How to calculate the average of random data in R

-1

I'm new to R. I have a large file with multiple columns and I've been asked to split the data into 2 parts. I have R split the data randomly by 70% into a group called nTrain, and 30% into a group called nTest.

I was able to split the data randomly, but I now need to calculate the AVERAGE of a specific column in the 70% random data and do the same for the 30% random data. Can someone please explain how to do so?

Thanks.

If it helps understand my situation, this is what I have so far in R:

length(DataFile)



(nData=nrow(DataFile))



DataFile



set.seed(0)



(trainIdx<- sample(seq(1,nrow(DataFile)), floor(nrow(DataFile)*0.70)))



> (nTrain=length(trainIdx))

[1] 15129



> (nTest=nData-nTrain)

[1] 6484

edited Nov 24 '18 at 12:40

Roman

2,0891531

asked Nov 24 '18 at 6:53

R Newbie

Welcome to StackOverflow! Please read the info about how to ask a good question and how to give a reproducible example. This will make it much easier for others to help you.

– Ronak Shah
Nov 24 '18 at 8:00

Thanks for the advice Ronak. I will read the info on how to ask a good question and how to give a reproducible example.

– R Newbie
Nov 25 '18 at 6:28

add a comment |

-1

Thanks.

If it helps understand my situation, this is what I have so far in R:

length(DataFile)



(nData=nrow(DataFile))



DataFile



set.seed(0)



(trainIdx<- sample(seq(1,nrow(DataFile)), floor(nrow(DataFile)*0.70)))



> (nTrain=length(trainIdx))

[1] 15129



> (nTest=nData-nTrain)

[1] 6484

edited Nov 24 '18 at 12:40

Roman

2,0891531

asked Nov 24 '18 at 6:53

R Newbie

Welcome to StackOverflow! Please read the info about how to ask a good question and how to give a reproducible example. This will make it much easier for others to help you.

– Ronak Shah
Nov 24 '18 at 8:00

Thanks for the advice Ronak. I will read the info on how to ask a good question and how to give a reproducible example.

– R Newbie
Nov 25 '18 at 6:28

add a comment |

-1

Thanks.

If it helps understand my situation, this is what I have so far in R:

length(DataFile)



(nData=nrow(DataFile))



DataFile



set.seed(0)



(trainIdx<- sample(seq(1,nrow(DataFile)), floor(nrow(DataFile)*0.70)))



> (nTrain=length(trainIdx))

[1] 15129



> (nTest=nData-nTrain)

[1] 6484

edited Nov 24 '18 at 12:40

Roman

2,0891531

asked Nov 24 '18 at 6:53

R Newbie

Thanks.

If it helps understand my situation, this is what I have so far in R:

length(DataFile)



(nData=nrow(DataFile))



DataFile



set.seed(0)



(trainIdx<- sample(seq(1,nrow(DataFile)), floor(nrow(DataFile)*0.70)))



> (nTrain=length(trainIdx))

[1] 15129



> (nTest=nData-nTrain)

[1] 6484

edited Nov 24 '18 at 12:40

Roman

2,0891531

asked Nov 24 '18 at 6:53

R Newbie

edited Nov 24 '18 at 12:40

Roman

2,0891531

asked Nov 24 '18 at 6:53

R Newbie

edited Nov 24 '18 at 12:40

Roman

2,0891531

edited Nov 24 '18 at 12:40

Roman

2,0891531

edited Nov 24 '18 at 12:40

Roman

2,0891531

asked Nov 24 '18 at 6:53

R Newbie

asked Nov 24 '18 at 6:53

R Newbie

asked Nov 24 '18 at 6:53

R Newbie

Welcome to StackOverflow! Please read the info about how to ask a good question and how to give a reproducible example. This will make it much easier for others to help you.

– Ronak Shah
Nov 24 '18 at 8:00

Thanks for the advice Ronak. I will read the info on how to ask a good question and how to give a reproducible example.

– R Newbie
Nov 25 '18 at 6:28

add a comment |

Welcome to StackOverflow! Please read the info about how to ask a good question and how to give a reproducible example. This will make it much easier for others to help you.

– Ronak Shah
Nov 24 '18 at 8:00

Thanks for the advice Ronak. I will read the info on how to ask a good question and how to give a reproducible example.

– R Newbie
Nov 25 '18 at 6:28

Welcome to StackOverflow! Please read the info about how to ask a good question and how to give a reproducible example. This will make it much easier for others to help you.

– Ronak Shah
Nov 24 '18 at 8:00

Thanks for the advice Ronak. I will read the info on how to ask a good question and how to give a reproducible example.

– R Newbie
Nov 25 '18 at 6:28

add a comment |

1 Answer
1

active

oldest

votes

Welcome to Stackoverflow!

In R convention you should stick to the <- operator for most types of assigments (you can find more info here
and
here).

The code/output you posted is incomplete, really (e.g., the output after the first line, length(DataFile), is missing).

Let's go through this step by step.

1. Create mock data

set.seed(1701)

DataFile <- sample(seq(0, 1, 0.01), 10000, replace = TRUE)

2. Create a dataset

# This randomizes the order

DataSet <- sample(DataFile)

3. Split Train and Test

split <- length(DataSet) * 0.7

# You use length() for one-dimensional objects, and

# nrow() for matrices, tables, etc.



DataTrain <- head(DataSet, split)

DataTest <- tail(DataSet, length(DataSet) - split)



# This approach avoids rounding errors when splitting and

# as our dataset is already randomized we can sample linearly.

4. Calculate average

> mean(DataTrain)

[1] 0.5029891

> mean(DataTest)

[1] 0.496056

edited Nov 24 '18 at 12:52

answered Nov 24 '18 at 10:35

Roman

2,0891531

Thanks Roman for the step-by-step instructions. I will try your process and let you know if I can get it to work.

– R Newbie
Nov 25 '18 at 1:09

the file that I'm pulling the data from has 20 columns with headers. I need to pull the average of only 1 of the columns, for only 70% of the data. Can you explain how I can do this? I appreciate your help!

– R Newbie
Nov 25 '18 at 1:38

Can you post a dput(head(data)) into your original post and specify the column? As a general approach, @AdamB showed the right method if you work with table-shaped data.

– Roman
Nov 25 '18 at 1:42

I was able to solve my issue! Thanks Roman!

– R Newbie
Nov 25 '18 at 2:08

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53455890%2fhow-to-calculate-the-average-of-random-data-in-r%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

Welcome to Stackoverflow!

In R convention you should stick to the <- operator for most types of assigments (you can find more info here
and
here).

The code/output you posted is incomplete, really (e.g., the output after the first line, length(DataFile), is missing).

Let's go through this step by step.

1. Create mock data

set.seed(1701)

DataFile <- sample(seq(0, 1, 0.01), 10000, replace = TRUE)

2. Create a dataset

# This randomizes the order

DataSet <- sample(DataFile)

3. Split Train and Test

split <- length(DataSet) * 0.7

# You use length() for one-dimensional objects, and

# nrow() for matrices, tables, etc.



DataTrain <- head(DataSet, split)

DataTest <- tail(DataSet, length(DataSet) - split)



# This approach avoids rounding errors when splitting and

# as our dataset is already randomized we can sample linearly.

4. Calculate average

> mean(DataTrain)

[1] 0.5029891

> mean(DataTest)

[1] 0.496056

edited Nov 24 '18 at 12:52

answered Nov 24 '18 at 10:35

Roman

2,0891531

Thanks Roman for the step-by-step instructions. I will try your process and let you know if I can get it to work.

– R Newbie
Nov 25 '18 at 1:09

the file that I'm pulling the data from has 20 columns with headers. I need to pull the average of only 1 of the columns, for only 70% of the data. Can you explain how I can do this? I appreciate your help!

– R Newbie
Nov 25 '18 at 1:38

Can you post a dput(head(data)) into your original post and specify the column? As a general approach, @AdamB showed the right method if you work with table-shaped data.

– Roman
Nov 25 '18 at 1:42

I was able to solve my issue! Thanks Roman!

– R Newbie
Nov 25 '18 at 2:08

add a comment |

Welcome to Stackoverflow!

In R convention you should stick to the <- operator for most types of assigments (you can find more info here
and
here).

The code/output you posted is incomplete, really (e.g., the output after the first line, length(DataFile), is missing).

Let's go through this step by step.

1. Create mock data

set.seed(1701)

DataFile <- sample(seq(0, 1, 0.01), 10000, replace = TRUE)

2. Create a dataset

# This randomizes the order

DataSet <- sample(DataFile)

3. Split Train and Test

split <- length(DataSet) * 0.7

# You use length() for one-dimensional objects, and

# nrow() for matrices, tables, etc.



DataTrain <- head(DataSet, split)

DataTest <- tail(DataSet, length(DataSet) - split)



# This approach avoids rounding errors when splitting and

# as our dataset is already randomized we can sample linearly.

4. Calculate average

> mean(DataTrain)

[1] 0.5029891

> mean(DataTest)

[1] 0.496056

edited Nov 24 '18 at 12:52

answered Nov 24 '18 at 10:35

Roman

2,0891531

Thanks Roman for the step-by-step instructions. I will try your process and let you know if I can get it to work.

– R Newbie
Nov 25 '18 at 1:09

the file that I'm pulling the data from has 20 columns with headers. I need to pull the average of only 1 of the columns, for only 70% of the data. Can you explain how I can do this? I appreciate your help!

– R Newbie
Nov 25 '18 at 1:38

Can you post a dput(head(data)) into your original post and specify the column? As a general approach, @AdamB showed the right method if you work with table-shaped data.

– Roman
Nov 25 '18 at 1:42

I was able to solve my issue! Thanks Roman!

– R Newbie
Nov 25 '18 at 2:08

add a comment |

Welcome to Stackoverflow!

In R convention you should stick to the <- operator for most types of assigments (you can find more info here
and
here).

The code/output you posted is incomplete, really (e.g., the output after the first line, length(DataFile), is missing).

Let's go through this step by step.

1. Create mock data

set.seed(1701)

DataFile <- sample(seq(0, 1, 0.01), 10000, replace = TRUE)

2. Create a dataset

# This randomizes the order

DataSet <- sample(DataFile)

3. Split Train and Test

split <- length(DataSet) * 0.7

# You use length() for one-dimensional objects, and

# nrow() for matrices, tables, etc.



DataTrain <- head(DataSet, split)

DataTest <- tail(DataSet, length(DataSet) - split)



# This approach avoids rounding errors when splitting and

# as our dataset is already randomized we can sample linearly.

4. Calculate average

> mean(DataTrain)

[1] 0.5029891

> mean(DataTest)

[1] 0.496056

edited Nov 24 '18 at 12:52

answered Nov 24 '18 at 10:35

Roman

2,0891531

Welcome to Stackoverflow!

In R convention you should stick to the <- operator for most types of assigments (you can find more info here
and
here).

The code/output you posted is incomplete, really (e.g., the output after the first line, length(DataFile), is missing).

Let's go through this step by step.

1. Create mock data

set.seed(1701)

DataFile <- sample(seq(0, 1, 0.01), 10000, replace = TRUE)

2. Create a dataset

# This randomizes the order

DataSet <- sample(DataFile)

3. Split Train and Test

split <- length(DataSet) * 0.7

# You use length() for one-dimensional objects, and

# nrow() for matrices, tables, etc.



DataTrain <- head(DataSet, split)

DataTest <- tail(DataSet, length(DataSet) - split)



# This approach avoids rounding errors when splitting and

# as our dataset is already randomized we can sample linearly.

4. Calculate average

> mean(DataTrain)

[1] 0.5029891

> mean(DataTest)

[1] 0.496056

edited Nov 24 '18 at 12:52

answered Nov 24 '18 at 10:35

Roman

2,0891531

edited Nov 24 '18 at 12:52

answered Nov 24 '18 at 10:35

Roman

2,0891531

answered Nov 24 '18 at 10:35

Roman

2,0891531

answered Nov 24 '18 at 10:35

Roman

2,0891531

Thanks Roman for the step-by-step instructions. I will try your process and let you know if I can get it to work.

– R Newbie
Nov 25 '18 at 1:09

the file that I'm pulling the data from has 20 columns with headers. I need to pull the average of only 1 of the columns, for only 70% of the data. Can you explain how I can do this? I appreciate your help!

– R Newbie
Nov 25 '18 at 1:38

Can you post a dput(head(data)) into your original post and specify the column? As a general approach, @AdamB showed the right method if you work with table-shaped data.

– Roman
Nov 25 '18 at 1:42

I was able to solve my issue! Thanks Roman!

– R Newbie
Nov 25 '18 at 2:08

add a comment |

Thanks Roman for the step-by-step instructions. I will try your process and let you know if I can get it to work.

– R Newbie
Nov 25 '18 at 1:09

the file that I'm pulling the data from has 20 columns with headers. I need to pull the average of only 1 of the columns, for only 70% of the data. Can you explain how I can do this? I appreciate your help!

– R Newbie
Nov 25 '18 at 1:38

Can you post a dput(head(data)) into your original post and specify the column? As a general approach, @AdamB showed the right method if you work with table-shaped data.

– Roman
Nov 25 '18 at 1:42

I was able to solve my issue! Thanks Roman!

– R Newbie
Nov 25 '18 at 2:08

Thanks Roman for the step-by-step instructions. I will try your process and let you know if I can get it to work.

– R Newbie
Nov 25 '18 at 1:09

the file that I'm pulling the data from has 20 columns with headers. I need to pull the average of only 1 of the columns, for only 70% of the data. Can you explain how I can do this? I appreciate your help!

– R Newbie
Nov 25 '18 at 1:38

Can you post a dput(head(data)) into your original post and specify the column? As a general approach, @AdamB showed the right method if you work with table-shaped data.

– Roman
Nov 25 '18 at 1:42

I was able to solve my issue! Thanks Roman!

– R Newbie
Nov 25 '18 at 2:08

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Nsryjdtyk