How to calculate the average of random data in R
I'm new to R. I have a large file with multiple columns and I've been asked to split the data into 2 parts. I have R split the data randomly by 70% into a group called nTrain
, and 30% into a group called nTest
.
I was able to split the data randomly, but I now need to calculate the AVERAGE of a specific column in the 70% random data and do the same for the 30% random data. Can someone please explain how to do so?
Thanks.
If it helps understand my situation, this is what I have so far in R:
length(DataFile)
(nData=nrow(DataFile))
DataFile
set.seed(0)
(trainIdx<- sample(seq(1,nrow(DataFile)), floor(nrow(DataFile)*0.70)))
> (nTrain=length(trainIdx))
[1] 15129
> (nTest=nData-nTrain)
[1] 6484
r
add a comment |
I'm new to R. I have a large file with multiple columns and I've been asked to split the data into 2 parts. I have R split the data randomly by 70% into a group called nTrain
, and 30% into a group called nTest
.
I was able to split the data randomly, but I now need to calculate the AVERAGE of a specific column in the 70% random data and do the same for the 30% random data. Can someone please explain how to do so?
Thanks.
If it helps understand my situation, this is what I have so far in R:
length(DataFile)
(nData=nrow(DataFile))
DataFile
set.seed(0)
(trainIdx<- sample(seq(1,nrow(DataFile)), floor(nrow(DataFile)*0.70)))
> (nTrain=length(trainIdx))
[1] 15129
> (nTest=nData-nTrain)
[1] 6484
r
Welcome to StackOverflow! Please read the info about how to ask a good question and how to give a reproducible example. This will make it much easier for others to help you.
– Ronak Shah
Nov 24 '18 at 8:00
Thanks for the advice Ronak. I will read the info on how to ask a good question and how to give a reproducible example.
– R Newbie
Nov 25 '18 at 6:28
add a comment |
I'm new to R. I have a large file with multiple columns and I've been asked to split the data into 2 parts. I have R split the data randomly by 70% into a group called nTrain
, and 30% into a group called nTest
.
I was able to split the data randomly, but I now need to calculate the AVERAGE of a specific column in the 70% random data and do the same for the 30% random data. Can someone please explain how to do so?
Thanks.
If it helps understand my situation, this is what I have so far in R:
length(DataFile)
(nData=nrow(DataFile))
DataFile
set.seed(0)
(trainIdx<- sample(seq(1,nrow(DataFile)), floor(nrow(DataFile)*0.70)))
> (nTrain=length(trainIdx))
[1] 15129
> (nTest=nData-nTrain)
[1] 6484
r
I'm new to R. I have a large file with multiple columns and I've been asked to split the data into 2 parts. I have R split the data randomly by 70% into a group called nTrain
, and 30% into a group called nTest
.
I was able to split the data randomly, but I now need to calculate the AVERAGE of a specific column in the 70% random data and do the same for the 30% random data. Can someone please explain how to do so?
Thanks.
If it helps understand my situation, this is what I have so far in R:
length(DataFile)
(nData=nrow(DataFile))
DataFile
set.seed(0)
(trainIdx<- sample(seq(1,nrow(DataFile)), floor(nrow(DataFile)*0.70)))
> (nTrain=length(trainIdx))
[1] 15129
> (nTest=nData-nTrain)
[1] 6484
r
r
edited Nov 24 '18 at 12:40
Roman
2,0891531
2,0891531
asked Nov 24 '18 at 6:53
R NewbieR Newbie
61
61
Welcome to StackOverflow! Please read the info about how to ask a good question and how to give a reproducible example. This will make it much easier for others to help you.
– Ronak Shah
Nov 24 '18 at 8:00
Thanks for the advice Ronak. I will read the info on how to ask a good question and how to give a reproducible example.
– R Newbie
Nov 25 '18 at 6:28
add a comment |
Welcome to StackOverflow! Please read the info about how to ask a good question and how to give a reproducible example. This will make it much easier for others to help you.
– Ronak Shah
Nov 24 '18 at 8:00
Thanks for the advice Ronak. I will read the info on how to ask a good question and how to give a reproducible example.
– R Newbie
Nov 25 '18 at 6:28
Welcome to StackOverflow! Please read the info about how to ask a good question and how to give a reproducible example. This will make it much easier for others to help you.
– Ronak Shah
Nov 24 '18 at 8:00
Welcome to StackOverflow! Please read the info about how to ask a good question and how to give a reproducible example. This will make it much easier for others to help you.
– Ronak Shah
Nov 24 '18 at 8:00
Thanks for the advice Ronak. I will read the info on how to ask a good question and how to give a reproducible example.
– R Newbie
Nov 25 '18 at 6:28
Thanks for the advice Ronak. I will read the info on how to ask a good question and how to give a reproducible example.
– R Newbie
Nov 25 '18 at 6:28
add a comment |
1 Answer
1
active
oldest
votes
Welcome to Stackoverflow!
- In
R
convention you should stick to the<-
operator for most types of assigments (you can find more info here
and
here). - The code/output you posted is incomplete, really (e.g., the output after the first line,
length(DataFile)
, is missing).
Let's go through this step by step.
1. Create mock data
set.seed(1701)
DataFile <- sample(seq(0, 1, 0.01), 10000, replace = TRUE)
2. Create a dataset
# This randomizes the order
DataSet <- sample(DataFile)
3. Split Train and Test
split <- length(DataSet) * 0.7
# You use length() for one-dimensional objects, and
# nrow() for matrices, tables, etc.
DataTrain <- head(DataSet, split)
DataTest <- tail(DataSet, length(DataSet) - split)
# This approach avoids rounding errors when splitting and
# as our dataset is already randomized we can sample linearly.
4. Calculate average
> mean(DataTrain)
[1] 0.5029891
> mean(DataTest)
[1] 0.496056
Thanks Roman for the step-by-step instructions. I will try your process and let you know if I can get it to work.
– R Newbie
Nov 25 '18 at 1:09
the file that I'm pulling the data from has 20 columns with headers. I need to pull the average of only 1 of the columns, for only 70% of the data. Can you explain how I can do this? I appreciate your help!
– R Newbie
Nov 25 '18 at 1:38
Can you post adput(head(data))
into your original post and specify the column? As a general approach, @AdamB showed the right method if you work with table-shaped data.
– Roman
Nov 25 '18 at 1:42
I was able to solve my issue! Thanks Roman!
– R Newbie
Nov 25 '18 at 2:08
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53455890%2fhow-to-calculate-the-average-of-random-data-in-r%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Welcome to Stackoverflow!
- In
R
convention you should stick to the<-
operator for most types of assigments (you can find more info here
and
here). - The code/output you posted is incomplete, really (e.g., the output after the first line,
length(DataFile)
, is missing).
Let's go through this step by step.
1. Create mock data
set.seed(1701)
DataFile <- sample(seq(0, 1, 0.01), 10000, replace = TRUE)
2. Create a dataset
# This randomizes the order
DataSet <- sample(DataFile)
3. Split Train and Test
split <- length(DataSet) * 0.7
# You use length() for one-dimensional objects, and
# nrow() for matrices, tables, etc.
DataTrain <- head(DataSet, split)
DataTest <- tail(DataSet, length(DataSet) - split)
# This approach avoids rounding errors when splitting and
# as our dataset is already randomized we can sample linearly.
4. Calculate average
> mean(DataTrain)
[1] 0.5029891
> mean(DataTest)
[1] 0.496056
Thanks Roman for the step-by-step instructions. I will try your process and let you know if I can get it to work.
– R Newbie
Nov 25 '18 at 1:09
the file that I'm pulling the data from has 20 columns with headers. I need to pull the average of only 1 of the columns, for only 70% of the data. Can you explain how I can do this? I appreciate your help!
– R Newbie
Nov 25 '18 at 1:38
Can you post adput(head(data))
into your original post and specify the column? As a general approach, @AdamB showed the right method if you work with table-shaped data.
– Roman
Nov 25 '18 at 1:42
I was able to solve my issue! Thanks Roman!
– R Newbie
Nov 25 '18 at 2:08
add a comment |
Welcome to Stackoverflow!
- In
R
convention you should stick to the<-
operator for most types of assigments (you can find more info here
and
here). - The code/output you posted is incomplete, really (e.g., the output after the first line,
length(DataFile)
, is missing).
Let's go through this step by step.
1. Create mock data
set.seed(1701)
DataFile <- sample(seq(0, 1, 0.01), 10000, replace = TRUE)
2. Create a dataset
# This randomizes the order
DataSet <- sample(DataFile)
3. Split Train and Test
split <- length(DataSet) * 0.7
# You use length() for one-dimensional objects, and
# nrow() for matrices, tables, etc.
DataTrain <- head(DataSet, split)
DataTest <- tail(DataSet, length(DataSet) - split)
# This approach avoids rounding errors when splitting and
# as our dataset is already randomized we can sample linearly.
4. Calculate average
> mean(DataTrain)
[1] 0.5029891
> mean(DataTest)
[1] 0.496056
Thanks Roman for the step-by-step instructions. I will try your process and let you know if I can get it to work.
– R Newbie
Nov 25 '18 at 1:09
the file that I'm pulling the data from has 20 columns with headers. I need to pull the average of only 1 of the columns, for only 70% of the data. Can you explain how I can do this? I appreciate your help!
– R Newbie
Nov 25 '18 at 1:38
Can you post adput(head(data))
into your original post and specify the column? As a general approach, @AdamB showed the right method if you work with table-shaped data.
– Roman
Nov 25 '18 at 1:42
I was able to solve my issue! Thanks Roman!
– R Newbie
Nov 25 '18 at 2:08
add a comment |
Welcome to Stackoverflow!
- In
R
convention you should stick to the<-
operator for most types of assigments (you can find more info here
and
here). - The code/output you posted is incomplete, really (e.g., the output after the first line,
length(DataFile)
, is missing).
Let's go through this step by step.
1. Create mock data
set.seed(1701)
DataFile <- sample(seq(0, 1, 0.01), 10000, replace = TRUE)
2. Create a dataset
# This randomizes the order
DataSet <- sample(DataFile)
3. Split Train and Test
split <- length(DataSet) * 0.7
# You use length() for one-dimensional objects, and
# nrow() for matrices, tables, etc.
DataTrain <- head(DataSet, split)
DataTest <- tail(DataSet, length(DataSet) - split)
# This approach avoids rounding errors when splitting and
# as our dataset is already randomized we can sample linearly.
4. Calculate average
> mean(DataTrain)
[1] 0.5029891
> mean(DataTest)
[1] 0.496056
Welcome to Stackoverflow!
- In
R
convention you should stick to the<-
operator for most types of assigments (you can find more info here
and
here). - The code/output you posted is incomplete, really (e.g., the output after the first line,
length(DataFile)
, is missing).
Let's go through this step by step.
1. Create mock data
set.seed(1701)
DataFile <- sample(seq(0, 1, 0.01), 10000, replace = TRUE)
2. Create a dataset
# This randomizes the order
DataSet <- sample(DataFile)
3. Split Train and Test
split <- length(DataSet) * 0.7
# You use length() for one-dimensional objects, and
# nrow() for matrices, tables, etc.
DataTrain <- head(DataSet, split)
DataTest <- tail(DataSet, length(DataSet) - split)
# This approach avoids rounding errors when splitting and
# as our dataset is already randomized we can sample linearly.
4. Calculate average
> mean(DataTrain)
[1] 0.5029891
> mean(DataTest)
[1] 0.496056
edited Nov 24 '18 at 12:52
answered Nov 24 '18 at 10:35
RomanRoman
2,0891531
2,0891531
Thanks Roman for the step-by-step instructions. I will try your process and let you know if I can get it to work.
– R Newbie
Nov 25 '18 at 1:09
the file that I'm pulling the data from has 20 columns with headers. I need to pull the average of only 1 of the columns, for only 70% of the data. Can you explain how I can do this? I appreciate your help!
– R Newbie
Nov 25 '18 at 1:38
Can you post adput(head(data))
into your original post and specify the column? As a general approach, @AdamB showed the right method if you work with table-shaped data.
– Roman
Nov 25 '18 at 1:42
I was able to solve my issue! Thanks Roman!
– R Newbie
Nov 25 '18 at 2:08
add a comment |
Thanks Roman for the step-by-step instructions. I will try your process and let you know if I can get it to work.
– R Newbie
Nov 25 '18 at 1:09
the file that I'm pulling the data from has 20 columns with headers. I need to pull the average of only 1 of the columns, for only 70% of the data. Can you explain how I can do this? I appreciate your help!
– R Newbie
Nov 25 '18 at 1:38
Can you post adput(head(data))
into your original post and specify the column? As a general approach, @AdamB showed the right method if you work with table-shaped data.
– Roman
Nov 25 '18 at 1:42
I was able to solve my issue! Thanks Roman!
– R Newbie
Nov 25 '18 at 2:08
Thanks Roman for the step-by-step instructions. I will try your process and let you know if I can get it to work.
– R Newbie
Nov 25 '18 at 1:09
Thanks Roman for the step-by-step instructions. I will try your process and let you know if I can get it to work.
– R Newbie
Nov 25 '18 at 1:09
the file that I'm pulling the data from has 20 columns with headers. I need to pull the average of only 1 of the columns, for only 70% of the data. Can you explain how I can do this? I appreciate your help!
– R Newbie
Nov 25 '18 at 1:38
the file that I'm pulling the data from has 20 columns with headers. I need to pull the average of only 1 of the columns, for only 70% of the data. Can you explain how I can do this? I appreciate your help!
– R Newbie
Nov 25 '18 at 1:38
Can you post a
dput(head(data))
into your original post and specify the column? As a general approach, @AdamB showed the right method if you work with table-shaped data.– Roman
Nov 25 '18 at 1:42
Can you post a
dput(head(data))
into your original post and specify the column? As a general approach, @AdamB showed the right method if you work with table-shaped data.– Roman
Nov 25 '18 at 1:42
I was able to solve my issue! Thanks Roman!
– R Newbie
Nov 25 '18 at 2:08
I was able to solve my issue! Thanks Roman!
– R Newbie
Nov 25 '18 at 2:08
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53455890%2fhow-to-calculate-the-average-of-random-data-in-r%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Welcome to StackOverflow! Please read the info about how to ask a good question and how to give a reproducible example. This will make it much easier for others to help you.
– Ronak Shah
Nov 24 '18 at 8:00
Thanks for the advice Ronak. I will read the info on how to ask a good question and how to give a reproducible example.
– R Newbie
Nov 25 '18 at 6:28