How to calculate the average of random data in R












-1















I'm new to R. I have a large file with multiple columns and I've been asked to split the data into 2 parts. I have R split the data randomly by 70% into a group called nTrain, and 30% into a group called nTest.

I was able to split the data randomly, but I now need to calculate the AVERAGE of a specific column in the 70% random data and do the same for the 30% random data. Can someone please explain how to do so?



Thanks.



If it helps understand my situation, this is what I have so far in R:



length(DataFile)

(nData=nrow(DataFile))

DataFile

set.seed(0)

(trainIdx<- sample(seq(1,nrow(DataFile)), floor(nrow(DataFile)*0.70)))

> (nTrain=length(trainIdx))
[1] 15129

> (nTest=nData-nTrain)
[1] 6484









share|improve this question

























  • Welcome to StackOverflow! Please read the info about how to ask a good question and how to give a reproducible example. This will make it much easier for others to help you.

    – Ronak Shah
    Nov 24 '18 at 8:00











  • Thanks for the advice Ronak. I will read the info on how to ask a good question and how to give a reproducible example.

    – R Newbie
    Nov 25 '18 at 6:28
















-1















I'm new to R. I have a large file with multiple columns and I've been asked to split the data into 2 parts. I have R split the data randomly by 70% into a group called nTrain, and 30% into a group called nTest.

I was able to split the data randomly, but I now need to calculate the AVERAGE of a specific column in the 70% random data and do the same for the 30% random data. Can someone please explain how to do so?



Thanks.



If it helps understand my situation, this is what I have so far in R:



length(DataFile)

(nData=nrow(DataFile))

DataFile

set.seed(0)

(trainIdx<- sample(seq(1,nrow(DataFile)), floor(nrow(DataFile)*0.70)))

> (nTrain=length(trainIdx))
[1] 15129

> (nTest=nData-nTrain)
[1] 6484









share|improve this question

























  • Welcome to StackOverflow! Please read the info about how to ask a good question and how to give a reproducible example. This will make it much easier for others to help you.

    – Ronak Shah
    Nov 24 '18 at 8:00











  • Thanks for the advice Ronak. I will read the info on how to ask a good question and how to give a reproducible example.

    – R Newbie
    Nov 25 '18 at 6:28














-1












-1








-1


1






I'm new to R. I have a large file with multiple columns and I've been asked to split the data into 2 parts. I have R split the data randomly by 70% into a group called nTrain, and 30% into a group called nTest.

I was able to split the data randomly, but I now need to calculate the AVERAGE of a specific column in the 70% random data and do the same for the 30% random data. Can someone please explain how to do so?



Thanks.



If it helps understand my situation, this is what I have so far in R:



length(DataFile)

(nData=nrow(DataFile))

DataFile

set.seed(0)

(trainIdx<- sample(seq(1,nrow(DataFile)), floor(nrow(DataFile)*0.70)))

> (nTrain=length(trainIdx))
[1] 15129

> (nTest=nData-nTrain)
[1] 6484









share|improve this question
















I'm new to R. I have a large file with multiple columns and I've been asked to split the data into 2 parts. I have R split the data randomly by 70% into a group called nTrain, and 30% into a group called nTest.

I was able to split the data randomly, but I now need to calculate the AVERAGE of a specific column in the 70% random data and do the same for the 30% random data. Can someone please explain how to do so?



Thanks.



If it helps understand my situation, this is what I have so far in R:



length(DataFile)

(nData=nrow(DataFile))

DataFile

set.seed(0)

(trainIdx<- sample(seq(1,nrow(DataFile)), floor(nrow(DataFile)*0.70)))

> (nTrain=length(trainIdx))
[1] 15129

> (nTest=nData-nTrain)
[1] 6484






r






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 24 '18 at 12:40









Roman

2,0891531




2,0891531










asked Nov 24 '18 at 6:53









R NewbieR Newbie

61




61













  • Welcome to StackOverflow! Please read the info about how to ask a good question and how to give a reproducible example. This will make it much easier for others to help you.

    – Ronak Shah
    Nov 24 '18 at 8:00











  • Thanks for the advice Ronak. I will read the info on how to ask a good question and how to give a reproducible example.

    – R Newbie
    Nov 25 '18 at 6:28



















  • Welcome to StackOverflow! Please read the info about how to ask a good question and how to give a reproducible example. This will make it much easier for others to help you.

    – Ronak Shah
    Nov 24 '18 at 8:00











  • Thanks for the advice Ronak. I will read the info on how to ask a good question and how to give a reproducible example.

    – R Newbie
    Nov 25 '18 at 6:28

















Welcome to StackOverflow! Please read the info about how to ask a good question and how to give a reproducible example. This will make it much easier for others to help you.

– Ronak Shah
Nov 24 '18 at 8:00





Welcome to StackOverflow! Please read the info about how to ask a good question and how to give a reproducible example. This will make it much easier for others to help you.

– Ronak Shah
Nov 24 '18 at 8:00













Thanks for the advice Ronak. I will read the info on how to ask a good question and how to give a reproducible example.

– R Newbie
Nov 25 '18 at 6:28





Thanks for the advice Ronak. I will read the info on how to ask a good question and how to give a reproducible example.

– R Newbie
Nov 25 '18 at 6:28












1 Answer
1






active

oldest

votes


















0















Welcome to Stackoverflow!




  1. In R convention you should stick to the <- operator for most types of assigments (you can find more info here
    and
    here).

  2. The code/output you posted is incomplete, really (e.g., the output after the first line, length(DataFile), is missing).


Let's go through this step by step.



1. Create mock data



set.seed(1701)
DataFile <- sample(seq(0, 1, 0.01), 10000, replace = TRUE)


2. Create a dataset



# This randomizes the order
DataSet <- sample(DataFile)


3. Split Train and Test



split <- length(DataSet) * 0.7
# You use length() for one-dimensional objects, and
# nrow() for matrices, tables, etc.

DataTrain <- head(DataSet, split)
DataTest <- tail(DataSet, length(DataSet) - split)

# This approach avoids rounding errors when splitting and
# as our dataset is already randomized we can sample linearly.


4. Calculate average



> mean(DataTrain)
[1] 0.5029891
> mean(DataTest)
[1] 0.496056





share|improve this answer


























  • Thanks Roman for the step-by-step instructions. I will try your process and let you know if I can get it to work.

    – R Newbie
    Nov 25 '18 at 1:09











  • the file that I'm pulling the data from has 20 columns with headers. I need to pull the average of only 1 of the columns, for only 70% of the data. Can you explain how I can do this? I appreciate your help!

    – R Newbie
    Nov 25 '18 at 1:38











  • Can you post a dput(head(data)) into your original post and specify the column? As a general approach, @AdamB showed the right method if you work with table-shaped data.

    – Roman
    Nov 25 '18 at 1:42











  • I was able to solve my issue! Thanks Roman!

    – R Newbie
    Nov 25 '18 at 2:08











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53455890%2fhow-to-calculate-the-average-of-random-data-in-r%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









0















Welcome to Stackoverflow!




  1. In R convention you should stick to the <- operator for most types of assigments (you can find more info here
    and
    here).

  2. The code/output you posted is incomplete, really (e.g., the output after the first line, length(DataFile), is missing).


Let's go through this step by step.



1. Create mock data



set.seed(1701)
DataFile <- sample(seq(0, 1, 0.01), 10000, replace = TRUE)


2. Create a dataset



# This randomizes the order
DataSet <- sample(DataFile)


3. Split Train and Test



split <- length(DataSet) * 0.7
# You use length() for one-dimensional objects, and
# nrow() for matrices, tables, etc.

DataTrain <- head(DataSet, split)
DataTest <- tail(DataSet, length(DataSet) - split)

# This approach avoids rounding errors when splitting and
# as our dataset is already randomized we can sample linearly.


4. Calculate average



> mean(DataTrain)
[1] 0.5029891
> mean(DataTest)
[1] 0.496056





share|improve this answer


























  • Thanks Roman for the step-by-step instructions. I will try your process and let you know if I can get it to work.

    – R Newbie
    Nov 25 '18 at 1:09











  • the file that I'm pulling the data from has 20 columns with headers. I need to pull the average of only 1 of the columns, for only 70% of the data. Can you explain how I can do this? I appreciate your help!

    – R Newbie
    Nov 25 '18 at 1:38











  • Can you post a dput(head(data)) into your original post and specify the column? As a general approach, @AdamB showed the right method if you work with table-shaped data.

    – Roman
    Nov 25 '18 at 1:42











  • I was able to solve my issue! Thanks Roman!

    – R Newbie
    Nov 25 '18 at 2:08
















0















Welcome to Stackoverflow!




  1. In R convention you should stick to the <- operator for most types of assigments (you can find more info here
    and
    here).

  2. The code/output you posted is incomplete, really (e.g., the output after the first line, length(DataFile), is missing).


Let's go through this step by step.



1. Create mock data



set.seed(1701)
DataFile <- sample(seq(0, 1, 0.01), 10000, replace = TRUE)


2. Create a dataset



# This randomizes the order
DataSet <- sample(DataFile)


3. Split Train and Test



split <- length(DataSet) * 0.7
# You use length() for one-dimensional objects, and
# nrow() for matrices, tables, etc.

DataTrain <- head(DataSet, split)
DataTest <- tail(DataSet, length(DataSet) - split)

# This approach avoids rounding errors when splitting and
# as our dataset is already randomized we can sample linearly.


4. Calculate average



> mean(DataTrain)
[1] 0.5029891
> mean(DataTest)
[1] 0.496056





share|improve this answer


























  • Thanks Roman for the step-by-step instructions. I will try your process and let you know if I can get it to work.

    – R Newbie
    Nov 25 '18 at 1:09











  • the file that I'm pulling the data from has 20 columns with headers. I need to pull the average of only 1 of the columns, for only 70% of the data. Can you explain how I can do this? I appreciate your help!

    – R Newbie
    Nov 25 '18 at 1:38











  • Can you post a dput(head(data)) into your original post and specify the column? As a general approach, @AdamB showed the right method if you work with table-shaped data.

    – Roman
    Nov 25 '18 at 1:42











  • I was able to solve my issue! Thanks Roman!

    – R Newbie
    Nov 25 '18 at 2:08














0












0








0








Welcome to Stackoverflow!




  1. In R convention you should stick to the <- operator for most types of assigments (you can find more info here
    and
    here).

  2. The code/output you posted is incomplete, really (e.g., the output after the first line, length(DataFile), is missing).


Let's go through this step by step.



1. Create mock data



set.seed(1701)
DataFile <- sample(seq(0, 1, 0.01), 10000, replace = TRUE)


2. Create a dataset



# This randomizes the order
DataSet <- sample(DataFile)


3. Split Train and Test



split <- length(DataSet) * 0.7
# You use length() for one-dimensional objects, and
# nrow() for matrices, tables, etc.

DataTrain <- head(DataSet, split)
DataTest <- tail(DataSet, length(DataSet) - split)

# This approach avoids rounding errors when splitting and
# as our dataset is already randomized we can sample linearly.


4. Calculate average



> mean(DataTrain)
[1] 0.5029891
> mean(DataTest)
[1] 0.496056





share|improve this answer
















Welcome to Stackoverflow!




  1. In R convention you should stick to the <- operator for most types of assigments (you can find more info here
    and
    here).

  2. The code/output you posted is incomplete, really (e.g., the output after the first line, length(DataFile), is missing).


Let's go through this step by step.



1. Create mock data



set.seed(1701)
DataFile <- sample(seq(0, 1, 0.01), 10000, replace = TRUE)


2. Create a dataset



# This randomizes the order
DataSet <- sample(DataFile)


3. Split Train and Test



split <- length(DataSet) * 0.7
# You use length() for one-dimensional objects, and
# nrow() for matrices, tables, etc.

DataTrain <- head(DataSet, split)
DataTest <- tail(DataSet, length(DataSet) - split)

# This approach avoids rounding errors when splitting and
# as our dataset is already randomized we can sample linearly.


4. Calculate average



> mean(DataTrain)
[1] 0.5029891
> mean(DataTest)
[1] 0.496056






share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 24 '18 at 12:52

























answered Nov 24 '18 at 10:35









RomanRoman

2,0891531




2,0891531













  • Thanks Roman for the step-by-step instructions. I will try your process and let you know if I can get it to work.

    – R Newbie
    Nov 25 '18 at 1:09











  • the file that I'm pulling the data from has 20 columns with headers. I need to pull the average of only 1 of the columns, for only 70% of the data. Can you explain how I can do this? I appreciate your help!

    – R Newbie
    Nov 25 '18 at 1:38











  • Can you post a dput(head(data)) into your original post and specify the column? As a general approach, @AdamB showed the right method if you work with table-shaped data.

    – Roman
    Nov 25 '18 at 1:42











  • I was able to solve my issue! Thanks Roman!

    – R Newbie
    Nov 25 '18 at 2:08



















  • Thanks Roman for the step-by-step instructions. I will try your process and let you know if I can get it to work.

    – R Newbie
    Nov 25 '18 at 1:09











  • the file that I'm pulling the data from has 20 columns with headers. I need to pull the average of only 1 of the columns, for only 70% of the data. Can you explain how I can do this? I appreciate your help!

    – R Newbie
    Nov 25 '18 at 1:38











  • Can you post a dput(head(data)) into your original post and specify the column? As a general approach, @AdamB showed the right method if you work with table-shaped data.

    – Roman
    Nov 25 '18 at 1:42











  • I was able to solve my issue! Thanks Roman!

    – R Newbie
    Nov 25 '18 at 2:08

















Thanks Roman for the step-by-step instructions. I will try your process and let you know if I can get it to work.

– R Newbie
Nov 25 '18 at 1:09





Thanks Roman for the step-by-step instructions. I will try your process and let you know if I can get it to work.

– R Newbie
Nov 25 '18 at 1:09













the file that I'm pulling the data from has 20 columns with headers. I need to pull the average of only 1 of the columns, for only 70% of the data. Can you explain how I can do this? I appreciate your help!

– R Newbie
Nov 25 '18 at 1:38





the file that I'm pulling the data from has 20 columns with headers. I need to pull the average of only 1 of the columns, for only 70% of the data. Can you explain how I can do this? I appreciate your help!

– R Newbie
Nov 25 '18 at 1:38













Can you post a dput(head(data)) into your original post and specify the column? As a general approach, @AdamB showed the right method if you work with table-shaped data.

– Roman
Nov 25 '18 at 1:42





Can you post a dput(head(data)) into your original post and specify the column? As a general approach, @AdamB showed the right method if you work with table-shaped data.

– Roman
Nov 25 '18 at 1:42













I was able to solve my issue! Thanks Roman!

– R Newbie
Nov 25 '18 at 2:08





I was able to solve my issue! Thanks Roman!

– R Newbie
Nov 25 '18 at 2:08




















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53455890%2fhow-to-calculate-the-average-of-random-data-in-r%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Create new schema in PostgreSQL using DBeaver

Deepest pit of an array with Javascript: test on Codility

Costa Masnaga