How to calculate regression residuals in R for each individual in a longitudinal analysis?

up vote
0
down vote

favorite

I am working on a longitudinal/repeated measures multilevel model (MLM). Usually, for time-varying covariates (in my case "weekly gross income/1000"), you would calculate a person-mean centered version of the variable (i.e. deducting the person-year income response from the average of the person's weekly income across all of said person's time points). However, this can lead to bias (see here) and hence a better (more generalisable) approach is to center around a regression line for each individual (as it happens, the residuals from the regression serve this purpose).

Therefore, I need to calculate the following regression, but for each individual (roughly 10,000 individuals with 25,000 observations):

lm(Weekly_Gross_Pay_Main_Job~nYear, data=df)

Then, the really critical part is that I need to extract the residuals to a separate column in my main dataset, matched up with each person. These residuals will take the place of my group-mean centered variable (which will in turn be used in my MLM).

Here is a possible starting point using the function that I have for the group-mean centering. If this could be updated to fit a regression with the residuals output for each person, then that would be ideal (if not, then I am open to other approaches):

#Group mean-centering a variable. Relevant for L1 variables only.

gmc = function(variable, group){

  return(ave(variable, group, FUN = function(x){x - mean(x)}))

}



df$Weekly_Gross_Pay_Main_Jobgmc <- gmc(df$Weekly_Gross_Pay_Main_Job, df$Person_ID)

Data extract in long format (where Person_ID is the person, nYear is time, Weekly_Gross_Pay_Main_Job is weekly income/1000 and Weekly_Gross_Pay_Main_Jobgmc is the group-mean centered version):

structure(list(Person_ID = c(100003L, 100003L, 100003L, 100006L, 

100006L, 100006L, 100006L, 100010L, 100010L, 100010L, 100010L, 

100010L, 100010L, 100011L, 100014L, 100014L, 100014L, 100014L, 

100014L, 100016L, 100018L, 100018L, 100018L, 100018L, 100018L, 

100018L, 100018L, 100018L, 100018L, 100020L, 100020L, 100020L, 

100020L, 100020L, 100020L, 100020L, 100020L, 100020L, 100021L, 

100021L, 100024L, 100024L, 100024L, 100024L, 100024L, 100024L, 

100024L, 100024L, 100024L, 100024L, 100025L, 100025L, 100025L, 

100025L, 100025L, 100025L, 100025L, 100025L, 100027L, 100027L, 

100027L, 100027L, 100029L, 100029L, 100029L, 100029L, 100029L, 

100031L, 100031L, 100031L, 100032L, 100032L, 100032L, 100033L, 

100033L, 100033L, 100033L, 100033L, 100033L, 100034L, 100034L, 

100034L, 100037L, 100037L, 100037L, 100037L, 100037L, 100037L, 

100037L, 100044L, 100044L, 100044L, 100044L, 100044L, 100044L, 

100044L, 100045L, 100045L, 100045L, 100045L), nYear = c(5L, 6L, 

7L, 2L, 3L, 4L, 6L, 5L, 6L, 7L, 8L, 9L, 10L, 1L, 5L, 6L, 7L, 

8L, 9L, 5L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 1L, 2L, 3L, 4L, 

5L, 6L, 7L, 8L, 9L, 1L, 2L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 

13L, 14L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 1L, 2L, 3L, 4L, 5L, 

6L, 7L, 8L, 9L, 4L, 5L, 6L, 1L, 2L, 3L, 3L, 4L, 5L, 6L, 7L, 8L, 

2L, 3L, 5L, 5L, 6L, 7L, 8L, 9L, 11L, 13L, 2L, 3L, 4L, 6L, 7L, 

8L, 9L, 4L, 5L, 6L, 7L), Weekly_Gross_Pay_Main_Job = c(0, 0.58, 

0.35, 0.035, 0.65, 0.195, 0.43, 0, 0, 0, 0, 0, 0, 0.12, 1.653, 

0.967, 1.742, 1.323, 0, 0.709, 0.155, 0.431, 0.235, 0.17, 0.285, 

0.357, 0.28, 0.335, 0.375, 0.111, 0.333, 0.582, 0.882, 0.85, 

0.944, 1.615, 1.615, 1.35, 0.168, 0.08, 0, 0, 0, 0, 0, 0, 0, 

0.134, 0.737, 0, 0.02, 0.372, 0.1, 0.014, 0.307, 0.39, 0.671, 

0.5, 0.278, 0.32, 0.425, 0.4, 0.57, 0.917, 0.75, 0.402, 0.437, 

0.211, 0.537, 0.54, 0.135, 0.15, 0.65, 0.324, 0.399, 0.497, 0.67, 

0.825, 0.825, 0.25, 0.319, 0.35, 0.885, 0.941, 0.975, 0.975, 

1.02, 1.096, 1.148, 0.1, 0.11, 0.413, 0.477, 0.578, 0.686, 0.686, 

0.511, 0.578, 0.8, 0.75), Weekly_Gross_Pay_Main_Jobgmc = c(-0.31, 

0.27, 0.04, -0.2925, 0.3225, -0.1325, 0.1025, 0, 0, 0, 0, 0, 

0, 0, 0.516, -0.17, 0.605, 0.186, -1.137, 0, -0.136444444444444, 

0.139555555555556, -0.0564444444444445, -0.121444444444444, -0.00644444444444447, 

0.0655555555555555, -0.0114444444444444, 0.0435555555555556, 

0.0835555555555555, -0.809222222222222, -0.587222222222222, -0.338222222222222, 

-0.0382222222222223, -0.0702222222222223, 0.0237777777777777, 

0.694777777777778, 0.694777777777778, 0.429777777777778, 0.044, 

-0.044, -0.0871, -0.0871, -0.0871, -0.0871, -0.0871, -0.0871, 

-0.0871, 0.0469, 0.6499, -0.0871, -0.27675, 0.07525, -0.19675, 

-0.28275, 0.01025, 0.09325, 0.37425, 0.20325, -0.07775, -0.03575, 

0.06925, 0.04425, -0.0452, 0.3018, 0.1348, -0.2132, -0.1782, 

-0.218333333333333, 0.107666666666667, 0.110666666666667, -0.176666666666667, 

-0.161666666666667, 0.338333333333333, -0.266, -0.191, -0.093, 

0.0800000000000001, 0.235, 0.235, -0.0563333333333333, 0.0126666666666667, 

0.0436666666666666, -0.120714285714286, -0.0647142857142858, 

-0.0307142857142858, -0.0307142857142858, 0.0142857142857142, 

0.0902857142857143, 0.142285714285714, -0.335714285714286, -0.325714285714286, 

-0.0227142857142857, 0.0412857142857143, 0.142285714285714, 0.250285714285714, 

0.250285714285714, -0.1368, -0.0698000000000001, 0.1522, 0.1022

)), row.names = c(NA, 100L), class = "data.frame")

asked Nov 20 at 3:16

aspark2020

205

add a comment |

up vote
0
down vote

favorite

Therefore, I need to calculate the following regression, but for each individual (roughly 10,000 individuals with 25,000 observations):

lm(Weekly_Gross_Pay_Main_Job~nYear, data=df)

#Group mean-centering a variable. Relevant for L1 variables only.

gmc = function(variable, group){

  return(ave(variable, group, FUN = function(x){x - mean(x)}))

}



df$Weekly_Gross_Pay_Main_Jobgmc <- gmc(df$Weekly_Gross_Pay_Main_Job, df$Person_ID)

structure(list(Person_ID = c(100003L, 100003L, 100003L, 100006L, 

100006L, 100006L, 100006L, 100010L, 100010L, 100010L, 100010L, 

100010L, 100010L, 100011L, 100014L, 100014L, 100014L, 100014L, 

100014L, 100016L, 100018L, 100018L, 100018L, 100018L, 100018L, 

100018L, 100018L, 100018L, 100018L, 100020L, 100020L, 100020L, 

100020L, 100020L, 100020L, 100020L, 100020L, 100020L, 100021L, 

100021L, 100024L, 100024L, 100024L, 100024L, 100024L, 100024L, 

100024L, 100024L, 100024L, 100024L, 100025L, 100025L, 100025L, 

100025L, 100025L, 100025L, 100025L, 100025L, 100027L, 100027L, 

100027L, 100027L, 100029L, 100029L, 100029L, 100029L, 100029L, 

100031L, 100031L, 100031L, 100032L, 100032L, 100032L, 100033L, 

100033L, 100033L, 100033L, 100033L, 100033L, 100034L, 100034L, 

100034L, 100037L, 100037L, 100037L, 100037L, 100037L, 100037L, 

100037L, 100044L, 100044L, 100044L, 100044L, 100044L, 100044L, 

100044L, 100045L, 100045L, 100045L, 100045L), nYear = c(5L, 6L, 

7L, 2L, 3L, 4L, 6L, 5L, 6L, 7L, 8L, 9L, 10L, 1L, 5L, 6L, 7L, 

8L, 9L, 5L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 1L, 2L, 3L, 4L, 

5L, 6L, 7L, 8L, 9L, 1L, 2L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 

13L, 14L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 1L, 2L, 3L, 4L, 5L, 

6L, 7L, 8L, 9L, 4L, 5L, 6L, 1L, 2L, 3L, 3L, 4L, 5L, 6L, 7L, 8L, 

2L, 3L, 5L, 5L, 6L, 7L, 8L, 9L, 11L, 13L, 2L, 3L, 4L, 6L, 7L, 

8L, 9L, 4L, 5L, 6L, 7L), Weekly_Gross_Pay_Main_Job = c(0, 0.58, 

0.35, 0.035, 0.65, 0.195, 0.43, 0, 0, 0, 0, 0, 0, 0.12, 1.653, 

0.967, 1.742, 1.323, 0, 0.709, 0.155, 0.431, 0.235, 0.17, 0.285, 

0.357, 0.28, 0.335, 0.375, 0.111, 0.333, 0.582, 0.882, 0.85, 

0.944, 1.615, 1.615, 1.35, 0.168, 0.08, 0, 0, 0, 0, 0, 0, 0, 

0.134, 0.737, 0, 0.02, 0.372, 0.1, 0.014, 0.307, 0.39, 0.671, 

0.5, 0.278, 0.32, 0.425, 0.4, 0.57, 0.917, 0.75, 0.402, 0.437, 

0.211, 0.537, 0.54, 0.135, 0.15, 0.65, 0.324, 0.399, 0.497, 0.67, 

0.825, 0.825, 0.25, 0.319, 0.35, 0.885, 0.941, 0.975, 0.975, 

1.02, 1.096, 1.148, 0.1, 0.11, 0.413, 0.477, 0.578, 0.686, 0.686, 

0.511, 0.578, 0.8, 0.75), Weekly_Gross_Pay_Main_Jobgmc = c(-0.31, 

0.27, 0.04, -0.2925, 0.3225, -0.1325, 0.1025, 0, 0, 0, 0, 0, 

0, 0, 0.516, -0.17, 0.605, 0.186, -1.137, 0, -0.136444444444444, 

0.139555555555556, -0.0564444444444445, -0.121444444444444, -0.00644444444444447, 

0.0655555555555555, -0.0114444444444444, 0.0435555555555556, 

0.0835555555555555, -0.809222222222222, -0.587222222222222, -0.338222222222222, 

-0.0382222222222223, -0.0702222222222223, 0.0237777777777777, 

0.694777777777778, 0.694777777777778, 0.429777777777778, 0.044, 

-0.044, -0.0871, -0.0871, -0.0871, -0.0871, -0.0871, -0.0871, 

-0.0871, 0.0469, 0.6499, -0.0871, -0.27675, 0.07525, -0.19675, 

-0.28275, 0.01025, 0.09325, 0.37425, 0.20325, -0.07775, -0.03575, 

0.06925, 0.04425, -0.0452, 0.3018, 0.1348, -0.2132, -0.1782, 

-0.218333333333333, 0.107666666666667, 0.110666666666667, -0.176666666666667, 

-0.161666666666667, 0.338333333333333, -0.266, -0.191, -0.093, 

0.0800000000000001, 0.235, 0.235, -0.0563333333333333, 0.0126666666666667, 

0.0436666666666666, -0.120714285714286, -0.0647142857142858, 

-0.0307142857142858, -0.0307142857142858, 0.0142857142857142, 

0.0902857142857143, 0.142285714285714, -0.335714285714286, -0.325714285714286, 

-0.0227142857142857, 0.0412857142857143, 0.142285714285714, 0.250285714285714, 

0.250285714285714, -0.1368, -0.0698000000000001, 0.1522, 0.1022

)), row.names = c(NA, 100L), class = "data.frame")

asked Nov 20 at 3:16

aspark2020

205

add a comment |

up vote
0
down vote

favorite

Therefore, I need to calculate the following regression, but for each individual (roughly 10,000 individuals with 25,000 observations):

lm(Weekly_Gross_Pay_Main_Job~nYear, data=df)

#Group mean-centering a variable. Relevant for L1 variables only.

gmc = function(variable, group){

  return(ave(variable, group, FUN = function(x){x - mean(x)}))

}



df$Weekly_Gross_Pay_Main_Jobgmc <- gmc(df$Weekly_Gross_Pay_Main_Job, df$Person_ID)

structure(list(Person_ID = c(100003L, 100003L, 100003L, 100006L, 

100006L, 100006L, 100006L, 100010L, 100010L, 100010L, 100010L, 

100010L, 100010L, 100011L, 100014L, 100014L, 100014L, 100014L, 

100014L, 100016L, 100018L, 100018L, 100018L, 100018L, 100018L, 

100018L, 100018L, 100018L, 100018L, 100020L, 100020L, 100020L, 

100020L, 100020L, 100020L, 100020L, 100020L, 100020L, 100021L, 

100021L, 100024L, 100024L, 100024L, 100024L, 100024L, 100024L, 

100024L, 100024L, 100024L, 100024L, 100025L, 100025L, 100025L, 

100025L, 100025L, 100025L, 100025L, 100025L, 100027L, 100027L, 

100027L, 100027L, 100029L, 100029L, 100029L, 100029L, 100029L, 

100031L, 100031L, 100031L, 100032L, 100032L, 100032L, 100033L, 

100033L, 100033L, 100033L, 100033L, 100033L, 100034L, 100034L, 

100034L, 100037L, 100037L, 100037L, 100037L, 100037L, 100037L, 

100037L, 100044L, 100044L, 100044L, 100044L, 100044L, 100044L, 

100044L, 100045L, 100045L, 100045L, 100045L), nYear = c(5L, 6L, 

7L, 2L, 3L, 4L, 6L, 5L, 6L, 7L, 8L, 9L, 10L, 1L, 5L, 6L, 7L, 

8L, 9L, 5L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 1L, 2L, 3L, 4L, 

5L, 6L, 7L, 8L, 9L, 1L, 2L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 

13L, 14L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 1L, 2L, 3L, 4L, 5L, 

6L, 7L, 8L, 9L, 4L, 5L, 6L, 1L, 2L, 3L, 3L, 4L, 5L, 6L, 7L, 8L, 

2L, 3L, 5L, 5L, 6L, 7L, 8L, 9L, 11L, 13L, 2L, 3L, 4L, 6L, 7L, 

8L, 9L, 4L, 5L, 6L, 7L), Weekly_Gross_Pay_Main_Job = c(0, 0.58, 

0.35, 0.035, 0.65, 0.195, 0.43, 0, 0, 0, 0, 0, 0, 0.12, 1.653, 

0.967, 1.742, 1.323, 0, 0.709, 0.155, 0.431, 0.235, 0.17, 0.285, 

0.357, 0.28, 0.335, 0.375, 0.111, 0.333, 0.582, 0.882, 0.85, 

0.944, 1.615, 1.615, 1.35, 0.168, 0.08, 0, 0, 0, 0, 0, 0, 0, 

0.134, 0.737, 0, 0.02, 0.372, 0.1, 0.014, 0.307, 0.39, 0.671, 

0.5, 0.278, 0.32, 0.425, 0.4, 0.57, 0.917, 0.75, 0.402, 0.437, 

0.211, 0.537, 0.54, 0.135, 0.15, 0.65, 0.324, 0.399, 0.497, 0.67, 

0.825, 0.825, 0.25, 0.319, 0.35, 0.885, 0.941, 0.975, 0.975, 

1.02, 1.096, 1.148, 0.1, 0.11, 0.413, 0.477, 0.578, 0.686, 0.686, 

0.511, 0.578, 0.8, 0.75), Weekly_Gross_Pay_Main_Jobgmc = c(-0.31, 

0.27, 0.04, -0.2925, 0.3225, -0.1325, 0.1025, 0, 0, 0, 0, 0, 

0, 0, 0.516, -0.17, 0.605, 0.186, -1.137, 0, -0.136444444444444, 

0.139555555555556, -0.0564444444444445, -0.121444444444444, -0.00644444444444447, 

0.0655555555555555, -0.0114444444444444, 0.0435555555555556, 

0.0835555555555555, -0.809222222222222, -0.587222222222222, -0.338222222222222, 

-0.0382222222222223, -0.0702222222222223, 0.0237777777777777, 

0.694777777777778, 0.694777777777778, 0.429777777777778, 0.044, 

-0.044, -0.0871, -0.0871, -0.0871, -0.0871, -0.0871, -0.0871, 

-0.0871, 0.0469, 0.6499, -0.0871, -0.27675, 0.07525, -0.19675, 

-0.28275, 0.01025, 0.09325, 0.37425, 0.20325, -0.07775, -0.03575, 

0.06925, 0.04425, -0.0452, 0.3018, 0.1348, -0.2132, -0.1782, 

-0.218333333333333, 0.107666666666667, 0.110666666666667, -0.176666666666667, 

-0.161666666666667, 0.338333333333333, -0.266, -0.191, -0.093, 

0.0800000000000001, 0.235, 0.235, -0.0563333333333333, 0.0126666666666667, 

0.0436666666666666, -0.120714285714286, -0.0647142857142858, 

-0.0307142857142858, -0.0307142857142858, 0.0142857142857142, 

0.0902857142857143, 0.142285714285714, -0.335714285714286, -0.325714285714286, 

-0.0227142857142857, 0.0412857142857143, 0.142285714285714, 0.250285714285714, 

0.250285714285714, -0.1368, -0.0698000000000001, 0.1522, 0.1022

)), row.names = c(NA, 100L), class = "data.frame")

asked Nov 20 at 3:16

aspark2020

205

Therefore, I need to calculate the following regression, but for each individual (roughly 10,000 individuals with 25,000 observations):

lm(Weekly_Gross_Pay_Main_Job~nYear, data=df)

#Group mean-centering a variable. Relevant for L1 variables only.

gmc = function(variable, group){

  return(ave(variable, group, FUN = function(x){x - mean(x)}))

}



df$Weekly_Gross_Pay_Main_Jobgmc <- gmc(df$Weekly_Gross_Pay_Main_Job, df$Person_ID)

structure(list(Person_ID = c(100003L, 100003L, 100003L, 100006L, 

100006L, 100006L, 100006L, 100010L, 100010L, 100010L, 100010L, 

100010L, 100010L, 100011L, 100014L, 100014L, 100014L, 100014L, 

100014L, 100016L, 100018L, 100018L, 100018L, 100018L, 100018L, 

100018L, 100018L, 100018L, 100018L, 100020L, 100020L, 100020L, 

100020L, 100020L, 100020L, 100020L, 100020L, 100020L, 100021L, 

100021L, 100024L, 100024L, 100024L, 100024L, 100024L, 100024L, 

100024L, 100024L, 100024L, 100024L, 100025L, 100025L, 100025L, 

100025L, 100025L, 100025L, 100025L, 100025L, 100027L, 100027L, 

100027L, 100027L, 100029L, 100029L, 100029L, 100029L, 100029L, 

100031L, 100031L, 100031L, 100032L, 100032L, 100032L, 100033L, 

100033L, 100033L, 100033L, 100033L, 100033L, 100034L, 100034L, 

100034L, 100037L, 100037L, 100037L, 100037L, 100037L, 100037L, 

100037L, 100044L, 100044L, 100044L, 100044L, 100044L, 100044L, 

100044L, 100045L, 100045L, 100045L, 100045L), nYear = c(5L, 6L, 

7L, 2L, 3L, 4L, 6L, 5L, 6L, 7L, 8L, 9L, 10L, 1L, 5L, 6L, 7L, 

8L, 9L, 5L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 1L, 2L, 3L, 4L, 

5L, 6L, 7L, 8L, 9L, 1L, 2L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 

13L, 14L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 1L, 2L, 3L, 4L, 5L, 

6L, 7L, 8L, 9L, 4L, 5L, 6L, 1L, 2L, 3L, 3L, 4L, 5L, 6L, 7L, 8L, 

2L, 3L, 5L, 5L, 6L, 7L, 8L, 9L, 11L, 13L, 2L, 3L, 4L, 6L, 7L, 

8L, 9L, 4L, 5L, 6L, 7L), Weekly_Gross_Pay_Main_Job = c(0, 0.58, 

0.35, 0.035, 0.65, 0.195, 0.43, 0, 0, 0, 0, 0, 0, 0.12, 1.653, 

0.967, 1.742, 1.323, 0, 0.709, 0.155, 0.431, 0.235, 0.17, 0.285, 

0.357, 0.28, 0.335, 0.375, 0.111, 0.333, 0.582, 0.882, 0.85, 

0.944, 1.615, 1.615, 1.35, 0.168, 0.08, 0, 0, 0, 0, 0, 0, 0, 

0.134, 0.737, 0, 0.02, 0.372, 0.1, 0.014, 0.307, 0.39, 0.671, 

0.5, 0.278, 0.32, 0.425, 0.4, 0.57, 0.917, 0.75, 0.402, 0.437, 

0.211, 0.537, 0.54, 0.135, 0.15, 0.65, 0.324, 0.399, 0.497, 0.67, 

0.825, 0.825, 0.25, 0.319, 0.35, 0.885, 0.941, 0.975, 0.975, 

1.02, 1.096, 1.148, 0.1, 0.11, 0.413, 0.477, 0.578, 0.686, 0.686, 

0.511, 0.578, 0.8, 0.75), Weekly_Gross_Pay_Main_Jobgmc = c(-0.31, 

0.27, 0.04, -0.2925, 0.3225, -0.1325, 0.1025, 0, 0, 0, 0, 0, 

0, 0, 0.516, -0.17, 0.605, 0.186, -1.137, 0, -0.136444444444444, 

0.139555555555556, -0.0564444444444445, -0.121444444444444, -0.00644444444444447, 

0.0655555555555555, -0.0114444444444444, 0.0435555555555556, 

0.0835555555555555, -0.809222222222222, -0.587222222222222, -0.338222222222222, 

-0.0382222222222223, -0.0702222222222223, 0.0237777777777777, 

0.694777777777778, 0.694777777777778, 0.429777777777778, 0.044, 

-0.044, -0.0871, -0.0871, -0.0871, -0.0871, -0.0871, -0.0871, 

-0.0871, 0.0469, 0.6499, -0.0871, -0.27675, 0.07525, -0.19675, 

-0.28275, 0.01025, 0.09325, 0.37425, 0.20325, -0.07775, -0.03575, 

0.06925, 0.04425, -0.0452, 0.3018, 0.1348, -0.2132, -0.1782, 

-0.218333333333333, 0.107666666666667, 0.110666666666667, -0.176666666666667, 

-0.161666666666667, 0.338333333333333, -0.266, -0.191, -0.093, 

0.0800000000000001, 0.235, 0.235, -0.0563333333333333, 0.0126666666666667, 

0.0436666666666666, -0.120714285714286, -0.0647142857142858, 

-0.0307142857142858, -0.0307142857142858, 0.0142857142857142, 

0.0902857142857143, 0.142285714285714, -0.335714285714286, -0.325714285714286, 

-0.0227142857142857, 0.0412857142857143, 0.142285714285714, 0.250285714285714, 

0.250285714285714, -0.1368, -0.0698000000000001, 0.1522, 0.1022

)), row.names = c(NA, 100L), class = "data.frame")

r regression longitudinal multilevel-analysis

asked Nov 20 at 3:16

aspark2020

205

asked Nov 20 at 3:16

aspark2020

205

asked Nov 20 at 3:16

aspark2020

205

asked Nov 20 at 3:16

aspark2020

205

asked Nov 20 at 3:16

aspark2020

205

add a comment |

2 Answers
2

active

oldest

votes

up vote
1
down vote

not sure if I'm reading you right, this might be a very naive answer missing the point, but doesn't "residuals" just work.
Here's a linear mixed effects model with some data i had lying around

    some.model<-lme(DV~IV, random=~1|Id, data=df)

    head(residuals(some.model))

       7            7           24           24           32           32 

    -0.054135825 -0.054135825  0.064271638  0.064271638 -0.001975424 -0.001975424

If you really want to put it into a column with the idnumber next to it it takes a few more steps. It probably can be done in a single step but i'm really bad.

   extra.column<-residuals(some.model)

   extra.column.id<-names(residuals(some.model))

   extra.column<-residuals(some.model)

   cbind(extra.column,extra.column.id)

   extra.column            extra.column.id

   7    "-0.0541358252373243"   "7"            

   7    "-0.0541358252373243"   "7"            

   24   "0.0642716380035857"    "24"           

   24   "0.0642716380035857"    "24"           

   32   "-0.0019754241828096"   "32"           

   32   "-0.0019754241828096"   "32"

Sorry if this is not what you're looking for, but check out the residuals command.

answered Nov 22 at 9:39

Huy Pham

1315

add a comment |

up vote
0
down vote

accepted

Here is how I ended up doing it:

#Before you begin, time needs to be grand-mean centered.

df$nYearmc <- df$nYear-mean(df$nYear, na.rm=TRUE)



#Now to regress the time-varying covariate onto grand-mean centered time and complete the process.



#First, create a group called `by_person`.

df <- tidyr::unite(df, Person_Year, c(Person_ID, nYearmc), remove=FALSE)

by_Person <- dplyr::group_by(df, Person_ID)



#Second, regress the time-varying covariate onto the newly created grand-mean centered time variable and merge with the main data frame.

df.Weekly_Gross_Pay_Main_Job <- dplyr::do(by_Person, augment(lm(Weekly_Gross_Pay_Main_Job~nYearmc, data=.)))

df.Weekly_Gross_Pay_Main_Job <- tidyr::unite(df.Weekly_Gross_Pay_Main_Job, Person_Year, c(Person_ID, nYearmc), remove=FALSE)

df <- merge(df, df.Weekly_Gross_Pay_Main_Job, by="Person_Year")



#Third, copy over the required columns (renaming them would be more efficient, but either way).

df$RegResGrossPay <- df$.resid



#Fourth, do an optional tidy up.

colnames(df)[colnames(df)=="Person_ID.x"] <- "Person_ID"

colnames(df)[colnames(df)=="nYearmc.x"] <- "nYearmc"

colnames(df)[colnames(df)=="Weekly_Gross_Pay_Main_Job.x"] <- "Weekly_Gross_Pay_Main_Job"

df$Person_ID.y <- NULL

df$nYearmc.y <- NULL

df$Weekly_Gross_Pay_Main_Job.y <- NULL

df$.fitted <- NULL

df$.se.fit <- NULL

df$.resid <- NULL

df$.hat <- NULL

df$.sigma <- NULL

df$.cooksd <- NULL

df$.std.resid <- NULL

df.Weekly_Gross_Pay_Main_Job <- NULL



#Fifth, generate plots of the variables you need.

ggplot(df, aes(nYearmc, RegResGrossPay))+geom_line(aes(group=Person_ID), alpha =1/3)+geom_smooth(method="lm",se=FALSE)

answered Nov 27 at 6:17

aspark2020

205

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53385696%2fhow-to-calculate-regression-residuals-in-r-for-each-individual-in-a-longitudinal%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

up vote
1
down vote

not sure if I'm reading you right, this might be a very naive answer missing the point, but doesn't "residuals" just work.
Here's a linear mixed effects model with some data i had lying around

    some.model<-lme(DV~IV, random=~1|Id, data=df)

    head(residuals(some.model))

       7            7           24           24           32           32 

    -0.054135825 -0.054135825  0.064271638  0.064271638 -0.001975424 -0.001975424

If you really want to put it into a column with the idnumber next to it it takes a few more steps. It probably can be done in a single step but i'm really bad.

   extra.column<-residuals(some.model)

   extra.column.id<-names(residuals(some.model))

   extra.column<-residuals(some.model)

   cbind(extra.column,extra.column.id)

   extra.column            extra.column.id

   7    "-0.0541358252373243"   "7"            

   7    "-0.0541358252373243"   "7"            

   24   "0.0642716380035857"    "24"           

   24   "0.0642716380035857"    "24"           

   32   "-0.0019754241828096"   "32"           

   32   "-0.0019754241828096"   "32"

Sorry if this is not what you're looking for, but check out the residuals command.

answered Nov 22 at 9:39

Huy Pham

1315

add a comment |

up vote
1
down vote

not sure if I'm reading you right, this might be a very naive answer missing the point, but doesn't "residuals" just work.
Here's a linear mixed effects model with some data i had lying around

    some.model<-lme(DV~IV, random=~1|Id, data=df)

    head(residuals(some.model))

       7            7           24           24           32           32 

    -0.054135825 -0.054135825  0.064271638  0.064271638 -0.001975424 -0.001975424

If you really want to put it into a column with the idnumber next to it it takes a few more steps. It probably can be done in a single step but i'm really bad.

   extra.column<-residuals(some.model)

   extra.column.id<-names(residuals(some.model))

   extra.column<-residuals(some.model)

   cbind(extra.column,extra.column.id)

   extra.column            extra.column.id

   7    "-0.0541358252373243"   "7"            

   7    "-0.0541358252373243"   "7"            

   24   "0.0642716380035857"    "24"           

   24   "0.0642716380035857"    "24"           

   32   "-0.0019754241828096"   "32"           

   32   "-0.0019754241828096"   "32"

Sorry if this is not what you're looking for, but check out the residuals command.

answered Nov 22 at 9:39

Huy Pham

1315

add a comment |

up vote
1
down vote

not sure if I'm reading you right, this might be a very naive answer missing the point, but doesn't "residuals" just work.
Here's a linear mixed effects model with some data i had lying around

    some.model<-lme(DV~IV, random=~1|Id, data=df)

    head(residuals(some.model))

       7            7           24           24           32           32 

    -0.054135825 -0.054135825  0.064271638  0.064271638 -0.001975424 -0.001975424

If you really want to put it into a column with the idnumber next to it it takes a few more steps. It probably can be done in a single step but i'm really bad.

   extra.column<-residuals(some.model)

   extra.column.id<-names(residuals(some.model))

   extra.column<-residuals(some.model)

   cbind(extra.column,extra.column.id)

   extra.column            extra.column.id

   7    "-0.0541358252373243"   "7"            

   7    "-0.0541358252373243"   "7"            

   24   "0.0642716380035857"    "24"           

   24   "0.0642716380035857"    "24"           

   32   "-0.0019754241828096"   "32"           

   32   "-0.0019754241828096"   "32"

Sorry if this is not what you're looking for, but check out the residuals command.

answered Nov 22 at 9:39

Huy Pham

1315

not sure if I'm reading you right, this might be a very naive answer missing the point, but doesn't "residuals" just work.
Here's a linear mixed effects model with some data i had lying around

    some.model<-lme(DV~IV, random=~1|Id, data=df)

    head(residuals(some.model))

       7            7           24           24           32           32 

    -0.054135825 -0.054135825  0.064271638  0.064271638 -0.001975424 -0.001975424

If you really want to put it into a column with the idnumber next to it it takes a few more steps. It probably can be done in a single step but i'm really bad.

   extra.column<-residuals(some.model)

   extra.column.id<-names(residuals(some.model))

   extra.column<-residuals(some.model)

   cbind(extra.column,extra.column.id)

   extra.column            extra.column.id

   7    "-0.0541358252373243"   "7"            

   7    "-0.0541358252373243"   "7"            

   24   "0.0642716380035857"    "24"           

   24   "0.0642716380035857"    "24"           

   32   "-0.0019754241828096"   "32"           

   32   "-0.0019754241828096"   "32"

Sorry if this is not what you're looking for, but check out the residuals command.

answered Nov 22 at 9:39

Huy Pham

1315

answered Nov 22 at 9:39

Huy Pham

1315

answered Nov 22 at 9:39

Huy Pham

1315

answered Nov 22 at 9:39

Huy Pham

1315

add a comment |

up vote
0
down vote

accepted

Here is how I ended up doing it:

#Before you begin, time needs to be grand-mean centered.

df$nYearmc <- df$nYear-mean(df$nYear, na.rm=TRUE)



#Now to regress the time-varying covariate onto grand-mean centered time and complete the process.



#First, create a group called `by_person`.

df <- tidyr::unite(df, Person_Year, c(Person_ID, nYearmc), remove=FALSE)

by_Person <- dplyr::group_by(df, Person_ID)



#Second, regress the time-varying covariate onto the newly created grand-mean centered time variable and merge with the main data frame.

df.Weekly_Gross_Pay_Main_Job <- dplyr::do(by_Person, augment(lm(Weekly_Gross_Pay_Main_Job~nYearmc, data=.)))

df.Weekly_Gross_Pay_Main_Job <- tidyr::unite(df.Weekly_Gross_Pay_Main_Job, Person_Year, c(Person_ID, nYearmc), remove=FALSE)

df <- merge(df, df.Weekly_Gross_Pay_Main_Job, by="Person_Year")



#Third, copy over the required columns (renaming them would be more efficient, but either way).

df$RegResGrossPay <- df$.resid



#Fourth, do an optional tidy up.

colnames(df)[colnames(df)=="Person_ID.x"] <- "Person_ID"

colnames(df)[colnames(df)=="nYearmc.x"] <- "nYearmc"

colnames(df)[colnames(df)=="Weekly_Gross_Pay_Main_Job.x"] <- "Weekly_Gross_Pay_Main_Job"

df$Person_ID.y <- NULL

df$nYearmc.y <- NULL

df$Weekly_Gross_Pay_Main_Job.y <- NULL

df$.fitted <- NULL

df$.se.fit <- NULL

df$.resid <- NULL

df$.hat <- NULL

df$.sigma <- NULL

df$.cooksd <- NULL

df$.std.resid <- NULL

df.Weekly_Gross_Pay_Main_Job <- NULL



#Fifth, generate plots of the variables you need.

ggplot(df, aes(nYearmc, RegResGrossPay))+geom_line(aes(group=Person_ID), alpha =1/3)+geom_smooth(method="lm",se=FALSE)

answered Nov 27 at 6:17

aspark2020

205

add a comment |

up vote
0
down vote

accepted

Here is how I ended up doing it:

#Before you begin, time needs to be grand-mean centered.

df$nYearmc <- df$nYear-mean(df$nYear, na.rm=TRUE)



#Now to regress the time-varying covariate onto grand-mean centered time and complete the process.



#First, create a group called `by_person`.

df <- tidyr::unite(df, Person_Year, c(Person_ID, nYearmc), remove=FALSE)

by_Person <- dplyr::group_by(df, Person_ID)



#Second, regress the time-varying covariate onto the newly created grand-mean centered time variable and merge with the main data frame.

df.Weekly_Gross_Pay_Main_Job <- dplyr::do(by_Person, augment(lm(Weekly_Gross_Pay_Main_Job~nYearmc, data=.)))

df.Weekly_Gross_Pay_Main_Job <- tidyr::unite(df.Weekly_Gross_Pay_Main_Job, Person_Year, c(Person_ID, nYearmc), remove=FALSE)

df <- merge(df, df.Weekly_Gross_Pay_Main_Job, by="Person_Year")



#Third, copy over the required columns (renaming them would be more efficient, but either way).

df$RegResGrossPay <- df$.resid



#Fourth, do an optional tidy up.

colnames(df)[colnames(df)=="Person_ID.x"] <- "Person_ID"

colnames(df)[colnames(df)=="nYearmc.x"] <- "nYearmc"

colnames(df)[colnames(df)=="Weekly_Gross_Pay_Main_Job.x"] <- "Weekly_Gross_Pay_Main_Job"

df$Person_ID.y <- NULL

df$nYearmc.y <- NULL

df$Weekly_Gross_Pay_Main_Job.y <- NULL

df$.fitted <- NULL

df$.se.fit <- NULL

df$.resid <- NULL

df$.hat <- NULL

df$.sigma <- NULL

df$.cooksd <- NULL

df$.std.resid <- NULL

df.Weekly_Gross_Pay_Main_Job <- NULL



#Fifth, generate plots of the variables you need.

ggplot(df, aes(nYearmc, RegResGrossPay))+geom_line(aes(group=Person_ID), alpha =1/3)+geom_smooth(method="lm",se=FALSE)

answered Nov 27 at 6:17

aspark2020

205

add a comment |

up vote
0
down vote

accepted

Here is how I ended up doing it:

#Before you begin, time needs to be grand-mean centered.

df$nYearmc <- df$nYear-mean(df$nYear, na.rm=TRUE)



#Now to regress the time-varying covariate onto grand-mean centered time and complete the process.



#First, create a group called `by_person`.

df <- tidyr::unite(df, Person_Year, c(Person_ID, nYearmc), remove=FALSE)

by_Person <- dplyr::group_by(df, Person_ID)



#Second, regress the time-varying covariate onto the newly created grand-mean centered time variable and merge with the main data frame.

df.Weekly_Gross_Pay_Main_Job <- dplyr::do(by_Person, augment(lm(Weekly_Gross_Pay_Main_Job~nYearmc, data=.)))

df.Weekly_Gross_Pay_Main_Job <- tidyr::unite(df.Weekly_Gross_Pay_Main_Job, Person_Year, c(Person_ID, nYearmc), remove=FALSE)

df <- merge(df, df.Weekly_Gross_Pay_Main_Job, by="Person_Year")



#Third, copy over the required columns (renaming them would be more efficient, but either way).

df$RegResGrossPay <- df$.resid



#Fourth, do an optional tidy up.

colnames(df)[colnames(df)=="Person_ID.x"] <- "Person_ID"

colnames(df)[colnames(df)=="nYearmc.x"] <- "nYearmc"

colnames(df)[colnames(df)=="Weekly_Gross_Pay_Main_Job.x"] <- "Weekly_Gross_Pay_Main_Job"

df$Person_ID.y <- NULL

df$nYearmc.y <- NULL

df$Weekly_Gross_Pay_Main_Job.y <- NULL

df$.fitted <- NULL

df$.se.fit <- NULL

df$.resid <- NULL

df$.hat <- NULL

df$.sigma <- NULL

df$.cooksd <- NULL

df$.std.resid <- NULL

df.Weekly_Gross_Pay_Main_Job <- NULL



#Fifth, generate plots of the variables you need.

ggplot(df, aes(nYearmc, RegResGrossPay))+geom_line(aes(group=Person_ID), alpha =1/3)+geom_smooth(method="lm",se=FALSE)

answered Nov 27 at 6:17

aspark2020

205

Here is how I ended up doing it:

#Before you begin, time needs to be grand-mean centered.

df$nYearmc <- df$nYear-mean(df$nYear, na.rm=TRUE)



#Now to regress the time-varying covariate onto grand-mean centered time and complete the process.



#First, create a group called `by_person`.

df <- tidyr::unite(df, Person_Year, c(Person_ID, nYearmc), remove=FALSE)

by_Person <- dplyr::group_by(df, Person_ID)



#Second, regress the time-varying covariate onto the newly created grand-mean centered time variable and merge with the main data frame.

df.Weekly_Gross_Pay_Main_Job <- dplyr::do(by_Person, augment(lm(Weekly_Gross_Pay_Main_Job~nYearmc, data=.)))

df.Weekly_Gross_Pay_Main_Job <- tidyr::unite(df.Weekly_Gross_Pay_Main_Job, Person_Year, c(Person_ID, nYearmc), remove=FALSE)

df <- merge(df, df.Weekly_Gross_Pay_Main_Job, by="Person_Year")



#Third, copy over the required columns (renaming them would be more efficient, but either way).

df$RegResGrossPay <- df$.resid



#Fourth, do an optional tidy up.

colnames(df)[colnames(df)=="Person_ID.x"] <- "Person_ID"

colnames(df)[colnames(df)=="nYearmc.x"] <- "nYearmc"

colnames(df)[colnames(df)=="Weekly_Gross_Pay_Main_Job.x"] <- "Weekly_Gross_Pay_Main_Job"

df$Person_ID.y <- NULL

df$nYearmc.y <- NULL

df$Weekly_Gross_Pay_Main_Job.y <- NULL

df$.fitted <- NULL

df$.se.fit <- NULL

df$.resid <- NULL

df$.hat <- NULL

df$.sigma <- NULL

df$.cooksd <- NULL

df$.std.resid <- NULL

df.Weekly_Gross_Pay_Main_Job <- NULL



#Fifth, generate plots of the variables you need.

ggplot(df, aes(nYearmc, RegResGrossPay))+geom_line(aes(group=Person_ID), alpha =1/3)+geom_smooth(method="lm",se=FALSE)

answered Nov 27 at 6:17

aspark2020

205

answered Nov 27 at 6:17

aspark2020

205

answered Nov 27 at 6:17

aspark2020

205

answered Nov 27 at 6:17

aspark2020

205

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Nsryjdtyk