How to calculate regression residuals in R for each individual in a longitudinal analysis?











up vote
0
down vote

favorite












I am working on a longitudinal/repeated measures multilevel model (MLM). Usually, for time-varying covariates (in my case "weekly gross income/1000"), you would calculate a person-mean centered version of the variable (i.e. deducting the person-year income response from the average of the person's weekly income across all of said person's time points). However, this can lead to bias (see here) and hence a better (more generalisable) approach is to center around a regression line for each individual (as it happens, the residuals from the regression serve this purpose).



Therefore, I need to calculate the following regression, but for each individual (roughly 10,000 individuals with 25,000 observations):



lm(Weekly_Gross_Pay_Main_Job~nYear, data=df)


Then, the really critical part is that I need to extract the residuals to a separate column in my main dataset, matched up with each person. These residuals will take the place of my group-mean centered variable (which will in turn be used in my MLM).



Here is a possible starting point using the function that I have for the group-mean centering. If this could be updated to fit a regression with the residuals output for each person, then that would be ideal (if not, then I am open to other approaches):



#Group mean-centering a variable. Relevant for L1 variables only.
gmc = function(variable, group){
return(ave(variable, group, FUN = function(x){x - mean(x)}))
}

df$Weekly_Gross_Pay_Main_Jobgmc <- gmc(df$Weekly_Gross_Pay_Main_Job, df$Person_ID)


Data extract in long format (where Person_ID is the person, nYear is time, Weekly_Gross_Pay_Main_Job is weekly income/1000 and Weekly_Gross_Pay_Main_Jobgmc is the group-mean centered version):



structure(list(Person_ID = c(100003L, 100003L, 100003L, 100006L, 
100006L, 100006L, 100006L, 100010L, 100010L, 100010L, 100010L,
100010L, 100010L, 100011L, 100014L, 100014L, 100014L, 100014L,
100014L, 100016L, 100018L, 100018L, 100018L, 100018L, 100018L,
100018L, 100018L, 100018L, 100018L, 100020L, 100020L, 100020L,
100020L, 100020L, 100020L, 100020L, 100020L, 100020L, 100021L,
100021L, 100024L, 100024L, 100024L, 100024L, 100024L, 100024L,
100024L, 100024L, 100024L, 100024L, 100025L, 100025L, 100025L,
100025L, 100025L, 100025L, 100025L, 100025L, 100027L, 100027L,
100027L, 100027L, 100029L, 100029L, 100029L, 100029L, 100029L,
100031L, 100031L, 100031L, 100032L, 100032L, 100032L, 100033L,
100033L, 100033L, 100033L, 100033L, 100033L, 100034L, 100034L,
100034L, 100037L, 100037L, 100037L, 100037L, 100037L, 100037L,
100037L, 100044L, 100044L, 100044L, 100044L, 100044L, 100044L,
100044L, 100045L, 100045L, 100045L, 100045L), nYear = c(5L, 6L,
7L, 2L, 3L, 4L, 6L, 5L, 6L, 7L, 8L, 9L, 10L, 1L, 5L, 6L, 7L,
8L, 9L, 5L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 1L, 2L, 3L, 4L,
5L, 6L, 7L, 8L, 9L, 1L, 2L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L,
13L, 14L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 1L, 2L, 3L, 4L, 5L,
6L, 7L, 8L, 9L, 4L, 5L, 6L, 1L, 2L, 3L, 3L, 4L, 5L, 6L, 7L, 8L,
2L, 3L, 5L, 5L, 6L, 7L, 8L, 9L, 11L, 13L, 2L, 3L, 4L, 6L, 7L,
8L, 9L, 4L, 5L, 6L, 7L), Weekly_Gross_Pay_Main_Job = c(0, 0.58,
0.35, 0.035, 0.65, 0.195, 0.43, 0, 0, 0, 0, 0, 0, 0.12, 1.653,
0.967, 1.742, 1.323, 0, 0.709, 0.155, 0.431, 0.235, 0.17, 0.285,
0.357, 0.28, 0.335, 0.375, 0.111, 0.333, 0.582, 0.882, 0.85,
0.944, 1.615, 1.615, 1.35, 0.168, 0.08, 0, 0, 0, 0, 0, 0, 0,
0.134, 0.737, 0, 0.02, 0.372, 0.1, 0.014, 0.307, 0.39, 0.671,
0.5, 0.278, 0.32, 0.425, 0.4, 0.57, 0.917, 0.75, 0.402, 0.437,
0.211, 0.537, 0.54, 0.135, 0.15, 0.65, 0.324, 0.399, 0.497, 0.67,
0.825, 0.825, 0.25, 0.319, 0.35, 0.885, 0.941, 0.975, 0.975,
1.02, 1.096, 1.148, 0.1, 0.11, 0.413, 0.477, 0.578, 0.686, 0.686,
0.511, 0.578, 0.8, 0.75), Weekly_Gross_Pay_Main_Jobgmc = c(-0.31,
0.27, 0.04, -0.2925, 0.3225, -0.1325, 0.1025, 0, 0, 0, 0, 0,
0, 0, 0.516, -0.17, 0.605, 0.186, -1.137, 0, -0.136444444444444,
0.139555555555556, -0.0564444444444445, -0.121444444444444, -0.00644444444444447,
0.0655555555555555, -0.0114444444444444, 0.0435555555555556,
0.0835555555555555, -0.809222222222222, -0.587222222222222, -0.338222222222222,
-0.0382222222222223, -0.0702222222222223, 0.0237777777777777,
0.694777777777778, 0.694777777777778, 0.429777777777778, 0.044,
-0.044, -0.0871, -0.0871, -0.0871, -0.0871, -0.0871, -0.0871,
-0.0871, 0.0469, 0.6499, -0.0871, -0.27675, 0.07525, -0.19675,
-0.28275, 0.01025, 0.09325, 0.37425, 0.20325, -0.07775, -0.03575,
0.06925, 0.04425, -0.0452, 0.3018, 0.1348, -0.2132, -0.1782,
-0.218333333333333, 0.107666666666667, 0.110666666666667, -0.176666666666667,
-0.161666666666667, 0.338333333333333, -0.266, -0.191, -0.093,
0.0800000000000001, 0.235, 0.235, -0.0563333333333333, 0.0126666666666667,
0.0436666666666666, -0.120714285714286, -0.0647142857142858,
-0.0307142857142858, -0.0307142857142858, 0.0142857142857142,
0.0902857142857143, 0.142285714285714, -0.335714285714286, -0.325714285714286,
-0.0227142857142857, 0.0412857142857143, 0.142285714285714, 0.250285714285714,
0.250285714285714, -0.1368, -0.0698000000000001, 0.1522, 0.1022
)), row.names = c(NA, 100L), class = "data.frame")









share|improve this question


























    up vote
    0
    down vote

    favorite












    I am working on a longitudinal/repeated measures multilevel model (MLM). Usually, for time-varying covariates (in my case "weekly gross income/1000"), you would calculate a person-mean centered version of the variable (i.e. deducting the person-year income response from the average of the person's weekly income across all of said person's time points). However, this can lead to bias (see here) and hence a better (more generalisable) approach is to center around a regression line for each individual (as it happens, the residuals from the regression serve this purpose).



    Therefore, I need to calculate the following regression, but for each individual (roughly 10,000 individuals with 25,000 observations):



    lm(Weekly_Gross_Pay_Main_Job~nYear, data=df)


    Then, the really critical part is that I need to extract the residuals to a separate column in my main dataset, matched up with each person. These residuals will take the place of my group-mean centered variable (which will in turn be used in my MLM).



    Here is a possible starting point using the function that I have for the group-mean centering. If this could be updated to fit a regression with the residuals output for each person, then that would be ideal (if not, then I am open to other approaches):



    #Group mean-centering a variable. Relevant for L1 variables only.
    gmc = function(variable, group){
    return(ave(variable, group, FUN = function(x){x - mean(x)}))
    }

    df$Weekly_Gross_Pay_Main_Jobgmc <- gmc(df$Weekly_Gross_Pay_Main_Job, df$Person_ID)


    Data extract in long format (where Person_ID is the person, nYear is time, Weekly_Gross_Pay_Main_Job is weekly income/1000 and Weekly_Gross_Pay_Main_Jobgmc is the group-mean centered version):



    structure(list(Person_ID = c(100003L, 100003L, 100003L, 100006L, 
    100006L, 100006L, 100006L, 100010L, 100010L, 100010L, 100010L,
    100010L, 100010L, 100011L, 100014L, 100014L, 100014L, 100014L,
    100014L, 100016L, 100018L, 100018L, 100018L, 100018L, 100018L,
    100018L, 100018L, 100018L, 100018L, 100020L, 100020L, 100020L,
    100020L, 100020L, 100020L, 100020L, 100020L, 100020L, 100021L,
    100021L, 100024L, 100024L, 100024L, 100024L, 100024L, 100024L,
    100024L, 100024L, 100024L, 100024L, 100025L, 100025L, 100025L,
    100025L, 100025L, 100025L, 100025L, 100025L, 100027L, 100027L,
    100027L, 100027L, 100029L, 100029L, 100029L, 100029L, 100029L,
    100031L, 100031L, 100031L, 100032L, 100032L, 100032L, 100033L,
    100033L, 100033L, 100033L, 100033L, 100033L, 100034L, 100034L,
    100034L, 100037L, 100037L, 100037L, 100037L, 100037L, 100037L,
    100037L, 100044L, 100044L, 100044L, 100044L, 100044L, 100044L,
    100044L, 100045L, 100045L, 100045L, 100045L), nYear = c(5L, 6L,
    7L, 2L, 3L, 4L, 6L, 5L, 6L, 7L, 8L, 9L, 10L, 1L, 5L, 6L, 7L,
    8L, 9L, 5L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 1L, 2L, 3L, 4L,
    5L, 6L, 7L, 8L, 9L, 1L, 2L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L,
    13L, 14L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 1L, 2L, 3L, 4L, 5L,
    6L, 7L, 8L, 9L, 4L, 5L, 6L, 1L, 2L, 3L, 3L, 4L, 5L, 6L, 7L, 8L,
    2L, 3L, 5L, 5L, 6L, 7L, 8L, 9L, 11L, 13L, 2L, 3L, 4L, 6L, 7L,
    8L, 9L, 4L, 5L, 6L, 7L), Weekly_Gross_Pay_Main_Job = c(0, 0.58,
    0.35, 0.035, 0.65, 0.195, 0.43, 0, 0, 0, 0, 0, 0, 0.12, 1.653,
    0.967, 1.742, 1.323, 0, 0.709, 0.155, 0.431, 0.235, 0.17, 0.285,
    0.357, 0.28, 0.335, 0.375, 0.111, 0.333, 0.582, 0.882, 0.85,
    0.944, 1.615, 1.615, 1.35, 0.168, 0.08, 0, 0, 0, 0, 0, 0, 0,
    0.134, 0.737, 0, 0.02, 0.372, 0.1, 0.014, 0.307, 0.39, 0.671,
    0.5, 0.278, 0.32, 0.425, 0.4, 0.57, 0.917, 0.75, 0.402, 0.437,
    0.211, 0.537, 0.54, 0.135, 0.15, 0.65, 0.324, 0.399, 0.497, 0.67,
    0.825, 0.825, 0.25, 0.319, 0.35, 0.885, 0.941, 0.975, 0.975,
    1.02, 1.096, 1.148, 0.1, 0.11, 0.413, 0.477, 0.578, 0.686, 0.686,
    0.511, 0.578, 0.8, 0.75), Weekly_Gross_Pay_Main_Jobgmc = c(-0.31,
    0.27, 0.04, -0.2925, 0.3225, -0.1325, 0.1025, 0, 0, 0, 0, 0,
    0, 0, 0.516, -0.17, 0.605, 0.186, -1.137, 0, -0.136444444444444,
    0.139555555555556, -0.0564444444444445, -0.121444444444444, -0.00644444444444447,
    0.0655555555555555, -0.0114444444444444, 0.0435555555555556,
    0.0835555555555555, -0.809222222222222, -0.587222222222222, -0.338222222222222,
    -0.0382222222222223, -0.0702222222222223, 0.0237777777777777,
    0.694777777777778, 0.694777777777778, 0.429777777777778, 0.044,
    -0.044, -0.0871, -0.0871, -0.0871, -0.0871, -0.0871, -0.0871,
    -0.0871, 0.0469, 0.6499, -0.0871, -0.27675, 0.07525, -0.19675,
    -0.28275, 0.01025, 0.09325, 0.37425, 0.20325, -0.07775, -0.03575,
    0.06925, 0.04425, -0.0452, 0.3018, 0.1348, -0.2132, -0.1782,
    -0.218333333333333, 0.107666666666667, 0.110666666666667, -0.176666666666667,
    -0.161666666666667, 0.338333333333333, -0.266, -0.191, -0.093,
    0.0800000000000001, 0.235, 0.235, -0.0563333333333333, 0.0126666666666667,
    0.0436666666666666, -0.120714285714286, -0.0647142857142858,
    -0.0307142857142858, -0.0307142857142858, 0.0142857142857142,
    0.0902857142857143, 0.142285714285714, -0.335714285714286, -0.325714285714286,
    -0.0227142857142857, 0.0412857142857143, 0.142285714285714, 0.250285714285714,
    0.250285714285714, -0.1368, -0.0698000000000001, 0.1522, 0.1022
    )), row.names = c(NA, 100L), class = "data.frame")









    share|improve this question
























      up vote
      0
      down vote

      favorite









      up vote
      0
      down vote

      favorite











      I am working on a longitudinal/repeated measures multilevel model (MLM). Usually, for time-varying covariates (in my case "weekly gross income/1000"), you would calculate a person-mean centered version of the variable (i.e. deducting the person-year income response from the average of the person's weekly income across all of said person's time points). However, this can lead to bias (see here) and hence a better (more generalisable) approach is to center around a regression line for each individual (as it happens, the residuals from the regression serve this purpose).



      Therefore, I need to calculate the following regression, but for each individual (roughly 10,000 individuals with 25,000 observations):



      lm(Weekly_Gross_Pay_Main_Job~nYear, data=df)


      Then, the really critical part is that I need to extract the residuals to a separate column in my main dataset, matched up with each person. These residuals will take the place of my group-mean centered variable (which will in turn be used in my MLM).



      Here is a possible starting point using the function that I have for the group-mean centering. If this could be updated to fit a regression with the residuals output for each person, then that would be ideal (if not, then I am open to other approaches):



      #Group mean-centering a variable. Relevant for L1 variables only.
      gmc = function(variable, group){
      return(ave(variable, group, FUN = function(x){x - mean(x)}))
      }

      df$Weekly_Gross_Pay_Main_Jobgmc <- gmc(df$Weekly_Gross_Pay_Main_Job, df$Person_ID)


      Data extract in long format (where Person_ID is the person, nYear is time, Weekly_Gross_Pay_Main_Job is weekly income/1000 and Weekly_Gross_Pay_Main_Jobgmc is the group-mean centered version):



      structure(list(Person_ID = c(100003L, 100003L, 100003L, 100006L, 
      100006L, 100006L, 100006L, 100010L, 100010L, 100010L, 100010L,
      100010L, 100010L, 100011L, 100014L, 100014L, 100014L, 100014L,
      100014L, 100016L, 100018L, 100018L, 100018L, 100018L, 100018L,
      100018L, 100018L, 100018L, 100018L, 100020L, 100020L, 100020L,
      100020L, 100020L, 100020L, 100020L, 100020L, 100020L, 100021L,
      100021L, 100024L, 100024L, 100024L, 100024L, 100024L, 100024L,
      100024L, 100024L, 100024L, 100024L, 100025L, 100025L, 100025L,
      100025L, 100025L, 100025L, 100025L, 100025L, 100027L, 100027L,
      100027L, 100027L, 100029L, 100029L, 100029L, 100029L, 100029L,
      100031L, 100031L, 100031L, 100032L, 100032L, 100032L, 100033L,
      100033L, 100033L, 100033L, 100033L, 100033L, 100034L, 100034L,
      100034L, 100037L, 100037L, 100037L, 100037L, 100037L, 100037L,
      100037L, 100044L, 100044L, 100044L, 100044L, 100044L, 100044L,
      100044L, 100045L, 100045L, 100045L, 100045L), nYear = c(5L, 6L,
      7L, 2L, 3L, 4L, 6L, 5L, 6L, 7L, 8L, 9L, 10L, 1L, 5L, 6L, 7L,
      8L, 9L, 5L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 1L, 2L, 3L, 4L,
      5L, 6L, 7L, 8L, 9L, 1L, 2L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L,
      13L, 14L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 1L, 2L, 3L, 4L, 5L,
      6L, 7L, 8L, 9L, 4L, 5L, 6L, 1L, 2L, 3L, 3L, 4L, 5L, 6L, 7L, 8L,
      2L, 3L, 5L, 5L, 6L, 7L, 8L, 9L, 11L, 13L, 2L, 3L, 4L, 6L, 7L,
      8L, 9L, 4L, 5L, 6L, 7L), Weekly_Gross_Pay_Main_Job = c(0, 0.58,
      0.35, 0.035, 0.65, 0.195, 0.43, 0, 0, 0, 0, 0, 0, 0.12, 1.653,
      0.967, 1.742, 1.323, 0, 0.709, 0.155, 0.431, 0.235, 0.17, 0.285,
      0.357, 0.28, 0.335, 0.375, 0.111, 0.333, 0.582, 0.882, 0.85,
      0.944, 1.615, 1.615, 1.35, 0.168, 0.08, 0, 0, 0, 0, 0, 0, 0,
      0.134, 0.737, 0, 0.02, 0.372, 0.1, 0.014, 0.307, 0.39, 0.671,
      0.5, 0.278, 0.32, 0.425, 0.4, 0.57, 0.917, 0.75, 0.402, 0.437,
      0.211, 0.537, 0.54, 0.135, 0.15, 0.65, 0.324, 0.399, 0.497, 0.67,
      0.825, 0.825, 0.25, 0.319, 0.35, 0.885, 0.941, 0.975, 0.975,
      1.02, 1.096, 1.148, 0.1, 0.11, 0.413, 0.477, 0.578, 0.686, 0.686,
      0.511, 0.578, 0.8, 0.75), Weekly_Gross_Pay_Main_Jobgmc = c(-0.31,
      0.27, 0.04, -0.2925, 0.3225, -0.1325, 0.1025, 0, 0, 0, 0, 0,
      0, 0, 0.516, -0.17, 0.605, 0.186, -1.137, 0, -0.136444444444444,
      0.139555555555556, -0.0564444444444445, -0.121444444444444, -0.00644444444444447,
      0.0655555555555555, -0.0114444444444444, 0.0435555555555556,
      0.0835555555555555, -0.809222222222222, -0.587222222222222, -0.338222222222222,
      -0.0382222222222223, -0.0702222222222223, 0.0237777777777777,
      0.694777777777778, 0.694777777777778, 0.429777777777778, 0.044,
      -0.044, -0.0871, -0.0871, -0.0871, -0.0871, -0.0871, -0.0871,
      -0.0871, 0.0469, 0.6499, -0.0871, -0.27675, 0.07525, -0.19675,
      -0.28275, 0.01025, 0.09325, 0.37425, 0.20325, -0.07775, -0.03575,
      0.06925, 0.04425, -0.0452, 0.3018, 0.1348, -0.2132, -0.1782,
      -0.218333333333333, 0.107666666666667, 0.110666666666667, -0.176666666666667,
      -0.161666666666667, 0.338333333333333, -0.266, -0.191, -0.093,
      0.0800000000000001, 0.235, 0.235, -0.0563333333333333, 0.0126666666666667,
      0.0436666666666666, -0.120714285714286, -0.0647142857142858,
      -0.0307142857142858, -0.0307142857142858, 0.0142857142857142,
      0.0902857142857143, 0.142285714285714, -0.335714285714286, -0.325714285714286,
      -0.0227142857142857, 0.0412857142857143, 0.142285714285714, 0.250285714285714,
      0.250285714285714, -0.1368, -0.0698000000000001, 0.1522, 0.1022
      )), row.names = c(NA, 100L), class = "data.frame")









      share|improve this question













      I am working on a longitudinal/repeated measures multilevel model (MLM). Usually, for time-varying covariates (in my case "weekly gross income/1000"), you would calculate a person-mean centered version of the variable (i.e. deducting the person-year income response from the average of the person's weekly income across all of said person's time points). However, this can lead to bias (see here) and hence a better (more generalisable) approach is to center around a regression line for each individual (as it happens, the residuals from the regression serve this purpose).



      Therefore, I need to calculate the following regression, but for each individual (roughly 10,000 individuals with 25,000 observations):



      lm(Weekly_Gross_Pay_Main_Job~nYear, data=df)


      Then, the really critical part is that I need to extract the residuals to a separate column in my main dataset, matched up with each person. These residuals will take the place of my group-mean centered variable (which will in turn be used in my MLM).



      Here is a possible starting point using the function that I have for the group-mean centering. If this could be updated to fit a regression with the residuals output for each person, then that would be ideal (if not, then I am open to other approaches):



      #Group mean-centering a variable. Relevant for L1 variables only.
      gmc = function(variable, group){
      return(ave(variable, group, FUN = function(x){x - mean(x)}))
      }

      df$Weekly_Gross_Pay_Main_Jobgmc <- gmc(df$Weekly_Gross_Pay_Main_Job, df$Person_ID)


      Data extract in long format (where Person_ID is the person, nYear is time, Weekly_Gross_Pay_Main_Job is weekly income/1000 and Weekly_Gross_Pay_Main_Jobgmc is the group-mean centered version):



      structure(list(Person_ID = c(100003L, 100003L, 100003L, 100006L, 
      100006L, 100006L, 100006L, 100010L, 100010L, 100010L, 100010L,
      100010L, 100010L, 100011L, 100014L, 100014L, 100014L, 100014L,
      100014L, 100016L, 100018L, 100018L, 100018L, 100018L, 100018L,
      100018L, 100018L, 100018L, 100018L, 100020L, 100020L, 100020L,
      100020L, 100020L, 100020L, 100020L, 100020L, 100020L, 100021L,
      100021L, 100024L, 100024L, 100024L, 100024L, 100024L, 100024L,
      100024L, 100024L, 100024L, 100024L, 100025L, 100025L, 100025L,
      100025L, 100025L, 100025L, 100025L, 100025L, 100027L, 100027L,
      100027L, 100027L, 100029L, 100029L, 100029L, 100029L, 100029L,
      100031L, 100031L, 100031L, 100032L, 100032L, 100032L, 100033L,
      100033L, 100033L, 100033L, 100033L, 100033L, 100034L, 100034L,
      100034L, 100037L, 100037L, 100037L, 100037L, 100037L, 100037L,
      100037L, 100044L, 100044L, 100044L, 100044L, 100044L, 100044L,
      100044L, 100045L, 100045L, 100045L, 100045L), nYear = c(5L, 6L,
      7L, 2L, 3L, 4L, 6L, 5L, 6L, 7L, 8L, 9L, 10L, 1L, 5L, 6L, 7L,
      8L, 9L, 5L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 1L, 2L, 3L, 4L,
      5L, 6L, 7L, 8L, 9L, 1L, 2L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L,
      13L, 14L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 1L, 2L, 3L, 4L, 5L,
      6L, 7L, 8L, 9L, 4L, 5L, 6L, 1L, 2L, 3L, 3L, 4L, 5L, 6L, 7L, 8L,
      2L, 3L, 5L, 5L, 6L, 7L, 8L, 9L, 11L, 13L, 2L, 3L, 4L, 6L, 7L,
      8L, 9L, 4L, 5L, 6L, 7L), Weekly_Gross_Pay_Main_Job = c(0, 0.58,
      0.35, 0.035, 0.65, 0.195, 0.43, 0, 0, 0, 0, 0, 0, 0.12, 1.653,
      0.967, 1.742, 1.323, 0, 0.709, 0.155, 0.431, 0.235, 0.17, 0.285,
      0.357, 0.28, 0.335, 0.375, 0.111, 0.333, 0.582, 0.882, 0.85,
      0.944, 1.615, 1.615, 1.35, 0.168, 0.08, 0, 0, 0, 0, 0, 0, 0,
      0.134, 0.737, 0, 0.02, 0.372, 0.1, 0.014, 0.307, 0.39, 0.671,
      0.5, 0.278, 0.32, 0.425, 0.4, 0.57, 0.917, 0.75, 0.402, 0.437,
      0.211, 0.537, 0.54, 0.135, 0.15, 0.65, 0.324, 0.399, 0.497, 0.67,
      0.825, 0.825, 0.25, 0.319, 0.35, 0.885, 0.941, 0.975, 0.975,
      1.02, 1.096, 1.148, 0.1, 0.11, 0.413, 0.477, 0.578, 0.686, 0.686,
      0.511, 0.578, 0.8, 0.75), Weekly_Gross_Pay_Main_Jobgmc = c(-0.31,
      0.27, 0.04, -0.2925, 0.3225, -0.1325, 0.1025, 0, 0, 0, 0, 0,
      0, 0, 0.516, -0.17, 0.605, 0.186, -1.137, 0, -0.136444444444444,
      0.139555555555556, -0.0564444444444445, -0.121444444444444, -0.00644444444444447,
      0.0655555555555555, -0.0114444444444444, 0.0435555555555556,
      0.0835555555555555, -0.809222222222222, -0.587222222222222, -0.338222222222222,
      -0.0382222222222223, -0.0702222222222223, 0.0237777777777777,
      0.694777777777778, 0.694777777777778, 0.429777777777778, 0.044,
      -0.044, -0.0871, -0.0871, -0.0871, -0.0871, -0.0871, -0.0871,
      -0.0871, 0.0469, 0.6499, -0.0871, -0.27675, 0.07525, -0.19675,
      -0.28275, 0.01025, 0.09325, 0.37425, 0.20325, -0.07775, -0.03575,
      0.06925, 0.04425, -0.0452, 0.3018, 0.1348, -0.2132, -0.1782,
      -0.218333333333333, 0.107666666666667, 0.110666666666667, -0.176666666666667,
      -0.161666666666667, 0.338333333333333, -0.266, -0.191, -0.093,
      0.0800000000000001, 0.235, 0.235, -0.0563333333333333, 0.0126666666666667,
      0.0436666666666666, -0.120714285714286, -0.0647142857142858,
      -0.0307142857142858, -0.0307142857142858, 0.0142857142857142,
      0.0902857142857143, 0.142285714285714, -0.335714285714286, -0.325714285714286,
      -0.0227142857142857, 0.0412857142857143, 0.142285714285714, 0.250285714285714,
      0.250285714285714, -0.1368, -0.0698000000000001, 0.1522, 0.1022
      )), row.names = c(NA, 100L), class = "data.frame")






      r regression longitudinal multilevel-analysis






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 20 at 3:16









      aspark2020

      205




      205
























          2 Answers
          2






          active

          oldest

          votes

















          up vote
          1
          down vote













          not sure if I'm reading you right, this might be a very naive answer missing the point, but doesn't "residuals" just work.
          Here's a linear mixed effects model with some data i had lying around



              some.model<-lme(DV~IV, random=~1|Id, data=df)
          head(residuals(some.model))
          7 7 24 24 32 32
          -0.054135825 -0.054135825 0.064271638 0.064271638 -0.001975424 -0.001975424


          If you really want to put it into a column with the idnumber next to it it takes a few more steps. It probably can be done in a single step but i'm really bad.



             extra.column<-residuals(some.model)
          extra.column.id<-names(residuals(some.model))
          extra.column<-residuals(some.model)
          cbind(extra.column,extra.column.id)
          extra.column extra.column.id
          7 "-0.0541358252373243" "7"
          7 "-0.0541358252373243" "7"
          24 "0.0642716380035857" "24"
          24 "0.0642716380035857" "24"
          32 "-0.0019754241828096" "32"
          32 "-0.0019754241828096" "32"


          Sorry if this is not what you're looking for, but check out the residuals command.






          share|improve this answer




























            up vote
            0
            down vote



            accepted










            Here is how I ended up doing it:



            #Before you begin, time needs to be grand-mean centered.
            df$nYearmc <- df$nYear-mean(df$nYear, na.rm=TRUE)

            #Now to regress the time-varying covariate onto grand-mean centered time and complete the process.

            #First, create a group called `by_person`.
            df <- tidyr::unite(df, Person_Year, c(Person_ID, nYearmc), remove=FALSE)
            by_Person <- dplyr::group_by(df, Person_ID)

            #Second, regress the time-varying covariate onto the newly created grand-mean centered time variable and merge with the main data frame.
            df.Weekly_Gross_Pay_Main_Job <- dplyr::do(by_Person, augment(lm(Weekly_Gross_Pay_Main_Job~nYearmc, data=.)))
            df.Weekly_Gross_Pay_Main_Job <- tidyr::unite(df.Weekly_Gross_Pay_Main_Job, Person_Year, c(Person_ID, nYearmc), remove=FALSE)
            df <- merge(df, df.Weekly_Gross_Pay_Main_Job, by="Person_Year")

            #Third, copy over the required columns (renaming them would be more efficient, but either way).
            df$RegResGrossPay <- df$.resid

            #Fourth, do an optional tidy up.
            colnames(df)[colnames(df)=="Person_ID.x"] <- "Person_ID"
            colnames(df)[colnames(df)=="nYearmc.x"] <- "nYearmc"
            colnames(df)[colnames(df)=="Weekly_Gross_Pay_Main_Job.x"] <- "Weekly_Gross_Pay_Main_Job"
            df$Person_ID.y <- NULL
            df$nYearmc.y <- NULL
            df$Weekly_Gross_Pay_Main_Job.y <- NULL
            df$.fitted <- NULL
            df$.se.fit <- NULL
            df$.resid <- NULL
            df$.hat <- NULL
            df$.sigma <- NULL
            df$.cooksd <- NULL
            df$.std.resid <- NULL
            df.Weekly_Gross_Pay_Main_Job <- NULL

            #Fifth, generate plots of the variables you need.
            ggplot(df, aes(nYearmc, RegResGrossPay))+geom_line(aes(group=Person_ID), alpha =1/3)+geom_smooth(method="lm",se=FALSE)





            share|improve this answer





















              Your Answer






              StackExchange.ifUsing("editor", function () {
              StackExchange.using("externalEditor", function () {
              StackExchange.using("snippets", function () {
              StackExchange.snippets.init();
              });
              });
              }, "code-snippets");

              StackExchange.ready(function() {
              var channelOptions = {
              tags: "".split(" "),
              id: "1"
              };
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function() {
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled) {
              StackExchange.using("snippets", function() {
              createEditor();
              });
              }
              else {
              createEditor();
              }
              });

              function createEditor() {
              StackExchange.prepareEditor({
              heartbeatType: 'answer',
              convertImagesToLinks: true,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: 10,
              bindNavPrevention: true,
              postfix: "",
              imageUploader: {
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              },
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              });


              }
              });














              draft saved

              draft discarded


















              StackExchange.ready(
              function () {
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53385696%2fhow-to-calculate-regression-residuals-in-r-for-each-individual-in-a-longitudinal%23new-answer', 'question_page');
              }
              );

              Post as a guest















              Required, but never shown

























              2 Answers
              2






              active

              oldest

              votes








              2 Answers
              2






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes








              up vote
              1
              down vote













              not sure if I'm reading you right, this might be a very naive answer missing the point, but doesn't "residuals" just work.
              Here's a linear mixed effects model with some data i had lying around



                  some.model<-lme(DV~IV, random=~1|Id, data=df)
              head(residuals(some.model))
              7 7 24 24 32 32
              -0.054135825 -0.054135825 0.064271638 0.064271638 -0.001975424 -0.001975424


              If you really want to put it into a column with the idnumber next to it it takes a few more steps. It probably can be done in a single step but i'm really bad.



                 extra.column<-residuals(some.model)
              extra.column.id<-names(residuals(some.model))
              extra.column<-residuals(some.model)
              cbind(extra.column,extra.column.id)
              extra.column extra.column.id
              7 "-0.0541358252373243" "7"
              7 "-0.0541358252373243" "7"
              24 "0.0642716380035857" "24"
              24 "0.0642716380035857" "24"
              32 "-0.0019754241828096" "32"
              32 "-0.0019754241828096" "32"


              Sorry if this is not what you're looking for, but check out the residuals command.






              share|improve this answer

























                up vote
                1
                down vote













                not sure if I'm reading you right, this might be a very naive answer missing the point, but doesn't "residuals" just work.
                Here's a linear mixed effects model with some data i had lying around



                    some.model<-lme(DV~IV, random=~1|Id, data=df)
                head(residuals(some.model))
                7 7 24 24 32 32
                -0.054135825 -0.054135825 0.064271638 0.064271638 -0.001975424 -0.001975424


                If you really want to put it into a column with the idnumber next to it it takes a few more steps. It probably can be done in a single step but i'm really bad.



                   extra.column<-residuals(some.model)
                extra.column.id<-names(residuals(some.model))
                extra.column<-residuals(some.model)
                cbind(extra.column,extra.column.id)
                extra.column extra.column.id
                7 "-0.0541358252373243" "7"
                7 "-0.0541358252373243" "7"
                24 "0.0642716380035857" "24"
                24 "0.0642716380035857" "24"
                32 "-0.0019754241828096" "32"
                32 "-0.0019754241828096" "32"


                Sorry if this is not what you're looking for, but check out the residuals command.






                share|improve this answer























                  up vote
                  1
                  down vote










                  up vote
                  1
                  down vote









                  not sure if I'm reading you right, this might be a very naive answer missing the point, but doesn't "residuals" just work.
                  Here's a linear mixed effects model with some data i had lying around



                      some.model<-lme(DV~IV, random=~1|Id, data=df)
                  head(residuals(some.model))
                  7 7 24 24 32 32
                  -0.054135825 -0.054135825 0.064271638 0.064271638 -0.001975424 -0.001975424


                  If you really want to put it into a column with the idnumber next to it it takes a few more steps. It probably can be done in a single step but i'm really bad.



                     extra.column<-residuals(some.model)
                  extra.column.id<-names(residuals(some.model))
                  extra.column<-residuals(some.model)
                  cbind(extra.column,extra.column.id)
                  extra.column extra.column.id
                  7 "-0.0541358252373243" "7"
                  7 "-0.0541358252373243" "7"
                  24 "0.0642716380035857" "24"
                  24 "0.0642716380035857" "24"
                  32 "-0.0019754241828096" "32"
                  32 "-0.0019754241828096" "32"


                  Sorry if this is not what you're looking for, but check out the residuals command.






                  share|improve this answer












                  not sure if I'm reading you right, this might be a very naive answer missing the point, but doesn't "residuals" just work.
                  Here's a linear mixed effects model with some data i had lying around



                      some.model<-lme(DV~IV, random=~1|Id, data=df)
                  head(residuals(some.model))
                  7 7 24 24 32 32
                  -0.054135825 -0.054135825 0.064271638 0.064271638 -0.001975424 -0.001975424


                  If you really want to put it into a column with the idnumber next to it it takes a few more steps. It probably can be done in a single step but i'm really bad.



                     extra.column<-residuals(some.model)
                  extra.column.id<-names(residuals(some.model))
                  extra.column<-residuals(some.model)
                  cbind(extra.column,extra.column.id)
                  extra.column extra.column.id
                  7 "-0.0541358252373243" "7"
                  7 "-0.0541358252373243" "7"
                  24 "0.0642716380035857" "24"
                  24 "0.0642716380035857" "24"
                  32 "-0.0019754241828096" "32"
                  32 "-0.0019754241828096" "32"


                  Sorry if this is not what you're looking for, but check out the residuals command.







                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Nov 22 at 9:39









                  Huy Pham

                  1315




                  1315
























                      up vote
                      0
                      down vote



                      accepted










                      Here is how I ended up doing it:



                      #Before you begin, time needs to be grand-mean centered.
                      df$nYearmc <- df$nYear-mean(df$nYear, na.rm=TRUE)

                      #Now to regress the time-varying covariate onto grand-mean centered time and complete the process.

                      #First, create a group called `by_person`.
                      df <- tidyr::unite(df, Person_Year, c(Person_ID, nYearmc), remove=FALSE)
                      by_Person <- dplyr::group_by(df, Person_ID)

                      #Second, regress the time-varying covariate onto the newly created grand-mean centered time variable and merge with the main data frame.
                      df.Weekly_Gross_Pay_Main_Job <- dplyr::do(by_Person, augment(lm(Weekly_Gross_Pay_Main_Job~nYearmc, data=.)))
                      df.Weekly_Gross_Pay_Main_Job <- tidyr::unite(df.Weekly_Gross_Pay_Main_Job, Person_Year, c(Person_ID, nYearmc), remove=FALSE)
                      df <- merge(df, df.Weekly_Gross_Pay_Main_Job, by="Person_Year")

                      #Third, copy over the required columns (renaming them would be more efficient, but either way).
                      df$RegResGrossPay <- df$.resid

                      #Fourth, do an optional tidy up.
                      colnames(df)[colnames(df)=="Person_ID.x"] <- "Person_ID"
                      colnames(df)[colnames(df)=="nYearmc.x"] <- "nYearmc"
                      colnames(df)[colnames(df)=="Weekly_Gross_Pay_Main_Job.x"] <- "Weekly_Gross_Pay_Main_Job"
                      df$Person_ID.y <- NULL
                      df$nYearmc.y <- NULL
                      df$Weekly_Gross_Pay_Main_Job.y <- NULL
                      df$.fitted <- NULL
                      df$.se.fit <- NULL
                      df$.resid <- NULL
                      df$.hat <- NULL
                      df$.sigma <- NULL
                      df$.cooksd <- NULL
                      df$.std.resid <- NULL
                      df.Weekly_Gross_Pay_Main_Job <- NULL

                      #Fifth, generate plots of the variables you need.
                      ggplot(df, aes(nYearmc, RegResGrossPay))+geom_line(aes(group=Person_ID), alpha =1/3)+geom_smooth(method="lm",se=FALSE)





                      share|improve this answer

























                        up vote
                        0
                        down vote



                        accepted










                        Here is how I ended up doing it:



                        #Before you begin, time needs to be grand-mean centered.
                        df$nYearmc <- df$nYear-mean(df$nYear, na.rm=TRUE)

                        #Now to regress the time-varying covariate onto grand-mean centered time and complete the process.

                        #First, create a group called `by_person`.
                        df <- tidyr::unite(df, Person_Year, c(Person_ID, nYearmc), remove=FALSE)
                        by_Person <- dplyr::group_by(df, Person_ID)

                        #Second, regress the time-varying covariate onto the newly created grand-mean centered time variable and merge with the main data frame.
                        df.Weekly_Gross_Pay_Main_Job <- dplyr::do(by_Person, augment(lm(Weekly_Gross_Pay_Main_Job~nYearmc, data=.)))
                        df.Weekly_Gross_Pay_Main_Job <- tidyr::unite(df.Weekly_Gross_Pay_Main_Job, Person_Year, c(Person_ID, nYearmc), remove=FALSE)
                        df <- merge(df, df.Weekly_Gross_Pay_Main_Job, by="Person_Year")

                        #Third, copy over the required columns (renaming them would be more efficient, but either way).
                        df$RegResGrossPay <- df$.resid

                        #Fourth, do an optional tidy up.
                        colnames(df)[colnames(df)=="Person_ID.x"] <- "Person_ID"
                        colnames(df)[colnames(df)=="nYearmc.x"] <- "nYearmc"
                        colnames(df)[colnames(df)=="Weekly_Gross_Pay_Main_Job.x"] <- "Weekly_Gross_Pay_Main_Job"
                        df$Person_ID.y <- NULL
                        df$nYearmc.y <- NULL
                        df$Weekly_Gross_Pay_Main_Job.y <- NULL
                        df$.fitted <- NULL
                        df$.se.fit <- NULL
                        df$.resid <- NULL
                        df$.hat <- NULL
                        df$.sigma <- NULL
                        df$.cooksd <- NULL
                        df$.std.resid <- NULL
                        df.Weekly_Gross_Pay_Main_Job <- NULL

                        #Fifth, generate plots of the variables you need.
                        ggplot(df, aes(nYearmc, RegResGrossPay))+geom_line(aes(group=Person_ID), alpha =1/3)+geom_smooth(method="lm",se=FALSE)





                        share|improve this answer























                          up vote
                          0
                          down vote



                          accepted







                          up vote
                          0
                          down vote



                          accepted






                          Here is how I ended up doing it:



                          #Before you begin, time needs to be grand-mean centered.
                          df$nYearmc <- df$nYear-mean(df$nYear, na.rm=TRUE)

                          #Now to regress the time-varying covariate onto grand-mean centered time and complete the process.

                          #First, create a group called `by_person`.
                          df <- tidyr::unite(df, Person_Year, c(Person_ID, nYearmc), remove=FALSE)
                          by_Person <- dplyr::group_by(df, Person_ID)

                          #Second, regress the time-varying covariate onto the newly created grand-mean centered time variable and merge with the main data frame.
                          df.Weekly_Gross_Pay_Main_Job <- dplyr::do(by_Person, augment(lm(Weekly_Gross_Pay_Main_Job~nYearmc, data=.)))
                          df.Weekly_Gross_Pay_Main_Job <- tidyr::unite(df.Weekly_Gross_Pay_Main_Job, Person_Year, c(Person_ID, nYearmc), remove=FALSE)
                          df <- merge(df, df.Weekly_Gross_Pay_Main_Job, by="Person_Year")

                          #Third, copy over the required columns (renaming them would be more efficient, but either way).
                          df$RegResGrossPay <- df$.resid

                          #Fourth, do an optional tidy up.
                          colnames(df)[colnames(df)=="Person_ID.x"] <- "Person_ID"
                          colnames(df)[colnames(df)=="nYearmc.x"] <- "nYearmc"
                          colnames(df)[colnames(df)=="Weekly_Gross_Pay_Main_Job.x"] <- "Weekly_Gross_Pay_Main_Job"
                          df$Person_ID.y <- NULL
                          df$nYearmc.y <- NULL
                          df$Weekly_Gross_Pay_Main_Job.y <- NULL
                          df$.fitted <- NULL
                          df$.se.fit <- NULL
                          df$.resid <- NULL
                          df$.hat <- NULL
                          df$.sigma <- NULL
                          df$.cooksd <- NULL
                          df$.std.resid <- NULL
                          df.Weekly_Gross_Pay_Main_Job <- NULL

                          #Fifth, generate plots of the variables you need.
                          ggplot(df, aes(nYearmc, RegResGrossPay))+geom_line(aes(group=Person_ID), alpha =1/3)+geom_smooth(method="lm",se=FALSE)





                          share|improve this answer












                          Here is how I ended up doing it:



                          #Before you begin, time needs to be grand-mean centered.
                          df$nYearmc <- df$nYear-mean(df$nYear, na.rm=TRUE)

                          #Now to regress the time-varying covariate onto grand-mean centered time and complete the process.

                          #First, create a group called `by_person`.
                          df <- tidyr::unite(df, Person_Year, c(Person_ID, nYearmc), remove=FALSE)
                          by_Person <- dplyr::group_by(df, Person_ID)

                          #Second, regress the time-varying covariate onto the newly created grand-mean centered time variable and merge with the main data frame.
                          df.Weekly_Gross_Pay_Main_Job <- dplyr::do(by_Person, augment(lm(Weekly_Gross_Pay_Main_Job~nYearmc, data=.)))
                          df.Weekly_Gross_Pay_Main_Job <- tidyr::unite(df.Weekly_Gross_Pay_Main_Job, Person_Year, c(Person_ID, nYearmc), remove=FALSE)
                          df <- merge(df, df.Weekly_Gross_Pay_Main_Job, by="Person_Year")

                          #Third, copy over the required columns (renaming them would be more efficient, but either way).
                          df$RegResGrossPay <- df$.resid

                          #Fourth, do an optional tidy up.
                          colnames(df)[colnames(df)=="Person_ID.x"] <- "Person_ID"
                          colnames(df)[colnames(df)=="nYearmc.x"] <- "nYearmc"
                          colnames(df)[colnames(df)=="Weekly_Gross_Pay_Main_Job.x"] <- "Weekly_Gross_Pay_Main_Job"
                          df$Person_ID.y <- NULL
                          df$nYearmc.y <- NULL
                          df$Weekly_Gross_Pay_Main_Job.y <- NULL
                          df$.fitted <- NULL
                          df$.se.fit <- NULL
                          df$.resid <- NULL
                          df$.hat <- NULL
                          df$.sigma <- NULL
                          df$.cooksd <- NULL
                          df$.std.resid <- NULL
                          df.Weekly_Gross_Pay_Main_Job <- NULL

                          #Fifth, generate plots of the variables you need.
                          ggplot(df, aes(nYearmc, RegResGrossPay))+geom_line(aes(group=Person_ID), alpha =1/3)+geom_smooth(method="lm",se=FALSE)






                          share|improve this answer












                          share|improve this answer



                          share|improve this answer










                          answered Nov 27 at 6:17









                          aspark2020

                          205




                          205






























                              draft saved

                              draft discarded




















































                              Thanks for contributing an answer to Stack Overflow!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid



                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.


                              To learn more, see our tips on writing great answers.





                              Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                              Please pay close attention to the following guidance:


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid



                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.


                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function () {
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53385696%2fhow-to-calculate-regression-residuals-in-r-for-each-individual-in-a-longitudinal%23new-answer', 'question_page');
                              }
                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              Create new schema in PostgreSQL using DBeaver

                              Deepest pit of an array with Javascript: test on Codility

                              Costa Masnaga