R - How to selectively copy long format case responses to future years?











up vote
0
down vote

favorite












I am conducting a longitudinal analysis with data in long format. I have personality (O, C, E, A, N) measured in 2005, 2009 and 2013 with my dependent variable (job satisfaction) measured every year. Each person has their own ID. I have two goals (which I think might be able to be achieved with the same underlying logic):



1) I need another variable (column) indicating the first year that personality was measured for each person. If personality was never measured, then return NA. For example, if personality was first measured in 2009, then “2009” needs to be returned in this column for every row for that ID.



2) I need another variable (column) which copies the personality item responses for each year for each ID, unless it is replaced by a later measurement year (i.e. 2009, 2013). For example, if an ID completed the personality test in 2005 and 2009, then I need their responses for 2005 copied to years 2006, 2007, and 2008, but not 2009, as later entries (if they exist) should then become what gets copied forward to future years.



I have tried making a function for these, but ultimately failed.



I hope that makes sense. Extract of 100 cases below:



structure(list(Person_ID = c(100003L, 100003L, 100003L, 100003L, 
100003L, 100003L, 100003L, 100003L, 100003L, 100003L, 100005L,
100005L, 100005L, 100005L, 100005L, 100005L, 100005L, 100005L,
100005L, 100006L, 100006L, 100006L, 100006L, 100006L, 100006L,
100006L, 100006L, 100007L, 100007L, 100007L, 100007L, 100007L,
100008L, 100008L, 100008L, 100009L, 100009L, 100010L, 100010L,
100010L, 100010L, 100010L, 100010L, 100010L, 100010L, 100010L,
100010L, 100011L, 100011L, 100011L, 100011L, 100011L, 100014L,
100014L, 100014L, 100014L, 100014L, 100014L, 100014L, 100014L,
100014L, 100014L, 100015L, 100015L, 100015L, 100015L, 100015L,
100015L, 100015L, 100015L, 100015L, 100015L, 100016L, 100016L,
100016L, 100016L, 100016L, 100016L, 100016L, 100016L, 100016L,
100016L, 100018L, 100018L, 100018L, 100018L, 100018L, 100018L,
100018L, 100018L, 100018L, 100018L, 100019L, 100019L, 100019L,
100019L, 100019L, 100019L, 100019L, 100019L), Job_Satisfaction = c(0L,
NA, 7L, NA, 8L, 10L, NA, NA, NA, NA, 9L, NA, NA, NA, NA, NA,
NA, NA, NA, 6L, 10L, 8L, 9L, 7L, NA, 9L, 3L, NA, 10L, NA, NA,
NA, NA, NA, NA, NA, NA, 5L, NA, NA, NA, NA, 5L, 4L, 8L, 5L, 5L,
NA, NA, NA, 8L, NA, 8L, 6L, 8L, 7L, 7L, NA, NA, NA, NA, NA, 10L,
9L, 9L, 9L, 10L, 8L, 10L, 9L, 8L, 9L, 7L, 9L, 9L, 8L, 8L, 9L,
9L, 9L, 8L, 9L, NA, 7L, 7L, 8L, 7L, 8L, 7L, 7L, 8L, 7L, 7L, 7L,
7L, 7L, 7L, 7L, 7L, 8L), Year = c(2005L, 2006L, 2007L, 2008L,
2009L, 2010L, 2011L, 2012L, 2013L, 2014L, 2005L, 2007L, 2008L,
2009L, 2010L, 2011L, 2012L, 2013L, 2014L, 2005L, 2008L, 2009L,
2010L, 2011L, 2012L, 2013L, 2014L, 2008L, 2009L, 2010L, 2011L,
2012L, 2005L, 2006L, 2007L, 2006L, 2007L, 2005L, 2006L, 2007L,
2008L, 2009L, 2010L, 2011L, 2012L, 2013L, 2014L, 2005L, 2006L,
2007L, 2008L, 2009L, 2005L, 2006L, 2007L, 2008L, 2009L, 2010L,
2011L, 2012L, 2013L, 2014L, 2005L, 2006L, 2007L, 2008L, 2009L,
2010L, 2011L, 2012L, 2013L, 2014L, 2005L, 2006L, 2007L, 2008L,
2009L, 2010L, 2011L, 2012L, 2013L, 2014L, 2005L, 2006L, 2007L,
2008L, 2009L, 2010L, 2011L, 2012L, 2013L, 2014L, 2005L, 2006L,
2007L, 2008L, 2009L, 2010L, 2011L, 2012L), O = c(5, NA, NA, NA,
5.5, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
4.83333333333333, NA, 4.16666666666667, NA, NA, NA, 3.5, NA,
NA, NA, NA, NA, NA, 3.83333333333333, NA, NA, NA, NA, 5, NA,
NA, NA, 5.16666666666667, NA, NA, NA, 5, NA, 6, NA, NA, NA, NA,
5.5, NA, NA, NA, 5.16666666666667, NA, NA, NA, 5, NA, 4.5, NA,
NA, NA, 4.33333333333333, NA, NA, NA, 4, NA, 3, NA, NA, NA, 3.16666666666667,
NA, NA, NA, 3.16666666666667, NA, 5.5, NA, NA, NA, 5.66666666666667,
NA, NA, NA, 5.33333333333333, NA, 5.5, NA, NA, NA, 5, NA, NA,
NA), C = c(4.66666666666667, NA, NA, NA, 6.83333333333333, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 4.33333333333333,
NA, 4.16666666666667, NA, NA, NA, 4.33333333333333, NA, NA, NA,
NA, NA, NA, 5.83333333333333, NA, NA, NA, NA, 5.66666666666667,
NA, NA, NA, 6.66666666666667, NA, NA, NA, 6.66666666666667, NA,
6.33333333333333, NA, NA, NA, NA, 5.83333333333333, NA, NA, NA,
6, NA, NA, NA, 6, NA, 6.66666666666667, NA, NA, NA, 6.5, NA,
NA, NA, 6.5, NA, 5.5, NA, NA, NA, 5.83333333333333, NA, NA, NA,
5.16666666666667, NA, 5.5, NA, NA, NA, 5.66666666666667, NA,
NA, NA, 5.16666666666667, NA, 5.66666666666667, NA, NA, NA, 6,
NA, NA, NA), E = c(4.33333333333333, NA, NA, NA, 5.16666666666667,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 4.66666666666667,
NA, 4.33333333333333, NA, NA, NA, 3.83333333333333, NA, NA, NA,
NA, NA, NA, 4.33333333333333, NA, NA, NA, NA, 5.5, NA, NA, NA,
5.16666666666667, NA, NA, NA, 5.33333333333333, NA, 6.16666666666667,
NA, NA, NA, NA, 5.83333333333333, NA, NA, NA, 4.83333333333333,
NA, NA, NA, 4.5, NA, 4.33333333333333, NA, NA, NA, 4.66666666666667,
NA, NA, NA, 3.83333333333333, NA, 3.5, NA, NA, NA, 3.5, NA, NA,
NA, 5.16666666666667, NA, 2, NA, NA, NA, 3.16666666666667, NA,
NA, NA, 2.66666666666667, NA, 2.33333333333333, NA, NA, NA, 2.5,
NA, NA, NA), A = c(6.75, NA, NA, NA, 7, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, 5.75, NA, 4.75, NA, NA, NA, 5,
NA, NA, NA, NA, NA, NA, 5.5, NA, NA, NA, NA, 5, NA, NA, NA, 4.75,
NA, NA, NA, 5, NA, 5.25, NA, NA, NA, NA, 5.5, NA, NA, NA, 5,
NA, NA, NA, 5.5, NA, 6.75, NA, NA, NA, 5.75, NA, NA, NA, 6.75,
NA, 5, NA, NA, NA, 3.5, NA, NA, NA, 4.75, NA, 5.75, NA, NA, NA,
5.75, NA, NA, NA, 5.5, NA, 5.75, NA, NA, NA, 5.75, NA, NA, NA
), N = c(3.16666666666667, NA, NA, NA, 3.33333333333333, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 3.66666666666667,
NA, 4.5, NA, NA, NA, 4.16666666666667, NA, NA, NA, NA, NA, NA,
2.66666666666667, NA, NA, NA, NA, 2.66666666666667, NA, NA, NA,
2.33333333333333, NA, NA, NA, 3.16666666666667, NA, 1.83333333333333,
NA, NA, NA, NA, 1.33333333333333, NA, NA, NA, 2.83333333333333,
NA, NA, NA, 1.83333333333333, NA, 3.83333333333333, NA, NA, NA,
3, NA, NA, NA, 3.33333333333333, NA, 1.66666666666667, NA, NA,
NA, 2.5, NA, NA, NA, 2.5, NA, 3.83333333333333, NA, NA, NA, 3.33333333333333,
NA, NA, NA, 4.16666666666667, NA, 3.83333333333333, NA, NA, NA,
4.5, NA, NA, NA)), .Names = c("Person_ID", "Job_Satisfaction",
"Year", "O", "C", "E", "A", "N"), row.names = c(NA, 100L), class = "data.frame")









share|improve this question


























    up vote
    0
    down vote

    favorite












    I am conducting a longitudinal analysis with data in long format. I have personality (O, C, E, A, N) measured in 2005, 2009 and 2013 with my dependent variable (job satisfaction) measured every year. Each person has their own ID. I have two goals (which I think might be able to be achieved with the same underlying logic):



    1) I need another variable (column) indicating the first year that personality was measured for each person. If personality was never measured, then return NA. For example, if personality was first measured in 2009, then “2009” needs to be returned in this column for every row for that ID.



    2) I need another variable (column) which copies the personality item responses for each year for each ID, unless it is replaced by a later measurement year (i.e. 2009, 2013). For example, if an ID completed the personality test in 2005 and 2009, then I need their responses for 2005 copied to years 2006, 2007, and 2008, but not 2009, as later entries (if they exist) should then become what gets copied forward to future years.



    I have tried making a function for these, but ultimately failed.



    I hope that makes sense. Extract of 100 cases below:



    structure(list(Person_ID = c(100003L, 100003L, 100003L, 100003L, 
    100003L, 100003L, 100003L, 100003L, 100003L, 100003L, 100005L,
    100005L, 100005L, 100005L, 100005L, 100005L, 100005L, 100005L,
    100005L, 100006L, 100006L, 100006L, 100006L, 100006L, 100006L,
    100006L, 100006L, 100007L, 100007L, 100007L, 100007L, 100007L,
    100008L, 100008L, 100008L, 100009L, 100009L, 100010L, 100010L,
    100010L, 100010L, 100010L, 100010L, 100010L, 100010L, 100010L,
    100010L, 100011L, 100011L, 100011L, 100011L, 100011L, 100014L,
    100014L, 100014L, 100014L, 100014L, 100014L, 100014L, 100014L,
    100014L, 100014L, 100015L, 100015L, 100015L, 100015L, 100015L,
    100015L, 100015L, 100015L, 100015L, 100015L, 100016L, 100016L,
    100016L, 100016L, 100016L, 100016L, 100016L, 100016L, 100016L,
    100016L, 100018L, 100018L, 100018L, 100018L, 100018L, 100018L,
    100018L, 100018L, 100018L, 100018L, 100019L, 100019L, 100019L,
    100019L, 100019L, 100019L, 100019L, 100019L), Job_Satisfaction = c(0L,
    NA, 7L, NA, 8L, 10L, NA, NA, NA, NA, 9L, NA, NA, NA, NA, NA,
    NA, NA, NA, 6L, 10L, 8L, 9L, 7L, NA, 9L, 3L, NA, 10L, NA, NA,
    NA, NA, NA, NA, NA, NA, 5L, NA, NA, NA, NA, 5L, 4L, 8L, 5L, 5L,
    NA, NA, NA, 8L, NA, 8L, 6L, 8L, 7L, 7L, NA, NA, NA, NA, NA, 10L,
    9L, 9L, 9L, 10L, 8L, 10L, 9L, 8L, 9L, 7L, 9L, 9L, 8L, 8L, 9L,
    9L, 9L, 8L, 9L, NA, 7L, 7L, 8L, 7L, 8L, 7L, 7L, 8L, 7L, 7L, 7L,
    7L, 7L, 7L, 7L, 7L, 8L), Year = c(2005L, 2006L, 2007L, 2008L,
    2009L, 2010L, 2011L, 2012L, 2013L, 2014L, 2005L, 2007L, 2008L,
    2009L, 2010L, 2011L, 2012L, 2013L, 2014L, 2005L, 2008L, 2009L,
    2010L, 2011L, 2012L, 2013L, 2014L, 2008L, 2009L, 2010L, 2011L,
    2012L, 2005L, 2006L, 2007L, 2006L, 2007L, 2005L, 2006L, 2007L,
    2008L, 2009L, 2010L, 2011L, 2012L, 2013L, 2014L, 2005L, 2006L,
    2007L, 2008L, 2009L, 2005L, 2006L, 2007L, 2008L, 2009L, 2010L,
    2011L, 2012L, 2013L, 2014L, 2005L, 2006L, 2007L, 2008L, 2009L,
    2010L, 2011L, 2012L, 2013L, 2014L, 2005L, 2006L, 2007L, 2008L,
    2009L, 2010L, 2011L, 2012L, 2013L, 2014L, 2005L, 2006L, 2007L,
    2008L, 2009L, 2010L, 2011L, 2012L, 2013L, 2014L, 2005L, 2006L,
    2007L, 2008L, 2009L, 2010L, 2011L, 2012L), O = c(5, NA, NA, NA,
    5.5, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
    4.83333333333333, NA, 4.16666666666667, NA, NA, NA, 3.5, NA,
    NA, NA, NA, NA, NA, 3.83333333333333, NA, NA, NA, NA, 5, NA,
    NA, NA, 5.16666666666667, NA, NA, NA, 5, NA, 6, NA, NA, NA, NA,
    5.5, NA, NA, NA, 5.16666666666667, NA, NA, NA, 5, NA, 4.5, NA,
    NA, NA, 4.33333333333333, NA, NA, NA, 4, NA, 3, NA, NA, NA, 3.16666666666667,
    NA, NA, NA, 3.16666666666667, NA, 5.5, NA, NA, NA, 5.66666666666667,
    NA, NA, NA, 5.33333333333333, NA, 5.5, NA, NA, NA, 5, NA, NA,
    NA), C = c(4.66666666666667, NA, NA, NA, 6.83333333333333, NA,
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 4.33333333333333,
    NA, 4.16666666666667, NA, NA, NA, 4.33333333333333, NA, NA, NA,
    NA, NA, NA, 5.83333333333333, NA, NA, NA, NA, 5.66666666666667,
    NA, NA, NA, 6.66666666666667, NA, NA, NA, 6.66666666666667, NA,
    6.33333333333333, NA, NA, NA, NA, 5.83333333333333, NA, NA, NA,
    6, NA, NA, NA, 6, NA, 6.66666666666667, NA, NA, NA, 6.5, NA,
    NA, NA, 6.5, NA, 5.5, NA, NA, NA, 5.83333333333333, NA, NA, NA,
    5.16666666666667, NA, 5.5, NA, NA, NA, 5.66666666666667, NA,
    NA, NA, 5.16666666666667, NA, 5.66666666666667, NA, NA, NA, 6,
    NA, NA, NA), E = c(4.33333333333333, NA, NA, NA, 5.16666666666667,
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 4.66666666666667,
    NA, 4.33333333333333, NA, NA, NA, 3.83333333333333, NA, NA, NA,
    NA, NA, NA, 4.33333333333333, NA, NA, NA, NA, 5.5, NA, NA, NA,
    5.16666666666667, NA, NA, NA, 5.33333333333333, NA, 6.16666666666667,
    NA, NA, NA, NA, 5.83333333333333, NA, NA, NA, 4.83333333333333,
    NA, NA, NA, 4.5, NA, 4.33333333333333, NA, NA, NA, 4.66666666666667,
    NA, NA, NA, 3.83333333333333, NA, 3.5, NA, NA, NA, 3.5, NA, NA,
    NA, 5.16666666666667, NA, 2, NA, NA, NA, 3.16666666666667, NA,
    NA, NA, 2.66666666666667, NA, 2.33333333333333, NA, NA, NA, 2.5,
    NA, NA, NA), A = c(6.75, NA, NA, NA, 7, NA, NA, NA, NA, NA, NA,
    NA, NA, NA, NA, NA, NA, NA, NA, 5.75, NA, 4.75, NA, NA, NA, 5,
    NA, NA, NA, NA, NA, NA, 5.5, NA, NA, NA, NA, 5, NA, NA, NA, 4.75,
    NA, NA, NA, 5, NA, 5.25, NA, NA, NA, NA, 5.5, NA, NA, NA, 5,
    NA, NA, NA, 5.5, NA, 6.75, NA, NA, NA, 5.75, NA, NA, NA, 6.75,
    NA, 5, NA, NA, NA, 3.5, NA, NA, NA, 4.75, NA, 5.75, NA, NA, NA,
    5.75, NA, NA, NA, 5.5, NA, 5.75, NA, NA, NA, 5.75, NA, NA, NA
    ), N = c(3.16666666666667, NA, NA, NA, 3.33333333333333, NA,
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 3.66666666666667,
    NA, 4.5, NA, NA, NA, 4.16666666666667, NA, NA, NA, NA, NA, NA,
    2.66666666666667, NA, NA, NA, NA, 2.66666666666667, NA, NA, NA,
    2.33333333333333, NA, NA, NA, 3.16666666666667, NA, 1.83333333333333,
    NA, NA, NA, NA, 1.33333333333333, NA, NA, NA, 2.83333333333333,
    NA, NA, NA, 1.83333333333333, NA, 3.83333333333333, NA, NA, NA,
    3, NA, NA, NA, 3.33333333333333, NA, 1.66666666666667, NA, NA,
    NA, 2.5, NA, NA, NA, 2.5, NA, 3.83333333333333, NA, NA, NA, 3.33333333333333,
    NA, NA, NA, 4.16666666666667, NA, 3.83333333333333, NA, NA, NA,
    4.5, NA, NA, NA)), .Names = c("Person_ID", "Job_Satisfaction",
    "Year", "O", "C", "E", "A", "N"), row.names = c(NA, 100L), class = "data.frame")









    share|improve this question
























      up vote
      0
      down vote

      favorite









      up vote
      0
      down vote

      favorite











      I am conducting a longitudinal analysis with data in long format. I have personality (O, C, E, A, N) measured in 2005, 2009 and 2013 with my dependent variable (job satisfaction) measured every year. Each person has their own ID. I have two goals (which I think might be able to be achieved with the same underlying logic):



      1) I need another variable (column) indicating the first year that personality was measured for each person. If personality was never measured, then return NA. For example, if personality was first measured in 2009, then “2009” needs to be returned in this column for every row for that ID.



      2) I need another variable (column) which copies the personality item responses for each year for each ID, unless it is replaced by a later measurement year (i.e. 2009, 2013). For example, if an ID completed the personality test in 2005 and 2009, then I need their responses for 2005 copied to years 2006, 2007, and 2008, but not 2009, as later entries (if they exist) should then become what gets copied forward to future years.



      I have tried making a function for these, but ultimately failed.



      I hope that makes sense. Extract of 100 cases below:



      structure(list(Person_ID = c(100003L, 100003L, 100003L, 100003L, 
      100003L, 100003L, 100003L, 100003L, 100003L, 100003L, 100005L,
      100005L, 100005L, 100005L, 100005L, 100005L, 100005L, 100005L,
      100005L, 100006L, 100006L, 100006L, 100006L, 100006L, 100006L,
      100006L, 100006L, 100007L, 100007L, 100007L, 100007L, 100007L,
      100008L, 100008L, 100008L, 100009L, 100009L, 100010L, 100010L,
      100010L, 100010L, 100010L, 100010L, 100010L, 100010L, 100010L,
      100010L, 100011L, 100011L, 100011L, 100011L, 100011L, 100014L,
      100014L, 100014L, 100014L, 100014L, 100014L, 100014L, 100014L,
      100014L, 100014L, 100015L, 100015L, 100015L, 100015L, 100015L,
      100015L, 100015L, 100015L, 100015L, 100015L, 100016L, 100016L,
      100016L, 100016L, 100016L, 100016L, 100016L, 100016L, 100016L,
      100016L, 100018L, 100018L, 100018L, 100018L, 100018L, 100018L,
      100018L, 100018L, 100018L, 100018L, 100019L, 100019L, 100019L,
      100019L, 100019L, 100019L, 100019L, 100019L), Job_Satisfaction = c(0L,
      NA, 7L, NA, 8L, 10L, NA, NA, NA, NA, 9L, NA, NA, NA, NA, NA,
      NA, NA, NA, 6L, 10L, 8L, 9L, 7L, NA, 9L, 3L, NA, 10L, NA, NA,
      NA, NA, NA, NA, NA, NA, 5L, NA, NA, NA, NA, 5L, 4L, 8L, 5L, 5L,
      NA, NA, NA, 8L, NA, 8L, 6L, 8L, 7L, 7L, NA, NA, NA, NA, NA, 10L,
      9L, 9L, 9L, 10L, 8L, 10L, 9L, 8L, 9L, 7L, 9L, 9L, 8L, 8L, 9L,
      9L, 9L, 8L, 9L, NA, 7L, 7L, 8L, 7L, 8L, 7L, 7L, 8L, 7L, 7L, 7L,
      7L, 7L, 7L, 7L, 7L, 8L), Year = c(2005L, 2006L, 2007L, 2008L,
      2009L, 2010L, 2011L, 2012L, 2013L, 2014L, 2005L, 2007L, 2008L,
      2009L, 2010L, 2011L, 2012L, 2013L, 2014L, 2005L, 2008L, 2009L,
      2010L, 2011L, 2012L, 2013L, 2014L, 2008L, 2009L, 2010L, 2011L,
      2012L, 2005L, 2006L, 2007L, 2006L, 2007L, 2005L, 2006L, 2007L,
      2008L, 2009L, 2010L, 2011L, 2012L, 2013L, 2014L, 2005L, 2006L,
      2007L, 2008L, 2009L, 2005L, 2006L, 2007L, 2008L, 2009L, 2010L,
      2011L, 2012L, 2013L, 2014L, 2005L, 2006L, 2007L, 2008L, 2009L,
      2010L, 2011L, 2012L, 2013L, 2014L, 2005L, 2006L, 2007L, 2008L,
      2009L, 2010L, 2011L, 2012L, 2013L, 2014L, 2005L, 2006L, 2007L,
      2008L, 2009L, 2010L, 2011L, 2012L, 2013L, 2014L, 2005L, 2006L,
      2007L, 2008L, 2009L, 2010L, 2011L, 2012L), O = c(5, NA, NA, NA,
      5.5, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
      4.83333333333333, NA, 4.16666666666667, NA, NA, NA, 3.5, NA,
      NA, NA, NA, NA, NA, 3.83333333333333, NA, NA, NA, NA, 5, NA,
      NA, NA, 5.16666666666667, NA, NA, NA, 5, NA, 6, NA, NA, NA, NA,
      5.5, NA, NA, NA, 5.16666666666667, NA, NA, NA, 5, NA, 4.5, NA,
      NA, NA, 4.33333333333333, NA, NA, NA, 4, NA, 3, NA, NA, NA, 3.16666666666667,
      NA, NA, NA, 3.16666666666667, NA, 5.5, NA, NA, NA, 5.66666666666667,
      NA, NA, NA, 5.33333333333333, NA, 5.5, NA, NA, NA, 5, NA, NA,
      NA), C = c(4.66666666666667, NA, NA, NA, 6.83333333333333, NA,
      NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 4.33333333333333,
      NA, 4.16666666666667, NA, NA, NA, 4.33333333333333, NA, NA, NA,
      NA, NA, NA, 5.83333333333333, NA, NA, NA, NA, 5.66666666666667,
      NA, NA, NA, 6.66666666666667, NA, NA, NA, 6.66666666666667, NA,
      6.33333333333333, NA, NA, NA, NA, 5.83333333333333, NA, NA, NA,
      6, NA, NA, NA, 6, NA, 6.66666666666667, NA, NA, NA, 6.5, NA,
      NA, NA, 6.5, NA, 5.5, NA, NA, NA, 5.83333333333333, NA, NA, NA,
      5.16666666666667, NA, 5.5, NA, NA, NA, 5.66666666666667, NA,
      NA, NA, 5.16666666666667, NA, 5.66666666666667, NA, NA, NA, 6,
      NA, NA, NA), E = c(4.33333333333333, NA, NA, NA, 5.16666666666667,
      NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 4.66666666666667,
      NA, 4.33333333333333, NA, NA, NA, 3.83333333333333, NA, NA, NA,
      NA, NA, NA, 4.33333333333333, NA, NA, NA, NA, 5.5, NA, NA, NA,
      5.16666666666667, NA, NA, NA, 5.33333333333333, NA, 6.16666666666667,
      NA, NA, NA, NA, 5.83333333333333, NA, NA, NA, 4.83333333333333,
      NA, NA, NA, 4.5, NA, 4.33333333333333, NA, NA, NA, 4.66666666666667,
      NA, NA, NA, 3.83333333333333, NA, 3.5, NA, NA, NA, 3.5, NA, NA,
      NA, 5.16666666666667, NA, 2, NA, NA, NA, 3.16666666666667, NA,
      NA, NA, 2.66666666666667, NA, 2.33333333333333, NA, NA, NA, 2.5,
      NA, NA, NA), A = c(6.75, NA, NA, NA, 7, NA, NA, NA, NA, NA, NA,
      NA, NA, NA, NA, NA, NA, NA, NA, 5.75, NA, 4.75, NA, NA, NA, 5,
      NA, NA, NA, NA, NA, NA, 5.5, NA, NA, NA, NA, 5, NA, NA, NA, 4.75,
      NA, NA, NA, 5, NA, 5.25, NA, NA, NA, NA, 5.5, NA, NA, NA, 5,
      NA, NA, NA, 5.5, NA, 6.75, NA, NA, NA, 5.75, NA, NA, NA, 6.75,
      NA, 5, NA, NA, NA, 3.5, NA, NA, NA, 4.75, NA, 5.75, NA, NA, NA,
      5.75, NA, NA, NA, 5.5, NA, 5.75, NA, NA, NA, 5.75, NA, NA, NA
      ), N = c(3.16666666666667, NA, NA, NA, 3.33333333333333, NA,
      NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 3.66666666666667,
      NA, 4.5, NA, NA, NA, 4.16666666666667, NA, NA, NA, NA, NA, NA,
      2.66666666666667, NA, NA, NA, NA, 2.66666666666667, NA, NA, NA,
      2.33333333333333, NA, NA, NA, 3.16666666666667, NA, 1.83333333333333,
      NA, NA, NA, NA, 1.33333333333333, NA, NA, NA, 2.83333333333333,
      NA, NA, NA, 1.83333333333333, NA, 3.83333333333333, NA, NA, NA,
      3, NA, NA, NA, 3.33333333333333, NA, 1.66666666666667, NA, NA,
      NA, 2.5, NA, NA, NA, 2.5, NA, 3.83333333333333, NA, NA, NA, 3.33333333333333,
      NA, NA, NA, 4.16666666666667, NA, 3.83333333333333, NA, NA, NA,
      4.5, NA, NA, NA)), .Names = c("Person_ID", "Job_Satisfaction",
      "Year", "O", "C", "E", "A", "N"), row.names = c(NA, 100L), class = "data.frame")









      share|improve this question













      I am conducting a longitudinal analysis with data in long format. I have personality (O, C, E, A, N) measured in 2005, 2009 and 2013 with my dependent variable (job satisfaction) measured every year. Each person has their own ID. I have two goals (which I think might be able to be achieved with the same underlying logic):



      1) I need another variable (column) indicating the first year that personality was measured for each person. If personality was never measured, then return NA. For example, if personality was first measured in 2009, then “2009” needs to be returned in this column for every row for that ID.



      2) I need another variable (column) which copies the personality item responses for each year for each ID, unless it is replaced by a later measurement year (i.e. 2009, 2013). For example, if an ID completed the personality test in 2005 and 2009, then I need their responses for 2005 copied to years 2006, 2007, and 2008, but not 2009, as later entries (if they exist) should then become what gets copied forward to future years.



      I have tried making a function for these, but ultimately failed.



      I hope that makes sense. Extract of 100 cases below:



      structure(list(Person_ID = c(100003L, 100003L, 100003L, 100003L, 
      100003L, 100003L, 100003L, 100003L, 100003L, 100003L, 100005L,
      100005L, 100005L, 100005L, 100005L, 100005L, 100005L, 100005L,
      100005L, 100006L, 100006L, 100006L, 100006L, 100006L, 100006L,
      100006L, 100006L, 100007L, 100007L, 100007L, 100007L, 100007L,
      100008L, 100008L, 100008L, 100009L, 100009L, 100010L, 100010L,
      100010L, 100010L, 100010L, 100010L, 100010L, 100010L, 100010L,
      100010L, 100011L, 100011L, 100011L, 100011L, 100011L, 100014L,
      100014L, 100014L, 100014L, 100014L, 100014L, 100014L, 100014L,
      100014L, 100014L, 100015L, 100015L, 100015L, 100015L, 100015L,
      100015L, 100015L, 100015L, 100015L, 100015L, 100016L, 100016L,
      100016L, 100016L, 100016L, 100016L, 100016L, 100016L, 100016L,
      100016L, 100018L, 100018L, 100018L, 100018L, 100018L, 100018L,
      100018L, 100018L, 100018L, 100018L, 100019L, 100019L, 100019L,
      100019L, 100019L, 100019L, 100019L, 100019L), Job_Satisfaction = c(0L,
      NA, 7L, NA, 8L, 10L, NA, NA, NA, NA, 9L, NA, NA, NA, NA, NA,
      NA, NA, NA, 6L, 10L, 8L, 9L, 7L, NA, 9L, 3L, NA, 10L, NA, NA,
      NA, NA, NA, NA, NA, NA, 5L, NA, NA, NA, NA, 5L, 4L, 8L, 5L, 5L,
      NA, NA, NA, 8L, NA, 8L, 6L, 8L, 7L, 7L, NA, NA, NA, NA, NA, 10L,
      9L, 9L, 9L, 10L, 8L, 10L, 9L, 8L, 9L, 7L, 9L, 9L, 8L, 8L, 9L,
      9L, 9L, 8L, 9L, NA, 7L, 7L, 8L, 7L, 8L, 7L, 7L, 8L, 7L, 7L, 7L,
      7L, 7L, 7L, 7L, 7L, 8L), Year = c(2005L, 2006L, 2007L, 2008L,
      2009L, 2010L, 2011L, 2012L, 2013L, 2014L, 2005L, 2007L, 2008L,
      2009L, 2010L, 2011L, 2012L, 2013L, 2014L, 2005L, 2008L, 2009L,
      2010L, 2011L, 2012L, 2013L, 2014L, 2008L, 2009L, 2010L, 2011L,
      2012L, 2005L, 2006L, 2007L, 2006L, 2007L, 2005L, 2006L, 2007L,
      2008L, 2009L, 2010L, 2011L, 2012L, 2013L, 2014L, 2005L, 2006L,
      2007L, 2008L, 2009L, 2005L, 2006L, 2007L, 2008L, 2009L, 2010L,
      2011L, 2012L, 2013L, 2014L, 2005L, 2006L, 2007L, 2008L, 2009L,
      2010L, 2011L, 2012L, 2013L, 2014L, 2005L, 2006L, 2007L, 2008L,
      2009L, 2010L, 2011L, 2012L, 2013L, 2014L, 2005L, 2006L, 2007L,
      2008L, 2009L, 2010L, 2011L, 2012L, 2013L, 2014L, 2005L, 2006L,
      2007L, 2008L, 2009L, 2010L, 2011L, 2012L), O = c(5, NA, NA, NA,
      5.5, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
      4.83333333333333, NA, 4.16666666666667, NA, NA, NA, 3.5, NA,
      NA, NA, NA, NA, NA, 3.83333333333333, NA, NA, NA, NA, 5, NA,
      NA, NA, 5.16666666666667, NA, NA, NA, 5, NA, 6, NA, NA, NA, NA,
      5.5, NA, NA, NA, 5.16666666666667, NA, NA, NA, 5, NA, 4.5, NA,
      NA, NA, 4.33333333333333, NA, NA, NA, 4, NA, 3, NA, NA, NA, 3.16666666666667,
      NA, NA, NA, 3.16666666666667, NA, 5.5, NA, NA, NA, 5.66666666666667,
      NA, NA, NA, 5.33333333333333, NA, 5.5, NA, NA, NA, 5, NA, NA,
      NA), C = c(4.66666666666667, NA, NA, NA, 6.83333333333333, NA,
      NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 4.33333333333333,
      NA, 4.16666666666667, NA, NA, NA, 4.33333333333333, NA, NA, NA,
      NA, NA, NA, 5.83333333333333, NA, NA, NA, NA, 5.66666666666667,
      NA, NA, NA, 6.66666666666667, NA, NA, NA, 6.66666666666667, NA,
      6.33333333333333, NA, NA, NA, NA, 5.83333333333333, NA, NA, NA,
      6, NA, NA, NA, 6, NA, 6.66666666666667, NA, NA, NA, 6.5, NA,
      NA, NA, 6.5, NA, 5.5, NA, NA, NA, 5.83333333333333, NA, NA, NA,
      5.16666666666667, NA, 5.5, NA, NA, NA, 5.66666666666667, NA,
      NA, NA, 5.16666666666667, NA, 5.66666666666667, NA, NA, NA, 6,
      NA, NA, NA), E = c(4.33333333333333, NA, NA, NA, 5.16666666666667,
      NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 4.66666666666667,
      NA, 4.33333333333333, NA, NA, NA, 3.83333333333333, NA, NA, NA,
      NA, NA, NA, 4.33333333333333, NA, NA, NA, NA, 5.5, NA, NA, NA,
      5.16666666666667, NA, NA, NA, 5.33333333333333, NA, 6.16666666666667,
      NA, NA, NA, NA, 5.83333333333333, NA, NA, NA, 4.83333333333333,
      NA, NA, NA, 4.5, NA, 4.33333333333333, NA, NA, NA, 4.66666666666667,
      NA, NA, NA, 3.83333333333333, NA, 3.5, NA, NA, NA, 3.5, NA, NA,
      NA, 5.16666666666667, NA, 2, NA, NA, NA, 3.16666666666667, NA,
      NA, NA, 2.66666666666667, NA, 2.33333333333333, NA, NA, NA, 2.5,
      NA, NA, NA), A = c(6.75, NA, NA, NA, 7, NA, NA, NA, NA, NA, NA,
      NA, NA, NA, NA, NA, NA, NA, NA, 5.75, NA, 4.75, NA, NA, NA, 5,
      NA, NA, NA, NA, NA, NA, 5.5, NA, NA, NA, NA, 5, NA, NA, NA, 4.75,
      NA, NA, NA, 5, NA, 5.25, NA, NA, NA, NA, 5.5, NA, NA, NA, 5,
      NA, NA, NA, 5.5, NA, 6.75, NA, NA, NA, 5.75, NA, NA, NA, 6.75,
      NA, 5, NA, NA, NA, 3.5, NA, NA, NA, 4.75, NA, 5.75, NA, NA, NA,
      5.75, NA, NA, NA, 5.5, NA, 5.75, NA, NA, NA, 5.75, NA, NA, NA
      ), N = c(3.16666666666667, NA, NA, NA, 3.33333333333333, NA,
      NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 3.66666666666667,
      NA, 4.5, NA, NA, NA, 4.16666666666667, NA, NA, NA, NA, NA, NA,
      2.66666666666667, NA, NA, NA, NA, 2.66666666666667, NA, NA, NA,
      2.33333333333333, NA, NA, NA, 3.16666666666667, NA, 1.83333333333333,
      NA, NA, NA, NA, 1.33333333333333, NA, NA, NA, 2.83333333333333,
      NA, NA, NA, 1.83333333333333, NA, 3.83333333333333, NA, NA, NA,
      3, NA, NA, NA, 3.33333333333333, NA, 1.66666666666667, NA, NA,
      NA, 2.5, NA, NA, NA, 2.5, NA, 3.83333333333333, NA, NA, NA, 3.33333333333333,
      NA, NA, NA, 4.16666666666667, NA, 3.83333333333333, NA, NA, NA,
      4.5, NA, NA, NA)), .Names = c("Person_ID", "Job_Satisfaction",
      "Year", "O", "C", "E", "A", "N"), row.names = c(NA, 100L), class = "data.frame")






      r






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Aug 26 at 4:56









      aspark2020

      205




      205
























          2 Answers
          2






          active

          oldest

          votes

















          up vote
          1
          down vote



          accepted










          I ended up using a series of loops to do what I needed. Using the trait of "O" as an example:



          df$PersonalityYear <- ifelse(df$Year=="2005",
          "2005",
          ifelse(df$Year=="2009",
          "2009",
          ifelse(df$Year=="2013",
          "2013",
          "No")))

          #Code to solve the second goal.
          df$MostRecentYear <- NA
          n <- nrow(df)
          for (i in 2:n) df$MostRecentYear[i] <- ifelse(df$PersonalityYear[i]=="2013",
          2013,
          ifelse(df$PersonalityYear[i]=="2009",
          2009,
          ifelse(df$PersonalityYear[i]=="2005",
          2005,
          ifelse(df$Person_ID[i]==df$Person_ID[i-1],
          df$MostRecentYear[i-1],
          NA))))

          df$MostRecentO <- df$O
          for (i in 2:n) df$MostRecentO[i] <- ifelse(is.na(df$MostRecentYear[i]),
          NA,
          ifelse(df$MostRecentYear[i]==df$PersonalityYear[i],
          df$O[i],
          df$MostRecentO[i-1]))

          #Code to solve the first goal.
          df$FirstYear <- NA
          for (i in 2:n) df$FirstYear[i] <- ifelse(df$Person_ID[i]==df$Person_ID[i-1],
          ifelse(is.na(df$MostRecentYear[i-1]),
          df$MostRecentYear[i],
          df$FirstYear[i-1]),
          df$MostRecentYear[i])

          df$FirstYearO <- df$O
          for (i in 2:n) df$FirstYearO[i] <- ifelse(is.na(df$FirstYear[i]),
          NA,
          ifelse(df$FirstYear[i]==df$PersonalityYear[i],
          df$O[i],
          df$FirstYearO[i-1]))





          share|improve this answer




























            up vote
            0
            down vote













            library(dplyr)
            library(zoo)
            data %>% group_by(Person_ID) %>%
            mutate(first_year=dplyr::first(Year[!is.na(O)])) %>% #return first year where O is not missing
            mutate(O_fill=na.locf(O, na.rm = FALSE)) %>% #using zoo::na.locf to replace NA's with non-NA prior to it
            head(n=20)

            # A tibble: 20 x 10
            # Groups: Person_ID [3]
            Person_ID Job_Satisfaction Year O C E A N first_year O_fill
            <int> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl>
            1 100003 0 2005 5 4.67 4.33 6.75 3.17 2005 5
            2 100003 NA 2006 NA NA NA NA NA 2005 5
            3 100003 7 2007 NA NA NA NA NA 2005 5
            4 100003 NA 2008 NA NA NA NA NA 2005 5
            5 100003 8 2009 5.5 6.83 5.17 7 3.33 2005 5.5
            6 100003 10 2010 NA NA NA NA NA 2005 5.5
            7 100003 NA 2011 NA NA NA NA NA 2005 5.5
            8 100003 NA 2012 NA NA NA NA NA 2005 5.5
            9 100003 NA 2013 NA NA NA NA NA 2005 5.5
            10 100003 NA 2014 NA NA NA NA NA 2005 5.5
            11 100005 9 2005 NA NA NA NA NA NA NA
            12 100005 NA 2007 NA NA NA NA NA NA NA
            13 100005 NA 2008 NA NA NA NA NA NA NA
            14 100005 NA 2009 NA NA NA NA NA NA NA
            15 100005 NA 2010 NA NA NA NA NA NA NA
            16 100005 NA 2011 NA NA NA NA NA NA NA
            17 100005 NA 2012 NA NA NA NA NA NA NA
            18 100005 NA 2013 NA NA NA NA NA NA NA
            19 100005 NA 2014 NA NA NA NA NA NA NA
            20 100006 6 2005 4.83 4.33 4.67 5.75 3.67 2005 4.83





            share|improve this answer























            • Thanks for this. Unfortunately, though, this does not appear to reset the data in 'first_year' or 'O_fill' for a new ID.
              – aspark2020
              Aug 26 at 22:24










            • @aspark2020 as you can see above, it works for me. Probably you have a name collision between dplyr and plyr in group_by verb, try loading dplyr and zoo in a fresh R session or call group_by directly using dplyr::group_by.
              – A. Suliman
              Aug 27 at 2:37













            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f52023309%2fr-how-to-selectively-copy-long-format-case-responses-to-future-years%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            2 Answers
            2






            active

            oldest

            votes








            2 Answers
            2






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes








            up vote
            1
            down vote



            accepted










            I ended up using a series of loops to do what I needed. Using the trait of "O" as an example:



            df$PersonalityYear <- ifelse(df$Year=="2005",
            "2005",
            ifelse(df$Year=="2009",
            "2009",
            ifelse(df$Year=="2013",
            "2013",
            "No")))

            #Code to solve the second goal.
            df$MostRecentYear <- NA
            n <- nrow(df)
            for (i in 2:n) df$MostRecentYear[i] <- ifelse(df$PersonalityYear[i]=="2013",
            2013,
            ifelse(df$PersonalityYear[i]=="2009",
            2009,
            ifelse(df$PersonalityYear[i]=="2005",
            2005,
            ifelse(df$Person_ID[i]==df$Person_ID[i-1],
            df$MostRecentYear[i-1],
            NA))))

            df$MostRecentO <- df$O
            for (i in 2:n) df$MostRecentO[i] <- ifelse(is.na(df$MostRecentYear[i]),
            NA,
            ifelse(df$MostRecentYear[i]==df$PersonalityYear[i],
            df$O[i],
            df$MostRecentO[i-1]))

            #Code to solve the first goal.
            df$FirstYear <- NA
            for (i in 2:n) df$FirstYear[i] <- ifelse(df$Person_ID[i]==df$Person_ID[i-1],
            ifelse(is.na(df$MostRecentYear[i-1]),
            df$MostRecentYear[i],
            df$FirstYear[i-1]),
            df$MostRecentYear[i])

            df$FirstYearO <- df$O
            for (i in 2:n) df$FirstYearO[i] <- ifelse(is.na(df$FirstYear[i]),
            NA,
            ifelse(df$FirstYear[i]==df$PersonalityYear[i],
            df$O[i],
            df$FirstYearO[i-1]))





            share|improve this answer

























              up vote
              1
              down vote



              accepted










              I ended up using a series of loops to do what I needed. Using the trait of "O" as an example:



              df$PersonalityYear <- ifelse(df$Year=="2005",
              "2005",
              ifelse(df$Year=="2009",
              "2009",
              ifelse(df$Year=="2013",
              "2013",
              "No")))

              #Code to solve the second goal.
              df$MostRecentYear <- NA
              n <- nrow(df)
              for (i in 2:n) df$MostRecentYear[i] <- ifelse(df$PersonalityYear[i]=="2013",
              2013,
              ifelse(df$PersonalityYear[i]=="2009",
              2009,
              ifelse(df$PersonalityYear[i]=="2005",
              2005,
              ifelse(df$Person_ID[i]==df$Person_ID[i-1],
              df$MostRecentYear[i-1],
              NA))))

              df$MostRecentO <- df$O
              for (i in 2:n) df$MostRecentO[i] <- ifelse(is.na(df$MostRecentYear[i]),
              NA,
              ifelse(df$MostRecentYear[i]==df$PersonalityYear[i],
              df$O[i],
              df$MostRecentO[i-1]))

              #Code to solve the first goal.
              df$FirstYear <- NA
              for (i in 2:n) df$FirstYear[i] <- ifelse(df$Person_ID[i]==df$Person_ID[i-1],
              ifelse(is.na(df$MostRecentYear[i-1]),
              df$MostRecentYear[i],
              df$FirstYear[i-1]),
              df$MostRecentYear[i])

              df$FirstYearO <- df$O
              for (i in 2:n) df$FirstYearO[i] <- ifelse(is.na(df$FirstYear[i]),
              NA,
              ifelse(df$FirstYear[i]==df$PersonalityYear[i],
              df$O[i],
              df$FirstYearO[i-1]))





              share|improve this answer























                up vote
                1
                down vote



                accepted







                up vote
                1
                down vote



                accepted






                I ended up using a series of loops to do what I needed. Using the trait of "O" as an example:



                df$PersonalityYear <- ifelse(df$Year=="2005",
                "2005",
                ifelse(df$Year=="2009",
                "2009",
                ifelse(df$Year=="2013",
                "2013",
                "No")))

                #Code to solve the second goal.
                df$MostRecentYear <- NA
                n <- nrow(df)
                for (i in 2:n) df$MostRecentYear[i] <- ifelse(df$PersonalityYear[i]=="2013",
                2013,
                ifelse(df$PersonalityYear[i]=="2009",
                2009,
                ifelse(df$PersonalityYear[i]=="2005",
                2005,
                ifelse(df$Person_ID[i]==df$Person_ID[i-1],
                df$MostRecentYear[i-1],
                NA))))

                df$MostRecentO <- df$O
                for (i in 2:n) df$MostRecentO[i] <- ifelse(is.na(df$MostRecentYear[i]),
                NA,
                ifelse(df$MostRecentYear[i]==df$PersonalityYear[i],
                df$O[i],
                df$MostRecentO[i-1]))

                #Code to solve the first goal.
                df$FirstYear <- NA
                for (i in 2:n) df$FirstYear[i] <- ifelse(df$Person_ID[i]==df$Person_ID[i-1],
                ifelse(is.na(df$MostRecentYear[i-1]),
                df$MostRecentYear[i],
                df$FirstYear[i-1]),
                df$MostRecentYear[i])

                df$FirstYearO <- df$O
                for (i in 2:n) df$FirstYearO[i] <- ifelse(is.na(df$FirstYear[i]),
                NA,
                ifelse(df$FirstYear[i]==df$PersonalityYear[i],
                df$O[i],
                df$FirstYearO[i-1]))





                share|improve this answer












                I ended up using a series of loops to do what I needed. Using the trait of "O" as an example:



                df$PersonalityYear <- ifelse(df$Year=="2005",
                "2005",
                ifelse(df$Year=="2009",
                "2009",
                ifelse(df$Year=="2013",
                "2013",
                "No")))

                #Code to solve the second goal.
                df$MostRecentYear <- NA
                n <- nrow(df)
                for (i in 2:n) df$MostRecentYear[i] <- ifelse(df$PersonalityYear[i]=="2013",
                2013,
                ifelse(df$PersonalityYear[i]=="2009",
                2009,
                ifelse(df$PersonalityYear[i]=="2005",
                2005,
                ifelse(df$Person_ID[i]==df$Person_ID[i-1],
                df$MostRecentYear[i-1],
                NA))))

                df$MostRecentO <- df$O
                for (i in 2:n) df$MostRecentO[i] <- ifelse(is.na(df$MostRecentYear[i]),
                NA,
                ifelse(df$MostRecentYear[i]==df$PersonalityYear[i],
                df$O[i],
                df$MostRecentO[i-1]))

                #Code to solve the first goal.
                df$FirstYear <- NA
                for (i in 2:n) df$FirstYear[i] <- ifelse(df$Person_ID[i]==df$Person_ID[i-1],
                ifelse(is.na(df$MostRecentYear[i-1]),
                df$MostRecentYear[i],
                df$FirstYear[i-1]),
                df$MostRecentYear[i])

                df$FirstYearO <- df$O
                for (i in 2:n) df$FirstYearO[i] <- ifelse(is.na(df$FirstYear[i]),
                NA,
                ifelse(df$FirstYear[i]==df$PersonalityYear[i],
                df$O[i],
                df$FirstYearO[i-1]))






                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Nov 20 at 1:38









                aspark2020

                205




                205
























                    up vote
                    0
                    down vote













                    library(dplyr)
                    library(zoo)
                    data %>% group_by(Person_ID) %>%
                    mutate(first_year=dplyr::first(Year[!is.na(O)])) %>% #return first year where O is not missing
                    mutate(O_fill=na.locf(O, na.rm = FALSE)) %>% #using zoo::na.locf to replace NA's with non-NA prior to it
                    head(n=20)

                    # A tibble: 20 x 10
                    # Groups: Person_ID [3]
                    Person_ID Job_Satisfaction Year O C E A N first_year O_fill
                    <int> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl>
                    1 100003 0 2005 5 4.67 4.33 6.75 3.17 2005 5
                    2 100003 NA 2006 NA NA NA NA NA 2005 5
                    3 100003 7 2007 NA NA NA NA NA 2005 5
                    4 100003 NA 2008 NA NA NA NA NA 2005 5
                    5 100003 8 2009 5.5 6.83 5.17 7 3.33 2005 5.5
                    6 100003 10 2010 NA NA NA NA NA 2005 5.5
                    7 100003 NA 2011 NA NA NA NA NA 2005 5.5
                    8 100003 NA 2012 NA NA NA NA NA 2005 5.5
                    9 100003 NA 2013 NA NA NA NA NA 2005 5.5
                    10 100003 NA 2014 NA NA NA NA NA 2005 5.5
                    11 100005 9 2005 NA NA NA NA NA NA NA
                    12 100005 NA 2007 NA NA NA NA NA NA NA
                    13 100005 NA 2008 NA NA NA NA NA NA NA
                    14 100005 NA 2009 NA NA NA NA NA NA NA
                    15 100005 NA 2010 NA NA NA NA NA NA NA
                    16 100005 NA 2011 NA NA NA NA NA NA NA
                    17 100005 NA 2012 NA NA NA NA NA NA NA
                    18 100005 NA 2013 NA NA NA NA NA NA NA
                    19 100005 NA 2014 NA NA NA NA NA NA NA
                    20 100006 6 2005 4.83 4.33 4.67 5.75 3.67 2005 4.83





                    share|improve this answer























                    • Thanks for this. Unfortunately, though, this does not appear to reset the data in 'first_year' or 'O_fill' for a new ID.
                      – aspark2020
                      Aug 26 at 22:24










                    • @aspark2020 as you can see above, it works for me. Probably you have a name collision between dplyr and plyr in group_by verb, try loading dplyr and zoo in a fresh R session or call group_by directly using dplyr::group_by.
                      – A. Suliman
                      Aug 27 at 2:37

















                    up vote
                    0
                    down vote













                    library(dplyr)
                    library(zoo)
                    data %>% group_by(Person_ID) %>%
                    mutate(first_year=dplyr::first(Year[!is.na(O)])) %>% #return first year where O is not missing
                    mutate(O_fill=na.locf(O, na.rm = FALSE)) %>% #using zoo::na.locf to replace NA's with non-NA prior to it
                    head(n=20)

                    # A tibble: 20 x 10
                    # Groups: Person_ID [3]
                    Person_ID Job_Satisfaction Year O C E A N first_year O_fill
                    <int> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl>
                    1 100003 0 2005 5 4.67 4.33 6.75 3.17 2005 5
                    2 100003 NA 2006 NA NA NA NA NA 2005 5
                    3 100003 7 2007 NA NA NA NA NA 2005 5
                    4 100003 NA 2008 NA NA NA NA NA 2005 5
                    5 100003 8 2009 5.5 6.83 5.17 7 3.33 2005 5.5
                    6 100003 10 2010 NA NA NA NA NA 2005 5.5
                    7 100003 NA 2011 NA NA NA NA NA 2005 5.5
                    8 100003 NA 2012 NA NA NA NA NA 2005 5.5
                    9 100003 NA 2013 NA NA NA NA NA 2005 5.5
                    10 100003 NA 2014 NA NA NA NA NA 2005 5.5
                    11 100005 9 2005 NA NA NA NA NA NA NA
                    12 100005 NA 2007 NA NA NA NA NA NA NA
                    13 100005 NA 2008 NA NA NA NA NA NA NA
                    14 100005 NA 2009 NA NA NA NA NA NA NA
                    15 100005 NA 2010 NA NA NA NA NA NA NA
                    16 100005 NA 2011 NA NA NA NA NA NA NA
                    17 100005 NA 2012 NA NA NA NA NA NA NA
                    18 100005 NA 2013 NA NA NA NA NA NA NA
                    19 100005 NA 2014 NA NA NA NA NA NA NA
                    20 100006 6 2005 4.83 4.33 4.67 5.75 3.67 2005 4.83





                    share|improve this answer























                    • Thanks for this. Unfortunately, though, this does not appear to reset the data in 'first_year' or 'O_fill' for a new ID.
                      – aspark2020
                      Aug 26 at 22:24










                    • @aspark2020 as you can see above, it works for me. Probably you have a name collision between dplyr and plyr in group_by verb, try loading dplyr and zoo in a fresh R session or call group_by directly using dplyr::group_by.
                      – A. Suliman
                      Aug 27 at 2:37















                    up vote
                    0
                    down vote










                    up vote
                    0
                    down vote









                    library(dplyr)
                    library(zoo)
                    data %>% group_by(Person_ID) %>%
                    mutate(first_year=dplyr::first(Year[!is.na(O)])) %>% #return first year where O is not missing
                    mutate(O_fill=na.locf(O, na.rm = FALSE)) %>% #using zoo::na.locf to replace NA's with non-NA prior to it
                    head(n=20)

                    # A tibble: 20 x 10
                    # Groups: Person_ID [3]
                    Person_ID Job_Satisfaction Year O C E A N first_year O_fill
                    <int> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl>
                    1 100003 0 2005 5 4.67 4.33 6.75 3.17 2005 5
                    2 100003 NA 2006 NA NA NA NA NA 2005 5
                    3 100003 7 2007 NA NA NA NA NA 2005 5
                    4 100003 NA 2008 NA NA NA NA NA 2005 5
                    5 100003 8 2009 5.5 6.83 5.17 7 3.33 2005 5.5
                    6 100003 10 2010 NA NA NA NA NA 2005 5.5
                    7 100003 NA 2011 NA NA NA NA NA 2005 5.5
                    8 100003 NA 2012 NA NA NA NA NA 2005 5.5
                    9 100003 NA 2013 NA NA NA NA NA 2005 5.5
                    10 100003 NA 2014 NA NA NA NA NA 2005 5.5
                    11 100005 9 2005 NA NA NA NA NA NA NA
                    12 100005 NA 2007 NA NA NA NA NA NA NA
                    13 100005 NA 2008 NA NA NA NA NA NA NA
                    14 100005 NA 2009 NA NA NA NA NA NA NA
                    15 100005 NA 2010 NA NA NA NA NA NA NA
                    16 100005 NA 2011 NA NA NA NA NA NA NA
                    17 100005 NA 2012 NA NA NA NA NA NA NA
                    18 100005 NA 2013 NA NA NA NA NA NA NA
                    19 100005 NA 2014 NA NA NA NA NA NA NA
                    20 100006 6 2005 4.83 4.33 4.67 5.75 3.67 2005 4.83





                    share|improve this answer














                    library(dplyr)
                    library(zoo)
                    data %>% group_by(Person_ID) %>%
                    mutate(first_year=dplyr::first(Year[!is.na(O)])) %>% #return first year where O is not missing
                    mutate(O_fill=na.locf(O, na.rm = FALSE)) %>% #using zoo::na.locf to replace NA's with non-NA prior to it
                    head(n=20)

                    # A tibble: 20 x 10
                    # Groups: Person_ID [3]
                    Person_ID Job_Satisfaction Year O C E A N first_year O_fill
                    <int> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl>
                    1 100003 0 2005 5 4.67 4.33 6.75 3.17 2005 5
                    2 100003 NA 2006 NA NA NA NA NA 2005 5
                    3 100003 7 2007 NA NA NA NA NA 2005 5
                    4 100003 NA 2008 NA NA NA NA NA 2005 5
                    5 100003 8 2009 5.5 6.83 5.17 7 3.33 2005 5.5
                    6 100003 10 2010 NA NA NA NA NA 2005 5.5
                    7 100003 NA 2011 NA NA NA NA NA 2005 5.5
                    8 100003 NA 2012 NA NA NA NA NA 2005 5.5
                    9 100003 NA 2013 NA NA NA NA NA 2005 5.5
                    10 100003 NA 2014 NA NA NA NA NA 2005 5.5
                    11 100005 9 2005 NA NA NA NA NA NA NA
                    12 100005 NA 2007 NA NA NA NA NA NA NA
                    13 100005 NA 2008 NA NA NA NA NA NA NA
                    14 100005 NA 2009 NA NA NA NA NA NA NA
                    15 100005 NA 2010 NA NA NA NA NA NA NA
                    16 100005 NA 2011 NA NA NA NA NA NA NA
                    17 100005 NA 2012 NA NA NA NA NA NA NA
                    18 100005 NA 2013 NA NA NA NA NA NA NA
                    19 100005 NA 2014 NA NA NA NA NA NA NA
                    20 100006 6 2005 4.83 4.33 4.67 5.75 3.67 2005 4.83






                    share|improve this answer














                    share|improve this answer



                    share|improve this answer








                    edited Aug 27 at 3:27

























                    answered Aug 26 at 5:29









                    A. Suliman

                    4,48531022




                    4,48531022












                    • Thanks for this. Unfortunately, though, this does not appear to reset the data in 'first_year' or 'O_fill' for a new ID.
                      – aspark2020
                      Aug 26 at 22:24










                    • @aspark2020 as you can see above, it works for me. Probably you have a name collision between dplyr and plyr in group_by verb, try loading dplyr and zoo in a fresh R session or call group_by directly using dplyr::group_by.
                      – A. Suliman
                      Aug 27 at 2:37




















                    • Thanks for this. Unfortunately, though, this does not appear to reset the data in 'first_year' or 'O_fill' for a new ID.
                      – aspark2020
                      Aug 26 at 22:24










                    • @aspark2020 as you can see above, it works for me. Probably you have a name collision between dplyr and plyr in group_by verb, try loading dplyr and zoo in a fresh R session or call group_by directly using dplyr::group_by.
                      – A. Suliman
                      Aug 27 at 2:37


















                    Thanks for this. Unfortunately, though, this does not appear to reset the data in 'first_year' or 'O_fill' for a new ID.
                    – aspark2020
                    Aug 26 at 22:24




                    Thanks for this. Unfortunately, though, this does not appear to reset the data in 'first_year' or 'O_fill' for a new ID.
                    – aspark2020
                    Aug 26 at 22:24












                    @aspark2020 as you can see above, it works for me. Probably you have a name collision between dplyr and plyr in group_by verb, try loading dplyr and zoo in a fresh R session or call group_by directly using dplyr::group_by.
                    – A. Suliman
                    Aug 27 at 2:37






                    @aspark2020 as you can see above, it works for me. Probably you have a name collision between dplyr and plyr in group_by verb, try loading dplyr and zoo in a fresh R session or call group_by directly using dplyr::group_by.
                    – A. Suliman
                    Aug 27 at 2:37




















                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.





                    Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                    Please pay close attention to the following guidance:


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f52023309%2fr-how-to-selectively-copy-long-format-case-responses-to-future-years%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Ottavio Pratesi

                    Tricia Helfer

                    15 giugno