R - How to selectively copy long format case responses to future years?
up vote
0
down vote
favorite
I am conducting a longitudinal analysis with data in long format. I have personality (O, C, E, A, N) measured in 2005, 2009 and 2013 with my dependent variable (job satisfaction) measured every year. Each person has their own ID. I have two goals (which I think might be able to be achieved with the same underlying logic):
1) I need another variable (column) indicating the first year that personality was measured for each person. If personality was never measured, then return NA. For example, if personality was first measured in 2009, then “2009” needs to be returned in this column for every row for that ID.
2) I need another variable (column) which copies the personality item responses for each year for each ID, unless it is replaced by a later measurement year (i.e. 2009, 2013). For example, if an ID completed the personality test in 2005 and 2009, then I need their responses for 2005 copied to years 2006, 2007, and 2008, but not 2009, as later entries (if they exist) should then become what gets copied forward to future years.
I have tried making a function for these, but ultimately failed.
I hope that makes sense. Extract of 100 cases below:
structure(list(Person_ID = c(100003L, 100003L, 100003L, 100003L,
100003L, 100003L, 100003L, 100003L, 100003L, 100003L, 100005L,
100005L, 100005L, 100005L, 100005L, 100005L, 100005L, 100005L,
100005L, 100006L, 100006L, 100006L, 100006L, 100006L, 100006L,
100006L, 100006L, 100007L, 100007L, 100007L, 100007L, 100007L,
100008L, 100008L, 100008L, 100009L, 100009L, 100010L, 100010L,
100010L, 100010L, 100010L, 100010L, 100010L, 100010L, 100010L,
100010L, 100011L, 100011L, 100011L, 100011L, 100011L, 100014L,
100014L, 100014L, 100014L, 100014L, 100014L, 100014L, 100014L,
100014L, 100014L, 100015L, 100015L, 100015L, 100015L, 100015L,
100015L, 100015L, 100015L, 100015L, 100015L, 100016L, 100016L,
100016L, 100016L, 100016L, 100016L, 100016L, 100016L, 100016L,
100016L, 100018L, 100018L, 100018L, 100018L, 100018L, 100018L,
100018L, 100018L, 100018L, 100018L, 100019L, 100019L, 100019L,
100019L, 100019L, 100019L, 100019L, 100019L), Job_Satisfaction = c(0L,
NA, 7L, NA, 8L, 10L, NA, NA, NA, NA, 9L, NA, NA, NA, NA, NA,
NA, NA, NA, 6L, 10L, 8L, 9L, 7L, NA, 9L, 3L, NA, 10L, NA, NA,
NA, NA, NA, NA, NA, NA, 5L, NA, NA, NA, NA, 5L, 4L, 8L, 5L, 5L,
NA, NA, NA, 8L, NA, 8L, 6L, 8L, 7L, 7L, NA, NA, NA, NA, NA, 10L,
9L, 9L, 9L, 10L, 8L, 10L, 9L, 8L, 9L, 7L, 9L, 9L, 8L, 8L, 9L,
9L, 9L, 8L, 9L, NA, 7L, 7L, 8L, 7L, 8L, 7L, 7L, 8L, 7L, 7L, 7L,
7L, 7L, 7L, 7L, 7L, 8L), Year = c(2005L, 2006L, 2007L, 2008L,
2009L, 2010L, 2011L, 2012L, 2013L, 2014L, 2005L, 2007L, 2008L,
2009L, 2010L, 2011L, 2012L, 2013L, 2014L, 2005L, 2008L, 2009L,
2010L, 2011L, 2012L, 2013L, 2014L, 2008L, 2009L, 2010L, 2011L,
2012L, 2005L, 2006L, 2007L, 2006L, 2007L, 2005L, 2006L, 2007L,
2008L, 2009L, 2010L, 2011L, 2012L, 2013L, 2014L, 2005L, 2006L,
2007L, 2008L, 2009L, 2005L, 2006L, 2007L, 2008L, 2009L, 2010L,
2011L, 2012L, 2013L, 2014L, 2005L, 2006L, 2007L, 2008L, 2009L,
2010L, 2011L, 2012L, 2013L, 2014L, 2005L, 2006L, 2007L, 2008L,
2009L, 2010L, 2011L, 2012L, 2013L, 2014L, 2005L, 2006L, 2007L,
2008L, 2009L, 2010L, 2011L, 2012L, 2013L, 2014L, 2005L, 2006L,
2007L, 2008L, 2009L, 2010L, 2011L, 2012L), O = c(5, NA, NA, NA,
5.5, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
4.83333333333333, NA, 4.16666666666667, NA, NA, NA, 3.5, NA,
NA, NA, NA, NA, NA, 3.83333333333333, NA, NA, NA, NA, 5, NA,
NA, NA, 5.16666666666667, NA, NA, NA, 5, NA, 6, NA, NA, NA, NA,
5.5, NA, NA, NA, 5.16666666666667, NA, NA, NA, 5, NA, 4.5, NA,
NA, NA, 4.33333333333333, NA, NA, NA, 4, NA, 3, NA, NA, NA, 3.16666666666667,
NA, NA, NA, 3.16666666666667, NA, 5.5, NA, NA, NA, 5.66666666666667,
NA, NA, NA, 5.33333333333333, NA, 5.5, NA, NA, NA, 5, NA, NA,
NA), C = c(4.66666666666667, NA, NA, NA, 6.83333333333333, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 4.33333333333333,
NA, 4.16666666666667, NA, NA, NA, 4.33333333333333, NA, NA, NA,
NA, NA, NA, 5.83333333333333, NA, NA, NA, NA, 5.66666666666667,
NA, NA, NA, 6.66666666666667, NA, NA, NA, 6.66666666666667, NA,
6.33333333333333, NA, NA, NA, NA, 5.83333333333333, NA, NA, NA,
6, NA, NA, NA, 6, NA, 6.66666666666667, NA, NA, NA, 6.5, NA,
NA, NA, 6.5, NA, 5.5, NA, NA, NA, 5.83333333333333, NA, NA, NA,
5.16666666666667, NA, 5.5, NA, NA, NA, 5.66666666666667, NA,
NA, NA, 5.16666666666667, NA, 5.66666666666667, NA, NA, NA, 6,
NA, NA, NA), E = c(4.33333333333333, NA, NA, NA, 5.16666666666667,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 4.66666666666667,
NA, 4.33333333333333, NA, NA, NA, 3.83333333333333, NA, NA, NA,
NA, NA, NA, 4.33333333333333, NA, NA, NA, NA, 5.5, NA, NA, NA,
5.16666666666667, NA, NA, NA, 5.33333333333333, NA, 6.16666666666667,
NA, NA, NA, NA, 5.83333333333333, NA, NA, NA, 4.83333333333333,
NA, NA, NA, 4.5, NA, 4.33333333333333, NA, NA, NA, 4.66666666666667,
NA, NA, NA, 3.83333333333333, NA, 3.5, NA, NA, NA, 3.5, NA, NA,
NA, 5.16666666666667, NA, 2, NA, NA, NA, 3.16666666666667, NA,
NA, NA, 2.66666666666667, NA, 2.33333333333333, NA, NA, NA, 2.5,
NA, NA, NA), A = c(6.75, NA, NA, NA, 7, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, 5.75, NA, 4.75, NA, NA, NA, 5,
NA, NA, NA, NA, NA, NA, 5.5, NA, NA, NA, NA, 5, NA, NA, NA, 4.75,
NA, NA, NA, 5, NA, 5.25, NA, NA, NA, NA, 5.5, NA, NA, NA, 5,
NA, NA, NA, 5.5, NA, 6.75, NA, NA, NA, 5.75, NA, NA, NA, 6.75,
NA, 5, NA, NA, NA, 3.5, NA, NA, NA, 4.75, NA, 5.75, NA, NA, NA,
5.75, NA, NA, NA, 5.5, NA, 5.75, NA, NA, NA, 5.75, NA, NA, NA
), N = c(3.16666666666667, NA, NA, NA, 3.33333333333333, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 3.66666666666667,
NA, 4.5, NA, NA, NA, 4.16666666666667, NA, NA, NA, NA, NA, NA,
2.66666666666667, NA, NA, NA, NA, 2.66666666666667, NA, NA, NA,
2.33333333333333, NA, NA, NA, 3.16666666666667, NA, 1.83333333333333,
NA, NA, NA, NA, 1.33333333333333, NA, NA, NA, 2.83333333333333,
NA, NA, NA, 1.83333333333333, NA, 3.83333333333333, NA, NA, NA,
3, NA, NA, NA, 3.33333333333333, NA, 1.66666666666667, NA, NA,
NA, 2.5, NA, NA, NA, 2.5, NA, 3.83333333333333, NA, NA, NA, 3.33333333333333,
NA, NA, NA, 4.16666666666667, NA, 3.83333333333333, NA, NA, NA,
4.5, NA, NA, NA)), .Names = c("Person_ID", "Job_Satisfaction",
"Year", "O", "C", "E", "A", "N"), row.names = c(NA, 100L), class = "data.frame")
r
add a comment |
up vote
0
down vote
favorite
I am conducting a longitudinal analysis with data in long format. I have personality (O, C, E, A, N) measured in 2005, 2009 and 2013 with my dependent variable (job satisfaction) measured every year. Each person has their own ID. I have two goals (which I think might be able to be achieved with the same underlying logic):
1) I need another variable (column) indicating the first year that personality was measured for each person. If personality was never measured, then return NA. For example, if personality was first measured in 2009, then “2009” needs to be returned in this column for every row for that ID.
2) I need another variable (column) which copies the personality item responses for each year for each ID, unless it is replaced by a later measurement year (i.e. 2009, 2013). For example, if an ID completed the personality test in 2005 and 2009, then I need their responses for 2005 copied to years 2006, 2007, and 2008, but not 2009, as later entries (if they exist) should then become what gets copied forward to future years.
I have tried making a function for these, but ultimately failed.
I hope that makes sense. Extract of 100 cases below:
structure(list(Person_ID = c(100003L, 100003L, 100003L, 100003L,
100003L, 100003L, 100003L, 100003L, 100003L, 100003L, 100005L,
100005L, 100005L, 100005L, 100005L, 100005L, 100005L, 100005L,
100005L, 100006L, 100006L, 100006L, 100006L, 100006L, 100006L,
100006L, 100006L, 100007L, 100007L, 100007L, 100007L, 100007L,
100008L, 100008L, 100008L, 100009L, 100009L, 100010L, 100010L,
100010L, 100010L, 100010L, 100010L, 100010L, 100010L, 100010L,
100010L, 100011L, 100011L, 100011L, 100011L, 100011L, 100014L,
100014L, 100014L, 100014L, 100014L, 100014L, 100014L, 100014L,
100014L, 100014L, 100015L, 100015L, 100015L, 100015L, 100015L,
100015L, 100015L, 100015L, 100015L, 100015L, 100016L, 100016L,
100016L, 100016L, 100016L, 100016L, 100016L, 100016L, 100016L,
100016L, 100018L, 100018L, 100018L, 100018L, 100018L, 100018L,
100018L, 100018L, 100018L, 100018L, 100019L, 100019L, 100019L,
100019L, 100019L, 100019L, 100019L, 100019L), Job_Satisfaction = c(0L,
NA, 7L, NA, 8L, 10L, NA, NA, NA, NA, 9L, NA, NA, NA, NA, NA,
NA, NA, NA, 6L, 10L, 8L, 9L, 7L, NA, 9L, 3L, NA, 10L, NA, NA,
NA, NA, NA, NA, NA, NA, 5L, NA, NA, NA, NA, 5L, 4L, 8L, 5L, 5L,
NA, NA, NA, 8L, NA, 8L, 6L, 8L, 7L, 7L, NA, NA, NA, NA, NA, 10L,
9L, 9L, 9L, 10L, 8L, 10L, 9L, 8L, 9L, 7L, 9L, 9L, 8L, 8L, 9L,
9L, 9L, 8L, 9L, NA, 7L, 7L, 8L, 7L, 8L, 7L, 7L, 8L, 7L, 7L, 7L,
7L, 7L, 7L, 7L, 7L, 8L), Year = c(2005L, 2006L, 2007L, 2008L,
2009L, 2010L, 2011L, 2012L, 2013L, 2014L, 2005L, 2007L, 2008L,
2009L, 2010L, 2011L, 2012L, 2013L, 2014L, 2005L, 2008L, 2009L,
2010L, 2011L, 2012L, 2013L, 2014L, 2008L, 2009L, 2010L, 2011L,
2012L, 2005L, 2006L, 2007L, 2006L, 2007L, 2005L, 2006L, 2007L,
2008L, 2009L, 2010L, 2011L, 2012L, 2013L, 2014L, 2005L, 2006L,
2007L, 2008L, 2009L, 2005L, 2006L, 2007L, 2008L, 2009L, 2010L,
2011L, 2012L, 2013L, 2014L, 2005L, 2006L, 2007L, 2008L, 2009L,
2010L, 2011L, 2012L, 2013L, 2014L, 2005L, 2006L, 2007L, 2008L,
2009L, 2010L, 2011L, 2012L, 2013L, 2014L, 2005L, 2006L, 2007L,
2008L, 2009L, 2010L, 2011L, 2012L, 2013L, 2014L, 2005L, 2006L,
2007L, 2008L, 2009L, 2010L, 2011L, 2012L), O = c(5, NA, NA, NA,
5.5, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
4.83333333333333, NA, 4.16666666666667, NA, NA, NA, 3.5, NA,
NA, NA, NA, NA, NA, 3.83333333333333, NA, NA, NA, NA, 5, NA,
NA, NA, 5.16666666666667, NA, NA, NA, 5, NA, 6, NA, NA, NA, NA,
5.5, NA, NA, NA, 5.16666666666667, NA, NA, NA, 5, NA, 4.5, NA,
NA, NA, 4.33333333333333, NA, NA, NA, 4, NA, 3, NA, NA, NA, 3.16666666666667,
NA, NA, NA, 3.16666666666667, NA, 5.5, NA, NA, NA, 5.66666666666667,
NA, NA, NA, 5.33333333333333, NA, 5.5, NA, NA, NA, 5, NA, NA,
NA), C = c(4.66666666666667, NA, NA, NA, 6.83333333333333, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 4.33333333333333,
NA, 4.16666666666667, NA, NA, NA, 4.33333333333333, NA, NA, NA,
NA, NA, NA, 5.83333333333333, NA, NA, NA, NA, 5.66666666666667,
NA, NA, NA, 6.66666666666667, NA, NA, NA, 6.66666666666667, NA,
6.33333333333333, NA, NA, NA, NA, 5.83333333333333, NA, NA, NA,
6, NA, NA, NA, 6, NA, 6.66666666666667, NA, NA, NA, 6.5, NA,
NA, NA, 6.5, NA, 5.5, NA, NA, NA, 5.83333333333333, NA, NA, NA,
5.16666666666667, NA, 5.5, NA, NA, NA, 5.66666666666667, NA,
NA, NA, 5.16666666666667, NA, 5.66666666666667, NA, NA, NA, 6,
NA, NA, NA), E = c(4.33333333333333, NA, NA, NA, 5.16666666666667,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 4.66666666666667,
NA, 4.33333333333333, NA, NA, NA, 3.83333333333333, NA, NA, NA,
NA, NA, NA, 4.33333333333333, NA, NA, NA, NA, 5.5, NA, NA, NA,
5.16666666666667, NA, NA, NA, 5.33333333333333, NA, 6.16666666666667,
NA, NA, NA, NA, 5.83333333333333, NA, NA, NA, 4.83333333333333,
NA, NA, NA, 4.5, NA, 4.33333333333333, NA, NA, NA, 4.66666666666667,
NA, NA, NA, 3.83333333333333, NA, 3.5, NA, NA, NA, 3.5, NA, NA,
NA, 5.16666666666667, NA, 2, NA, NA, NA, 3.16666666666667, NA,
NA, NA, 2.66666666666667, NA, 2.33333333333333, NA, NA, NA, 2.5,
NA, NA, NA), A = c(6.75, NA, NA, NA, 7, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, 5.75, NA, 4.75, NA, NA, NA, 5,
NA, NA, NA, NA, NA, NA, 5.5, NA, NA, NA, NA, 5, NA, NA, NA, 4.75,
NA, NA, NA, 5, NA, 5.25, NA, NA, NA, NA, 5.5, NA, NA, NA, 5,
NA, NA, NA, 5.5, NA, 6.75, NA, NA, NA, 5.75, NA, NA, NA, 6.75,
NA, 5, NA, NA, NA, 3.5, NA, NA, NA, 4.75, NA, 5.75, NA, NA, NA,
5.75, NA, NA, NA, 5.5, NA, 5.75, NA, NA, NA, 5.75, NA, NA, NA
), N = c(3.16666666666667, NA, NA, NA, 3.33333333333333, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 3.66666666666667,
NA, 4.5, NA, NA, NA, 4.16666666666667, NA, NA, NA, NA, NA, NA,
2.66666666666667, NA, NA, NA, NA, 2.66666666666667, NA, NA, NA,
2.33333333333333, NA, NA, NA, 3.16666666666667, NA, 1.83333333333333,
NA, NA, NA, NA, 1.33333333333333, NA, NA, NA, 2.83333333333333,
NA, NA, NA, 1.83333333333333, NA, 3.83333333333333, NA, NA, NA,
3, NA, NA, NA, 3.33333333333333, NA, 1.66666666666667, NA, NA,
NA, 2.5, NA, NA, NA, 2.5, NA, 3.83333333333333, NA, NA, NA, 3.33333333333333,
NA, NA, NA, 4.16666666666667, NA, 3.83333333333333, NA, NA, NA,
4.5, NA, NA, NA)), .Names = c("Person_ID", "Job_Satisfaction",
"Year", "O", "C", "E", "A", "N"), row.names = c(NA, 100L), class = "data.frame")
r
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I am conducting a longitudinal analysis with data in long format. I have personality (O, C, E, A, N) measured in 2005, 2009 and 2013 with my dependent variable (job satisfaction) measured every year. Each person has their own ID. I have two goals (which I think might be able to be achieved with the same underlying logic):
1) I need another variable (column) indicating the first year that personality was measured for each person. If personality was never measured, then return NA. For example, if personality was first measured in 2009, then “2009” needs to be returned in this column for every row for that ID.
2) I need another variable (column) which copies the personality item responses for each year for each ID, unless it is replaced by a later measurement year (i.e. 2009, 2013). For example, if an ID completed the personality test in 2005 and 2009, then I need their responses for 2005 copied to years 2006, 2007, and 2008, but not 2009, as later entries (if they exist) should then become what gets copied forward to future years.
I have tried making a function for these, but ultimately failed.
I hope that makes sense. Extract of 100 cases below:
structure(list(Person_ID = c(100003L, 100003L, 100003L, 100003L,
100003L, 100003L, 100003L, 100003L, 100003L, 100003L, 100005L,
100005L, 100005L, 100005L, 100005L, 100005L, 100005L, 100005L,
100005L, 100006L, 100006L, 100006L, 100006L, 100006L, 100006L,
100006L, 100006L, 100007L, 100007L, 100007L, 100007L, 100007L,
100008L, 100008L, 100008L, 100009L, 100009L, 100010L, 100010L,
100010L, 100010L, 100010L, 100010L, 100010L, 100010L, 100010L,
100010L, 100011L, 100011L, 100011L, 100011L, 100011L, 100014L,
100014L, 100014L, 100014L, 100014L, 100014L, 100014L, 100014L,
100014L, 100014L, 100015L, 100015L, 100015L, 100015L, 100015L,
100015L, 100015L, 100015L, 100015L, 100015L, 100016L, 100016L,
100016L, 100016L, 100016L, 100016L, 100016L, 100016L, 100016L,
100016L, 100018L, 100018L, 100018L, 100018L, 100018L, 100018L,
100018L, 100018L, 100018L, 100018L, 100019L, 100019L, 100019L,
100019L, 100019L, 100019L, 100019L, 100019L), Job_Satisfaction = c(0L,
NA, 7L, NA, 8L, 10L, NA, NA, NA, NA, 9L, NA, NA, NA, NA, NA,
NA, NA, NA, 6L, 10L, 8L, 9L, 7L, NA, 9L, 3L, NA, 10L, NA, NA,
NA, NA, NA, NA, NA, NA, 5L, NA, NA, NA, NA, 5L, 4L, 8L, 5L, 5L,
NA, NA, NA, 8L, NA, 8L, 6L, 8L, 7L, 7L, NA, NA, NA, NA, NA, 10L,
9L, 9L, 9L, 10L, 8L, 10L, 9L, 8L, 9L, 7L, 9L, 9L, 8L, 8L, 9L,
9L, 9L, 8L, 9L, NA, 7L, 7L, 8L, 7L, 8L, 7L, 7L, 8L, 7L, 7L, 7L,
7L, 7L, 7L, 7L, 7L, 8L), Year = c(2005L, 2006L, 2007L, 2008L,
2009L, 2010L, 2011L, 2012L, 2013L, 2014L, 2005L, 2007L, 2008L,
2009L, 2010L, 2011L, 2012L, 2013L, 2014L, 2005L, 2008L, 2009L,
2010L, 2011L, 2012L, 2013L, 2014L, 2008L, 2009L, 2010L, 2011L,
2012L, 2005L, 2006L, 2007L, 2006L, 2007L, 2005L, 2006L, 2007L,
2008L, 2009L, 2010L, 2011L, 2012L, 2013L, 2014L, 2005L, 2006L,
2007L, 2008L, 2009L, 2005L, 2006L, 2007L, 2008L, 2009L, 2010L,
2011L, 2012L, 2013L, 2014L, 2005L, 2006L, 2007L, 2008L, 2009L,
2010L, 2011L, 2012L, 2013L, 2014L, 2005L, 2006L, 2007L, 2008L,
2009L, 2010L, 2011L, 2012L, 2013L, 2014L, 2005L, 2006L, 2007L,
2008L, 2009L, 2010L, 2011L, 2012L, 2013L, 2014L, 2005L, 2006L,
2007L, 2008L, 2009L, 2010L, 2011L, 2012L), O = c(5, NA, NA, NA,
5.5, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
4.83333333333333, NA, 4.16666666666667, NA, NA, NA, 3.5, NA,
NA, NA, NA, NA, NA, 3.83333333333333, NA, NA, NA, NA, 5, NA,
NA, NA, 5.16666666666667, NA, NA, NA, 5, NA, 6, NA, NA, NA, NA,
5.5, NA, NA, NA, 5.16666666666667, NA, NA, NA, 5, NA, 4.5, NA,
NA, NA, 4.33333333333333, NA, NA, NA, 4, NA, 3, NA, NA, NA, 3.16666666666667,
NA, NA, NA, 3.16666666666667, NA, 5.5, NA, NA, NA, 5.66666666666667,
NA, NA, NA, 5.33333333333333, NA, 5.5, NA, NA, NA, 5, NA, NA,
NA), C = c(4.66666666666667, NA, NA, NA, 6.83333333333333, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 4.33333333333333,
NA, 4.16666666666667, NA, NA, NA, 4.33333333333333, NA, NA, NA,
NA, NA, NA, 5.83333333333333, NA, NA, NA, NA, 5.66666666666667,
NA, NA, NA, 6.66666666666667, NA, NA, NA, 6.66666666666667, NA,
6.33333333333333, NA, NA, NA, NA, 5.83333333333333, NA, NA, NA,
6, NA, NA, NA, 6, NA, 6.66666666666667, NA, NA, NA, 6.5, NA,
NA, NA, 6.5, NA, 5.5, NA, NA, NA, 5.83333333333333, NA, NA, NA,
5.16666666666667, NA, 5.5, NA, NA, NA, 5.66666666666667, NA,
NA, NA, 5.16666666666667, NA, 5.66666666666667, NA, NA, NA, 6,
NA, NA, NA), E = c(4.33333333333333, NA, NA, NA, 5.16666666666667,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 4.66666666666667,
NA, 4.33333333333333, NA, NA, NA, 3.83333333333333, NA, NA, NA,
NA, NA, NA, 4.33333333333333, NA, NA, NA, NA, 5.5, NA, NA, NA,
5.16666666666667, NA, NA, NA, 5.33333333333333, NA, 6.16666666666667,
NA, NA, NA, NA, 5.83333333333333, NA, NA, NA, 4.83333333333333,
NA, NA, NA, 4.5, NA, 4.33333333333333, NA, NA, NA, 4.66666666666667,
NA, NA, NA, 3.83333333333333, NA, 3.5, NA, NA, NA, 3.5, NA, NA,
NA, 5.16666666666667, NA, 2, NA, NA, NA, 3.16666666666667, NA,
NA, NA, 2.66666666666667, NA, 2.33333333333333, NA, NA, NA, 2.5,
NA, NA, NA), A = c(6.75, NA, NA, NA, 7, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, 5.75, NA, 4.75, NA, NA, NA, 5,
NA, NA, NA, NA, NA, NA, 5.5, NA, NA, NA, NA, 5, NA, NA, NA, 4.75,
NA, NA, NA, 5, NA, 5.25, NA, NA, NA, NA, 5.5, NA, NA, NA, 5,
NA, NA, NA, 5.5, NA, 6.75, NA, NA, NA, 5.75, NA, NA, NA, 6.75,
NA, 5, NA, NA, NA, 3.5, NA, NA, NA, 4.75, NA, 5.75, NA, NA, NA,
5.75, NA, NA, NA, 5.5, NA, 5.75, NA, NA, NA, 5.75, NA, NA, NA
), N = c(3.16666666666667, NA, NA, NA, 3.33333333333333, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 3.66666666666667,
NA, 4.5, NA, NA, NA, 4.16666666666667, NA, NA, NA, NA, NA, NA,
2.66666666666667, NA, NA, NA, NA, 2.66666666666667, NA, NA, NA,
2.33333333333333, NA, NA, NA, 3.16666666666667, NA, 1.83333333333333,
NA, NA, NA, NA, 1.33333333333333, NA, NA, NA, 2.83333333333333,
NA, NA, NA, 1.83333333333333, NA, 3.83333333333333, NA, NA, NA,
3, NA, NA, NA, 3.33333333333333, NA, 1.66666666666667, NA, NA,
NA, 2.5, NA, NA, NA, 2.5, NA, 3.83333333333333, NA, NA, NA, 3.33333333333333,
NA, NA, NA, 4.16666666666667, NA, 3.83333333333333, NA, NA, NA,
4.5, NA, NA, NA)), .Names = c("Person_ID", "Job_Satisfaction",
"Year", "O", "C", "E", "A", "N"), row.names = c(NA, 100L), class = "data.frame")
r
I am conducting a longitudinal analysis with data in long format. I have personality (O, C, E, A, N) measured in 2005, 2009 and 2013 with my dependent variable (job satisfaction) measured every year. Each person has their own ID. I have two goals (which I think might be able to be achieved with the same underlying logic):
1) I need another variable (column) indicating the first year that personality was measured for each person. If personality was never measured, then return NA. For example, if personality was first measured in 2009, then “2009” needs to be returned in this column for every row for that ID.
2) I need another variable (column) which copies the personality item responses for each year for each ID, unless it is replaced by a later measurement year (i.e. 2009, 2013). For example, if an ID completed the personality test in 2005 and 2009, then I need their responses for 2005 copied to years 2006, 2007, and 2008, but not 2009, as later entries (if they exist) should then become what gets copied forward to future years.
I have tried making a function for these, but ultimately failed.
I hope that makes sense. Extract of 100 cases below:
structure(list(Person_ID = c(100003L, 100003L, 100003L, 100003L,
100003L, 100003L, 100003L, 100003L, 100003L, 100003L, 100005L,
100005L, 100005L, 100005L, 100005L, 100005L, 100005L, 100005L,
100005L, 100006L, 100006L, 100006L, 100006L, 100006L, 100006L,
100006L, 100006L, 100007L, 100007L, 100007L, 100007L, 100007L,
100008L, 100008L, 100008L, 100009L, 100009L, 100010L, 100010L,
100010L, 100010L, 100010L, 100010L, 100010L, 100010L, 100010L,
100010L, 100011L, 100011L, 100011L, 100011L, 100011L, 100014L,
100014L, 100014L, 100014L, 100014L, 100014L, 100014L, 100014L,
100014L, 100014L, 100015L, 100015L, 100015L, 100015L, 100015L,
100015L, 100015L, 100015L, 100015L, 100015L, 100016L, 100016L,
100016L, 100016L, 100016L, 100016L, 100016L, 100016L, 100016L,
100016L, 100018L, 100018L, 100018L, 100018L, 100018L, 100018L,
100018L, 100018L, 100018L, 100018L, 100019L, 100019L, 100019L,
100019L, 100019L, 100019L, 100019L, 100019L), Job_Satisfaction = c(0L,
NA, 7L, NA, 8L, 10L, NA, NA, NA, NA, 9L, NA, NA, NA, NA, NA,
NA, NA, NA, 6L, 10L, 8L, 9L, 7L, NA, 9L, 3L, NA, 10L, NA, NA,
NA, NA, NA, NA, NA, NA, 5L, NA, NA, NA, NA, 5L, 4L, 8L, 5L, 5L,
NA, NA, NA, 8L, NA, 8L, 6L, 8L, 7L, 7L, NA, NA, NA, NA, NA, 10L,
9L, 9L, 9L, 10L, 8L, 10L, 9L, 8L, 9L, 7L, 9L, 9L, 8L, 8L, 9L,
9L, 9L, 8L, 9L, NA, 7L, 7L, 8L, 7L, 8L, 7L, 7L, 8L, 7L, 7L, 7L,
7L, 7L, 7L, 7L, 7L, 8L), Year = c(2005L, 2006L, 2007L, 2008L,
2009L, 2010L, 2011L, 2012L, 2013L, 2014L, 2005L, 2007L, 2008L,
2009L, 2010L, 2011L, 2012L, 2013L, 2014L, 2005L, 2008L, 2009L,
2010L, 2011L, 2012L, 2013L, 2014L, 2008L, 2009L, 2010L, 2011L,
2012L, 2005L, 2006L, 2007L, 2006L, 2007L, 2005L, 2006L, 2007L,
2008L, 2009L, 2010L, 2011L, 2012L, 2013L, 2014L, 2005L, 2006L,
2007L, 2008L, 2009L, 2005L, 2006L, 2007L, 2008L, 2009L, 2010L,
2011L, 2012L, 2013L, 2014L, 2005L, 2006L, 2007L, 2008L, 2009L,
2010L, 2011L, 2012L, 2013L, 2014L, 2005L, 2006L, 2007L, 2008L,
2009L, 2010L, 2011L, 2012L, 2013L, 2014L, 2005L, 2006L, 2007L,
2008L, 2009L, 2010L, 2011L, 2012L, 2013L, 2014L, 2005L, 2006L,
2007L, 2008L, 2009L, 2010L, 2011L, 2012L), O = c(5, NA, NA, NA,
5.5, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
4.83333333333333, NA, 4.16666666666667, NA, NA, NA, 3.5, NA,
NA, NA, NA, NA, NA, 3.83333333333333, NA, NA, NA, NA, 5, NA,
NA, NA, 5.16666666666667, NA, NA, NA, 5, NA, 6, NA, NA, NA, NA,
5.5, NA, NA, NA, 5.16666666666667, NA, NA, NA, 5, NA, 4.5, NA,
NA, NA, 4.33333333333333, NA, NA, NA, 4, NA, 3, NA, NA, NA, 3.16666666666667,
NA, NA, NA, 3.16666666666667, NA, 5.5, NA, NA, NA, 5.66666666666667,
NA, NA, NA, 5.33333333333333, NA, 5.5, NA, NA, NA, 5, NA, NA,
NA), C = c(4.66666666666667, NA, NA, NA, 6.83333333333333, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 4.33333333333333,
NA, 4.16666666666667, NA, NA, NA, 4.33333333333333, NA, NA, NA,
NA, NA, NA, 5.83333333333333, NA, NA, NA, NA, 5.66666666666667,
NA, NA, NA, 6.66666666666667, NA, NA, NA, 6.66666666666667, NA,
6.33333333333333, NA, NA, NA, NA, 5.83333333333333, NA, NA, NA,
6, NA, NA, NA, 6, NA, 6.66666666666667, NA, NA, NA, 6.5, NA,
NA, NA, 6.5, NA, 5.5, NA, NA, NA, 5.83333333333333, NA, NA, NA,
5.16666666666667, NA, 5.5, NA, NA, NA, 5.66666666666667, NA,
NA, NA, 5.16666666666667, NA, 5.66666666666667, NA, NA, NA, 6,
NA, NA, NA), E = c(4.33333333333333, NA, NA, NA, 5.16666666666667,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 4.66666666666667,
NA, 4.33333333333333, NA, NA, NA, 3.83333333333333, NA, NA, NA,
NA, NA, NA, 4.33333333333333, NA, NA, NA, NA, 5.5, NA, NA, NA,
5.16666666666667, NA, NA, NA, 5.33333333333333, NA, 6.16666666666667,
NA, NA, NA, NA, 5.83333333333333, NA, NA, NA, 4.83333333333333,
NA, NA, NA, 4.5, NA, 4.33333333333333, NA, NA, NA, 4.66666666666667,
NA, NA, NA, 3.83333333333333, NA, 3.5, NA, NA, NA, 3.5, NA, NA,
NA, 5.16666666666667, NA, 2, NA, NA, NA, 3.16666666666667, NA,
NA, NA, 2.66666666666667, NA, 2.33333333333333, NA, NA, NA, 2.5,
NA, NA, NA), A = c(6.75, NA, NA, NA, 7, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, 5.75, NA, 4.75, NA, NA, NA, 5,
NA, NA, NA, NA, NA, NA, 5.5, NA, NA, NA, NA, 5, NA, NA, NA, 4.75,
NA, NA, NA, 5, NA, 5.25, NA, NA, NA, NA, 5.5, NA, NA, NA, 5,
NA, NA, NA, 5.5, NA, 6.75, NA, NA, NA, 5.75, NA, NA, NA, 6.75,
NA, 5, NA, NA, NA, 3.5, NA, NA, NA, 4.75, NA, 5.75, NA, NA, NA,
5.75, NA, NA, NA, 5.5, NA, 5.75, NA, NA, NA, 5.75, NA, NA, NA
), N = c(3.16666666666667, NA, NA, NA, 3.33333333333333, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 3.66666666666667,
NA, 4.5, NA, NA, NA, 4.16666666666667, NA, NA, NA, NA, NA, NA,
2.66666666666667, NA, NA, NA, NA, 2.66666666666667, NA, NA, NA,
2.33333333333333, NA, NA, NA, 3.16666666666667, NA, 1.83333333333333,
NA, NA, NA, NA, 1.33333333333333, NA, NA, NA, 2.83333333333333,
NA, NA, NA, 1.83333333333333, NA, 3.83333333333333, NA, NA, NA,
3, NA, NA, NA, 3.33333333333333, NA, 1.66666666666667, NA, NA,
NA, 2.5, NA, NA, NA, 2.5, NA, 3.83333333333333, NA, NA, NA, 3.33333333333333,
NA, NA, NA, 4.16666666666667, NA, 3.83333333333333, NA, NA, NA,
4.5, NA, NA, NA)), .Names = c("Person_ID", "Job_Satisfaction",
"Year", "O", "C", "E", "A", "N"), row.names = c(NA, 100L), class = "data.frame")
r
r
asked Aug 26 at 4:56
aspark2020
205
205
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
up vote
1
down vote
accepted
I ended up using a series of loops to do what I needed. Using the trait of "O" as an example:
df$PersonalityYear <- ifelse(df$Year=="2005",
"2005",
ifelse(df$Year=="2009",
"2009",
ifelse(df$Year=="2013",
"2013",
"No")))
#Code to solve the second goal.
df$MostRecentYear <- NA
n <- nrow(df)
for (i in 2:n) df$MostRecentYear[i] <- ifelse(df$PersonalityYear[i]=="2013",
2013,
ifelse(df$PersonalityYear[i]=="2009",
2009,
ifelse(df$PersonalityYear[i]=="2005",
2005,
ifelse(df$Person_ID[i]==df$Person_ID[i-1],
df$MostRecentYear[i-1],
NA))))
df$MostRecentO <- df$O
for (i in 2:n) df$MostRecentO[i] <- ifelse(is.na(df$MostRecentYear[i]),
NA,
ifelse(df$MostRecentYear[i]==df$PersonalityYear[i],
df$O[i],
df$MostRecentO[i-1]))
#Code to solve the first goal.
df$FirstYear <- NA
for (i in 2:n) df$FirstYear[i] <- ifelse(df$Person_ID[i]==df$Person_ID[i-1],
ifelse(is.na(df$MostRecentYear[i-1]),
df$MostRecentYear[i],
df$FirstYear[i-1]),
df$MostRecentYear[i])
df$FirstYearO <- df$O
for (i in 2:n) df$FirstYearO[i] <- ifelse(is.na(df$FirstYear[i]),
NA,
ifelse(df$FirstYear[i]==df$PersonalityYear[i],
df$O[i],
df$FirstYearO[i-1]))
add a comment |
up vote
0
down vote
library(dplyr)
library(zoo)
data %>% group_by(Person_ID) %>%
mutate(first_year=dplyr::first(Year[!is.na(O)])) %>% #return first year where O is not missing
mutate(O_fill=na.locf(O, na.rm = FALSE)) %>% #using zoo::na.locf to replace NA's with non-NA prior to it
head(n=20)
# A tibble: 20 x 10
# Groups: Person_ID [3]
Person_ID Job_Satisfaction Year O C E A N first_year O_fill
<int> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl>
1 100003 0 2005 5 4.67 4.33 6.75 3.17 2005 5
2 100003 NA 2006 NA NA NA NA NA 2005 5
3 100003 7 2007 NA NA NA NA NA 2005 5
4 100003 NA 2008 NA NA NA NA NA 2005 5
5 100003 8 2009 5.5 6.83 5.17 7 3.33 2005 5.5
6 100003 10 2010 NA NA NA NA NA 2005 5.5
7 100003 NA 2011 NA NA NA NA NA 2005 5.5
8 100003 NA 2012 NA NA NA NA NA 2005 5.5
9 100003 NA 2013 NA NA NA NA NA 2005 5.5
10 100003 NA 2014 NA NA NA NA NA 2005 5.5
11 100005 9 2005 NA NA NA NA NA NA NA
12 100005 NA 2007 NA NA NA NA NA NA NA
13 100005 NA 2008 NA NA NA NA NA NA NA
14 100005 NA 2009 NA NA NA NA NA NA NA
15 100005 NA 2010 NA NA NA NA NA NA NA
16 100005 NA 2011 NA NA NA NA NA NA NA
17 100005 NA 2012 NA NA NA NA NA NA NA
18 100005 NA 2013 NA NA NA NA NA NA NA
19 100005 NA 2014 NA NA NA NA NA NA NA
20 100006 6 2005 4.83 4.33 4.67 5.75 3.67 2005 4.83
Thanks for this. Unfortunately, though, this does not appear to reset the data in 'first_year' or 'O_fill' for a new ID.
– aspark2020
Aug 26 at 22:24
@aspark2020 as you can see above, it works for me. Probably you have a name collision betweendplyrandplyringroup_byverb, try loadingdplyrandzooin a fresh R session or callgroup_bydirectly usingdplyr::group_by.
– A. Suliman
Aug 27 at 2:37
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f52023309%2fr-how-to-selectively-copy-long-format-case-responses-to-future-years%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
accepted
I ended up using a series of loops to do what I needed. Using the trait of "O" as an example:
df$PersonalityYear <- ifelse(df$Year=="2005",
"2005",
ifelse(df$Year=="2009",
"2009",
ifelse(df$Year=="2013",
"2013",
"No")))
#Code to solve the second goal.
df$MostRecentYear <- NA
n <- nrow(df)
for (i in 2:n) df$MostRecentYear[i] <- ifelse(df$PersonalityYear[i]=="2013",
2013,
ifelse(df$PersonalityYear[i]=="2009",
2009,
ifelse(df$PersonalityYear[i]=="2005",
2005,
ifelse(df$Person_ID[i]==df$Person_ID[i-1],
df$MostRecentYear[i-1],
NA))))
df$MostRecentO <- df$O
for (i in 2:n) df$MostRecentO[i] <- ifelse(is.na(df$MostRecentYear[i]),
NA,
ifelse(df$MostRecentYear[i]==df$PersonalityYear[i],
df$O[i],
df$MostRecentO[i-1]))
#Code to solve the first goal.
df$FirstYear <- NA
for (i in 2:n) df$FirstYear[i] <- ifelse(df$Person_ID[i]==df$Person_ID[i-1],
ifelse(is.na(df$MostRecentYear[i-1]),
df$MostRecentYear[i],
df$FirstYear[i-1]),
df$MostRecentYear[i])
df$FirstYearO <- df$O
for (i in 2:n) df$FirstYearO[i] <- ifelse(is.na(df$FirstYear[i]),
NA,
ifelse(df$FirstYear[i]==df$PersonalityYear[i],
df$O[i],
df$FirstYearO[i-1]))
add a comment |
up vote
1
down vote
accepted
I ended up using a series of loops to do what I needed. Using the trait of "O" as an example:
df$PersonalityYear <- ifelse(df$Year=="2005",
"2005",
ifelse(df$Year=="2009",
"2009",
ifelse(df$Year=="2013",
"2013",
"No")))
#Code to solve the second goal.
df$MostRecentYear <- NA
n <- nrow(df)
for (i in 2:n) df$MostRecentYear[i] <- ifelse(df$PersonalityYear[i]=="2013",
2013,
ifelse(df$PersonalityYear[i]=="2009",
2009,
ifelse(df$PersonalityYear[i]=="2005",
2005,
ifelse(df$Person_ID[i]==df$Person_ID[i-1],
df$MostRecentYear[i-1],
NA))))
df$MostRecentO <- df$O
for (i in 2:n) df$MostRecentO[i] <- ifelse(is.na(df$MostRecentYear[i]),
NA,
ifelse(df$MostRecentYear[i]==df$PersonalityYear[i],
df$O[i],
df$MostRecentO[i-1]))
#Code to solve the first goal.
df$FirstYear <- NA
for (i in 2:n) df$FirstYear[i] <- ifelse(df$Person_ID[i]==df$Person_ID[i-1],
ifelse(is.na(df$MostRecentYear[i-1]),
df$MostRecentYear[i],
df$FirstYear[i-1]),
df$MostRecentYear[i])
df$FirstYearO <- df$O
for (i in 2:n) df$FirstYearO[i] <- ifelse(is.na(df$FirstYear[i]),
NA,
ifelse(df$FirstYear[i]==df$PersonalityYear[i],
df$O[i],
df$FirstYearO[i-1]))
add a comment |
up vote
1
down vote
accepted
up vote
1
down vote
accepted
I ended up using a series of loops to do what I needed. Using the trait of "O" as an example:
df$PersonalityYear <- ifelse(df$Year=="2005",
"2005",
ifelse(df$Year=="2009",
"2009",
ifelse(df$Year=="2013",
"2013",
"No")))
#Code to solve the second goal.
df$MostRecentYear <- NA
n <- nrow(df)
for (i in 2:n) df$MostRecentYear[i] <- ifelse(df$PersonalityYear[i]=="2013",
2013,
ifelse(df$PersonalityYear[i]=="2009",
2009,
ifelse(df$PersonalityYear[i]=="2005",
2005,
ifelse(df$Person_ID[i]==df$Person_ID[i-1],
df$MostRecentYear[i-1],
NA))))
df$MostRecentO <- df$O
for (i in 2:n) df$MostRecentO[i] <- ifelse(is.na(df$MostRecentYear[i]),
NA,
ifelse(df$MostRecentYear[i]==df$PersonalityYear[i],
df$O[i],
df$MostRecentO[i-1]))
#Code to solve the first goal.
df$FirstYear <- NA
for (i in 2:n) df$FirstYear[i] <- ifelse(df$Person_ID[i]==df$Person_ID[i-1],
ifelse(is.na(df$MostRecentYear[i-1]),
df$MostRecentYear[i],
df$FirstYear[i-1]),
df$MostRecentYear[i])
df$FirstYearO <- df$O
for (i in 2:n) df$FirstYearO[i] <- ifelse(is.na(df$FirstYear[i]),
NA,
ifelse(df$FirstYear[i]==df$PersonalityYear[i],
df$O[i],
df$FirstYearO[i-1]))
I ended up using a series of loops to do what I needed. Using the trait of "O" as an example:
df$PersonalityYear <- ifelse(df$Year=="2005",
"2005",
ifelse(df$Year=="2009",
"2009",
ifelse(df$Year=="2013",
"2013",
"No")))
#Code to solve the second goal.
df$MostRecentYear <- NA
n <- nrow(df)
for (i in 2:n) df$MostRecentYear[i] <- ifelse(df$PersonalityYear[i]=="2013",
2013,
ifelse(df$PersonalityYear[i]=="2009",
2009,
ifelse(df$PersonalityYear[i]=="2005",
2005,
ifelse(df$Person_ID[i]==df$Person_ID[i-1],
df$MostRecentYear[i-1],
NA))))
df$MostRecentO <- df$O
for (i in 2:n) df$MostRecentO[i] <- ifelse(is.na(df$MostRecentYear[i]),
NA,
ifelse(df$MostRecentYear[i]==df$PersonalityYear[i],
df$O[i],
df$MostRecentO[i-1]))
#Code to solve the first goal.
df$FirstYear <- NA
for (i in 2:n) df$FirstYear[i] <- ifelse(df$Person_ID[i]==df$Person_ID[i-1],
ifelse(is.na(df$MostRecentYear[i-1]),
df$MostRecentYear[i],
df$FirstYear[i-1]),
df$MostRecentYear[i])
df$FirstYearO <- df$O
for (i in 2:n) df$FirstYearO[i] <- ifelse(is.na(df$FirstYear[i]),
NA,
ifelse(df$FirstYear[i]==df$PersonalityYear[i],
df$O[i],
df$FirstYearO[i-1]))
answered Nov 20 at 1:38
aspark2020
205
205
add a comment |
add a comment |
up vote
0
down vote
library(dplyr)
library(zoo)
data %>% group_by(Person_ID) %>%
mutate(first_year=dplyr::first(Year[!is.na(O)])) %>% #return first year where O is not missing
mutate(O_fill=na.locf(O, na.rm = FALSE)) %>% #using zoo::na.locf to replace NA's with non-NA prior to it
head(n=20)
# A tibble: 20 x 10
# Groups: Person_ID [3]
Person_ID Job_Satisfaction Year O C E A N first_year O_fill
<int> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl>
1 100003 0 2005 5 4.67 4.33 6.75 3.17 2005 5
2 100003 NA 2006 NA NA NA NA NA 2005 5
3 100003 7 2007 NA NA NA NA NA 2005 5
4 100003 NA 2008 NA NA NA NA NA 2005 5
5 100003 8 2009 5.5 6.83 5.17 7 3.33 2005 5.5
6 100003 10 2010 NA NA NA NA NA 2005 5.5
7 100003 NA 2011 NA NA NA NA NA 2005 5.5
8 100003 NA 2012 NA NA NA NA NA 2005 5.5
9 100003 NA 2013 NA NA NA NA NA 2005 5.5
10 100003 NA 2014 NA NA NA NA NA 2005 5.5
11 100005 9 2005 NA NA NA NA NA NA NA
12 100005 NA 2007 NA NA NA NA NA NA NA
13 100005 NA 2008 NA NA NA NA NA NA NA
14 100005 NA 2009 NA NA NA NA NA NA NA
15 100005 NA 2010 NA NA NA NA NA NA NA
16 100005 NA 2011 NA NA NA NA NA NA NA
17 100005 NA 2012 NA NA NA NA NA NA NA
18 100005 NA 2013 NA NA NA NA NA NA NA
19 100005 NA 2014 NA NA NA NA NA NA NA
20 100006 6 2005 4.83 4.33 4.67 5.75 3.67 2005 4.83
Thanks for this. Unfortunately, though, this does not appear to reset the data in 'first_year' or 'O_fill' for a new ID.
– aspark2020
Aug 26 at 22:24
@aspark2020 as you can see above, it works for me. Probably you have a name collision betweendplyrandplyringroup_byverb, try loadingdplyrandzooin a fresh R session or callgroup_bydirectly usingdplyr::group_by.
– A. Suliman
Aug 27 at 2:37
add a comment |
up vote
0
down vote
library(dplyr)
library(zoo)
data %>% group_by(Person_ID) %>%
mutate(first_year=dplyr::first(Year[!is.na(O)])) %>% #return first year where O is not missing
mutate(O_fill=na.locf(O, na.rm = FALSE)) %>% #using zoo::na.locf to replace NA's with non-NA prior to it
head(n=20)
# A tibble: 20 x 10
# Groups: Person_ID [3]
Person_ID Job_Satisfaction Year O C E A N first_year O_fill
<int> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl>
1 100003 0 2005 5 4.67 4.33 6.75 3.17 2005 5
2 100003 NA 2006 NA NA NA NA NA 2005 5
3 100003 7 2007 NA NA NA NA NA 2005 5
4 100003 NA 2008 NA NA NA NA NA 2005 5
5 100003 8 2009 5.5 6.83 5.17 7 3.33 2005 5.5
6 100003 10 2010 NA NA NA NA NA 2005 5.5
7 100003 NA 2011 NA NA NA NA NA 2005 5.5
8 100003 NA 2012 NA NA NA NA NA 2005 5.5
9 100003 NA 2013 NA NA NA NA NA 2005 5.5
10 100003 NA 2014 NA NA NA NA NA 2005 5.5
11 100005 9 2005 NA NA NA NA NA NA NA
12 100005 NA 2007 NA NA NA NA NA NA NA
13 100005 NA 2008 NA NA NA NA NA NA NA
14 100005 NA 2009 NA NA NA NA NA NA NA
15 100005 NA 2010 NA NA NA NA NA NA NA
16 100005 NA 2011 NA NA NA NA NA NA NA
17 100005 NA 2012 NA NA NA NA NA NA NA
18 100005 NA 2013 NA NA NA NA NA NA NA
19 100005 NA 2014 NA NA NA NA NA NA NA
20 100006 6 2005 4.83 4.33 4.67 5.75 3.67 2005 4.83
Thanks for this. Unfortunately, though, this does not appear to reset the data in 'first_year' or 'O_fill' for a new ID.
– aspark2020
Aug 26 at 22:24
@aspark2020 as you can see above, it works for me. Probably you have a name collision betweendplyrandplyringroup_byverb, try loadingdplyrandzooin a fresh R session or callgroup_bydirectly usingdplyr::group_by.
– A. Suliman
Aug 27 at 2:37
add a comment |
up vote
0
down vote
up vote
0
down vote
library(dplyr)
library(zoo)
data %>% group_by(Person_ID) %>%
mutate(first_year=dplyr::first(Year[!is.na(O)])) %>% #return first year where O is not missing
mutate(O_fill=na.locf(O, na.rm = FALSE)) %>% #using zoo::na.locf to replace NA's with non-NA prior to it
head(n=20)
# A tibble: 20 x 10
# Groups: Person_ID [3]
Person_ID Job_Satisfaction Year O C E A N first_year O_fill
<int> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl>
1 100003 0 2005 5 4.67 4.33 6.75 3.17 2005 5
2 100003 NA 2006 NA NA NA NA NA 2005 5
3 100003 7 2007 NA NA NA NA NA 2005 5
4 100003 NA 2008 NA NA NA NA NA 2005 5
5 100003 8 2009 5.5 6.83 5.17 7 3.33 2005 5.5
6 100003 10 2010 NA NA NA NA NA 2005 5.5
7 100003 NA 2011 NA NA NA NA NA 2005 5.5
8 100003 NA 2012 NA NA NA NA NA 2005 5.5
9 100003 NA 2013 NA NA NA NA NA 2005 5.5
10 100003 NA 2014 NA NA NA NA NA 2005 5.5
11 100005 9 2005 NA NA NA NA NA NA NA
12 100005 NA 2007 NA NA NA NA NA NA NA
13 100005 NA 2008 NA NA NA NA NA NA NA
14 100005 NA 2009 NA NA NA NA NA NA NA
15 100005 NA 2010 NA NA NA NA NA NA NA
16 100005 NA 2011 NA NA NA NA NA NA NA
17 100005 NA 2012 NA NA NA NA NA NA NA
18 100005 NA 2013 NA NA NA NA NA NA NA
19 100005 NA 2014 NA NA NA NA NA NA NA
20 100006 6 2005 4.83 4.33 4.67 5.75 3.67 2005 4.83
library(dplyr)
library(zoo)
data %>% group_by(Person_ID) %>%
mutate(first_year=dplyr::first(Year[!is.na(O)])) %>% #return first year where O is not missing
mutate(O_fill=na.locf(O, na.rm = FALSE)) %>% #using zoo::na.locf to replace NA's with non-NA prior to it
head(n=20)
# A tibble: 20 x 10
# Groups: Person_ID [3]
Person_ID Job_Satisfaction Year O C E A N first_year O_fill
<int> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl>
1 100003 0 2005 5 4.67 4.33 6.75 3.17 2005 5
2 100003 NA 2006 NA NA NA NA NA 2005 5
3 100003 7 2007 NA NA NA NA NA 2005 5
4 100003 NA 2008 NA NA NA NA NA 2005 5
5 100003 8 2009 5.5 6.83 5.17 7 3.33 2005 5.5
6 100003 10 2010 NA NA NA NA NA 2005 5.5
7 100003 NA 2011 NA NA NA NA NA 2005 5.5
8 100003 NA 2012 NA NA NA NA NA 2005 5.5
9 100003 NA 2013 NA NA NA NA NA 2005 5.5
10 100003 NA 2014 NA NA NA NA NA 2005 5.5
11 100005 9 2005 NA NA NA NA NA NA NA
12 100005 NA 2007 NA NA NA NA NA NA NA
13 100005 NA 2008 NA NA NA NA NA NA NA
14 100005 NA 2009 NA NA NA NA NA NA NA
15 100005 NA 2010 NA NA NA NA NA NA NA
16 100005 NA 2011 NA NA NA NA NA NA NA
17 100005 NA 2012 NA NA NA NA NA NA NA
18 100005 NA 2013 NA NA NA NA NA NA NA
19 100005 NA 2014 NA NA NA NA NA NA NA
20 100006 6 2005 4.83 4.33 4.67 5.75 3.67 2005 4.83
edited Aug 27 at 3:27
answered Aug 26 at 5:29
A. Suliman
4,48531022
4,48531022
Thanks for this. Unfortunately, though, this does not appear to reset the data in 'first_year' or 'O_fill' for a new ID.
– aspark2020
Aug 26 at 22:24
@aspark2020 as you can see above, it works for me. Probably you have a name collision betweendplyrandplyringroup_byverb, try loadingdplyrandzooin a fresh R session or callgroup_bydirectly usingdplyr::group_by.
– A. Suliman
Aug 27 at 2:37
add a comment |
Thanks for this. Unfortunately, though, this does not appear to reset the data in 'first_year' or 'O_fill' for a new ID.
– aspark2020
Aug 26 at 22:24
@aspark2020 as you can see above, it works for me. Probably you have a name collision betweendplyrandplyringroup_byverb, try loadingdplyrandzooin a fresh R session or callgroup_bydirectly usingdplyr::group_by.
– A. Suliman
Aug 27 at 2:37
Thanks for this. Unfortunately, though, this does not appear to reset the data in 'first_year' or 'O_fill' for a new ID.
– aspark2020
Aug 26 at 22:24
Thanks for this. Unfortunately, though, this does not appear to reset the data in 'first_year' or 'O_fill' for a new ID.
– aspark2020
Aug 26 at 22:24
@aspark2020 as you can see above, it works for me. Probably you have a name collision between
dplyr and plyr in group_by verb, try loading dplyr and zoo in a fresh R session or call group_by directly using dplyr::group_by.– A. Suliman
Aug 27 at 2:37
@aspark2020 as you can see above, it works for me. Probably you have a name collision between
dplyr and plyr in group_by verb, try loading dplyr and zoo in a fresh R session or call group_by directly using dplyr::group_by.– A. Suliman
Aug 27 at 2:37
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f52023309%2fr-how-to-selectively-copy-long-format-case-responses-to-future-years%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown