Subtract two dates from different columns based on data availability
up vote
0
down vote
favorite
Below is my data-frame.
enter image description here
I need to subtract dates c/d from a/b based on date availability if 'a' is NA I need to select the value from 'b' and same goes for c and d. If 'c' is NA I need to select the value from 'd'. I need a column 'e' containing the difference.
How to loop through each row and perform this kind of subtraction?
python dataframe
add a comment |
up vote
0
down vote
favorite
Below is my data-frame.
enter image description here
I need to subtract dates c/d from a/b based on date availability if 'a' is NA I need to select the value from 'b' and same goes for c and d. If 'c' is NA I need to select the value from 'd'. I need a column 'e' containing the difference.
How to loop through each row and perform this kind of subtraction?
python dataframe
To be sure, what you want is to take the difference of a and c, but if a or c is NA, you want to swap to use b and/or d accordingly. Correct? If so, you can make intermediate columns for the a or b and c or d, and then just subtract those two columns instead. Could you paste the data so that we can try this example out ourselves with your data?
– Alexander Reynolds
Nov 19 at 1:16
Yes, what you told is correct, in fact i am struggling to get that intermediate columns. I have code to subtract the date.
– Arun Kumaran
Nov 19 at 1:24
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
Below is my data-frame.
enter image description here
I need to subtract dates c/d from a/b based on date availability if 'a' is NA I need to select the value from 'b' and same goes for c and d. If 'c' is NA I need to select the value from 'd'. I need a column 'e' containing the difference.
How to loop through each row and perform this kind of subtraction?
python dataframe
Below is my data-frame.
enter image description here
I need to subtract dates c/d from a/b based on date availability if 'a' is NA I need to select the value from 'b' and same goes for c and d. If 'c' is NA I need to select the value from 'd'. I need a column 'e' containing the difference.
How to loop through each row and perform this kind of subtraction?
python dataframe
python dataframe
edited Nov 19 at 1:40
asked Nov 19 at 1:04
Arun Kumaran
33
33
To be sure, what you want is to take the difference of a and c, but if a or c is NA, you want to swap to use b and/or d accordingly. Correct? If so, you can make intermediate columns for the a or b and c or d, and then just subtract those two columns instead. Could you paste the data so that we can try this example out ourselves with your data?
– Alexander Reynolds
Nov 19 at 1:16
Yes, what you told is correct, in fact i am struggling to get that intermediate columns. I have code to subtract the date.
– Arun Kumaran
Nov 19 at 1:24
add a comment |
To be sure, what you want is to take the difference of a and c, but if a or c is NA, you want to swap to use b and/or d accordingly. Correct? If so, you can make intermediate columns for the a or b and c or d, and then just subtract those two columns instead. Could you paste the data so that we can try this example out ourselves with your data?
– Alexander Reynolds
Nov 19 at 1:16
Yes, what you told is correct, in fact i am struggling to get that intermediate columns. I have code to subtract the date.
– Arun Kumaran
Nov 19 at 1:24
To be sure, what you want is to take the difference of a and c, but if a or c is NA, you want to swap to use b and/or d accordingly. Correct? If so, you can make intermediate columns for the a or b and c or d, and then just subtract those two columns instead. Could you paste the data so that we can try this example out ourselves with your data?
– Alexander Reynolds
Nov 19 at 1:16
To be sure, what you want is to take the difference of a and c, but if a or c is NA, you want to swap to use b and/or d accordingly. Correct? If so, you can make intermediate columns for the a or b and c or d, and then just subtract those two columns instead. Could you paste the data so that we can try this example out ourselves with your data?
– Alexander Reynolds
Nov 19 at 1:16
Yes, what you told is correct, in fact i am struggling to get that intermediate columns. I have code to subtract the date.
– Arun Kumaran
Nov 19 at 1:24
Yes, what you told is correct, in fact i am struggling to get that intermediate columns. I have code to subtract the date.
– Arun Kumaran
Nov 19 at 1:24
add a comment |
1 Answer
1
active
oldest
votes
up vote
0
down vote
accepted
Following the logic in my comment, the easiest thing to do with Pandas most of the time is to create intermediate columns. Eventually you can remove them or optimize them away if you don't want them. But it is an easy way to encapsulate your logic. What you want to do is take a dataframe like this:
>>> df
a b c d
0 0.414762 0.113796 0.134529 NaN
1 NaN 0.662192 0.703417 NaN
2 0.958970 NaN 0.237540 NaN
3 0.975512 0.241572 NaN 0.720148
4 0.719265 0.735744 0.801279 NaN
and make some intermediate columns that have the value of df['a']
when it is not NaN
, and otherwise fill with the value of df['b']
. You can do this with df.fillna()
pretty easily; you can use it to fill the NaN
values with values from another column. Then you can just take the difference of those two columns. For e.g.:
>>> df['a_or_b'] = df['a'].fillna(df['b'])
>>> df['c_or_d'] = df['c'].fillna(df['d'])
>>> df['e'] = df['a_or_b'] - df['c_or_d']
>>> df
a b c d a_or_b c_or_d e
0 0.414762 0.113796 0.134529 NaN 0.414762 0.134529 0.280233
1 NaN 0.662192 0.703417 NaN 0.662192 0.703417 -0.041225
2 0.958970 NaN 0.237540 NaN 0.958970 0.237540 0.721430
3 0.975512 0.241572 NaN 0.720148 0.975512 0.720148 0.255364
4 0.719265 0.735744 0.801279 NaN 0.719265 0.801279 -0.082013
This is assuming the missing values are NaN
but yours are N/A
. You can also use df.replace()
in the same way to replace the value of strings:
>>> df
a b c d
0 0.414762 0.113796 0.134529 N/A
1 N/A 0.662192 0.703417 N/A
2 0.95897 N/A 0.23754 N/A
3 0.975512 0.241572 N/A 0.720148
4 0.719265 0.735744 0.801279 N/A
>>> df['a_or_b'] = df['a'].replace('N/A', df['b'])
>>> df['c_or_d'] = df['c'].replace('N/A', df['d'])
>>> df['e'] = df['a_or_b'] - df['c_or_d']
>>> df
a b c d a_or_b c_or_d e
0 0.414762 0.113796 0.134529 N/A 0.414762 0.134529 0.280233
1 N/A 0.662192 0.703417 N/A 0.662192 0.703417 -0.041225
2 0.95897 N/A 0.23754 N/A 0.958970 0.237540 0.721430
3 0.975512 0.241572 N/A 0.720148 0.975512 0.720148 0.255364
4 0.719265 0.735744 0.801279 N/A 0.719265 0.801279 -0.082013
Although I do recommend not using strings but actual null-type values when you're working with them, like NaN
(np.nan
) or None
instead of a string like N/A
.
Either way, now you know what the intermediate columns are---so you can just directly use those results instead of actually assigning them to the dataframe if you don't want to.
>>> df
a b c d
0 0.414762 0.113796 0.134529 N/A
1 N/A 0.662192 0.703417 N/A
2 0.95897 N/A 0.23754 N/A
3 0.975512 0.241572 N/A 0.720148
4 0.719265 0.735744 0.801279 N/A
>>> df['e'] = df['a'].replace('N/A', df['b']) - df['c'].replace('N/A', df['d'])
>>> df
a b c d e
0 0.414762 0.113796 0.134529 N/A 0.280233
1 N/A 0.662192 0.703417 N/A -0.041225
2 0.95897 N/A 0.23754 N/A 0.721430
3 0.975512 0.241572 N/A 0.720148 0.255364
4 0.719265 0.735744 0.801279 N/A -0.082013
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
accepted
Following the logic in my comment, the easiest thing to do with Pandas most of the time is to create intermediate columns. Eventually you can remove them or optimize them away if you don't want them. But it is an easy way to encapsulate your logic. What you want to do is take a dataframe like this:
>>> df
a b c d
0 0.414762 0.113796 0.134529 NaN
1 NaN 0.662192 0.703417 NaN
2 0.958970 NaN 0.237540 NaN
3 0.975512 0.241572 NaN 0.720148
4 0.719265 0.735744 0.801279 NaN
and make some intermediate columns that have the value of df['a']
when it is not NaN
, and otherwise fill with the value of df['b']
. You can do this with df.fillna()
pretty easily; you can use it to fill the NaN
values with values from another column. Then you can just take the difference of those two columns. For e.g.:
>>> df['a_or_b'] = df['a'].fillna(df['b'])
>>> df['c_or_d'] = df['c'].fillna(df['d'])
>>> df['e'] = df['a_or_b'] - df['c_or_d']
>>> df
a b c d a_or_b c_or_d e
0 0.414762 0.113796 0.134529 NaN 0.414762 0.134529 0.280233
1 NaN 0.662192 0.703417 NaN 0.662192 0.703417 -0.041225
2 0.958970 NaN 0.237540 NaN 0.958970 0.237540 0.721430
3 0.975512 0.241572 NaN 0.720148 0.975512 0.720148 0.255364
4 0.719265 0.735744 0.801279 NaN 0.719265 0.801279 -0.082013
This is assuming the missing values are NaN
but yours are N/A
. You can also use df.replace()
in the same way to replace the value of strings:
>>> df
a b c d
0 0.414762 0.113796 0.134529 N/A
1 N/A 0.662192 0.703417 N/A
2 0.95897 N/A 0.23754 N/A
3 0.975512 0.241572 N/A 0.720148
4 0.719265 0.735744 0.801279 N/A
>>> df['a_or_b'] = df['a'].replace('N/A', df['b'])
>>> df['c_or_d'] = df['c'].replace('N/A', df['d'])
>>> df['e'] = df['a_or_b'] - df['c_or_d']
>>> df
a b c d a_or_b c_or_d e
0 0.414762 0.113796 0.134529 N/A 0.414762 0.134529 0.280233
1 N/A 0.662192 0.703417 N/A 0.662192 0.703417 -0.041225
2 0.95897 N/A 0.23754 N/A 0.958970 0.237540 0.721430
3 0.975512 0.241572 N/A 0.720148 0.975512 0.720148 0.255364
4 0.719265 0.735744 0.801279 N/A 0.719265 0.801279 -0.082013
Although I do recommend not using strings but actual null-type values when you're working with them, like NaN
(np.nan
) or None
instead of a string like N/A
.
Either way, now you know what the intermediate columns are---so you can just directly use those results instead of actually assigning them to the dataframe if you don't want to.
>>> df
a b c d
0 0.414762 0.113796 0.134529 N/A
1 N/A 0.662192 0.703417 N/A
2 0.95897 N/A 0.23754 N/A
3 0.975512 0.241572 N/A 0.720148
4 0.719265 0.735744 0.801279 N/A
>>> df['e'] = df['a'].replace('N/A', df['b']) - df['c'].replace('N/A', df['d'])
>>> df
a b c d e
0 0.414762 0.113796 0.134529 N/A 0.280233
1 N/A 0.662192 0.703417 N/A -0.041225
2 0.95897 N/A 0.23754 N/A 0.721430
3 0.975512 0.241572 N/A 0.720148 0.255364
4 0.719265 0.735744 0.801279 N/A -0.082013
add a comment |
up vote
0
down vote
accepted
Following the logic in my comment, the easiest thing to do with Pandas most of the time is to create intermediate columns. Eventually you can remove them or optimize them away if you don't want them. But it is an easy way to encapsulate your logic. What you want to do is take a dataframe like this:
>>> df
a b c d
0 0.414762 0.113796 0.134529 NaN
1 NaN 0.662192 0.703417 NaN
2 0.958970 NaN 0.237540 NaN
3 0.975512 0.241572 NaN 0.720148
4 0.719265 0.735744 0.801279 NaN
and make some intermediate columns that have the value of df['a']
when it is not NaN
, and otherwise fill with the value of df['b']
. You can do this with df.fillna()
pretty easily; you can use it to fill the NaN
values with values from another column. Then you can just take the difference of those two columns. For e.g.:
>>> df['a_or_b'] = df['a'].fillna(df['b'])
>>> df['c_or_d'] = df['c'].fillna(df['d'])
>>> df['e'] = df['a_or_b'] - df['c_or_d']
>>> df
a b c d a_or_b c_or_d e
0 0.414762 0.113796 0.134529 NaN 0.414762 0.134529 0.280233
1 NaN 0.662192 0.703417 NaN 0.662192 0.703417 -0.041225
2 0.958970 NaN 0.237540 NaN 0.958970 0.237540 0.721430
3 0.975512 0.241572 NaN 0.720148 0.975512 0.720148 0.255364
4 0.719265 0.735744 0.801279 NaN 0.719265 0.801279 -0.082013
This is assuming the missing values are NaN
but yours are N/A
. You can also use df.replace()
in the same way to replace the value of strings:
>>> df
a b c d
0 0.414762 0.113796 0.134529 N/A
1 N/A 0.662192 0.703417 N/A
2 0.95897 N/A 0.23754 N/A
3 0.975512 0.241572 N/A 0.720148
4 0.719265 0.735744 0.801279 N/A
>>> df['a_or_b'] = df['a'].replace('N/A', df['b'])
>>> df['c_or_d'] = df['c'].replace('N/A', df['d'])
>>> df['e'] = df['a_or_b'] - df['c_or_d']
>>> df
a b c d a_or_b c_or_d e
0 0.414762 0.113796 0.134529 N/A 0.414762 0.134529 0.280233
1 N/A 0.662192 0.703417 N/A 0.662192 0.703417 -0.041225
2 0.95897 N/A 0.23754 N/A 0.958970 0.237540 0.721430
3 0.975512 0.241572 N/A 0.720148 0.975512 0.720148 0.255364
4 0.719265 0.735744 0.801279 N/A 0.719265 0.801279 -0.082013
Although I do recommend not using strings but actual null-type values when you're working with them, like NaN
(np.nan
) or None
instead of a string like N/A
.
Either way, now you know what the intermediate columns are---so you can just directly use those results instead of actually assigning them to the dataframe if you don't want to.
>>> df
a b c d
0 0.414762 0.113796 0.134529 N/A
1 N/A 0.662192 0.703417 N/A
2 0.95897 N/A 0.23754 N/A
3 0.975512 0.241572 N/A 0.720148
4 0.719265 0.735744 0.801279 N/A
>>> df['e'] = df['a'].replace('N/A', df['b']) - df['c'].replace('N/A', df['d'])
>>> df
a b c d e
0 0.414762 0.113796 0.134529 N/A 0.280233
1 N/A 0.662192 0.703417 N/A -0.041225
2 0.95897 N/A 0.23754 N/A 0.721430
3 0.975512 0.241572 N/A 0.720148 0.255364
4 0.719265 0.735744 0.801279 N/A -0.082013
add a comment |
up vote
0
down vote
accepted
up vote
0
down vote
accepted
Following the logic in my comment, the easiest thing to do with Pandas most of the time is to create intermediate columns. Eventually you can remove them or optimize them away if you don't want them. But it is an easy way to encapsulate your logic. What you want to do is take a dataframe like this:
>>> df
a b c d
0 0.414762 0.113796 0.134529 NaN
1 NaN 0.662192 0.703417 NaN
2 0.958970 NaN 0.237540 NaN
3 0.975512 0.241572 NaN 0.720148
4 0.719265 0.735744 0.801279 NaN
and make some intermediate columns that have the value of df['a']
when it is not NaN
, and otherwise fill with the value of df['b']
. You can do this with df.fillna()
pretty easily; you can use it to fill the NaN
values with values from another column. Then you can just take the difference of those two columns. For e.g.:
>>> df['a_or_b'] = df['a'].fillna(df['b'])
>>> df['c_or_d'] = df['c'].fillna(df['d'])
>>> df['e'] = df['a_or_b'] - df['c_or_d']
>>> df
a b c d a_or_b c_or_d e
0 0.414762 0.113796 0.134529 NaN 0.414762 0.134529 0.280233
1 NaN 0.662192 0.703417 NaN 0.662192 0.703417 -0.041225
2 0.958970 NaN 0.237540 NaN 0.958970 0.237540 0.721430
3 0.975512 0.241572 NaN 0.720148 0.975512 0.720148 0.255364
4 0.719265 0.735744 0.801279 NaN 0.719265 0.801279 -0.082013
This is assuming the missing values are NaN
but yours are N/A
. You can also use df.replace()
in the same way to replace the value of strings:
>>> df
a b c d
0 0.414762 0.113796 0.134529 N/A
1 N/A 0.662192 0.703417 N/A
2 0.95897 N/A 0.23754 N/A
3 0.975512 0.241572 N/A 0.720148
4 0.719265 0.735744 0.801279 N/A
>>> df['a_or_b'] = df['a'].replace('N/A', df['b'])
>>> df['c_or_d'] = df['c'].replace('N/A', df['d'])
>>> df['e'] = df['a_or_b'] - df['c_or_d']
>>> df
a b c d a_or_b c_or_d e
0 0.414762 0.113796 0.134529 N/A 0.414762 0.134529 0.280233
1 N/A 0.662192 0.703417 N/A 0.662192 0.703417 -0.041225
2 0.95897 N/A 0.23754 N/A 0.958970 0.237540 0.721430
3 0.975512 0.241572 N/A 0.720148 0.975512 0.720148 0.255364
4 0.719265 0.735744 0.801279 N/A 0.719265 0.801279 -0.082013
Although I do recommend not using strings but actual null-type values when you're working with them, like NaN
(np.nan
) or None
instead of a string like N/A
.
Either way, now you know what the intermediate columns are---so you can just directly use those results instead of actually assigning them to the dataframe if you don't want to.
>>> df
a b c d
0 0.414762 0.113796 0.134529 N/A
1 N/A 0.662192 0.703417 N/A
2 0.95897 N/A 0.23754 N/A
3 0.975512 0.241572 N/A 0.720148
4 0.719265 0.735744 0.801279 N/A
>>> df['e'] = df['a'].replace('N/A', df['b']) - df['c'].replace('N/A', df['d'])
>>> df
a b c d e
0 0.414762 0.113796 0.134529 N/A 0.280233
1 N/A 0.662192 0.703417 N/A -0.041225
2 0.95897 N/A 0.23754 N/A 0.721430
3 0.975512 0.241572 N/A 0.720148 0.255364
4 0.719265 0.735744 0.801279 N/A -0.082013
Following the logic in my comment, the easiest thing to do with Pandas most of the time is to create intermediate columns. Eventually you can remove them or optimize them away if you don't want them. But it is an easy way to encapsulate your logic. What you want to do is take a dataframe like this:
>>> df
a b c d
0 0.414762 0.113796 0.134529 NaN
1 NaN 0.662192 0.703417 NaN
2 0.958970 NaN 0.237540 NaN
3 0.975512 0.241572 NaN 0.720148
4 0.719265 0.735744 0.801279 NaN
and make some intermediate columns that have the value of df['a']
when it is not NaN
, and otherwise fill with the value of df['b']
. You can do this with df.fillna()
pretty easily; you can use it to fill the NaN
values with values from another column. Then you can just take the difference of those two columns. For e.g.:
>>> df['a_or_b'] = df['a'].fillna(df['b'])
>>> df['c_or_d'] = df['c'].fillna(df['d'])
>>> df['e'] = df['a_or_b'] - df['c_or_d']
>>> df
a b c d a_or_b c_or_d e
0 0.414762 0.113796 0.134529 NaN 0.414762 0.134529 0.280233
1 NaN 0.662192 0.703417 NaN 0.662192 0.703417 -0.041225
2 0.958970 NaN 0.237540 NaN 0.958970 0.237540 0.721430
3 0.975512 0.241572 NaN 0.720148 0.975512 0.720148 0.255364
4 0.719265 0.735744 0.801279 NaN 0.719265 0.801279 -0.082013
This is assuming the missing values are NaN
but yours are N/A
. You can also use df.replace()
in the same way to replace the value of strings:
>>> df
a b c d
0 0.414762 0.113796 0.134529 N/A
1 N/A 0.662192 0.703417 N/A
2 0.95897 N/A 0.23754 N/A
3 0.975512 0.241572 N/A 0.720148
4 0.719265 0.735744 0.801279 N/A
>>> df['a_or_b'] = df['a'].replace('N/A', df['b'])
>>> df['c_or_d'] = df['c'].replace('N/A', df['d'])
>>> df['e'] = df['a_or_b'] - df['c_or_d']
>>> df
a b c d a_or_b c_or_d e
0 0.414762 0.113796 0.134529 N/A 0.414762 0.134529 0.280233
1 N/A 0.662192 0.703417 N/A 0.662192 0.703417 -0.041225
2 0.95897 N/A 0.23754 N/A 0.958970 0.237540 0.721430
3 0.975512 0.241572 N/A 0.720148 0.975512 0.720148 0.255364
4 0.719265 0.735744 0.801279 N/A 0.719265 0.801279 -0.082013
Although I do recommend not using strings but actual null-type values when you're working with them, like NaN
(np.nan
) or None
instead of a string like N/A
.
Either way, now you know what the intermediate columns are---so you can just directly use those results instead of actually assigning them to the dataframe if you don't want to.
>>> df
a b c d
0 0.414762 0.113796 0.134529 N/A
1 N/A 0.662192 0.703417 N/A
2 0.95897 N/A 0.23754 N/A
3 0.975512 0.241572 N/A 0.720148
4 0.719265 0.735744 0.801279 N/A
>>> df['e'] = df['a'].replace('N/A', df['b']) - df['c'].replace('N/A', df['d'])
>>> df
a b c d e
0 0.414762 0.113796 0.134529 N/A 0.280233
1 N/A 0.662192 0.703417 N/A -0.041225
2 0.95897 N/A 0.23754 N/A 0.721430
3 0.975512 0.241572 N/A 0.720148 0.255364
4 0.719265 0.735744 0.801279 N/A -0.082013
edited Nov 19 at 1:31
answered Nov 19 at 1:26
Alexander Reynolds
8,74611537
8,74611537
add a comment |
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53367079%2fsubtract-two-dates-from-different-columns-based-on-data-availability%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
To be sure, what you want is to take the difference of a and c, but if a or c is NA, you want to swap to use b and/or d accordingly. Correct? If so, you can make intermediate columns for the a or b and c or d, and then just subtract those two columns instead. Could you paste the data so that we can try this example out ourselves with your data?
– Alexander Reynolds
Nov 19 at 1:16
Yes, what you told is correct, in fact i am struggling to get that intermediate columns. I have code to subtract the date.
– Arun Kumaran
Nov 19 at 1:24