Subtract two dates from different columns based on data availability

up vote
0
down vote

favorite

Below is my data-frame.

enter image description here

I need to subtract dates c/d from a/b based on date availability if 'a' is NA I need to select the value from 'b' and same goes for c and d. If 'c' is NA I need to select the value from 'd'. I need a column 'e' containing the difference.

How to loop through each row and perform this kind of subtraction?

edited Nov 19 at 1:40

asked Nov 19 at 1:04

Arun Kumaran

To be sure, what you want is to take the difference of a and c, but if a or c is NA, you want to swap to use b and/or d accordingly. Correct? If so, you can make intermediate columns for the a or b and c or d, and then just subtract those two columns instead. Could you paste the data so that we can try this example out ourselves with your data?
– Alexander Reynolds
Nov 19 at 1:16

Yes, what you told is correct, in fact i am struggling to get that intermediate columns. I have code to subtract the date.
– Arun Kumaran
Nov 19 at 1:24

add a comment |

up vote
0
down vote

favorite

Below is my data-frame.

enter image description here

How to loop through each row and perform this kind of subtraction?

edited Nov 19 at 1:40

asked Nov 19 at 1:04

Arun Kumaran

To be sure, what you want is to take the difference of a and c, but if a or c is NA, you want to swap to use b and/or d accordingly. Correct? If so, you can make intermediate columns for the a or b and c or d, and then just subtract those two columns instead. Could you paste the data so that we can try this example out ourselves with your data?
– Alexander Reynolds
Nov 19 at 1:16

Yes, what you told is correct, in fact i am struggling to get that intermediate columns. I have code to subtract the date.
– Arun Kumaran
Nov 19 at 1:24

add a comment |

up vote
0
down vote

favorite

Below is my data-frame.

enter image description here

How to loop through each row and perform this kind of subtraction?

edited Nov 19 at 1:40

asked Nov 19 at 1:04

Arun Kumaran

Below is my data-frame.

enter image description here

How to loop through each row and perform this kind of subtraction?

python dataframe

edited Nov 19 at 1:40

asked Nov 19 at 1:04

Arun Kumaran

edited Nov 19 at 1:40

asked Nov 19 at 1:04

Arun Kumaran

edited Nov 19 at 1:40

asked Nov 19 at 1:04

Arun Kumaran

asked Nov 19 at 1:04

Arun Kumaran

asked Nov 19 at 1:04

Arun Kumaran

To be sure, what you want is to take the difference of a and c, but if a or c is NA, you want to swap to use b and/or d accordingly. Correct? If so, you can make intermediate columns for the a or b and c or d, and then just subtract those two columns instead. Could you paste the data so that we can try this example out ourselves with your data?
– Alexander Reynolds
Nov 19 at 1:16

Yes, what you told is correct, in fact i am struggling to get that intermediate columns. I have code to subtract the date.
– Arun Kumaran
Nov 19 at 1:24

add a comment |

To be sure, what you want is to take the difference of a and c, but if a or c is NA, you want to swap to use b and/or d accordingly. Correct? If so, you can make intermediate columns for the a or b and c or d, and then just subtract those two columns instead. Could you paste the data so that we can try this example out ourselves with your data?
– Alexander Reynolds
Nov 19 at 1:16

Yes, what you told is correct, in fact i am struggling to get that intermediate columns. I have code to subtract the date.
– Arun Kumaran
Nov 19 at 1:24

To be sure, what you want is to take the difference of a and c, but if a or c is NA, you want to swap to use b and/or d accordingly. Correct? If so, you can make intermediate columns for the a or b and c or d, and then just subtract those two columns instead. Could you paste the data so that we can try this example out ourselves with your data?
– Alexander Reynolds
Nov 19 at 1:16

Yes, what you told is correct, in fact i am struggling to get that intermediate columns. I have code to subtract the date.
– Arun Kumaran
Nov 19 at 1:24

add a comment |

1 Answer
1

active

oldest

votes

up vote
0
down vote

accepted

Following the logic in my comment, the easiest thing to do with Pandas most of the time is to create intermediate columns. Eventually you can remove them or optimize them away if you don't want them. But it is an easy way to encapsulate your logic. What you want to do is take a dataframe like this:

>>> df

          a         b         c         d

0  0.414762  0.113796  0.134529       NaN

1       NaN  0.662192  0.703417       NaN

2  0.958970       NaN  0.237540       NaN

3  0.975512  0.241572       NaN  0.720148

4  0.719265  0.735744  0.801279       NaN

and make some intermediate columns that have the value of df['a'] when it is not NaN, and otherwise fill with the value of df['b']. You can do this with df.fillna() pretty easily; you can use it to fill the NaN values with values from another column. Then you can just take the difference of those two columns. For e.g.:

>>> df['a_or_b'] = df['a'].fillna(df['b'])

>>> df['c_or_d'] = df['c'].fillna(df['d'])

>>> df['e'] = df['a_or_b'] - df['c_or_d']

>>> df

          a         b         c         d    a_or_b    c_or_d         e

0  0.414762  0.113796  0.134529       NaN  0.414762  0.134529  0.280233

1       NaN  0.662192  0.703417       NaN  0.662192  0.703417 -0.041225

2  0.958970       NaN  0.237540       NaN  0.958970  0.237540  0.721430

3  0.975512  0.241572       NaN  0.720148  0.975512  0.720148  0.255364

4  0.719265  0.735744  0.801279       NaN  0.719265  0.801279 -0.082013

This is assuming the missing values are NaN but yours are N/A. You can also use df.replace() in the same way to replace the value of strings:

>>> df

          a         b         c         d

0  0.414762  0.113796  0.134529       N/A

1       N/A  0.662192  0.703417       N/A

2   0.95897       N/A   0.23754       N/A

3  0.975512  0.241572       N/A  0.720148

4  0.719265  0.735744  0.801279       N/A

>>> df['a_or_b'] = df['a'].replace('N/A', df['b'])

>>> df['c_or_d'] = df['c'].replace('N/A', df['d'])

>>> df['e'] = df['a_or_b'] - df['c_or_d']

>>> df

          a         b         c         d    a_or_b    c_or_d         e

0  0.414762  0.113796  0.134529       N/A  0.414762  0.134529  0.280233

1       N/A  0.662192  0.703417       N/A  0.662192  0.703417 -0.041225

2   0.95897       N/A   0.23754       N/A  0.958970  0.237540  0.721430

3  0.975512  0.241572       N/A  0.720148  0.975512  0.720148  0.255364

4  0.719265  0.735744  0.801279       N/A  0.719265  0.801279 -0.082013

Although I do recommend not using strings but actual null-type values when you're working with them, like NaN (np.nan) or None instead of a string like N/A.

Either way, now you know what the intermediate columns are---so you can just directly use those results instead of actually assigning them to the dataframe if you don't want to.

>>> df

          a         b         c         d

0  0.414762  0.113796  0.134529       N/A

1       N/A  0.662192  0.703417       N/A

2   0.95897       N/A   0.23754       N/A

3  0.975512  0.241572       N/A  0.720148

4  0.719265  0.735744  0.801279       N/A

>>> df['e'] = df['a'].replace('N/A', df['b']) - df['c'].replace('N/A', df['d'])

>>> df

          a         b         c         d         e

0  0.414762  0.113796  0.134529       N/A  0.280233

1       N/A  0.662192  0.703417       N/A -0.041225

2   0.95897       N/A   0.23754       N/A  0.721430

3  0.975512  0.241572       N/A  0.720148  0.255364

4  0.719265  0.735744  0.801279       N/A -0.082013

edited Nov 19 at 1:31

answered Nov 19 at 1:26

Alexander Reynolds

8,74611537

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53367079%2fsubtract-two-dates-from-different-columns-based-on-data-availability%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
0
down vote

accepted

>>> df

          a         b         c         d

0  0.414762  0.113796  0.134529       NaN

1       NaN  0.662192  0.703417       NaN

2  0.958970       NaN  0.237540       NaN

3  0.975512  0.241572       NaN  0.720148

4  0.719265  0.735744  0.801279       NaN

>>> df['a_or_b'] = df['a'].fillna(df['b'])

>>> df['c_or_d'] = df['c'].fillna(df['d'])

>>> df['e'] = df['a_or_b'] - df['c_or_d']

>>> df

          a         b         c         d    a_or_b    c_or_d         e

0  0.414762  0.113796  0.134529       NaN  0.414762  0.134529  0.280233

1       NaN  0.662192  0.703417       NaN  0.662192  0.703417 -0.041225

2  0.958970       NaN  0.237540       NaN  0.958970  0.237540  0.721430

3  0.975512  0.241572       NaN  0.720148  0.975512  0.720148  0.255364

4  0.719265  0.735744  0.801279       NaN  0.719265  0.801279 -0.082013

This is assuming the missing values are NaN but yours are N/A. You can also use df.replace() in the same way to replace the value of strings:

>>> df

          a         b         c         d

0  0.414762  0.113796  0.134529       N/A

1       N/A  0.662192  0.703417       N/A

2   0.95897       N/A   0.23754       N/A

3  0.975512  0.241572       N/A  0.720148

4  0.719265  0.735744  0.801279       N/A

>>> df['a_or_b'] = df['a'].replace('N/A', df['b'])

>>> df['c_or_d'] = df['c'].replace('N/A', df['d'])

>>> df['e'] = df['a_or_b'] - df['c_or_d']

>>> df

          a         b         c         d    a_or_b    c_or_d         e

0  0.414762  0.113796  0.134529       N/A  0.414762  0.134529  0.280233

1       N/A  0.662192  0.703417       N/A  0.662192  0.703417 -0.041225

2   0.95897       N/A   0.23754       N/A  0.958970  0.237540  0.721430

3  0.975512  0.241572       N/A  0.720148  0.975512  0.720148  0.255364

4  0.719265  0.735744  0.801279       N/A  0.719265  0.801279 -0.082013

Although I do recommend not using strings but actual null-type values when you're working with them, like NaN (np.nan) or None instead of a string like N/A.

Either way, now you know what the intermediate columns are---so you can just directly use those results instead of actually assigning them to the dataframe if you don't want to.

>>> df

          a         b         c         d

0  0.414762  0.113796  0.134529       N/A

1       N/A  0.662192  0.703417       N/A

2   0.95897       N/A   0.23754       N/A

3  0.975512  0.241572       N/A  0.720148

4  0.719265  0.735744  0.801279       N/A

>>> df['e'] = df['a'].replace('N/A', df['b']) - df['c'].replace('N/A', df['d'])

>>> df

          a         b         c         d         e

0  0.414762  0.113796  0.134529       N/A  0.280233

1       N/A  0.662192  0.703417       N/A -0.041225

2   0.95897       N/A   0.23754       N/A  0.721430

3  0.975512  0.241572       N/A  0.720148  0.255364

4  0.719265  0.735744  0.801279       N/A -0.082013

edited Nov 19 at 1:31

answered Nov 19 at 1:26

Alexander Reynolds

8,74611537

add a comment |

up vote
0
down vote

accepted

>>> df

          a         b         c         d

0  0.414762  0.113796  0.134529       NaN

1       NaN  0.662192  0.703417       NaN

2  0.958970       NaN  0.237540       NaN

3  0.975512  0.241572       NaN  0.720148

4  0.719265  0.735744  0.801279       NaN

>>> df['a_or_b'] = df['a'].fillna(df['b'])

>>> df['c_or_d'] = df['c'].fillna(df['d'])

>>> df['e'] = df['a_or_b'] - df['c_or_d']

>>> df

          a         b         c         d    a_or_b    c_or_d         e

0  0.414762  0.113796  0.134529       NaN  0.414762  0.134529  0.280233

1       NaN  0.662192  0.703417       NaN  0.662192  0.703417 -0.041225

2  0.958970       NaN  0.237540       NaN  0.958970  0.237540  0.721430

3  0.975512  0.241572       NaN  0.720148  0.975512  0.720148  0.255364

4  0.719265  0.735744  0.801279       NaN  0.719265  0.801279 -0.082013

This is assuming the missing values are NaN but yours are N/A. You can also use df.replace() in the same way to replace the value of strings:

>>> df

          a         b         c         d

0  0.414762  0.113796  0.134529       N/A

1       N/A  0.662192  0.703417       N/A

2   0.95897       N/A   0.23754       N/A

3  0.975512  0.241572       N/A  0.720148

4  0.719265  0.735744  0.801279       N/A

>>> df['a_or_b'] = df['a'].replace('N/A', df['b'])

>>> df['c_or_d'] = df['c'].replace('N/A', df['d'])

>>> df['e'] = df['a_or_b'] - df['c_or_d']

>>> df

          a         b         c         d    a_or_b    c_or_d         e

0  0.414762  0.113796  0.134529       N/A  0.414762  0.134529  0.280233

1       N/A  0.662192  0.703417       N/A  0.662192  0.703417 -0.041225

2   0.95897       N/A   0.23754       N/A  0.958970  0.237540  0.721430

3  0.975512  0.241572       N/A  0.720148  0.975512  0.720148  0.255364

4  0.719265  0.735744  0.801279       N/A  0.719265  0.801279 -0.082013

Although I do recommend not using strings but actual null-type values when you're working with them, like NaN (np.nan) or None instead of a string like N/A.

Either way, now you know what the intermediate columns are---so you can just directly use those results instead of actually assigning them to the dataframe if you don't want to.

>>> df

          a         b         c         d

0  0.414762  0.113796  0.134529       N/A

1       N/A  0.662192  0.703417       N/A

2   0.95897       N/A   0.23754       N/A

3  0.975512  0.241572       N/A  0.720148

4  0.719265  0.735744  0.801279       N/A

>>> df['e'] = df['a'].replace('N/A', df['b']) - df['c'].replace('N/A', df['d'])

>>> df

          a         b         c         d         e

0  0.414762  0.113796  0.134529       N/A  0.280233

1       N/A  0.662192  0.703417       N/A -0.041225

2   0.95897       N/A   0.23754       N/A  0.721430

3  0.975512  0.241572       N/A  0.720148  0.255364

4  0.719265  0.735744  0.801279       N/A -0.082013

edited Nov 19 at 1:31

answered Nov 19 at 1:26

Alexander Reynolds

8,74611537

add a comment |

up vote
0
down vote

accepted

>>> df

          a         b         c         d

0  0.414762  0.113796  0.134529       NaN

1       NaN  0.662192  0.703417       NaN

2  0.958970       NaN  0.237540       NaN

3  0.975512  0.241572       NaN  0.720148

4  0.719265  0.735744  0.801279       NaN

>>> df['a_or_b'] = df['a'].fillna(df['b'])

>>> df['c_or_d'] = df['c'].fillna(df['d'])

>>> df['e'] = df['a_or_b'] - df['c_or_d']

>>> df

          a         b         c         d    a_or_b    c_or_d         e

0  0.414762  0.113796  0.134529       NaN  0.414762  0.134529  0.280233

1       NaN  0.662192  0.703417       NaN  0.662192  0.703417 -0.041225

2  0.958970       NaN  0.237540       NaN  0.958970  0.237540  0.721430

3  0.975512  0.241572       NaN  0.720148  0.975512  0.720148  0.255364

4  0.719265  0.735744  0.801279       NaN  0.719265  0.801279 -0.082013

This is assuming the missing values are NaN but yours are N/A. You can also use df.replace() in the same way to replace the value of strings:

>>> df

          a         b         c         d

0  0.414762  0.113796  0.134529       N/A

1       N/A  0.662192  0.703417       N/A

2   0.95897       N/A   0.23754       N/A

3  0.975512  0.241572       N/A  0.720148

4  0.719265  0.735744  0.801279       N/A

>>> df['a_or_b'] = df['a'].replace('N/A', df['b'])

>>> df['c_or_d'] = df['c'].replace('N/A', df['d'])

>>> df['e'] = df['a_or_b'] - df['c_or_d']

>>> df

          a         b         c         d    a_or_b    c_or_d         e

0  0.414762  0.113796  0.134529       N/A  0.414762  0.134529  0.280233

1       N/A  0.662192  0.703417       N/A  0.662192  0.703417 -0.041225

2   0.95897       N/A   0.23754       N/A  0.958970  0.237540  0.721430

3  0.975512  0.241572       N/A  0.720148  0.975512  0.720148  0.255364

4  0.719265  0.735744  0.801279       N/A  0.719265  0.801279 -0.082013

Although I do recommend not using strings but actual null-type values when you're working with them, like NaN (np.nan) or None instead of a string like N/A.

Either way, now you know what the intermediate columns are---so you can just directly use those results instead of actually assigning them to the dataframe if you don't want to.

>>> df

          a         b         c         d

0  0.414762  0.113796  0.134529       N/A

1       N/A  0.662192  0.703417       N/A

2   0.95897       N/A   0.23754       N/A

3  0.975512  0.241572       N/A  0.720148

4  0.719265  0.735744  0.801279       N/A

>>> df['e'] = df['a'].replace('N/A', df['b']) - df['c'].replace('N/A', df['d'])

>>> df

          a         b         c         d         e

0  0.414762  0.113796  0.134529       N/A  0.280233

1       N/A  0.662192  0.703417       N/A -0.041225

2   0.95897       N/A   0.23754       N/A  0.721430

3  0.975512  0.241572       N/A  0.720148  0.255364

4  0.719265  0.735744  0.801279       N/A -0.082013

edited Nov 19 at 1:31

answered Nov 19 at 1:26

Alexander Reynolds

8,74611537

>>> df

          a         b         c         d

0  0.414762  0.113796  0.134529       NaN

1       NaN  0.662192  0.703417       NaN

2  0.958970       NaN  0.237540       NaN

3  0.975512  0.241572       NaN  0.720148

4  0.719265  0.735744  0.801279       NaN

>>> df['a_or_b'] = df['a'].fillna(df['b'])

>>> df['c_or_d'] = df['c'].fillna(df['d'])

>>> df['e'] = df['a_or_b'] - df['c_or_d']

>>> df

          a         b         c         d    a_or_b    c_or_d         e

0  0.414762  0.113796  0.134529       NaN  0.414762  0.134529  0.280233

1       NaN  0.662192  0.703417       NaN  0.662192  0.703417 -0.041225

2  0.958970       NaN  0.237540       NaN  0.958970  0.237540  0.721430

3  0.975512  0.241572       NaN  0.720148  0.975512  0.720148  0.255364

4  0.719265  0.735744  0.801279       NaN  0.719265  0.801279 -0.082013

This is assuming the missing values are NaN but yours are N/A. You can also use df.replace() in the same way to replace the value of strings:

>>> df

          a         b         c         d

0  0.414762  0.113796  0.134529       N/A

1       N/A  0.662192  0.703417       N/A

2   0.95897       N/A   0.23754       N/A

3  0.975512  0.241572       N/A  0.720148

4  0.719265  0.735744  0.801279       N/A

>>> df['a_or_b'] = df['a'].replace('N/A', df['b'])

>>> df['c_or_d'] = df['c'].replace('N/A', df['d'])

>>> df['e'] = df['a_or_b'] - df['c_or_d']

>>> df

          a         b         c         d    a_or_b    c_or_d         e

0  0.414762  0.113796  0.134529       N/A  0.414762  0.134529  0.280233

1       N/A  0.662192  0.703417       N/A  0.662192  0.703417 -0.041225

2   0.95897       N/A   0.23754       N/A  0.958970  0.237540  0.721430

3  0.975512  0.241572       N/A  0.720148  0.975512  0.720148  0.255364

4  0.719265  0.735744  0.801279       N/A  0.719265  0.801279 -0.082013

Although I do recommend not using strings but actual null-type values when you're working with them, like NaN (np.nan) or None instead of a string like N/A.

Either way, now you know what the intermediate columns are---so you can just directly use those results instead of actually assigning them to the dataframe if you don't want to.

>>> df

          a         b         c         d

0  0.414762  0.113796  0.134529       N/A

1       N/A  0.662192  0.703417       N/A

2   0.95897       N/A   0.23754       N/A

3  0.975512  0.241572       N/A  0.720148

4  0.719265  0.735744  0.801279       N/A

>>> df['e'] = df['a'].replace('N/A', df['b']) - df['c'].replace('N/A', df['d'])

>>> df

          a         b         c         d         e

0  0.414762  0.113796  0.134529       N/A  0.280233

1       N/A  0.662192  0.703417       N/A -0.041225

2   0.95897       N/A   0.23754       N/A  0.721430

3  0.975512  0.241572       N/A  0.720148  0.255364

4  0.719265  0.735744  0.801279       N/A -0.082013

edited Nov 19 at 1:31

answered Nov 19 at 1:26

Alexander Reynolds

8,74611537

edited Nov 19 at 1:31

answered Nov 19 at 1:26

Alexander Reynolds

8,74611537

answered Nov 19 at 1:26

Alexander Reynolds

8,74611537

answered Nov 19 at 1:26

Alexander Reynolds

8,74611537

add a comment |

draft saved

draft discarded

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

pe5W,UIilQEWS2iPO9G4b2xEs6WYDYD3RJ,be0rOuXB03yZrjWmYW,Gb67y5YI v

搜尋此網誌

Nsryjdtyk