Pandas assign a value to new row based on index on incoming live data
up vote
0
down vote
favorite
I am having trouble writing efficient code (without many loops) that assings a value to a cell in a pandas dataframe that is being updated every minute or so (live stream). In the training set I trained my model with one-hot encoded timestamp variables and it did better than continues variables, so that's what I want to use for production. The dataframe looks like this:
datetime DOW_1 DOW_2 ... DOW_7 Month1 Month2 Month3
`2018-07-01 09:30:00` 0 1 0 0 0 1
As you can see the columns are encoded with 0's and 1's to denote what month, day of week, (and I have more columns for day of year, is_holiday, etc...) I easily did this on training, validation, and test data using pd.get_dummies, but now that a live stream of data is coming in I cannot find an easy way to 'assign' month2 = 0 based on df.index.month
I tried doing something along the lines of this type of loop but it's quite tedious and slow..
i=0
while i < len(df):
for m in range(1,13):
if df.index.iloc[i].month == m:
df['Month'+str(m)][i] = 1
i+=1
else:
i+=1
Any better suggestions?
python pandas
add a comment |
up vote
0
down vote
favorite
I am having trouble writing efficient code (without many loops) that assings a value to a cell in a pandas dataframe that is being updated every minute or so (live stream). In the training set I trained my model with one-hot encoded timestamp variables and it did better than continues variables, so that's what I want to use for production. The dataframe looks like this:
datetime DOW_1 DOW_2 ... DOW_7 Month1 Month2 Month3
`2018-07-01 09:30:00` 0 1 0 0 0 1
As you can see the columns are encoded with 0's and 1's to denote what month, day of week, (and I have more columns for day of year, is_holiday, etc...) I easily did this on training, validation, and test data using pd.get_dummies, but now that a live stream of data is coming in I cannot find an easy way to 'assign' month2 = 0 based on df.index.month
I tried doing something along the lines of this type of loop but it's quite tedious and slow..
i=0
while i < len(df):
for m in range(1,13):
if df.index.iloc[i].month == m:
df['Month'+str(m)][i] = 1
i+=1
else:
i+=1
Any better suggestions?
python pandas
Is it possible that the linedf['Month'+str(m)][i] = i
should assing1
instead ofi
?
– Julian Peller
Nov 20 at 2:05
Oops sorry typo.. Yep!
– Matt Elgazar
Nov 20 at 2:58
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I am having trouble writing efficient code (without many loops) that assings a value to a cell in a pandas dataframe that is being updated every minute or so (live stream). In the training set I trained my model with one-hot encoded timestamp variables and it did better than continues variables, so that's what I want to use for production. The dataframe looks like this:
datetime DOW_1 DOW_2 ... DOW_7 Month1 Month2 Month3
`2018-07-01 09:30:00` 0 1 0 0 0 1
As you can see the columns are encoded with 0's and 1's to denote what month, day of week, (and I have more columns for day of year, is_holiday, etc...) I easily did this on training, validation, and test data using pd.get_dummies, but now that a live stream of data is coming in I cannot find an easy way to 'assign' month2 = 0 based on df.index.month
I tried doing something along the lines of this type of loop but it's quite tedious and slow..
i=0
while i < len(df):
for m in range(1,13):
if df.index.iloc[i].month == m:
df['Month'+str(m)][i] = 1
i+=1
else:
i+=1
Any better suggestions?
python pandas
I am having trouble writing efficient code (without many loops) that assings a value to a cell in a pandas dataframe that is being updated every minute or so (live stream). In the training set I trained my model with one-hot encoded timestamp variables and it did better than continues variables, so that's what I want to use for production. The dataframe looks like this:
datetime DOW_1 DOW_2 ... DOW_7 Month1 Month2 Month3
`2018-07-01 09:30:00` 0 1 0 0 0 1
As you can see the columns are encoded with 0's and 1's to denote what month, day of week, (and I have more columns for day of year, is_holiday, etc...) I easily did this on training, validation, and test data using pd.get_dummies, but now that a live stream of data is coming in I cannot find an easy way to 'assign' month2 = 0 based on df.index.month
I tried doing something along the lines of this type of loop but it's quite tedious and slow..
i=0
while i < len(df):
for m in range(1,13):
if df.index.iloc[i].month == m:
df['Month'+str(m)][i] = 1
i+=1
else:
i+=1
Any better suggestions?
python pandas
python pandas
edited Nov 20 at 2:59
asked Nov 20 at 1:46
Matt Elgazar
449
449
Is it possible that the linedf['Month'+str(m)][i] = i
should assing1
instead ofi
?
– Julian Peller
Nov 20 at 2:05
Oops sorry typo.. Yep!
– Matt Elgazar
Nov 20 at 2:58
add a comment |
Is it possible that the linedf['Month'+str(m)][i] = i
should assing1
instead ofi
?
– Julian Peller
Nov 20 at 2:05
Oops sorry typo.. Yep!
– Matt Elgazar
Nov 20 at 2:58
Is it possible that the line
df['Month'+str(m)][i] = i
should assing 1
instead of i
?– Julian Peller
Nov 20 at 2:05
Is it possible that the line
df['Month'+str(m)][i] = i
should assing 1
instead of i
?– Julian Peller
Nov 20 at 2:05
Oops sorry typo.. Yep!
– Matt Elgazar
Nov 20 at 2:58
Oops sorry typo.. Yep!
– Matt Elgazar
Nov 20 at 2:58
add a comment |
1 Answer
1
active
oldest
votes
up vote
1
down vote
accepted
I'm still thinking about a solution that removes even the for, but you can at least avoid the external while over len(df)
using .loc:
for m in range(1, 13):
df.loc[df.index.month == m, 'Month'+str(m)] = 1
This is great - pretty much exactly what I was looking for. I don't know if there's a way to do it without any loops since some of the columns don't exist and they have to be created. The only last issue is it leaves the other columns with NaN values (i.e. month_1 has all 1's but month 2-12 has NaN's). Nice work, I'll mark it as correct!
– Matt Elgazar
Nov 20 at 7:57
Thanks. Regarding the NaN, you are right. I can think of 2 options to handle that: casting all NaN to zero, if possible:df = df.fillna(0)
or adding a second line to the for filling the zeros:df.loc[df.index.month != m, 'Month'+str(m)] = 0
(note the!=
instead of==
). I couldn't figure out a completely for-free solution... I don't think it exists.
– Julian Peller
Nov 20 at 14:43
Nice! yeah the extra line makes more sense since I'm looping over a lot of columns
– Matt Elgazar
Nov 20 at 15:37
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53385093%2fpandas-assign-a-value-to-new-row-based-on-index-on-incoming-live-data%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
accepted
I'm still thinking about a solution that removes even the for, but you can at least avoid the external while over len(df)
using .loc:
for m in range(1, 13):
df.loc[df.index.month == m, 'Month'+str(m)] = 1
This is great - pretty much exactly what I was looking for. I don't know if there's a way to do it without any loops since some of the columns don't exist and they have to be created. The only last issue is it leaves the other columns with NaN values (i.e. month_1 has all 1's but month 2-12 has NaN's). Nice work, I'll mark it as correct!
– Matt Elgazar
Nov 20 at 7:57
Thanks. Regarding the NaN, you are right. I can think of 2 options to handle that: casting all NaN to zero, if possible:df = df.fillna(0)
or adding a second line to the for filling the zeros:df.loc[df.index.month != m, 'Month'+str(m)] = 0
(note the!=
instead of==
). I couldn't figure out a completely for-free solution... I don't think it exists.
– Julian Peller
Nov 20 at 14:43
Nice! yeah the extra line makes more sense since I'm looping over a lot of columns
– Matt Elgazar
Nov 20 at 15:37
add a comment |
up vote
1
down vote
accepted
I'm still thinking about a solution that removes even the for, but you can at least avoid the external while over len(df)
using .loc:
for m in range(1, 13):
df.loc[df.index.month == m, 'Month'+str(m)] = 1
This is great - pretty much exactly what I was looking for. I don't know if there's a way to do it without any loops since some of the columns don't exist and they have to be created. The only last issue is it leaves the other columns with NaN values (i.e. month_1 has all 1's but month 2-12 has NaN's). Nice work, I'll mark it as correct!
– Matt Elgazar
Nov 20 at 7:57
Thanks. Regarding the NaN, you are right. I can think of 2 options to handle that: casting all NaN to zero, if possible:df = df.fillna(0)
or adding a second line to the for filling the zeros:df.loc[df.index.month != m, 'Month'+str(m)] = 0
(note the!=
instead of==
). I couldn't figure out a completely for-free solution... I don't think it exists.
– Julian Peller
Nov 20 at 14:43
Nice! yeah the extra line makes more sense since I'm looping over a lot of columns
– Matt Elgazar
Nov 20 at 15:37
add a comment |
up vote
1
down vote
accepted
up vote
1
down vote
accepted
I'm still thinking about a solution that removes even the for, but you can at least avoid the external while over len(df)
using .loc:
for m in range(1, 13):
df.loc[df.index.month == m, 'Month'+str(m)] = 1
I'm still thinking about a solution that removes even the for, but you can at least avoid the external while over len(df)
using .loc:
for m in range(1, 13):
df.loc[df.index.month == m, 'Month'+str(m)] = 1
edited Nov 20 at 2:17
answered Nov 20 at 2:08
Julian Peller
849511
849511
This is great - pretty much exactly what I was looking for. I don't know if there's a way to do it without any loops since some of the columns don't exist and they have to be created. The only last issue is it leaves the other columns with NaN values (i.e. month_1 has all 1's but month 2-12 has NaN's). Nice work, I'll mark it as correct!
– Matt Elgazar
Nov 20 at 7:57
Thanks. Regarding the NaN, you are right. I can think of 2 options to handle that: casting all NaN to zero, if possible:df = df.fillna(0)
or adding a second line to the for filling the zeros:df.loc[df.index.month != m, 'Month'+str(m)] = 0
(note the!=
instead of==
). I couldn't figure out a completely for-free solution... I don't think it exists.
– Julian Peller
Nov 20 at 14:43
Nice! yeah the extra line makes more sense since I'm looping over a lot of columns
– Matt Elgazar
Nov 20 at 15:37
add a comment |
This is great - pretty much exactly what I was looking for. I don't know if there's a way to do it without any loops since some of the columns don't exist and they have to be created. The only last issue is it leaves the other columns with NaN values (i.e. month_1 has all 1's but month 2-12 has NaN's). Nice work, I'll mark it as correct!
– Matt Elgazar
Nov 20 at 7:57
Thanks. Regarding the NaN, you are right. I can think of 2 options to handle that: casting all NaN to zero, if possible:df = df.fillna(0)
or adding a second line to the for filling the zeros:df.loc[df.index.month != m, 'Month'+str(m)] = 0
(note the!=
instead of==
). I couldn't figure out a completely for-free solution... I don't think it exists.
– Julian Peller
Nov 20 at 14:43
Nice! yeah the extra line makes more sense since I'm looping over a lot of columns
– Matt Elgazar
Nov 20 at 15:37
This is great - pretty much exactly what I was looking for. I don't know if there's a way to do it without any loops since some of the columns don't exist and they have to be created. The only last issue is it leaves the other columns with NaN values (i.e. month_1 has all 1's but month 2-12 has NaN's). Nice work, I'll mark it as correct!
– Matt Elgazar
Nov 20 at 7:57
This is great - pretty much exactly what I was looking for. I don't know if there's a way to do it without any loops since some of the columns don't exist and they have to be created. The only last issue is it leaves the other columns with NaN values (i.e. month_1 has all 1's but month 2-12 has NaN's). Nice work, I'll mark it as correct!
– Matt Elgazar
Nov 20 at 7:57
Thanks. Regarding the NaN, you are right. I can think of 2 options to handle that: casting all NaN to zero, if possible:
df = df.fillna(0)
or adding a second line to the for filling the zeros: df.loc[df.index.month != m, 'Month'+str(m)] = 0
(note the !=
instead of ==
). I couldn't figure out a completely for-free solution... I don't think it exists.– Julian Peller
Nov 20 at 14:43
Thanks. Regarding the NaN, you are right. I can think of 2 options to handle that: casting all NaN to zero, if possible:
df = df.fillna(0)
or adding a second line to the for filling the zeros: df.loc[df.index.month != m, 'Month'+str(m)] = 0
(note the !=
instead of ==
). I couldn't figure out a completely for-free solution... I don't think it exists.– Julian Peller
Nov 20 at 14:43
Nice! yeah the extra line makes more sense since I'm looping over a lot of columns
– Matt Elgazar
Nov 20 at 15:37
Nice! yeah the extra line makes more sense since I'm looping over a lot of columns
– Matt Elgazar
Nov 20 at 15:37
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53385093%2fpandas-assign-a-value-to-new-row-based-on-index-on-incoming-live-data%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Is it possible that the line
df['Month'+str(m)][i] = i
should assing1
instead ofi
?– Julian Peller
Nov 20 at 2:05
Oops sorry typo.. Yep!
– Matt Elgazar
Nov 20 at 2:58