Pandas assign a value to new row based on index on incoming live data











up vote
0
down vote

favorite












I am having trouble writing efficient code (without many loops) that assings a value to a cell in a pandas dataframe that is being updated every minute or so (live stream). In the training set I trained my model with one-hot encoded timestamp variables and it did better than continues variables, so that's what I want to use for production. The dataframe looks like this:



datetime              DOW_1     DOW_2    ... DOW_7    Month1   Month2   Month3 
`2018-07-01 09:30:00` 0 1 0 0 0 1


As you can see the columns are encoded with 0's and 1's to denote what month, day of week, (and I have more columns for day of year, is_holiday, etc...) I easily did this on training, validation, and test data using pd.get_dummies, but now that a live stream of data is coming in I cannot find an easy way to 'assign' month2 = 0 based on df.index.month



I tried doing something along the lines of this type of loop but it's quite tedious and slow..



i=0
while i < len(df):
for m in range(1,13):
if df.index.iloc[i].month == m:
df['Month'+str(m)][i] = 1
i+=1
else:
i+=1


Any better suggestions?










share|improve this question
























  • Is it possible that the line df['Month'+str(m)][i] = i should assing 1 instead of i?
    – Julian Peller
    Nov 20 at 2:05










  • Oops sorry typo.. Yep!
    – Matt Elgazar
    Nov 20 at 2:58















up vote
0
down vote

favorite












I am having trouble writing efficient code (without many loops) that assings a value to a cell in a pandas dataframe that is being updated every minute or so (live stream). In the training set I trained my model with one-hot encoded timestamp variables and it did better than continues variables, so that's what I want to use for production. The dataframe looks like this:



datetime              DOW_1     DOW_2    ... DOW_7    Month1   Month2   Month3 
`2018-07-01 09:30:00` 0 1 0 0 0 1


As you can see the columns are encoded with 0's and 1's to denote what month, day of week, (and I have more columns for day of year, is_holiday, etc...) I easily did this on training, validation, and test data using pd.get_dummies, but now that a live stream of data is coming in I cannot find an easy way to 'assign' month2 = 0 based on df.index.month



I tried doing something along the lines of this type of loop but it's quite tedious and slow..



i=0
while i < len(df):
for m in range(1,13):
if df.index.iloc[i].month == m:
df['Month'+str(m)][i] = 1
i+=1
else:
i+=1


Any better suggestions?










share|improve this question
























  • Is it possible that the line df['Month'+str(m)][i] = i should assing 1 instead of i?
    – Julian Peller
    Nov 20 at 2:05










  • Oops sorry typo.. Yep!
    – Matt Elgazar
    Nov 20 at 2:58













up vote
0
down vote

favorite









up vote
0
down vote

favorite











I am having trouble writing efficient code (without many loops) that assings a value to a cell in a pandas dataframe that is being updated every minute or so (live stream). In the training set I trained my model with one-hot encoded timestamp variables and it did better than continues variables, so that's what I want to use for production. The dataframe looks like this:



datetime              DOW_1     DOW_2    ... DOW_7    Month1   Month2   Month3 
`2018-07-01 09:30:00` 0 1 0 0 0 1


As you can see the columns are encoded with 0's and 1's to denote what month, day of week, (and I have more columns for day of year, is_holiday, etc...) I easily did this on training, validation, and test data using pd.get_dummies, but now that a live stream of data is coming in I cannot find an easy way to 'assign' month2 = 0 based on df.index.month



I tried doing something along the lines of this type of loop but it's quite tedious and slow..



i=0
while i < len(df):
for m in range(1,13):
if df.index.iloc[i].month == m:
df['Month'+str(m)][i] = 1
i+=1
else:
i+=1


Any better suggestions?










share|improve this question















I am having trouble writing efficient code (without many loops) that assings a value to a cell in a pandas dataframe that is being updated every minute or so (live stream). In the training set I trained my model with one-hot encoded timestamp variables and it did better than continues variables, so that's what I want to use for production. The dataframe looks like this:



datetime              DOW_1     DOW_2    ... DOW_7    Month1   Month2   Month3 
`2018-07-01 09:30:00` 0 1 0 0 0 1


As you can see the columns are encoded with 0's and 1's to denote what month, day of week, (and I have more columns for day of year, is_holiday, etc...) I easily did this on training, validation, and test data using pd.get_dummies, but now that a live stream of data is coming in I cannot find an easy way to 'assign' month2 = 0 based on df.index.month



I tried doing something along the lines of this type of loop but it's quite tedious and slow..



i=0
while i < len(df):
for m in range(1,13):
if df.index.iloc[i].month == m:
df['Month'+str(m)][i] = 1
i+=1
else:
i+=1


Any better suggestions?







python pandas






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 20 at 2:59

























asked Nov 20 at 1:46









Matt Elgazar

449




449












  • Is it possible that the line df['Month'+str(m)][i] = i should assing 1 instead of i?
    – Julian Peller
    Nov 20 at 2:05










  • Oops sorry typo.. Yep!
    – Matt Elgazar
    Nov 20 at 2:58


















  • Is it possible that the line df['Month'+str(m)][i] = i should assing 1 instead of i?
    – Julian Peller
    Nov 20 at 2:05










  • Oops sorry typo.. Yep!
    – Matt Elgazar
    Nov 20 at 2:58
















Is it possible that the line df['Month'+str(m)][i] = i should assing 1 instead of i?
– Julian Peller
Nov 20 at 2:05




Is it possible that the line df['Month'+str(m)][i] = i should assing 1 instead of i?
– Julian Peller
Nov 20 at 2:05












Oops sorry typo.. Yep!
– Matt Elgazar
Nov 20 at 2:58




Oops sorry typo.. Yep!
– Matt Elgazar
Nov 20 at 2:58












1 Answer
1






active

oldest

votes

















up vote
1
down vote



accepted










I'm still thinking about a solution that removes even the for, but you can at least avoid the external while over len(df) using .loc:



for m in range(1, 13):
df.loc[df.index.month == m, 'Month'+str(m)] = 1





share|improve this answer























  • This is great - pretty much exactly what I was looking for. I don't know if there's a way to do it without any loops since some of the columns don't exist and they have to be created. The only last issue is it leaves the other columns with NaN values (i.e. month_1 has all 1's but month 2-12 has NaN's). Nice work, I'll mark it as correct!
    – Matt Elgazar
    Nov 20 at 7:57












  • Thanks. Regarding the NaN, you are right. I can think of 2 options to handle that: casting all NaN to zero, if possible: df = df.fillna(0) or adding a second line to the for filling the zeros: df.loc[df.index.month != m, 'Month'+str(m)] = 0 (note the != instead of ==). I couldn't figure out a completely for-free solution... I don't think it exists.
    – Julian Peller
    Nov 20 at 14:43












  • Nice! yeah the extra line makes more sense since I'm looping over a lot of columns
    – Matt Elgazar
    Nov 20 at 15:37











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53385093%2fpandas-assign-a-value-to-new-row-based-on-index-on-incoming-live-data%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
1
down vote



accepted










I'm still thinking about a solution that removes even the for, but you can at least avoid the external while over len(df) using .loc:



for m in range(1, 13):
df.loc[df.index.month == m, 'Month'+str(m)] = 1





share|improve this answer























  • This is great - pretty much exactly what I was looking for. I don't know if there's a way to do it without any loops since some of the columns don't exist and they have to be created. The only last issue is it leaves the other columns with NaN values (i.e. month_1 has all 1's but month 2-12 has NaN's). Nice work, I'll mark it as correct!
    – Matt Elgazar
    Nov 20 at 7:57












  • Thanks. Regarding the NaN, you are right. I can think of 2 options to handle that: casting all NaN to zero, if possible: df = df.fillna(0) or adding a second line to the for filling the zeros: df.loc[df.index.month != m, 'Month'+str(m)] = 0 (note the != instead of ==). I couldn't figure out a completely for-free solution... I don't think it exists.
    – Julian Peller
    Nov 20 at 14:43












  • Nice! yeah the extra line makes more sense since I'm looping over a lot of columns
    – Matt Elgazar
    Nov 20 at 15:37















up vote
1
down vote



accepted










I'm still thinking about a solution that removes even the for, but you can at least avoid the external while over len(df) using .loc:



for m in range(1, 13):
df.loc[df.index.month == m, 'Month'+str(m)] = 1





share|improve this answer























  • This is great - pretty much exactly what I was looking for. I don't know if there's a way to do it without any loops since some of the columns don't exist and they have to be created. The only last issue is it leaves the other columns with NaN values (i.e. month_1 has all 1's but month 2-12 has NaN's). Nice work, I'll mark it as correct!
    – Matt Elgazar
    Nov 20 at 7:57












  • Thanks. Regarding the NaN, you are right. I can think of 2 options to handle that: casting all NaN to zero, if possible: df = df.fillna(0) or adding a second line to the for filling the zeros: df.loc[df.index.month != m, 'Month'+str(m)] = 0 (note the != instead of ==). I couldn't figure out a completely for-free solution... I don't think it exists.
    – Julian Peller
    Nov 20 at 14:43












  • Nice! yeah the extra line makes more sense since I'm looping over a lot of columns
    – Matt Elgazar
    Nov 20 at 15:37













up vote
1
down vote



accepted







up vote
1
down vote



accepted






I'm still thinking about a solution that removes even the for, but you can at least avoid the external while over len(df) using .loc:



for m in range(1, 13):
df.loc[df.index.month == m, 'Month'+str(m)] = 1





share|improve this answer














I'm still thinking about a solution that removes even the for, but you can at least avoid the external while over len(df) using .loc:



for m in range(1, 13):
df.loc[df.index.month == m, 'Month'+str(m)] = 1






share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 20 at 2:17

























answered Nov 20 at 2:08









Julian Peller

849511




849511












  • This is great - pretty much exactly what I was looking for. I don't know if there's a way to do it without any loops since some of the columns don't exist and they have to be created. The only last issue is it leaves the other columns with NaN values (i.e. month_1 has all 1's but month 2-12 has NaN's). Nice work, I'll mark it as correct!
    – Matt Elgazar
    Nov 20 at 7:57












  • Thanks. Regarding the NaN, you are right. I can think of 2 options to handle that: casting all NaN to zero, if possible: df = df.fillna(0) or adding a second line to the for filling the zeros: df.loc[df.index.month != m, 'Month'+str(m)] = 0 (note the != instead of ==). I couldn't figure out a completely for-free solution... I don't think it exists.
    – Julian Peller
    Nov 20 at 14:43












  • Nice! yeah the extra line makes more sense since I'm looping over a lot of columns
    – Matt Elgazar
    Nov 20 at 15:37


















  • This is great - pretty much exactly what I was looking for. I don't know if there's a way to do it without any loops since some of the columns don't exist and they have to be created. The only last issue is it leaves the other columns with NaN values (i.e. month_1 has all 1's but month 2-12 has NaN's). Nice work, I'll mark it as correct!
    – Matt Elgazar
    Nov 20 at 7:57












  • Thanks. Regarding the NaN, you are right. I can think of 2 options to handle that: casting all NaN to zero, if possible: df = df.fillna(0) or adding a second line to the for filling the zeros: df.loc[df.index.month != m, 'Month'+str(m)] = 0 (note the != instead of ==). I couldn't figure out a completely for-free solution... I don't think it exists.
    – Julian Peller
    Nov 20 at 14:43












  • Nice! yeah the extra line makes more sense since I'm looping over a lot of columns
    – Matt Elgazar
    Nov 20 at 15:37
















This is great - pretty much exactly what I was looking for. I don't know if there's a way to do it without any loops since some of the columns don't exist and they have to be created. The only last issue is it leaves the other columns with NaN values (i.e. month_1 has all 1's but month 2-12 has NaN's). Nice work, I'll mark it as correct!
– Matt Elgazar
Nov 20 at 7:57






This is great - pretty much exactly what I was looking for. I don't know if there's a way to do it without any loops since some of the columns don't exist and they have to be created. The only last issue is it leaves the other columns with NaN values (i.e. month_1 has all 1's but month 2-12 has NaN's). Nice work, I'll mark it as correct!
– Matt Elgazar
Nov 20 at 7:57














Thanks. Regarding the NaN, you are right. I can think of 2 options to handle that: casting all NaN to zero, if possible: df = df.fillna(0) or adding a second line to the for filling the zeros: df.loc[df.index.month != m, 'Month'+str(m)] = 0 (note the != instead of ==). I couldn't figure out a completely for-free solution... I don't think it exists.
– Julian Peller
Nov 20 at 14:43






Thanks. Regarding the NaN, you are right. I can think of 2 options to handle that: casting all NaN to zero, if possible: df = df.fillna(0) or adding a second line to the for filling the zeros: df.loc[df.index.month != m, 'Month'+str(m)] = 0 (note the != instead of ==). I couldn't figure out a completely for-free solution... I don't think it exists.
– Julian Peller
Nov 20 at 14:43














Nice! yeah the extra line makes more sense since I'm looping over a lot of columns
– Matt Elgazar
Nov 20 at 15:37




Nice! yeah the extra line makes more sense since I'm looping over a lot of columns
– Matt Elgazar
Nov 20 at 15:37


















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53385093%2fpandas-assign-a-value-to-new-row-based-on-index-on-incoming-live-data%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Costa Masnaga

Fotorealismo

Sidney Franklin