pandas get dummies for column with list












0















Input:-



empNo         name    

1234 [ AB, DE ]
5678 [ FG, IJ ]


Command:-



dataFrame = dataFrame.join(dataFrame.name.str.join('|').str.get_dummies().add_prefix('dummy_name_'))


The above command brings dummy "for each character of the column name"



Output:-



empNo         name              dummy_name_A        dummy_name_B     dummy_name_D     dummy_name_E  dummy_name_F    dummy_name_G    dummy_name_I    dummy_name_J

1234 [ AB, DE ] 1 1 1 1 0 0 0 0
5678 [ FG, IJ ] 0 0 0 0 1 1 1 1


Expected:-



empNo         name              dummy_name_AB       dummy_name_DE    dummy_name_FG  dummy_name_IJ   

1234 [ AB, DE ] 1 1 0 0
5678 [ FG, IJ ] 0 0 1 1









share|improve this question




















  • 4





    Strange. Can you share your dataframe like this: print(dataFrame.to_dict()) and post result here.

    – Anton vBR
    Nov 25 '18 at 19:03






  • 4





    If it is huge then use: print(dataFrame.head(2).to_dict()) to limit it to two rows.

    – Anton vBR
    Nov 25 '18 at 19:35






  • 1





    I got the issue Anton. Had accidentally converted the datatype of the column to [ .astype(str) ]. I removed it and my earlier command works fine as you had correctly pointed out. Thanks for the help.

    – Jenny
    Nov 25 '18 at 19:46
















0















Input:-



empNo         name    

1234 [ AB, DE ]
5678 [ FG, IJ ]


Command:-



dataFrame = dataFrame.join(dataFrame.name.str.join('|').str.get_dummies().add_prefix('dummy_name_'))


The above command brings dummy "for each character of the column name"



Output:-



empNo         name              dummy_name_A        dummy_name_B     dummy_name_D     dummy_name_E  dummy_name_F    dummy_name_G    dummy_name_I    dummy_name_J

1234 [ AB, DE ] 1 1 1 1 0 0 0 0
5678 [ FG, IJ ] 0 0 0 0 1 1 1 1


Expected:-



empNo         name              dummy_name_AB       dummy_name_DE    dummy_name_FG  dummy_name_IJ   

1234 [ AB, DE ] 1 1 0 0
5678 [ FG, IJ ] 0 0 1 1









share|improve this question




















  • 4





    Strange. Can you share your dataframe like this: print(dataFrame.to_dict()) and post result here.

    – Anton vBR
    Nov 25 '18 at 19:03






  • 4





    If it is huge then use: print(dataFrame.head(2).to_dict()) to limit it to two rows.

    – Anton vBR
    Nov 25 '18 at 19:35






  • 1





    I got the issue Anton. Had accidentally converted the datatype of the column to [ .astype(str) ]. I removed it and my earlier command works fine as you had correctly pointed out. Thanks for the help.

    – Jenny
    Nov 25 '18 at 19:46














0












0








0








Input:-



empNo         name    

1234 [ AB, DE ]
5678 [ FG, IJ ]


Command:-



dataFrame = dataFrame.join(dataFrame.name.str.join('|').str.get_dummies().add_prefix('dummy_name_'))


The above command brings dummy "for each character of the column name"



Output:-



empNo         name              dummy_name_A        dummy_name_B     dummy_name_D     dummy_name_E  dummy_name_F    dummy_name_G    dummy_name_I    dummy_name_J

1234 [ AB, DE ] 1 1 1 1 0 0 0 0
5678 [ FG, IJ ] 0 0 0 0 1 1 1 1


Expected:-



empNo         name              dummy_name_AB       dummy_name_DE    dummy_name_FG  dummy_name_IJ   

1234 [ AB, DE ] 1 1 0 0
5678 [ FG, IJ ] 0 0 1 1









share|improve this question
















Input:-



empNo         name    

1234 [ AB, DE ]
5678 [ FG, IJ ]


Command:-



dataFrame = dataFrame.join(dataFrame.name.str.join('|').str.get_dummies().add_prefix('dummy_name_'))


The above command brings dummy "for each character of the column name"



Output:-



empNo         name              dummy_name_A        dummy_name_B     dummy_name_D     dummy_name_E  dummy_name_F    dummy_name_G    dummy_name_I    dummy_name_J

1234 [ AB, DE ] 1 1 1 1 0 0 0 0
5678 [ FG, IJ ] 0 0 0 0 1 1 1 1


Expected:-



empNo         name              dummy_name_AB       dummy_name_DE    dummy_name_FG  dummy_name_IJ   

1234 [ AB, DE ] 1 1 0 0
5678 [ FG, IJ ] 0 0 1 1






python pandas dataframe






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 25 '18 at 19:00









Mayank Porwal

4,9802725




4,9802725










asked Nov 25 '18 at 18:58









JennyJenny

166




166








  • 4





    Strange. Can you share your dataframe like this: print(dataFrame.to_dict()) and post result here.

    – Anton vBR
    Nov 25 '18 at 19:03






  • 4





    If it is huge then use: print(dataFrame.head(2).to_dict()) to limit it to two rows.

    – Anton vBR
    Nov 25 '18 at 19:35






  • 1





    I got the issue Anton. Had accidentally converted the datatype of the column to [ .astype(str) ]. I removed it and my earlier command works fine as you had correctly pointed out. Thanks for the help.

    – Jenny
    Nov 25 '18 at 19:46














  • 4





    Strange. Can you share your dataframe like this: print(dataFrame.to_dict()) and post result here.

    – Anton vBR
    Nov 25 '18 at 19:03






  • 4





    If it is huge then use: print(dataFrame.head(2).to_dict()) to limit it to two rows.

    – Anton vBR
    Nov 25 '18 at 19:35






  • 1





    I got the issue Anton. Had accidentally converted the datatype of the column to [ .astype(str) ]. I removed it and my earlier command works fine as you had correctly pointed out. Thanks for the help.

    – Jenny
    Nov 25 '18 at 19:46








4




4





Strange. Can you share your dataframe like this: print(dataFrame.to_dict()) and post result here.

– Anton vBR
Nov 25 '18 at 19:03





Strange. Can you share your dataframe like this: print(dataFrame.to_dict()) and post result here.

– Anton vBR
Nov 25 '18 at 19:03




4




4





If it is huge then use: print(dataFrame.head(2).to_dict()) to limit it to two rows.

– Anton vBR
Nov 25 '18 at 19:35





If it is huge then use: print(dataFrame.head(2).to_dict()) to limit it to two rows.

– Anton vBR
Nov 25 '18 at 19:35




1




1





I got the issue Anton. Had accidentally converted the datatype of the column to [ .astype(str) ]. I removed it and my earlier command works fine as you had correctly pointed out. Thanks for the help.

– Jenny
Nov 25 '18 at 19:46





I got the issue Anton. Had accidentally converted the datatype of the column to [ .astype(str) ]. I removed it and my earlier command works fine as you had correctly pointed out. Thanks for the help.

– Jenny
Nov 25 '18 at 19:46












1 Answer
1






active

oldest

votes


















1














I think the list is not the list , so we using ast to convert the string type column back to list



import ast

df.name=df.name.apply(ast.literal_eval)


Then using str get_dummies



s=df.name.apply(pd.Series).stack().str.get_dummies().sum(level=0).add_prefix('dummy_name_')
s
dummy_name_AB dummy_name_DE dummy_name_FG dummy_name_IJ
0 1 1 0 0
1 0 0 1 1


Then



pd.concat([df[['empNo']],s],axis=1)


The data input



df.to_dict()
{'empNo': {0: 1234, 1: 5678}, 'name': {0: ['AB', 'DE'], 1: ['FG', 'IJ']}}





share|improve this answer


























  • Thanks W-B. This command brings the output as dummy_name_['AB'] , dummy_name_['DE'] , dummy_name_[ 'AB','DE' ], dummy_name_['FG'] , dummy_name_['IJ'] , dummy_name_[ 'FG','IJ' ], and one more additional column for an empty list dummy_name_ Can you please help achieving the expected result given above ?

    – Jenny
    Nov 25 '18 at 19:30











  • @Jenny please check above data input to see the different between mine and yours

    – Wen-Ben
    Nov 25 '18 at 19:32











  • I got the issue W-B. Had accidentally converted the datatype of the column to [ .astype(str) ]. I removed it and my earlier command works fine as you had correctly pointed out. Thanks for the help.

    – Jenny
    Nov 25 '18 at 19:53











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53470831%2fpandas-get-dummies-for-column-with-list%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









1














I think the list is not the list , so we using ast to convert the string type column back to list



import ast

df.name=df.name.apply(ast.literal_eval)


Then using str get_dummies



s=df.name.apply(pd.Series).stack().str.get_dummies().sum(level=0).add_prefix('dummy_name_')
s
dummy_name_AB dummy_name_DE dummy_name_FG dummy_name_IJ
0 1 1 0 0
1 0 0 1 1


Then



pd.concat([df[['empNo']],s],axis=1)


The data input



df.to_dict()
{'empNo': {0: 1234, 1: 5678}, 'name': {0: ['AB', 'DE'], 1: ['FG', 'IJ']}}





share|improve this answer


























  • Thanks W-B. This command brings the output as dummy_name_['AB'] , dummy_name_['DE'] , dummy_name_[ 'AB','DE' ], dummy_name_['FG'] , dummy_name_['IJ'] , dummy_name_[ 'FG','IJ' ], and one more additional column for an empty list dummy_name_ Can you please help achieving the expected result given above ?

    – Jenny
    Nov 25 '18 at 19:30











  • @Jenny please check above data input to see the different between mine and yours

    – Wen-Ben
    Nov 25 '18 at 19:32











  • I got the issue W-B. Had accidentally converted the datatype of the column to [ .astype(str) ]. I removed it and my earlier command works fine as you had correctly pointed out. Thanks for the help.

    – Jenny
    Nov 25 '18 at 19:53
















1














I think the list is not the list , so we using ast to convert the string type column back to list



import ast

df.name=df.name.apply(ast.literal_eval)


Then using str get_dummies



s=df.name.apply(pd.Series).stack().str.get_dummies().sum(level=0).add_prefix('dummy_name_')
s
dummy_name_AB dummy_name_DE dummy_name_FG dummy_name_IJ
0 1 1 0 0
1 0 0 1 1


Then



pd.concat([df[['empNo']],s],axis=1)


The data input



df.to_dict()
{'empNo': {0: 1234, 1: 5678}, 'name': {0: ['AB', 'DE'], 1: ['FG', 'IJ']}}





share|improve this answer


























  • Thanks W-B. This command brings the output as dummy_name_['AB'] , dummy_name_['DE'] , dummy_name_[ 'AB','DE' ], dummy_name_['FG'] , dummy_name_['IJ'] , dummy_name_[ 'FG','IJ' ], and one more additional column for an empty list dummy_name_ Can you please help achieving the expected result given above ?

    – Jenny
    Nov 25 '18 at 19:30











  • @Jenny please check above data input to see the different between mine and yours

    – Wen-Ben
    Nov 25 '18 at 19:32











  • I got the issue W-B. Had accidentally converted the datatype of the column to [ .astype(str) ]. I removed it and my earlier command works fine as you had correctly pointed out. Thanks for the help.

    – Jenny
    Nov 25 '18 at 19:53














1












1








1







I think the list is not the list , so we using ast to convert the string type column back to list



import ast

df.name=df.name.apply(ast.literal_eval)


Then using str get_dummies



s=df.name.apply(pd.Series).stack().str.get_dummies().sum(level=0).add_prefix('dummy_name_')
s
dummy_name_AB dummy_name_DE dummy_name_FG dummy_name_IJ
0 1 1 0 0
1 0 0 1 1


Then



pd.concat([df[['empNo']],s],axis=1)


The data input



df.to_dict()
{'empNo': {0: 1234, 1: 5678}, 'name': {0: ['AB', 'DE'], 1: ['FG', 'IJ']}}





share|improve this answer















I think the list is not the list , so we using ast to convert the string type column back to list



import ast

df.name=df.name.apply(ast.literal_eval)


Then using str get_dummies



s=df.name.apply(pd.Series).stack().str.get_dummies().sum(level=0).add_prefix('dummy_name_')
s
dummy_name_AB dummy_name_DE dummy_name_FG dummy_name_IJ
0 1 1 0 0
1 0 0 1 1


Then



pd.concat([df[['empNo']],s],axis=1)


The data input



df.to_dict()
{'empNo': {0: 1234, 1: 5678}, 'name': {0: ['AB', 'DE'], 1: ['FG', 'IJ']}}






share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 25 '18 at 19:32

























answered Nov 25 '18 at 19:23









Wen-BenWen-Ben

118k83469




118k83469













  • Thanks W-B. This command brings the output as dummy_name_['AB'] , dummy_name_['DE'] , dummy_name_[ 'AB','DE' ], dummy_name_['FG'] , dummy_name_['IJ'] , dummy_name_[ 'FG','IJ' ], and one more additional column for an empty list dummy_name_ Can you please help achieving the expected result given above ?

    – Jenny
    Nov 25 '18 at 19:30











  • @Jenny please check above data input to see the different between mine and yours

    – Wen-Ben
    Nov 25 '18 at 19:32











  • I got the issue W-B. Had accidentally converted the datatype of the column to [ .astype(str) ]. I removed it and my earlier command works fine as you had correctly pointed out. Thanks for the help.

    – Jenny
    Nov 25 '18 at 19:53



















  • Thanks W-B. This command brings the output as dummy_name_['AB'] , dummy_name_['DE'] , dummy_name_[ 'AB','DE' ], dummy_name_['FG'] , dummy_name_['IJ'] , dummy_name_[ 'FG','IJ' ], and one more additional column for an empty list dummy_name_ Can you please help achieving the expected result given above ?

    – Jenny
    Nov 25 '18 at 19:30











  • @Jenny please check above data input to see the different between mine and yours

    – Wen-Ben
    Nov 25 '18 at 19:32











  • I got the issue W-B. Had accidentally converted the datatype of the column to [ .astype(str) ]. I removed it and my earlier command works fine as you had correctly pointed out. Thanks for the help.

    – Jenny
    Nov 25 '18 at 19:53

















Thanks W-B. This command brings the output as dummy_name_['AB'] , dummy_name_['DE'] , dummy_name_[ 'AB','DE' ], dummy_name_['FG'] , dummy_name_['IJ'] , dummy_name_[ 'FG','IJ' ], and one more additional column for an empty list dummy_name_ Can you please help achieving the expected result given above ?

– Jenny
Nov 25 '18 at 19:30





Thanks W-B. This command brings the output as dummy_name_['AB'] , dummy_name_['DE'] , dummy_name_[ 'AB','DE' ], dummy_name_['FG'] , dummy_name_['IJ'] , dummy_name_[ 'FG','IJ' ], and one more additional column for an empty list dummy_name_ Can you please help achieving the expected result given above ?

– Jenny
Nov 25 '18 at 19:30













@Jenny please check above data input to see the different between mine and yours

– Wen-Ben
Nov 25 '18 at 19:32





@Jenny please check above data input to see the different between mine and yours

– Wen-Ben
Nov 25 '18 at 19:32













I got the issue W-B. Had accidentally converted the datatype of the column to [ .astype(str) ]. I removed it and my earlier command works fine as you had correctly pointed out. Thanks for the help.

– Jenny
Nov 25 '18 at 19:53





I got the issue W-B. Had accidentally converted the datatype of the column to [ .astype(str) ]. I removed it and my earlier command works fine as you had correctly pointed out. Thanks for the help.

– Jenny
Nov 25 '18 at 19:53




















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53470831%2fpandas-get-dummies-for-column-with-list%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Ottavio Pratesi

Tricia Helfer

15 giugno