Extract categorical data from dummy features












2















I'm trying to combine 4 categorical columns into 1 with pandas melt, but it creates 3 duplicates of each row (giving me 4x more rows).



dat = pd.DataFrame({'Name':('Tom','Pete','Mark','Steve'),
'Type1':(0,1,0,0),
'Type2':(1,0,0,0),
'Type3':(0,0,0,0),
'Type4':(0,0,0,0)})
dat = pd.melt(dat, id_vars='Name',
value_vars=('Type1', 'Type2', 'Type3', 'Type4'), value_name='type')



Name variable type
0 Tom Type1 0
1 Pete Type1 1
2 Mark Type1 0
3 Steve Type1 0
4 Tom Type2 1
5 Pete Type2 0
6 Mark Type2 0
7 Steve Type2 0
8 Tom Type3 0
9 Pete Type3 0
10 Mark Type3 0
11 Steve Type3 0
12 Tom Type4 0
13 Pete Type4 0
14 Mark Type4 0
15 Steve Type4 0


Another problem I have, which I guess can't be resolved with melt is replacing all of the rows where the value is 0 for all Types with 'None' - but that can probably be done with a simple query - fixing the duplicates is my worry for now. Unless I shouldn't be using melt?



What I'm trying to get is: column with a Type1 or 2 or 3 or 4. So in this case:



    Name    Type
0 Tom Type2
1 Pete Type1
2 Mark Type3
3 Steve Type3


Where last 2 would preferably turn to 'None' as these 2 names don't have a type assigned to them. (Hope I'm not going mad and it makes sense to someone other than just me).










share|improve this question




















  • 1





    You need to provide your expected output.

    – piRSquared
    Nov 21 '18 at 18:54











  • Where are the duplicates?

    – Chris
    Nov 21 '18 at 18:54











  • Sorry, edited the main post

    – Coolkidscandie
    Nov 21 '18 at 18:59











  • @Coolkidscandie can each name have more than one type?

    – Chris
    Nov 21 '18 at 19:04











  • @Chris - No. I've got a dataframe of about 2000 dogs and only 5-10% of them actually have any category. So rather than having 4 columns where most of them are NaNs, I wanted to have 1 with either just the name of the category (if they have any) or NaN if they don't belong to any category.

    – Coolkidscandie
    Nov 21 '18 at 19:06
















2















I'm trying to combine 4 categorical columns into 1 with pandas melt, but it creates 3 duplicates of each row (giving me 4x more rows).



dat = pd.DataFrame({'Name':('Tom','Pete','Mark','Steve'),
'Type1':(0,1,0,0),
'Type2':(1,0,0,0),
'Type3':(0,0,0,0),
'Type4':(0,0,0,0)})
dat = pd.melt(dat, id_vars='Name',
value_vars=('Type1', 'Type2', 'Type3', 'Type4'), value_name='type')



Name variable type
0 Tom Type1 0
1 Pete Type1 1
2 Mark Type1 0
3 Steve Type1 0
4 Tom Type2 1
5 Pete Type2 0
6 Mark Type2 0
7 Steve Type2 0
8 Tom Type3 0
9 Pete Type3 0
10 Mark Type3 0
11 Steve Type3 0
12 Tom Type4 0
13 Pete Type4 0
14 Mark Type4 0
15 Steve Type4 0


Another problem I have, which I guess can't be resolved with melt is replacing all of the rows where the value is 0 for all Types with 'None' - but that can probably be done with a simple query - fixing the duplicates is my worry for now. Unless I shouldn't be using melt?



What I'm trying to get is: column with a Type1 or 2 or 3 or 4. So in this case:



    Name    Type
0 Tom Type2
1 Pete Type1
2 Mark Type3
3 Steve Type3


Where last 2 would preferably turn to 'None' as these 2 names don't have a type assigned to them. (Hope I'm not going mad and it makes sense to someone other than just me).










share|improve this question




















  • 1





    You need to provide your expected output.

    – piRSquared
    Nov 21 '18 at 18:54











  • Where are the duplicates?

    – Chris
    Nov 21 '18 at 18:54











  • Sorry, edited the main post

    – Coolkidscandie
    Nov 21 '18 at 18:59











  • @Coolkidscandie can each name have more than one type?

    – Chris
    Nov 21 '18 at 19:04











  • @Chris - No. I've got a dataframe of about 2000 dogs and only 5-10% of them actually have any category. So rather than having 4 columns where most of them are NaNs, I wanted to have 1 with either just the name of the category (if they have any) or NaN if they don't belong to any category.

    – Coolkidscandie
    Nov 21 '18 at 19:06














2












2








2








I'm trying to combine 4 categorical columns into 1 with pandas melt, but it creates 3 duplicates of each row (giving me 4x more rows).



dat = pd.DataFrame({'Name':('Tom','Pete','Mark','Steve'),
'Type1':(0,1,0,0),
'Type2':(1,0,0,0),
'Type3':(0,0,0,0),
'Type4':(0,0,0,0)})
dat = pd.melt(dat, id_vars='Name',
value_vars=('Type1', 'Type2', 'Type3', 'Type4'), value_name='type')



Name variable type
0 Tom Type1 0
1 Pete Type1 1
2 Mark Type1 0
3 Steve Type1 0
4 Tom Type2 1
5 Pete Type2 0
6 Mark Type2 0
7 Steve Type2 0
8 Tom Type3 0
9 Pete Type3 0
10 Mark Type3 0
11 Steve Type3 0
12 Tom Type4 0
13 Pete Type4 0
14 Mark Type4 0
15 Steve Type4 0


Another problem I have, which I guess can't be resolved with melt is replacing all of the rows where the value is 0 for all Types with 'None' - but that can probably be done with a simple query - fixing the duplicates is my worry for now. Unless I shouldn't be using melt?



What I'm trying to get is: column with a Type1 or 2 or 3 or 4. So in this case:



    Name    Type
0 Tom Type2
1 Pete Type1
2 Mark Type3
3 Steve Type3


Where last 2 would preferably turn to 'None' as these 2 names don't have a type assigned to them. (Hope I'm not going mad and it makes sense to someone other than just me).










share|improve this question
















I'm trying to combine 4 categorical columns into 1 with pandas melt, but it creates 3 duplicates of each row (giving me 4x more rows).



dat = pd.DataFrame({'Name':('Tom','Pete','Mark','Steve'),
'Type1':(0,1,0,0),
'Type2':(1,0,0,0),
'Type3':(0,0,0,0),
'Type4':(0,0,0,0)})
dat = pd.melt(dat, id_vars='Name',
value_vars=('Type1', 'Type2', 'Type3', 'Type4'), value_name='type')



Name variable type
0 Tom Type1 0
1 Pete Type1 1
2 Mark Type1 0
3 Steve Type1 0
4 Tom Type2 1
5 Pete Type2 0
6 Mark Type2 0
7 Steve Type2 0
8 Tom Type3 0
9 Pete Type3 0
10 Mark Type3 0
11 Steve Type3 0
12 Tom Type4 0
13 Pete Type4 0
14 Mark Type4 0
15 Steve Type4 0


Another problem I have, which I guess can't be resolved with melt is replacing all of the rows where the value is 0 for all Types with 'None' - but that can probably be done with a simple query - fixing the duplicates is my worry for now. Unless I shouldn't be using melt?



What I'm trying to get is: column with a Type1 or 2 or 3 or 4. So in this case:



    Name    Type
0 Tom Type2
1 Pete Type1
2 Mark Type3
3 Steve Type3


Where last 2 would preferably turn to 'None' as these 2 names don't have a type assigned to them. (Hope I'm not going mad and it makes sense to someone other than just me).







python-3.x pandas






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 21 '18 at 19:52









yatu

7,9001926




7,9001926










asked Nov 21 '18 at 18:51









CoolkidscandieCoolkidscandie

266




266








  • 1





    You need to provide your expected output.

    – piRSquared
    Nov 21 '18 at 18:54











  • Where are the duplicates?

    – Chris
    Nov 21 '18 at 18:54











  • Sorry, edited the main post

    – Coolkidscandie
    Nov 21 '18 at 18:59











  • @Coolkidscandie can each name have more than one type?

    – Chris
    Nov 21 '18 at 19:04











  • @Chris - No. I've got a dataframe of about 2000 dogs and only 5-10% of them actually have any category. So rather than having 4 columns where most of them are NaNs, I wanted to have 1 with either just the name of the category (if they have any) or NaN if they don't belong to any category.

    – Coolkidscandie
    Nov 21 '18 at 19:06














  • 1





    You need to provide your expected output.

    – piRSquared
    Nov 21 '18 at 18:54











  • Where are the duplicates?

    – Chris
    Nov 21 '18 at 18:54











  • Sorry, edited the main post

    – Coolkidscandie
    Nov 21 '18 at 18:59











  • @Coolkidscandie can each name have more than one type?

    – Chris
    Nov 21 '18 at 19:04











  • @Chris - No. I've got a dataframe of about 2000 dogs and only 5-10% of them actually have any category. So rather than having 4 columns where most of them are NaNs, I wanted to have 1 with either just the name of the category (if they have any) or NaN if they don't belong to any category.

    – Coolkidscandie
    Nov 21 '18 at 19:06








1




1





You need to provide your expected output.

– piRSquared
Nov 21 '18 at 18:54





You need to provide your expected output.

– piRSquared
Nov 21 '18 at 18:54













Where are the duplicates?

– Chris
Nov 21 '18 at 18:54





Where are the duplicates?

– Chris
Nov 21 '18 at 18:54













Sorry, edited the main post

– Coolkidscandie
Nov 21 '18 at 18:59





Sorry, edited the main post

– Coolkidscandie
Nov 21 '18 at 18:59













@Coolkidscandie can each name have more than one type?

– Chris
Nov 21 '18 at 19:04





@Coolkidscandie can each name have more than one type?

– Chris
Nov 21 '18 at 19:04













@Chris - No. I've got a dataframe of about 2000 dogs and only 5-10% of them actually have any category. So rather than having 4 columns where most of them are NaNs, I wanted to have 1 with either just the name of the category (if they have any) or NaN if they don't belong to any category.

– Coolkidscandie
Nov 21 '18 at 19:06





@Chris - No. I've got a dataframe of about 2000 dogs and only 5-10% of them actually have any category. So rather than having 4 columns where most of them are NaNs, I wanted to have 1 with either just the name of the category (if they have any) or NaN if they don't belong to any category.

– Coolkidscandie
Nov 21 '18 at 19:06












1 Answer
1






active

oldest

votes


















1














idxmax



dat = pd.DataFrame({'Name':('Tom','Pete','Mark','Steve'),
'Type1':(0,1,0,0),
'Type2':(1,0,0,0),
'Type3':(0,0,0,0),
'Type4':(0,0,0,0)})

df = dat.loc[:,'Type1':].replace(0, np.nan)
df.idxmax(axis=1).to_frame(name='Type').set_index(dat.Name)

Type
Name
Tom Type2
Pete Type1
Mark NaN
Steve NaN





share|improve this answer


























  • Will this give me a way of replacing any rows where the Name doesn't belong in any category with NaN or 0 or anything like that? If i replace it now, all Type1 will be replaced.

    – Coolkidscandie
    Nov 21 '18 at 19:10











  • It does now with the new edit

    – yatu
    Nov 21 '18 at 19:15











  • @Coolkidscandie please consider marking the answer as correct so I know it help you

    – yatu
    Nov 21 '18 at 20:19






  • 1





    I will once I get to test it - sorry, I'm not able to access my file now. I'll have to go around setting the index, as there are more columns, but I'm sure I'll find something. For now, thank you. When I get to check it, I'll mark it

    – Coolkidscandie
    Nov 21 '18 at 20:35











  • so this code is giving me TypeError: reduction operation 'argmax' not allowed for this dtype

    – Coolkidscandie
    Nov 22 '18 at 16:01











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53418787%2fextract-categorical-data-from-dummy-features%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









1














idxmax



dat = pd.DataFrame({'Name':('Tom','Pete','Mark','Steve'),
'Type1':(0,1,0,0),
'Type2':(1,0,0,0),
'Type3':(0,0,0,0),
'Type4':(0,0,0,0)})

df = dat.loc[:,'Type1':].replace(0, np.nan)
df.idxmax(axis=1).to_frame(name='Type').set_index(dat.Name)

Type
Name
Tom Type2
Pete Type1
Mark NaN
Steve NaN





share|improve this answer


























  • Will this give me a way of replacing any rows where the Name doesn't belong in any category with NaN or 0 or anything like that? If i replace it now, all Type1 will be replaced.

    – Coolkidscandie
    Nov 21 '18 at 19:10











  • It does now with the new edit

    – yatu
    Nov 21 '18 at 19:15











  • @Coolkidscandie please consider marking the answer as correct so I know it help you

    – yatu
    Nov 21 '18 at 20:19






  • 1





    I will once I get to test it - sorry, I'm not able to access my file now. I'll have to go around setting the index, as there are more columns, but I'm sure I'll find something. For now, thank you. When I get to check it, I'll mark it

    – Coolkidscandie
    Nov 21 '18 at 20:35











  • so this code is giving me TypeError: reduction operation 'argmax' not allowed for this dtype

    – Coolkidscandie
    Nov 22 '18 at 16:01
















1














idxmax



dat = pd.DataFrame({'Name':('Tom','Pete','Mark','Steve'),
'Type1':(0,1,0,0),
'Type2':(1,0,0,0),
'Type3':(0,0,0,0),
'Type4':(0,0,0,0)})

df = dat.loc[:,'Type1':].replace(0, np.nan)
df.idxmax(axis=1).to_frame(name='Type').set_index(dat.Name)

Type
Name
Tom Type2
Pete Type1
Mark NaN
Steve NaN





share|improve this answer


























  • Will this give me a way of replacing any rows where the Name doesn't belong in any category with NaN or 0 or anything like that? If i replace it now, all Type1 will be replaced.

    – Coolkidscandie
    Nov 21 '18 at 19:10











  • It does now with the new edit

    – yatu
    Nov 21 '18 at 19:15











  • @Coolkidscandie please consider marking the answer as correct so I know it help you

    – yatu
    Nov 21 '18 at 20:19






  • 1





    I will once I get to test it - sorry, I'm not able to access my file now. I'll have to go around setting the index, as there are more columns, but I'm sure I'll find something. For now, thank you. When I get to check it, I'll mark it

    – Coolkidscandie
    Nov 21 '18 at 20:35











  • so this code is giving me TypeError: reduction operation 'argmax' not allowed for this dtype

    – Coolkidscandie
    Nov 22 '18 at 16:01














1












1








1







idxmax



dat = pd.DataFrame({'Name':('Tom','Pete','Mark','Steve'),
'Type1':(0,1,0,0),
'Type2':(1,0,0,0),
'Type3':(0,0,0,0),
'Type4':(0,0,0,0)})

df = dat.loc[:,'Type1':].replace(0, np.nan)
df.idxmax(axis=1).to_frame(name='Type').set_index(dat.Name)

Type
Name
Tom Type2
Pete Type1
Mark NaN
Steve NaN





share|improve this answer















idxmax



dat = pd.DataFrame({'Name':('Tom','Pete','Mark','Steve'),
'Type1':(0,1,0,0),
'Type2':(1,0,0,0),
'Type3':(0,0,0,0),
'Type4':(0,0,0,0)})

df = dat.loc[:,'Type1':].replace(0, np.nan)
df.idxmax(axis=1).to_frame(name='Type').set_index(dat.Name)

Type
Name
Tom Type2
Pete Type1
Mark NaN
Steve NaN






share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 22 '18 at 17:03

























answered Nov 21 '18 at 19:08









yatuyatu

7,9001926




7,9001926













  • Will this give me a way of replacing any rows where the Name doesn't belong in any category with NaN or 0 or anything like that? If i replace it now, all Type1 will be replaced.

    – Coolkidscandie
    Nov 21 '18 at 19:10











  • It does now with the new edit

    – yatu
    Nov 21 '18 at 19:15











  • @Coolkidscandie please consider marking the answer as correct so I know it help you

    – yatu
    Nov 21 '18 at 20:19






  • 1





    I will once I get to test it - sorry, I'm not able to access my file now. I'll have to go around setting the index, as there are more columns, but I'm sure I'll find something. For now, thank you. When I get to check it, I'll mark it

    – Coolkidscandie
    Nov 21 '18 at 20:35











  • so this code is giving me TypeError: reduction operation 'argmax' not allowed for this dtype

    – Coolkidscandie
    Nov 22 '18 at 16:01



















  • Will this give me a way of replacing any rows where the Name doesn't belong in any category with NaN or 0 or anything like that? If i replace it now, all Type1 will be replaced.

    – Coolkidscandie
    Nov 21 '18 at 19:10











  • It does now with the new edit

    – yatu
    Nov 21 '18 at 19:15











  • @Coolkidscandie please consider marking the answer as correct so I know it help you

    – yatu
    Nov 21 '18 at 20:19






  • 1





    I will once I get to test it - sorry, I'm not able to access my file now. I'll have to go around setting the index, as there are more columns, but I'm sure I'll find something. For now, thank you. When I get to check it, I'll mark it

    – Coolkidscandie
    Nov 21 '18 at 20:35











  • so this code is giving me TypeError: reduction operation 'argmax' not allowed for this dtype

    – Coolkidscandie
    Nov 22 '18 at 16:01

















Will this give me a way of replacing any rows where the Name doesn't belong in any category with NaN or 0 or anything like that? If i replace it now, all Type1 will be replaced.

– Coolkidscandie
Nov 21 '18 at 19:10





Will this give me a way of replacing any rows where the Name doesn't belong in any category with NaN or 0 or anything like that? If i replace it now, all Type1 will be replaced.

– Coolkidscandie
Nov 21 '18 at 19:10













It does now with the new edit

– yatu
Nov 21 '18 at 19:15





It does now with the new edit

– yatu
Nov 21 '18 at 19:15













@Coolkidscandie please consider marking the answer as correct so I know it help you

– yatu
Nov 21 '18 at 20:19





@Coolkidscandie please consider marking the answer as correct so I know it help you

– yatu
Nov 21 '18 at 20:19




1




1





I will once I get to test it - sorry, I'm not able to access my file now. I'll have to go around setting the index, as there are more columns, but I'm sure I'll find something. For now, thank you. When I get to check it, I'll mark it

– Coolkidscandie
Nov 21 '18 at 20:35





I will once I get to test it - sorry, I'm not able to access my file now. I'll have to go around setting the index, as there are more columns, but I'm sure I'll find something. For now, thank you. When I get to check it, I'll mark it

– Coolkidscandie
Nov 21 '18 at 20:35













so this code is giving me TypeError: reduction operation 'argmax' not allowed for this dtype

– Coolkidscandie
Nov 22 '18 at 16:01





so this code is giving me TypeError: reduction operation 'argmax' not allowed for this dtype

– Coolkidscandie
Nov 22 '18 at 16:01


















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53418787%2fextract-categorical-data-from-dummy-features%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Costa Masnaga

Fotorealismo

Sidney Franklin