Extract categorical data from dummy features

I'm trying to combine 4 categorical columns into 1 with pandas melt, but it creates 3 duplicates of each row (giving me 4x more rows).

dat = pd.DataFrame({'Name':('Tom','Pete','Mark','Steve'),

                    'Type1':(0,1,0,0),

                   'Type2':(1,0,0,0),

                   'Type3':(0,0,0,0),

                   'Type4':(0,0,0,0)})

dat = pd.melt(dat, id_vars='Name', 

              value_vars=('Type1', 'Type2', 'Type3', 'Type4'), value_name='type')







Name    variable    type

0   Tom     Type1   0

1   Pete    Type1   1

2   Mark    Type1   0

3   Steve   Type1   0

4   Tom     Type2   1

5   Pete    Type2   0

6   Mark    Type2   0

7   Steve   Type2   0

8   Tom     Type3   0

9   Pete    Type3   0

10  Mark    Type3   0

11  Steve   Type3   0

12  Tom     Type4   0

13  Pete    Type4   0

14  Mark    Type4   0

15  Steve   Type4   0

Another problem I have, which I guess can't be resolved with melt is replacing all of the rows where the value is 0 for all Types with 'None' - but that can probably be done with a simple query - fixing the duplicates is my worry for now. Unless I shouldn't be using melt?

What I'm trying to get is: column with a Type1 or 2 or 3 or 4. So in this case:

    Name    Type

0   Tom     Type2

1   Pete    Type1

2   Mark    Type3

3   Steve   Type3

Where last 2 would preferably turn to 'None' as these 2 names don't have a type assigned to them. (Hope I'm not going mad and it makes sense to someone other than just me).

edited Nov 21 '18 at 19:52

yatu

7,9001926

asked Nov 21 '18 at 18:51

Coolkidscandie

266

1

You need to provide your expected output.

– piRSquared
Nov 21 '18 at 18:54

Where are the duplicates?

– Chris
Nov 21 '18 at 18:54

Sorry, edited the main post

– Coolkidscandie
Nov 21 '18 at 18:59

@Coolkidscandie can each name have more than one type?

– Chris
Nov 21 '18 at 19:04

@Chris - No. I've got a dataframe of about 2000 dogs and only 5-10% of them actually have any category. So rather than having 4 columns where most of them are NaNs, I wanted to have 1 with either just the name of the category (if they have any) or NaN if they don't belong to any category.

– Coolkidscandie
Nov 21 '18 at 19:06

add a comment |

I'm trying to combine 4 categorical columns into 1 with pandas melt, but it creates 3 duplicates of each row (giving me 4x more rows).

dat = pd.DataFrame({'Name':('Tom','Pete','Mark','Steve'),

                    'Type1':(0,1,0,0),

                   'Type2':(1,0,0,0),

                   'Type3':(0,0,0,0),

                   'Type4':(0,0,0,0)})

dat = pd.melt(dat, id_vars='Name', 

              value_vars=('Type1', 'Type2', 'Type3', 'Type4'), value_name='type')







Name    variable    type

0   Tom     Type1   0

1   Pete    Type1   1

2   Mark    Type1   0

3   Steve   Type1   0

4   Tom     Type2   1

5   Pete    Type2   0

6   Mark    Type2   0

7   Steve   Type2   0

8   Tom     Type3   0

9   Pete    Type3   0

10  Mark    Type3   0

11  Steve   Type3   0

12  Tom     Type4   0

13  Pete    Type4   0

14  Mark    Type4   0

15  Steve   Type4   0

What I'm trying to get is: column with a Type1 or 2 or 3 or 4. So in this case:

    Name    Type

0   Tom     Type2

1   Pete    Type1

2   Mark    Type3

3   Steve   Type3

Where last 2 would preferably turn to 'None' as these 2 names don't have a type assigned to them. (Hope I'm not going mad and it makes sense to someone other than just me).

edited Nov 21 '18 at 19:52

yatu

7,9001926

asked Nov 21 '18 at 18:51

Coolkidscandie

266

1

You need to provide your expected output.

– piRSquared
Nov 21 '18 at 18:54

Where are the duplicates?

– Chris
Nov 21 '18 at 18:54

Sorry, edited the main post

– Coolkidscandie
Nov 21 '18 at 18:59

@Coolkidscandie can each name have more than one type?

– Chris
Nov 21 '18 at 19:04

@Chris - No. I've got a dataframe of about 2000 dogs and only 5-10% of them actually have any category. So rather than having 4 columns where most of them are NaNs, I wanted to have 1 with either just the name of the category (if they have any) or NaN if they don't belong to any category.

– Coolkidscandie
Nov 21 '18 at 19:06

add a comment |

I'm trying to combine 4 categorical columns into 1 with pandas melt, but it creates 3 duplicates of each row (giving me 4x more rows).

dat = pd.DataFrame({'Name':('Tom','Pete','Mark','Steve'),

                    'Type1':(0,1,0,0),

                   'Type2':(1,0,0,0),

                   'Type3':(0,0,0,0),

                   'Type4':(0,0,0,0)})

dat = pd.melt(dat, id_vars='Name', 

              value_vars=('Type1', 'Type2', 'Type3', 'Type4'), value_name='type')







Name    variable    type

0   Tom     Type1   0

1   Pete    Type1   1

2   Mark    Type1   0

3   Steve   Type1   0

4   Tom     Type2   1

5   Pete    Type2   0

6   Mark    Type2   0

7   Steve   Type2   0

8   Tom     Type3   0

9   Pete    Type3   0

10  Mark    Type3   0

11  Steve   Type3   0

12  Tom     Type4   0

13  Pete    Type4   0

14  Mark    Type4   0

15  Steve   Type4   0

What I'm trying to get is: column with a Type1 or 2 or 3 or 4. So in this case:

    Name    Type

0   Tom     Type2

1   Pete    Type1

2   Mark    Type3

3   Steve   Type3

Where last 2 would preferably turn to 'None' as these 2 names don't have a type assigned to them. (Hope I'm not going mad and it makes sense to someone other than just me).

edited Nov 21 '18 at 19:52

yatu

7,9001926

asked Nov 21 '18 at 18:51

Coolkidscandie

266

I'm trying to combine 4 categorical columns into 1 with pandas melt, but it creates 3 duplicates of each row (giving me 4x more rows).

dat = pd.DataFrame({'Name':('Tom','Pete','Mark','Steve'),

                    'Type1':(0,1,0,0),

                   'Type2':(1,0,0,0),

                   'Type3':(0,0,0,0),

                   'Type4':(0,0,0,0)})

dat = pd.melt(dat, id_vars='Name', 

              value_vars=('Type1', 'Type2', 'Type3', 'Type4'), value_name='type')







Name    variable    type

0   Tom     Type1   0

1   Pete    Type1   1

2   Mark    Type1   0

3   Steve   Type1   0

4   Tom     Type2   1

5   Pete    Type2   0

6   Mark    Type2   0

7   Steve   Type2   0

8   Tom     Type3   0

9   Pete    Type3   0

10  Mark    Type3   0

11  Steve   Type3   0

12  Tom     Type4   0

13  Pete    Type4   0

14  Mark    Type4   0

15  Steve   Type4   0

What I'm trying to get is: column with a Type1 or 2 or 3 or 4. So in this case:

    Name    Type

0   Tom     Type2

1   Pete    Type1

2   Mark    Type3

3   Steve   Type3

Where last 2 would preferably turn to 'None' as these 2 names don't have a type assigned to them. (Hope I'm not going mad and it makes sense to someone other than just me).

python-3.x pandas

edited Nov 21 '18 at 19:52

yatu

7,9001926

asked Nov 21 '18 at 18:51

Coolkidscandie

266

edited Nov 21 '18 at 19:52

yatu

7,9001926

asked Nov 21 '18 at 18:51

Coolkidscandie

266

edited Nov 21 '18 at 19:52

yatu

7,9001926

edited Nov 21 '18 at 19:52

yatu

7,9001926

edited Nov 21 '18 at 19:52

yatu

7,9001926

asked Nov 21 '18 at 18:51

Coolkidscandie

266

asked Nov 21 '18 at 18:51

Coolkidscandie

266

asked Nov 21 '18 at 18:51

Coolkidscandie

266

1

You need to provide your expected output.

– piRSquared
Nov 21 '18 at 18:54

Where are the duplicates?

– Chris
Nov 21 '18 at 18:54

Sorry, edited the main post

– Coolkidscandie
Nov 21 '18 at 18:59

@Coolkidscandie can each name have more than one type?

– Chris
Nov 21 '18 at 19:04

@Chris - No. I've got a dataframe of about 2000 dogs and only 5-10% of them actually have any category. So rather than having 4 columns where most of them are NaNs, I wanted to have 1 with either just the name of the category (if they have any) or NaN if they don't belong to any category.

– Coolkidscandie
Nov 21 '18 at 19:06

add a comment |

1

You need to provide your expected output.

– piRSquared
Nov 21 '18 at 18:54

Where are the duplicates?

– Chris
Nov 21 '18 at 18:54

Sorry, edited the main post

– Coolkidscandie
Nov 21 '18 at 18:59

@Coolkidscandie can each name have more than one type?

– Chris
Nov 21 '18 at 19:04

@Chris - No. I've got a dataframe of about 2000 dogs and only 5-10% of them actually have any category. So rather than having 4 columns where most of them are NaNs, I wanted to have 1 with either just the name of the category (if they have any) or NaN if they don't belong to any category.

– Coolkidscandie
Nov 21 '18 at 19:06

You need to provide your expected output.

– piRSquared
Nov 21 '18 at 18:54

Where are the duplicates?

– Chris
Nov 21 '18 at 18:54

Sorry, edited the main post

– Coolkidscandie
Nov 21 '18 at 18:59

@Coolkidscandie can each name have more than one type?

– Chris
Nov 21 '18 at 19:04

@Chris - No. I've got a dataframe of about 2000 dogs and only 5-10% of them actually have any category. So rather than having 4 columns where most of them are NaNs, I wanted to have 1 with either just the name of the category (if they have any) or NaN if they don't belong to any category.

– Coolkidscandie
Nov 21 '18 at 19:06

add a comment |

1 Answer
1

active

oldest

votes

`idxmax`

dat = pd.DataFrame({'Name':('Tom','Pete','Mark','Steve'),

                'Type1':(0,1,0,0),

               'Type2':(1,0,0,0),

               'Type3':(0,0,0,0),

               'Type4':(0,0,0,0)})



df = dat.loc[:,'Type1':].replace(0, np.nan)

df.idxmax(axis=1).to_frame(name='Type').set_index(dat.Name)



         Type

Name        

Tom    Type2

Pete   Type1

Mark     NaN

Steve    NaN

edited Nov 22 '18 at 17:03

answered Nov 21 '18 at 19:08

yatu

7,9001926

Will this give me a way of replacing any rows where the Name doesn't belong in any category with NaN or 0 or anything like that? If i replace it now, all Type1 will be replaced.

– Coolkidscandie
Nov 21 '18 at 19:10

It does now with the new edit

– yatu
Nov 21 '18 at 19:15

@Coolkidscandie please consider marking the answer as correct so I know it help you

– yatu
Nov 21 '18 at 20:19

1

I will once I get to test it - sorry, I'm not able to access my file now. I'll have to go around setting the index, as there are more columns, but I'm sure I'll find something. For now, thank you. When I get to check it, I'll mark it

– Coolkidscandie
Nov 21 '18 at 20:35

so this code is giving me TypeError: reduction operation 'argmax' not allowed for this dtype

– Coolkidscandie
Nov 22 '18 at 16:01

|
show 2 more comments

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53418787%2fextract-categorical-data-from-dummy-features%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

`idxmax`

dat = pd.DataFrame({'Name':('Tom','Pete','Mark','Steve'),

                'Type1':(0,1,0,0),

               'Type2':(1,0,0,0),

               'Type3':(0,0,0,0),

               'Type4':(0,0,0,0)})



df = dat.loc[:,'Type1':].replace(0, np.nan)

df.idxmax(axis=1).to_frame(name='Type').set_index(dat.Name)



         Type

Name        

Tom    Type2

Pete   Type1

Mark     NaN

Steve    NaN

edited Nov 22 '18 at 17:03

answered Nov 21 '18 at 19:08

yatu

7,9001926

Will this give me a way of replacing any rows where the Name doesn't belong in any category with NaN or 0 or anything like that? If i replace it now, all Type1 will be replaced.

– Coolkidscandie
Nov 21 '18 at 19:10

It does now with the new edit

– yatu
Nov 21 '18 at 19:15

@Coolkidscandie please consider marking the answer as correct so I know it help you

– yatu
Nov 21 '18 at 20:19

1

I will once I get to test it - sorry, I'm not able to access my file now. I'll have to go around setting the index, as there are more columns, but I'm sure I'll find something. For now, thank you. When I get to check it, I'll mark it

– Coolkidscandie
Nov 21 '18 at 20:35

so this code is giving me TypeError: reduction operation 'argmax' not allowed for this dtype

– Coolkidscandie
Nov 22 '18 at 16:01

|
show 2 more comments

`idxmax`

dat = pd.DataFrame({'Name':('Tom','Pete','Mark','Steve'),

                'Type1':(0,1,0,0),

               'Type2':(1,0,0,0),

               'Type3':(0,0,0,0),

               'Type4':(0,0,0,0)})



df = dat.loc[:,'Type1':].replace(0, np.nan)

df.idxmax(axis=1).to_frame(name='Type').set_index(dat.Name)



         Type

Name        

Tom    Type2

Pete   Type1

Mark     NaN

Steve    NaN

edited Nov 22 '18 at 17:03

answered Nov 21 '18 at 19:08

yatu

7,9001926

Will this give me a way of replacing any rows where the Name doesn't belong in any category with NaN or 0 or anything like that? If i replace it now, all Type1 will be replaced.

– Coolkidscandie
Nov 21 '18 at 19:10

It does now with the new edit

– yatu
Nov 21 '18 at 19:15

@Coolkidscandie please consider marking the answer as correct so I know it help you

– yatu
Nov 21 '18 at 20:19

1

I will once I get to test it - sorry, I'm not able to access my file now. I'll have to go around setting the index, as there are more columns, but I'm sure I'll find something. For now, thank you. When I get to check it, I'll mark it

– Coolkidscandie
Nov 21 '18 at 20:35

so this code is giving me TypeError: reduction operation 'argmax' not allowed for this dtype

– Coolkidscandie
Nov 22 '18 at 16:01

|
show 2 more comments

`idxmax`

dat = pd.DataFrame({'Name':('Tom','Pete','Mark','Steve'),

                'Type1':(0,1,0,0),

               'Type2':(1,0,0,0),

               'Type3':(0,0,0,0),

               'Type4':(0,0,0,0)})



df = dat.loc[:,'Type1':].replace(0, np.nan)

df.idxmax(axis=1).to_frame(name='Type').set_index(dat.Name)



         Type

Name        

Tom    Type2

Pete   Type1

Mark     NaN

Steve    NaN

edited Nov 22 '18 at 17:03

answered Nov 21 '18 at 19:08

yatu

7,9001926

`idxmax`

dat = pd.DataFrame({'Name':('Tom','Pete','Mark','Steve'),

                'Type1':(0,1,0,0),

               'Type2':(1,0,0,0),

               'Type3':(0,0,0,0),

               'Type4':(0,0,0,0)})



df = dat.loc[:,'Type1':].replace(0, np.nan)

df.idxmax(axis=1).to_frame(name='Type').set_index(dat.Name)



         Type

Name        

Tom    Type2

Pete   Type1

Mark     NaN

Steve    NaN

edited Nov 22 '18 at 17:03

answered Nov 21 '18 at 19:08

yatu

7,9001926

edited Nov 22 '18 at 17:03

answered Nov 21 '18 at 19:08

yatu

7,9001926

answered Nov 21 '18 at 19:08

yatu

7,9001926

answered Nov 21 '18 at 19:08

yatu

7,9001926

Will this give me a way of replacing any rows where the Name doesn't belong in any category with NaN or 0 or anything like that? If i replace it now, all Type1 will be replaced.

– Coolkidscandie
Nov 21 '18 at 19:10

It does now with the new edit

– yatu
Nov 21 '18 at 19:15

@Coolkidscandie please consider marking the answer as correct so I know it help you

– yatu
Nov 21 '18 at 20:19

1

I will once I get to test it - sorry, I'm not able to access my file now. I'll have to go around setting the index, as there are more columns, but I'm sure I'll find something. For now, thank you. When I get to check it, I'll mark it

– Coolkidscandie
Nov 21 '18 at 20:35

so this code is giving me TypeError: reduction operation 'argmax' not allowed for this dtype

– Coolkidscandie
Nov 22 '18 at 16:01

|
show 2 more comments

Will this give me a way of replacing any rows where the Name doesn't belong in any category with NaN or 0 or anything like that? If i replace it now, all Type1 will be replaced.

– Coolkidscandie
Nov 21 '18 at 19:10

It does now with the new edit

– yatu
Nov 21 '18 at 19:15

@Coolkidscandie please consider marking the answer as correct so I know it help you

– yatu
Nov 21 '18 at 20:19

1

I will once I get to test it - sorry, I'm not able to access my file now. I'll have to go around setting the index, as there are more columns, but I'm sure I'll find something. For now, thank you. When I get to check it, I'll mark it

– Coolkidscandie
Nov 21 '18 at 20:35

so this code is giving me TypeError: reduction operation 'argmax' not allowed for this dtype

– Coolkidscandie
Nov 22 '18 at 16:01

Will this give me a way of replacing any rows where the Name doesn't belong in any category with NaN or 0 or anything like that? If i replace it now, all Type1 will be replaced.

– Coolkidscandie
Nov 21 '18 at 19:10

It does now with the new edit

– yatu
Nov 21 '18 at 19:15

@Coolkidscandie please consider marking the answer as correct so I know it help you

– yatu
Nov 21 '18 at 20:19

I will once I get to test it - sorry, I'm not able to access my file now. I'll have to go around setting the index, as there are more columns, but I'm sure I'll find something. For now, thank you. When I get to check it, I'll mark it

– Coolkidscandie
Nov 21 '18 at 20:35

so this code is giving me TypeError: reduction operation 'argmax' not allowed for this dtype

– Coolkidscandie
Nov 22 '18 at 16:01

|
show 2 more comments

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

sUZl,dOR BSvAxGtJkdXBKS

搜尋此網誌

Nsryjdtyk