Extract categorical data from dummy features
I'm trying to combine 4 categorical columns into 1 with pandas melt, but it creates 3 duplicates of each row (giving me 4x more rows).
dat = pd.DataFrame({'Name':('Tom','Pete','Mark','Steve'),
'Type1':(0,1,0,0),
'Type2':(1,0,0,0),
'Type3':(0,0,0,0),
'Type4':(0,0,0,0)})
dat = pd.melt(dat, id_vars='Name',
value_vars=('Type1', 'Type2', 'Type3', 'Type4'), value_name='type')
Name variable type
0 Tom Type1 0
1 Pete Type1 1
2 Mark Type1 0
3 Steve Type1 0
4 Tom Type2 1
5 Pete Type2 0
6 Mark Type2 0
7 Steve Type2 0
8 Tom Type3 0
9 Pete Type3 0
10 Mark Type3 0
11 Steve Type3 0
12 Tom Type4 0
13 Pete Type4 0
14 Mark Type4 0
15 Steve Type4 0
Another problem I have, which I guess can't be resolved with melt is replacing all of the rows where the value is 0 for all Types with 'None' - but that can probably be done with a simple query - fixing the duplicates is my worry for now. Unless I shouldn't be using melt?
What I'm trying to get is: column with a Type1 or 2 or 3 or 4. So in this case:
Name Type
0 Tom Type2
1 Pete Type1
2 Mark Type3
3 Steve Type3
Where last 2 would preferably turn to 'None' as these 2 names don't have a type assigned to them. (Hope I'm not going mad and it makes sense to someone other than just me).
python-3.x pandas
add a comment |
I'm trying to combine 4 categorical columns into 1 with pandas melt, but it creates 3 duplicates of each row (giving me 4x more rows).
dat = pd.DataFrame({'Name':('Tom','Pete','Mark','Steve'),
'Type1':(0,1,0,0),
'Type2':(1,0,0,0),
'Type3':(0,0,0,0),
'Type4':(0,0,0,0)})
dat = pd.melt(dat, id_vars='Name',
value_vars=('Type1', 'Type2', 'Type3', 'Type4'), value_name='type')
Name variable type
0 Tom Type1 0
1 Pete Type1 1
2 Mark Type1 0
3 Steve Type1 0
4 Tom Type2 1
5 Pete Type2 0
6 Mark Type2 0
7 Steve Type2 0
8 Tom Type3 0
9 Pete Type3 0
10 Mark Type3 0
11 Steve Type3 0
12 Tom Type4 0
13 Pete Type4 0
14 Mark Type4 0
15 Steve Type4 0
Another problem I have, which I guess can't be resolved with melt is replacing all of the rows where the value is 0 for all Types with 'None' - but that can probably be done with a simple query - fixing the duplicates is my worry for now. Unless I shouldn't be using melt?
What I'm trying to get is: column with a Type1 or 2 or 3 or 4. So in this case:
Name Type
0 Tom Type2
1 Pete Type1
2 Mark Type3
3 Steve Type3
Where last 2 would preferably turn to 'None' as these 2 names don't have a type assigned to them. (Hope I'm not going mad and it makes sense to someone other than just me).
python-3.x pandas
1
You need to provide your expected output.
– piRSquared
Nov 21 '18 at 18:54
Where are the duplicates?
– Chris
Nov 21 '18 at 18:54
Sorry, edited the main post
– Coolkidscandie
Nov 21 '18 at 18:59
@Coolkidscandie can each name have more than one type?
– Chris
Nov 21 '18 at 19:04
@Chris - No. I've got a dataframe of about 2000 dogs and only 5-10% of them actually have any category. So rather than having 4 columns where most of them are NaNs, I wanted to have 1 with either just the name of the category (if they have any) or NaN if they don't belong to any category.
– Coolkidscandie
Nov 21 '18 at 19:06
add a comment |
I'm trying to combine 4 categorical columns into 1 with pandas melt, but it creates 3 duplicates of each row (giving me 4x more rows).
dat = pd.DataFrame({'Name':('Tom','Pete','Mark','Steve'),
'Type1':(0,1,0,0),
'Type2':(1,0,0,0),
'Type3':(0,0,0,0),
'Type4':(0,0,0,0)})
dat = pd.melt(dat, id_vars='Name',
value_vars=('Type1', 'Type2', 'Type3', 'Type4'), value_name='type')
Name variable type
0 Tom Type1 0
1 Pete Type1 1
2 Mark Type1 0
3 Steve Type1 0
4 Tom Type2 1
5 Pete Type2 0
6 Mark Type2 0
7 Steve Type2 0
8 Tom Type3 0
9 Pete Type3 0
10 Mark Type3 0
11 Steve Type3 0
12 Tom Type4 0
13 Pete Type4 0
14 Mark Type4 0
15 Steve Type4 0
Another problem I have, which I guess can't be resolved with melt is replacing all of the rows where the value is 0 for all Types with 'None' - but that can probably be done with a simple query - fixing the duplicates is my worry for now. Unless I shouldn't be using melt?
What I'm trying to get is: column with a Type1 or 2 or 3 or 4. So in this case:
Name Type
0 Tom Type2
1 Pete Type1
2 Mark Type3
3 Steve Type3
Where last 2 would preferably turn to 'None' as these 2 names don't have a type assigned to them. (Hope I'm not going mad and it makes sense to someone other than just me).
python-3.x pandas
I'm trying to combine 4 categorical columns into 1 with pandas melt, but it creates 3 duplicates of each row (giving me 4x more rows).
dat = pd.DataFrame({'Name':('Tom','Pete','Mark','Steve'),
'Type1':(0,1,0,0),
'Type2':(1,0,0,0),
'Type3':(0,0,0,0),
'Type4':(0,0,0,0)})
dat = pd.melt(dat, id_vars='Name',
value_vars=('Type1', 'Type2', 'Type3', 'Type4'), value_name='type')
Name variable type
0 Tom Type1 0
1 Pete Type1 1
2 Mark Type1 0
3 Steve Type1 0
4 Tom Type2 1
5 Pete Type2 0
6 Mark Type2 0
7 Steve Type2 0
8 Tom Type3 0
9 Pete Type3 0
10 Mark Type3 0
11 Steve Type3 0
12 Tom Type4 0
13 Pete Type4 0
14 Mark Type4 0
15 Steve Type4 0
Another problem I have, which I guess can't be resolved with melt is replacing all of the rows where the value is 0 for all Types with 'None' - but that can probably be done with a simple query - fixing the duplicates is my worry for now. Unless I shouldn't be using melt?
What I'm trying to get is: column with a Type1 or 2 or 3 or 4. So in this case:
Name Type
0 Tom Type2
1 Pete Type1
2 Mark Type3
3 Steve Type3
Where last 2 would preferably turn to 'None' as these 2 names don't have a type assigned to them. (Hope I'm not going mad and it makes sense to someone other than just me).
python-3.x pandas
python-3.x pandas
edited Nov 21 '18 at 19:52
yatu
7,9001926
7,9001926
asked Nov 21 '18 at 18:51
CoolkidscandieCoolkidscandie
266
266
1
You need to provide your expected output.
– piRSquared
Nov 21 '18 at 18:54
Where are the duplicates?
– Chris
Nov 21 '18 at 18:54
Sorry, edited the main post
– Coolkidscandie
Nov 21 '18 at 18:59
@Coolkidscandie can each name have more than one type?
– Chris
Nov 21 '18 at 19:04
@Chris - No. I've got a dataframe of about 2000 dogs and only 5-10% of them actually have any category. So rather than having 4 columns where most of them are NaNs, I wanted to have 1 with either just the name of the category (if they have any) or NaN if they don't belong to any category.
– Coolkidscandie
Nov 21 '18 at 19:06
add a comment |
1
You need to provide your expected output.
– piRSquared
Nov 21 '18 at 18:54
Where are the duplicates?
– Chris
Nov 21 '18 at 18:54
Sorry, edited the main post
– Coolkidscandie
Nov 21 '18 at 18:59
@Coolkidscandie can each name have more than one type?
– Chris
Nov 21 '18 at 19:04
@Chris - No. I've got a dataframe of about 2000 dogs and only 5-10% of them actually have any category. So rather than having 4 columns where most of them are NaNs, I wanted to have 1 with either just the name of the category (if they have any) or NaN if they don't belong to any category.
– Coolkidscandie
Nov 21 '18 at 19:06
1
1
You need to provide your expected output.
– piRSquared
Nov 21 '18 at 18:54
You need to provide your expected output.
– piRSquared
Nov 21 '18 at 18:54
Where are the duplicates?
– Chris
Nov 21 '18 at 18:54
Where are the duplicates?
– Chris
Nov 21 '18 at 18:54
Sorry, edited the main post
– Coolkidscandie
Nov 21 '18 at 18:59
Sorry, edited the main post
– Coolkidscandie
Nov 21 '18 at 18:59
@Coolkidscandie can each name have more than one type?
– Chris
Nov 21 '18 at 19:04
@Coolkidscandie can each name have more than one type?
– Chris
Nov 21 '18 at 19:04
@Chris - No. I've got a dataframe of about 2000 dogs and only 5-10% of them actually have any category. So rather than having 4 columns where most of them are NaNs, I wanted to have 1 with either just the name of the category (if they have any) or NaN if they don't belong to any category.
– Coolkidscandie
Nov 21 '18 at 19:06
@Chris - No. I've got a dataframe of about 2000 dogs and only 5-10% of them actually have any category. So rather than having 4 columns where most of them are NaNs, I wanted to have 1 with either just the name of the category (if they have any) or NaN if they don't belong to any category.
– Coolkidscandie
Nov 21 '18 at 19:06
add a comment |
1 Answer
1
active
oldest
votes
idxmax
dat = pd.DataFrame({'Name':('Tom','Pete','Mark','Steve'),
'Type1':(0,1,0,0),
'Type2':(1,0,0,0),
'Type3':(0,0,0,0),
'Type4':(0,0,0,0)})
df = dat.loc[:,'Type1':].replace(0, np.nan)
df.idxmax(axis=1).to_frame(name='Type').set_index(dat.Name)
Type
Name
Tom Type2
Pete Type1
Mark NaN
Steve NaN
Will this give me a way of replacing any rows where the Name doesn't belong in any category with NaN or 0 or anything like that? If i replace it now, all Type1 will be replaced.
– Coolkidscandie
Nov 21 '18 at 19:10
It does now with the new edit
– yatu
Nov 21 '18 at 19:15
@Coolkidscandie please consider marking the answer as correct so I know it help you
– yatu
Nov 21 '18 at 20:19
1
I will once I get to test it - sorry, I'm not able to access my file now. I'll have to go around setting the index, as there are more columns, but I'm sure I'll find something. For now, thank you. When I get to check it, I'll mark it
– Coolkidscandie
Nov 21 '18 at 20:35
so this code is giving me TypeError: reduction operation 'argmax' not allowed for this dtype
– Coolkidscandie
Nov 22 '18 at 16:01
|
show 2 more comments
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53418787%2fextract-categorical-data-from-dummy-features%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
idxmax
dat = pd.DataFrame({'Name':('Tom','Pete','Mark','Steve'),
'Type1':(0,1,0,0),
'Type2':(1,0,0,0),
'Type3':(0,0,0,0),
'Type4':(0,0,0,0)})
df = dat.loc[:,'Type1':].replace(0, np.nan)
df.idxmax(axis=1).to_frame(name='Type').set_index(dat.Name)
Type
Name
Tom Type2
Pete Type1
Mark NaN
Steve NaN
Will this give me a way of replacing any rows where the Name doesn't belong in any category with NaN or 0 or anything like that? If i replace it now, all Type1 will be replaced.
– Coolkidscandie
Nov 21 '18 at 19:10
It does now with the new edit
– yatu
Nov 21 '18 at 19:15
@Coolkidscandie please consider marking the answer as correct so I know it help you
– yatu
Nov 21 '18 at 20:19
1
I will once I get to test it - sorry, I'm not able to access my file now. I'll have to go around setting the index, as there are more columns, but I'm sure I'll find something. For now, thank you. When I get to check it, I'll mark it
– Coolkidscandie
Nov 21 '18 at 20:35
so this code is giving me TypeError: reduction operation 'argmax' not allowed for this dtype
– Coolkidscandie
Nov 22 '18 at 16:01
|
show 2 more comments
idxmax
dat = pd.DataFrame({'Name':('Tom','Pete','Mark','Steve'),
'Type1':(0,1,0,0),
'Type2':(1,0,0,0),
'Type3':(0,0,0,0),
'Type4':(0,0,0,0)})
df = dat.loc[:,'Type1':].replace(0, np.nan)
df.idxmax(axis=1).to_frame(name='Type').set_index(dat.Name)
Type
Name
Tom Type2
Pete Type1
Mark NaN
Steve NaN
Will this give me a way of replacing any rows where the Name doesn't belong in any category with NaN or 0 or anything like that? If i replace it now, all Type1 will be replaced.
– Coolkidscandie
Nov 21 '18 at 19:10
It does now with the new edit
– yatu
Nov 21 '18 at 19:15
@Coolkidscandie please consider marking the answer as correct so I know it help you
– yatu
Nov 21 '18 at 20:19
1
I will once I get to test it - sorry, I'm not able to access my file now. I'll have to go around setting the index, as there are more columns, but I'm sure I'll find something. For now, thank you. When I get to check it, I'll mark it
– Coolkidscandie
Nov 21 '18 at 20:35
so this code is giving me TypeError: reduction operation 'argmax' not allowed for this dtype
– Coolkidscandie
Nov 22 '18 at 16:01
|
show 2 more comments
idxmax
dat = pd.DataFrame({'Name':('Tom','Pete','Mark','Steve'),
'Type1':(0,1,0,0),
'Type2':(1,0,0,0),
'Type3':(0,0,0,0),
'Type4':(0,0,0,0)})
df = dat.loc[:,'Type1':].replace(0, np.nan)
df.idxmax(axis=1).to_frame(name='Type').set_index(dat.Name)
Type
Name
Tom Type2
Pete Type1
Mark NaN
Steve NaN
idxmax
dat = pd.DataFrame({'Name':('Tom','Pete','Mark','Steve'),
'Type1':(0,1,0,0),
'Type2':(1,0,0,0),
'Type3':(0,0,0,0),
'Type4':(0,0,0,0)})
df = dat.loc[:,'Type1':].replace(0, np.nan)
df.idxmax(axis=1).to_frame(name='Type').set_index(dat.Name)
Type
Name
Tom Type2
Pete Type1
Mark NaN
Steve NaN
edited Nov 22 '18 at 17:03
answered Nov 21 '18 at 19:08
yatuyatu
7,9001926
7,9001926
Will this give me a way of replacing any rows where the Name doesn't belong in any category with NaN or 0 or anything like that? If i replace it now, all Type1 will be replaced.
– Coolkidscandie
Nov 21 '18 at 19:10
It does now with the new edit
– yatu
Nov 21 '18 at 19:15
@Coolkidscandie please consider marking the answer as correct so I know it help you
– yatu
Nov 21 '18 at 20:19
1
I will once I get to test it - sorry, I'm not able to access my file now. I'll have to go around setting the index, as there are more columns, but I'm sure I'll find something. For now, thank you. When I get to check it, I'll mark it
– Coolkidscandie
Nov 21 '18 at 20:35
so this code is giving me TypeError: reduction operation 'argmax' not allowed for this dtype
– Coolkidscandie
Nov 22 '18 at 16:01
|
show 2 more comments
Will this give me a way of replacing any rows where the Name doesn't belong in any category with NaN or 0 or anything like that? If i replace it now, all Type1 will be replaced.
– Coolkidscandie
Nov 21 '18 at 19:10
It does now with the new edit
– yatu
Nov 21 '18 at 19:15
@Coolkidscandie please consider marking the answer as correct so I know it help you
– yatu
Nov 21 '18 at 20:19
1
I will once I get to test it - sorry, I'm not able to access my file now. I'll have to go around setting the index, as there are more columns, but I'm sure I'll find something. For now, thank you. When I get to check it, I'll mark it
– Coolkidscandie
Nov 21 '18 at 20:35
so this code is giving me TypeError: reduction operation 'argmax' not allowed for this dtype
– Coolkidscandie
Nov 22 '18 at 16:01
Will this give me a way of replacing any rows where the Name doesn't belong in any category with NaN or 0 or anything like that? If i replace it now, all Type1 will be replaced.
– Coolkidscandie
Nov 21 '18 at 19:10
Will this give me a way of replacing any rows where the Name doesn't belong in any category with NaN or 0 or anything like that? If i replace it now, all Type1 will be replaced.
– Coolkidscandie
Nov 21 '18 at 19:10
It does now with the new edit
– yatu
Nov 21 '18 at 19:15
It does now with the new edit
– yatu
Nov 21 '18 at 19:15
@Coolkidscandie please consider marking the answer as correct so I know it help you
– yatu
Nov 21 '18 at 20:19
@Coolkidscandie please consider marking the answer as correct so I know it help you
– yatu
Nov 21 '18 at 20:19
1
1
I will once I get to test it - sorry, I'm not able to access my file now. I'll have to go around setting the index, as there are more columns, but I'm sure I'll find something. For now, thank you. When I get to check it, I'll mark it
– Coolkidscandie
Nov 21 '18 at 20:35
I will once I get to test it - sorry, I'm not able to access my file now. I'll have to go around setting the index, as there are more columns, but I'm sure I'll find something. For now, thank you. When I get to check it, I'll mark it
– Coolkidscandie
Nov 21 '18 at 20:35
so this code is giving me TypeError: reduction operation 'argmax' not allowed for this dtype
– Coolkidscandie
Nov 22 '18 at 16:01
so this code is giving me TypeError: reduction operation 'argmax' not allowed for this dtype
– Coolkidscandie
Nov 22 '18 at 16:01
|
show 2 more comments
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53418787%2fextract-categorical-data-from-dummy-features%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
You need to provide your expected output.
– piRSquared
Nov 21 '18 at 18:54
Where are the duplicates?
– Chris
Nov 21 '18 at 18:54
Sorry, edited the main post
– Coolkidscandie
Nov 21 '18 at 18:59
@Coolkidscandie can each name have more than one type?
– Chris
Nov 21 '18 at 19:04
@Chris - No. I've got a dataframe of about 2000 dogs and only 5-10% of them actually have any category. So rather than having 4 columns where most of them are NaNs, I wanted to have 1 with either just the name of the category (if they have any) or NaN if they don't belong to any category.
– Coolkidscandie
Nov 21 '18 at 19:06