What is the right way to substitute column values in dataframe?
I want to following thing to happen:
for every column in df check if its type is numeric, if not - use label encoder to map str/obj to numeric classes (e.g 0,1,2,3...).
I am trying to do it in the following way:
for col in df:
if not np.issubdtype(df[col].dtype, np.number):
df[col] = LabelEncoder().fit_transform(df[col])
I see few problems here.
First - column names can repeat and thus df[col] returns more than one column, which is not what I want.
Second - df[col].dtype throws error:
AttributeError: 'DataFrame' object has no attribute 'dtype'
which I assume might arise due to the issue #1 , e.g we get multiple columns returned. But I am not confident.
Third - would assigning df[col] = LabelEncoder().fit_transform(df[col]) lead to a column substitution in df or should I do some esoteric df partitioning and concatenation?
Thank you
python pandas dataframe
add a comment |
I want to following thing to happen:
for every column in df check if its type is numeric, if not - use label encoder to map str/obj to numeric classes (e.g 0,1,2,3...).
I am trying to do it in the following way:
for col in df:
if not np.issubdtype(df[col].dtype, np.number):
df[col] = LabelEncoder().fit_transform(df[col])
I see few problems here.
First - column names can repeat and thus df[col] returns more than one column, which is not what I want.
Second - df[col].dtype throws error:
AttributeError: 'DataFrame' object has no attribute 'dtype'
which I assume might arise due to the issue #1 , e.g we get multiple columns returned. But I am not confident.
Third - would assigning df[col] = LabelEncoder().fit_transform(df[col]) lead to a column substitution in df or should I do some esoteric df partitioning and concatenation?
Thank you
python pandas dataframe
add a comment |
I want to following thing to happen:
for every column in df check if its type is numeric, if not - use label encoder to map str/obj to numeric classes (e.g 0,1,2,3...).
I am trying to do it in the following way:
for col in df:
if not np.issubdtype(df[col].dtype, np.number):
df[col] = LabelEncoder().fit_transform(df[col])
I see few problems here.
First - column names can repeat and thus df[col] returns more than one column, which is not what I want.
Second - df[col].dtype throws error:
AttributeError: 'DataFrame' object has no attribute 'dtype'
which I assume might arise due to the issue #1 , e.g we get multiple columns returned. But I am not confident.
Third - would assigning df[col] = LabelEncoder().fit_transform(df[col]) lead to a column substitution in df or should I do some esoteric df partitioning and concatenation?
Thank you
python pandas dataframe
I want to following thing to happen:
for every column in df check if its type is numeric, if not - use label encoder to map str/obj to numeric classes (e.g 0,1,2,3...).
I am trying to do it in the following way:
for col in df:
if not np.issubdtype(df[col].dtype, np.number):
df[col] = LabelEncoder().fit_transform(df[col])
I see few problems here.
First - column names can repeat and thus df[col] returns more than one column, which is not what I want.
Second - df[col].dtype throws error:
AttributeError: 'DataFrame' object has no attribute 'dtype'
which I assume might arise due to the issue #1 , e.g we get multiple columns returned. But I am not confident.
Third - would assigning df[col] = LabelEncoder().fit_transform(df[col]) lead to a column substitution in df or should I do some esoteric df partitioning and concatenation?
Thank you
python pandas dataframe
python pandas dataframe
asked Nov 25 '18 at 21:51
YohanRothYohanRoth
9611919
9611919
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
Since LabelEncoder supports only one column at a time, iteration over columns is your only option. You can make this a little more concise using select_dtypes to select the columns, and then df.apply to apply the LabelEncoder to each column.
cols = df.select_dtypes(exclude=[np.number]).columns
df[cols] = df[cols].apply(lambda x: LabelEncoder().fit_transform(x))
Alternatively, you could build a mask by selecting object dtypes only (a little more flaky but easily extensible):
m = df.dtypes == object
# m = [not np.issubdtype(d, np.number) for d in df.dtypes]
df.loc[:, m] = df.loc[:, m].apply(lambda x: LabelEncoder().fit_transform(x))
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53472350%2fwhat-is-the-right-way-to-substitute-column-values-in-dataframe%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Since LabelEncoder supports only one column at a time, iteration over columns is your only option. You can make this a little more concise using select_dtypes to select the columns, and then df.apply to apply the LabelEncoder to each column.
cols = df.select_dtypes(exclude=[np.number]).columns
df[cols] = df[cols].apply(lambda x: LabelEncoder().fit_transform(x))
Alternatively, you could build a mask by selecting object dtypes only (a little more flaky but easily extensible):
m = df.dtypes == object
# m = [not np.issubdtype(d, np.number) for d in df.dtypes]
df.loc[:, m] = df.loc[:, m].apply(lambda x: LabelEncoder().fit_transform(x))
add a comment |
Since LabelEncoder supports only one column at a time, iteration over columns is your only option. You can make this a little more concise using select_dtypes to select the columns, and then df.apply to apply the LabelEncoder to each column.
cols = df.select_dtypes(exclude=[np.number]).columns
df[cols] = df[cols].apply(lambda x: LabelEncoder().fit_transform(x))
Alternatively, you could build a mask by selecting object dtypes only (a little more flaky but easily extensible):
m = df.dtypes == object
# m = [not np.issubdtype(d, np.number) for d in df.dtypes]
df.loc[:, m] = df.loc[:, m].apply(lambda x: LabelEncoder().fit_transform(x))
add a comment |
Since LabelEncoder supports only one column at a time, iteration over columns is your only option. You can make this a little more concise using select_dtypes to select the columns, and then df.apply to apply the LabelEncoder to each column.
cols = df.select_dtypes(exclude=[np.number]).columns
df[cols] = df[cols].apply(lambda x: LabelEncoder().fit_transform(x))
Alternatively, you could build a mask by selecting object dtypes only (a little more flaky but easily extensible):
m = df.dtypes == object
# m = [not np.issubdtype(d, np.number) for d in df.dtypes]
df.loc[:, m] = df.loc[:, m].apply(lambda x: LabelEncoder().fit_transform(x))
Since LabelEncoder supports only one column at a time, iteration over columns is your only option. You can make this a little more concise using select_dtypes to select the columns, and then df.apply to apply the LabelEncoder to each column.
cols = df.select_dtypes(exclude=[np.number]).columns
df[cols] = df[cols].apply(lambda x: LabelEncoder().fit_transform(x))
Alternatively, you could build a mask by selecting object dtypes only (a little more flaky but easily extensible):
m = df.dtypes == object
# m = [not np.issubdtype(d, np.number) for d in df.dtypes]
df.loc[:, m] = df.loc[:, m].apply(lambda x: LabelEncoder().fit_transform(x))
edited Nov 25 '18 at 22:08
answered Nov 25 '18 at 22:01
coldspeedcoldspeed
137k23148235
137k23148235
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53472350%2fwhat-is-the-right-way-to-substitute-column-values-in-dataframe%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown