How to handle categorical data for preprocessing in Machine Learning
This may be a basic question, I have a categorical data and I want to feed this into my machine learning model. my ML model accepts only numerical data. What is the correct way to convert this categorical data into numerical data.
My Sample DF:
T-size Gender Label
0 L M 1
1 L M 1
2 M F 1
3 S F 0
4 M M 1
5 L M 0
6 S F 1
7 S F 0
8 M M 1
I know this following code convert my categorical data into numerical
Type-1:
df['T-size'] = df['T-size'].cat.codes
Above line simply converts category from 0 to N-1. It doesn't follow any relationship between them.
For this example I know S < M < L. What should I do when I have want to convert data like above.
Type-2:
In this type I No relationship between M and F. But I can tell that When M has more probability than F. i.e., sample to be 1 / Total number of sample
for Male,
(4/5)
for Female,
(2/4)
WKT,
(4/5) > (2/4)
How should I replace for this kind of column?
Can I replace M with (4/5) and F with (2/4) for this problem?
What is the proper way to dealing with column?
help me to understand this better.
python pandas dataframe machine-learning feature-selection
add a comment |
This may be a basic question, I have a categorical data and I want to feed this into my machine learning model. my ML model accepts only numerical data. What is the correct way to convert this categorical data into numerical data.
My Sample DF:
T-size Gender Label
0 L M 1
1 L M 1
2 M F 1
3 S F 0
4 M M 1
5 L M 0
6 S F 1
7 S F 0
8 M M 1
I know this following code convert my categorical data into numerical
Type-1:
df['T-size'] = df['T-size'].cat.codes
Above line simply converts category from 0 to N-1. It doesn't follow any relationship between them.
For this example I know S < M < L. What should I do when I have want to convert data like above.
Type-2:
In this type I No relationship between M and F. But I can tell that When M has more probability than F. i.e., sample to be 1 / Total number of sample
for Male,
(4/5)
for Female,
(2/4)
WKT,
(4/5) > (2/4)
How should I replace for this kind of column?
Can I replace M with (4/5) and F with (2/4) for this problem?
What is the proper way to dealing with column?
help me to understand this better.
python pandas dataframe machine-learning feature-selection
add a comment |
This may be a basic question, I have a categorical data and I want to feed this into my machine learning model. my ML model accepts only numerical data. What is the correct way to convert this categorical data into numerical data.
My Sample DF:
T-size Gender Label
0 L M 1
1 L M 1
2 M F 1
3 S F 0
4 M M 1
5 L M 0
6 S F 1
7 S F 0
8 M M 1
I know this following code convert my categorical data into numerical
Type-1:
df['T-size'] = df['T-size'].cat.codes
Above line simply converts category from 0 to N-1. It doesn't follow any relationship between them.
For this example I know S < M < L. What should I do when I have want to convert data like above.
Type-2:
In this type I No relationship between M and F. But I can tell that When M has more probability than F. i.e., sample to be 1 / Total number of sample
for Male,
(4/5)
for Female,
(2/4)
WKT,
(4/5) > (2/4)
How should I replace for this kind of column?
Can I replace M with (4/5) and F with (2/4) for this problem?
What is the proper way to dealing with column?
help me to understand this better.
python pandas dataframe machine-learning feature-selection
This may be a basic question, I have a categorical data and I want to feed this into my machine learning model. my ML model accepts only numerical data. What is the correct way to convert this categorical data into numerical data.
My Sample DF:
T-size Gender Label
0 L M 1
1 L M 1
2 M F 1
3 S F 0
4 M M 1
5 L M 0
6 S F 1
7 S F 0
8 M M 1
I know this following code convert my categorical data into numerical
Type-1:
df['T-size'] = df['T-size'].cat.codes
Above line simply converts category from 0 to N-1. It doesn't follow any relationship between them.
For this example I know S < M < L. What should I do when I have want to convert data like above.
Type-2:
In this type I No relationship between M and F. But I can tell that When M has more probability than F. i.e., sample to be 1 / Total number of sample
for Male,
(4/5)
for Female,
(2/4)
WKT,
(4/5) > (2/4)
How should I replace for this kind of column?
Can I replace M with (4/5) and F with (2/4) for this problem?
What is the proper way to dealing with column?
help me to understand this better.
python pandas dataframe machine-learning feature-selection
python pandas dataframe machine-learning feature-selection
edited Nov 26 '18 at 14:32
Joe
6,12421630
6,12421630
asked Nov 26 '18 at 9:26
Mohamed Thasin ahMohamed Thasin ah
4,10132041
4,10132041
add a comment |
add a comment |
4 Answers
4
active
oldest
votes
There are many ways to encode categorical data, some of them depend on exactly what you plan to do with it. For example, one-hot-encoding which is easily the most popular choice is an extremely poor choice if you're planning on using a decision tree / random forest / GBM.
Regarding your t-shirts above, you can give a pandas categorical type an order:
df['T-size'].astype(pd.api.types.CategoricalDtype(['S','M','L'],ordered=True)).
if you had set up your tshirt categorical like that then your .cat.codes method would work perfectly. It also means you can easily use scikit-learn's LabelEconder which fits neatly into pipelines.
Regarding you encoding of gender, you need to be very careful when using your target variable (your Label). You don't want to do this encoding before your train-test split otherwise you're using knowledge of your unseen data making it not truly unseen. This gets even more complicated if you're using cross-validation as you'll need to do the encoding with in each CV iteration (i.e. new encoding per fold). If you want to do this, I recommend you check out TargetEncoder from skcontribs Category Encoders but again, be sure to use this within an sklearn Pipeline or you will mess up the train-test splits and leak information from your test set into you training set.
What is the importance ofordered. i.e., what is the difference between df['T-size'].astype('category') anddf['T-size'].astype(pd.api.types.CategoricalDtype(['S','M','L'],ordered=True))
– Mohamed Thasin ah
Nov 26 '18 at 10:39
1
Well, you said you wanted to be sure that S<M<L, so that's what ordered does for you
– Dan
Nov 26 '18 at 10:39
yeah got it ;-)
– Mohamed Thasin ah
Nov 26 '18 at 10:40
For a clarification inYou don't want to do this encoding before your train-test split otherwise you're using knowledge of your unseen data making it not truly unseen.this statementthis encodingrepresents which encoding? you mean my assumption of encoding right? or you mean something other?
– Mohamed Thasin ah
Nov 26 '18 at 10:58
1
@MohamedThasinah technically, any encoding, But particularly any encoding that uses the target variable or the distribution of the variable being encoded.
– Dan
Nov 26 '18 at 11:08
|
show 2 more comments
For the first question, if you have a small number of categories, you could map the column with a dictionary. In this way you can set an order:
d = {'L':2, 'M':1, 'S':0}
df['T-size'] = df['T-size'].map(d)
Output:
T-size Gender Label
0 2 M 1
1 2 M 1
2 1 F 1
3 0 F 0
4 1 M 1
5 2 M 0
6 0 F 1
7 0 F 0
8 1 M 1
For the second question, you can use the same method, but i would leave the 2 values for males and females 0 and 1. If you need just the category and you dont have to make operations with the values, a values is equal to another.
thanks for the answer, here you used mapping values for s->0, M->2, L->2 May I know why did you choose 0,1,2. If it represents weight can I use 10, 20, 30 respectively. will it make any difference in my model
– Mohamed Thasin ah
Nov 26 '18 at 9:55
I would start from the smallest value0, and increase till the biggest
– Joe
Nov 26 '18 at 9:56
I'm really confused with what value to assign. Is replacing a scalar value to the category sufficient to deal with categorical data?
– Mohamed Thasin ah
Nov 26 '18 at 10:02
Yes it is enough
– Joe
Nov 26 '18 at 10:03
sorry for the repeated question, I'm really wondering, Don't I need to give weight explicitly?
– Mohamed Thasin ah
Nov 26 '18 at 10:04
|
show 4 more comments
If you want to have a hierarchy in your size parameter, you may consider using a linear mapping for it. This would be :
size_mapping = {"S": 1, "M":2 , "L":3}
#mapping to the DataFrame
df['T-size_num'] = df['T-size'].map(size_mapping)
This allows you to treat the input as numerical data while preserving the hierarchy
And as for the gender, you are misconceiving the repartition and the preproces. If you already put the repartition as an input, you will introduce a bias in your data. You must consider that Male and female as two distinct categories regardless of their existing repartition. You should map it with two different numbers, but without introducing proportions.
df['Gender_num'] = df['Gender'].map({'M':0 , 'F':1})
For a more detailed explanation and a coverage of more specificities than your question, I suggest reading this article regarding categorical data in Machine Learning
Thanks for the answer, for first question, you assigns 1, 2, 3 may I know the reson behind that?
– Mohamed Thasin ah
Nov 26 '18 at 10:04
The relationship of the values of your size parameters are unknown, so I assume it to be linear. And since sizes are positive entities, I took your minimum value to be greater than 0. It makes sure that the M size is a multiple of the S size (which is also true for the L size of course), and it would not be true for S=0, M=1 and L=3
– SantiStSupery
Nov 26 '18 at 10:20
I'm curious to know instead of1, 2, 3Can I use10, 20, 30or5, 10, 15?
– Mohamed Thasin ah
Nov 26 '18 at 10:22
1
Yes you could. The most important part is the way you model the relationship between your elements. If it is set to be linear with c=3a and b=2a, the patterns you will find in your data will remain the same whether you use 1, 2 and 3 or any other triplet that preserve that same linear relationship.
– SantiStSupery
Nov 26 '18 at 10:29
add a comment |
It might be overkill for the M/F example, since it's binary - but if you are ever concerned about mapping a categorical into a numerical form, then consider one hot encoding. It basically stretches your single column containing n categories, into n binary columns.
So a dataset of:
Gender
M
F
M
M
F
Would become
Gender_M Gender_F
1 0
0 1
1 0
1 0
0 1
This takes away any notion of one thing being more "positive" than another - an absolute must for categorical data with more than 2 options, where there's no transitive A > B > C relationship and you don't want to smear your results by forcing one into your encoding scheme.
Thanks for the answer, I understood the first part, But I failed to understandan absolute must for categorical data with more than 2 options, where there's no transitive A > B > C relationship and you don't want to smear your results by forcing one into your encoding scheme.this point. can you make me clear.
– Mohamed Thasin ah
Nov 26 '18 at 10:31
1
So you've outlined two types of categorical variable in your example, {S,M,L} is transitive, since you can assign them an order. {M,F} isn't transitive, but it is binary, so coding to {0,1} is probably ok. Another type of categorical variable is one where there are more than 2 options, but which are not orderable (i.e. not transitive) - so {Vanilla, Strawberry, Grape} might be the options in a categorical variable, but if transformed to {0,1,2} a skew is introduced, suggesting that Strawberry exists "between" Vanilla and Grape when no such relationship exists outside of the arbitrary coding.
– Thomas Kimber
Nov 26 '18 at 10:44
Great explanation, It's really appreciated. For this{Vanilla, Strawberry, Grape}one hot encoding solve the problem right?
– Mohamed Thasin ah
Nov 26 '18 at 10:52
1
Yes, exactly. There's nothing stopping you from using it for{M,F}too. One-hot encoding has the disadvantage of expanding the dimensionality of your data, which can be a problem when the number of categories in a field gets large, but bearing that in mind, it's usually a good general-use option.
– Thomas Kimber
Nov 26 '18 at 11:02
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53478046%2fhow-to-handle-categorical-data-for-preprocessing-in-machine-learning%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
There are many ways to encode categorical data, some of them depend on exactly what you plan to do with it. For example, one-hot-encoding which is easily the most popular choice is an extremely poor choice if you're planning on using a decision tree / random forest / GBM.
Regarding your t-shirts above, you can give a pandas categorical type an order:
df['T-size'].astype(pd.api.types.CategoricalDtype(['S','M','L'],ordered=True)).
if you had set up your tshirt categorical like that then your .cat.codes method would work perfectly. It also means you can easily use scikit-learn's LabelEconder which fits neatly into pipelines.
Regarding you encoding of gender, you need to be very careful when using your target variable (your Label). You don't want to do this encoding before your train-test split otherwise you're using knowledge of your unseen data making it not truly unseen. This gets even more complicated if you're using cross-validation as you'll need to do the encoding with in each CV iteration (i.e. new encoding per fold). If you want to do this, I recommend you check out TargetEncoder from skcontribs Category Encoders but again, be sure to use this within an sklearn Pipeline or you will mess up the train-test splits and leak information from your test set into you training set.
What is the importance ofordered. i.e., what is the difference between df['T-size'].astype('category') anddf['T-size'].astype(pd.api.types.CategoricalDtype(['S','M','L'],ordered=True))
– Mohamed Thasin ah
Nov 26 '18 at 10:39
1
Well, you said you wanted to be sure that S<M<L, so that's what ordered does for you
– Dan
Nov 26 '18 at 10:39
yeah got it ;-)
– Mohamed Thasin ah
Nov 26 '18 at 10:40
For a clarification inYou don't want to do this encoding before your train-test split otherwise you're using knowledge of your unseen data making it not truly unseen.this statementthis encodingrepresents which encoding? you mean my assumption of encoding right? or you mean something other?
– Mohamed Thasin ah
Nov 26 '18 at 10:58
1
@MohamedThasinah technically, any encoding, But particularly any encoding that uses the target variable or the distribution of the variable being encoded.
– Dan
Nov 26 '18 at 11:08
|
show 2 more comments
There are many ways to encode categorical data, some of them depend on exactly what you plan to do with it. For example, one-hot-encoding which is easily the most popular choice is an extremely poor choice if you're planning on using a decision tree / random forest / GBM.
Regarding your t-shirts above, you can give a pandas categorical type an order:
df['T-size'].astype(pd.api.types.CategoricalDtype(['S','M','L'],ordered=True)).
if you had set up your tshirt categorical like that then your .cat.codes method would work perfectly. It also means you can easily use scikit-learn's LabelEconder which fits neatly into pipelines.
Regarding you encoding of gender, you need to be very careful when using your target variable (your Label). You don't want to do this encoding before your train-test split otherwise you're using knowledge of your unseen data making it not truly unseen. This gets even more complicated if you're using cross-validation as you'll need to do the encoding with in each CV iteration (i.e. new encoding per fold). If you want to do this, I recommend you check out TargetEncoder from skcontribs Category Encoders but again, be sure to use this within an sklearn Pipeline or you will mess up the train-test splits and leak information from your test set into you training set.
What is the importance ofordered. i.e., what is the difference between df['T-size'].astype('category') anddf['T-size'].astype(pd.api.types.CategoricalDtype(['S','M','L'],ordered=True))
– Mohamed Thasin ah
Nov 26 '18 at 10:39
1
Well, you said you wanted to be sure that S<M<L, so that's what ordered does for you
– Dan
Nov 26 '18 at 10:39
yeah got it ;-)
– Mohamed Thasin ah
Nov 26 '18 at 10:40
For a clarification inYou don't want to do this encoding before your train-test split otherwise you're using knowledge of your unseen data making it not truly unseen.this statementthis encodingrepresents which encoding? you mean my assumption of encoding right? or you mean something other?
– Mohamed Thasin ah
Nov 26 '18 at 10:58
1
@MohamedThasinah technically, any encoding, But particularly any encoding that uses the target variable or the distribution of the variable being encoded.
– Dan
Nov 26 '18 at 11:08
|
show 2 more comments
There are many ways to encode categorical data, some of them depend on exactly what you plan to do with it. For example, one-hot-encoding which is easily the most popular choice is an extremely poor choice if you're planning on using a decision tree / random forest / GBM.
Regarding your t-shirts above, you can give a pandas categorical type an order:
df['T-size'].astype(pd.api.types.CategoricalDtype(['S','M','L'],ordered=True)).
if you had set up your tshirt categorical like that then your .cat.codes method would work perfectly. It also means you can easily use scikit-learn's LabelEconder which fits neatly into pipelines.
Regarding you encoding of gender, you need to be very careful when using your target variable (your Label). You don't want to do this encoding before your train-test split otherwise you're using knowledge of your unseen data making it not truly unseen. This gets even more complicated if you're using cross-validation as you'll need to do the encoding with in each CV iteration (i.e. new encoding per fold). If you want to do this, I recommend you check out TargetEncoder from skcontribs Category Encoders but again, be sure to use this within an sklearn Pipeline or you will mess up the train-test splits and leak information from your test set into you training set.
There are many ways to encode categorical data, some of them depend on exactly what you plan to do with it. For example, one-hot-encoding which is easily the most popular choice is an extremely poor choice if you're planning on using a decision tree / random forest / GBM.
Regarding your t-shirts above, you can give a pandas categorical type an order:
df['T-size'].astype(pd.api.types.CategoricalDtype(['S','M','L'],ordered=True)).
if you had set up your tshirt categorical like that then your .cat.codes method would work perfectly. It also means you can easily use scikit-learn's LabelEconder which fits neatly into pipelines.
Regarding you encoding of gender, you need to be very careful when using your target variable (your Label). You don't want to do this encoding before your train-test split otherwise you're using knowledge of your unseen data making it not truly unseen. This gets even more complicated if you're using cross-validation as you'll need to do the encoding with in each CV iteration (i.e. new encoding per fold). If you want to do this, I recommend you check out TargetEncoder from skcontribs Category Encoders but again, be sure to use this within an sklearn Pipeline or you will mess up the train-test splits and leak information from your test set into you training set.
edited Nov 29 '18 at 5:52
Mohamed Thasin ah
4,10132041
4,10132041
answered Nov 26 '18 at 10:19
DanDan
37.1k1056102
37.1k1056102
What is the importance ofordered. i.e., what is the difference between df['T-size'].astype('category') anddf['T-size'].astype(pd.api.types.CategoricalDtype(['S','M','L'],ordered=True))
– Mohamed Thasin ah
Nov 26 '18 at 10:39
1
Well, you said you wanted to be sure that S<M<L, so that's what ordered does for you
– Dan
Nov 26 '18 at 10:39
yeah got it ;-)
– Mohamed Thasin ah
Nov 26 '18 at 10:40
For a clarification inYou don't want to do this encoding before your train-test split otherwise you're using knowledge of your unseen data making it not truly unseen.this statementthis encodingrepresents which encoding? you mean my assumption of encoding right? or you mean something other?
– Mohamed Thasin ah
Nov 26 '18 at 10:58
1
@MohamedThasinah technically, any encoding, But particularly any encoding that uses the target variable or the distribution of the variable being encoded.
– Dan
Nov 26 '18 at 11:08
|
show 2 more comments
What is the importance ofordered. i.e., what is the difference between df['T-size'].astype('category') anddf['T-size'].astype(pd.api.types.CategoricalDtype(['S','M','L'],ordered=True))
– Mohamed Thasin ah
Nov 26 '18 at 10:39
1
Well, you said you wanted to be sure that S<M<L, so that's what ordered does for you
– Dan
Nov 26 '18 at 10:39
yeah got it ;-)
– Mohamed Thasin ah
Nov 26 '18 at 10:40
For a clarification inYou don't want to do this encoding before your train-test split otherwise you're using knowledge of your unseen data making it not truly unseen.this statementthis encodingrepresents which encoding? you mean my assumption of encoding right? or you mean something other?
– Mohamed Thasin ah
Nov 26 '18 at 10:58
1
@MohamedThasinah technically, any encoding, But particularly any encoding that uses the target variable or the distribution of the variable being encoded.
– Dan
Nov 26 '18 at 11:08
What is the importance of
ordered. i.e., what is the difference between df['T-size'].astype('category') and df['T-size'].astype(pd.api.types.CategoricalDtype(['S','M','L'],ordered=True))– Mohamed Thasin ah
Nov 26 '18 at 10:39
What is the importance of
ordered. i.e., what is the difference between df['T-size'].astype('category') and df['T-size'].astype(pd.api.types.CategoricalDtype(['S','M','L'],ordered=True))– Mohamed Thasin ah
Nov 26 '18 at 10:39
1
1
Well, you said you wanted to be sure that S<M<L, so that's what ordered does for you
– Dan
Nov 26 '18 at 10:39
Well, you said you wanted to be sure that S<M<L, so that's what ordered does for you
– Dan
Nov 26 '18 at 10:39
yeah got it ;-)
– Mohamed Thasin ah
Nov 26 '18 at 10:40
yeah got it ;-)
– Mohamed Thasin ah
Nov 26 '18 at 10:40
For a clarification in
You don't want to do this encoding before your train-test split otherwise you're using knowledge of your unseen data making it not truly unseen. this statement this encoding represents which encoding? you mean my assumption of encoding right? or you mean something other?– Mohamed Thasin ah
Nov 26 '18 at 10:58
For a clarification in
You don't want to do this encoding before your train-test split otherwise you're using knowledge of your unseen data making it not truly unseen. this statement this encoding represents which encoding? you mean my assumption of encoding right? or you mean something other?– Mohamed Thasin ah
Nov 26 '18 at 10:58
1
1
@MohamedThasinah technically, any encoding, But particularly any encoding that uses the target variable or the distribution of the variable being encoded.
– Dan
Nov 26 '18 at 11:08
@MohamedThasinah technically, any encoding, But particularly any encoding that uses the target variable or the distribution of the variable being encoded.
– Dan
Nov 26 '18 at 11:08
|
show 2 more comments
For the first question, if you have a small number of categories, you could map the column with a dictionary. In this way you can set an order:
d = {'L':2, 'M':1, 'S':0}
df['T-size'] = df['T-size'].map(d)
Output:
T-size Gender Label
0 2 M 1
1 2 M 1
2 1 F 1
3 0 F 0
4 1 M 1
5 2 M 0
6 0 F 1
7 0 F 0
8 1 M 1
For the second question, you can use the same method, but i would leave the 2 values for males and females 0 and 1. If you need just the category and you dont have to make operations with the values, a values is equal to another.
thanks for the answer, here you used mapping values for s->0, M->2, L->2 May I know why did you choose 0,1,2. If it represents weight can I use 10, 20, 30 respectively. will it make any difference in my model
– Mohamed Thasin ah
Nov 26 '18 at 9:55
I would start from the smallest value0, and increase till the biggest
– Joe
Nov 26 '18 at 9:56
I'm really confused with what value to assign. Is replacing a scalar value to the category sufficient to deal with categorical data?
– Mohamed Thasin ah
Nov 26 '18 at 10:02
Yes it is enough
– Joe
Nov 26 '18 at 10:03
sorry for the repeated question, I'm really wondering, Don't I need to give weight explicitly?
– Mohamed Thasin ah
Nov 26 '18 at 10:04
|
show 4 more comments
For the first question, if you have a small number of categories, you could map the column with a dictionary. In this way you can set an order:
d = {'L':2, 'M':1, 'S':0}
df['T-size'] = df['T-size'].map(d)
Output:
T-size Gender Label
0 2 M 1
1 2 M 1
2 1 F 1
3 0 F 0
4 1 M 1
5 2 M 0
6 0 F 1
7 0 F 0
8 1 M 1
For the second question, you can use the same method, but i would leave the 2 values for males and females 0 and 1. If you need just the category and you dont have to make operations with the values, a values is equal to another.
thanks for the answer, here you used mapping values for s->0, M->2, L->2 May I know why did you choose 0,1,2. If it represents weight can I use 10, 20, 30 respectively. will it make any difference in my model
– Mohamed Thasin ah
Nov 26 '18 at 9:55
I would start from the smallest value0, and increase till the biggest
– Joe
Nov 26 '18 at 9:56
I'm really confused with what value to assign. Is replacing a scalar value to the category sufficient to deal with categorical data?
– Mohamed Thasin ah
Nov 26 '18 at 10:02
Yes it is enough
– Joe
Nov 26 '18 at 10:03
sorry for the repeated question, I'm really wondering, Don't I need to give weight explicitly?
– Mohamed Thasin ah
Nov 26 '18 at 10:04
|
show 4 more comments
For the first question, if you have a small number of categories, you could map the column with a dictionary. In this way you can set an order:
d = {'L':2, 'M':1, 'S':0}
df['T-size'] = df['T-size'].map(d)
Output:
T-size Gender Label
0 2 M 1
1 2 M 1
2 1 F 1
3 0 F 0
4 1 M 1
5 2 M 0
6 0 F 1
7 0 F 0
8 1 M 1
For the second question, you can use the same method, but i would leave the 2 values for males and females 0 and 1. If you need just the category and you dont have to make operations with the values, a values is equal to another.
For the first question, if you have a small number of categories, you could map the column with a dictionary. In this way you can set an order:
d = {'L':2, 'M':1, 'S':0}
df['T-size'] = df['T-size'].map(d)
Output:
T-size Gender Label
0 2 M 1
1 2 M 1
2 1 F 1
3 0 F 0
4 1 M 1
5 2 M 0
6 0 F 1
7 0 F 0
8 1 M 1
For the second question, you can use the same method, but i would leave the 2 values for males and females 0 and 1. If you need just the category and you dont have to make operations with the values, a values is equal to another.
edited Nov 26 '18 at 9:58
answered Nov 26 '18 at 9:51
JoeJoe
6,12421630
6,12421630
thanks for the answer, here you used mapping values for s->0, M->2, L->2 May I know why did you choose 0,1,2. If it represents weight can I use 10, 20, 30 respectively. will it make any difference in my model
– Mohamed Thasin ah
Nov 26 '18 at 9:55
I would start from the smallest value0, and increase till the biggest
– Joe
Nov 26 '18 at 9:56
I'm really confused with what value to assign. Is replacing a scalar value to the category sufficient to deal with categorical data?
– Mohamed Thasin ah
Nov 26 '18 at 10:02
Yes it is enough
– Joe
Nov 26 '18 at 10:03
sorry for the repeated question, I'm really wondering, Don't I need to give weight explicitly?
– Mohamed Thasin ah
Nov 26 '18 at 10:04
|
show 4 more comments
thanks for the answer, here you used mapping values for s->0, M->2, L->2 May I know why did you choose 0,1,2. If it represents weight can I use 10, 20, 30 respectively. will it make any difference in my model
– Mohamed Thasin ah
Nov 26 '18 at 9:55
I would start from the smallest value0, and increase till the biggest
– Joe
Nov 26 '18 at 9:56
I'm really confused with what value to assign. Is replacing a scalar value to the category sufficient to deal with categorical data?
– Mohamed Thasin ah
Nov 26 '18 at 10:02
Yes it is enough
– Joe
Nov 26 '18 at 10:03
sorry for the repeated question, I'm really wondering, Don't I need to give weight explicitly?
– Mohamed Thasin ah
Nov 26 '18 at 10:04
thanks for the answer, here you used mapping values for s->0, M->2, L->2 May I know why did you choose 0,1,2. If it represents weight can I use 10, 20, 30 respectively. will it make any difference in my model
– Mohamed Thasin ah
Nov 26 '18 at 9:55
thanks for the answer, here you used mapping values for s->0, M->2, L->2 May I know why did you choose 0,1,2. If it represents weight can I use 10, 20, 30 respectively. will it make any difference in my model
– Mohamed Thasin ah
Nov 26 '18 at 9:55
I would start from the smallest value
0, and increase till the biggest– Joe
Nov 26 '18 at 9:56
I would start from the smallest value
0, and increase till the biggest– Joe
Nov 26 '18 at 9:56
I'm really confused with what value to assign. Is replacing a scalar value to the category sufficient to deal with categorical data?
– Mohamed Thasin ah
Nov 26 '18 at 10:02
I'm really confused with what value to assign. Is replacing a scalar value to the category sufficient to deal with categorical data?
– Mohamed Thasin ah
Nov 26 '18 at 10:02
Yes it is enough
– Joe
Nov 26 '18 at 10:03
Yes it is enough
– Joe
Nov 26 '18 at 10:03
sorry for the repeated question, I'm really wondering, Don't I need to give weight explicitly?
– Mohamed Thasin ah
Nov 26 '18 at 10:04
sorry for the repeated question, I'm really wondering, Don't I need to give weight explicitly?
– Mohamed Thasin ah
Nov 26 '18 at 10:04
|
show 4 more comments
If you want to have a hierarchy in your size parameter, you may consider using a linear mapping for it. This would be :
size_mapping = {"S": 1, "M":2 , "L":3}
#mapping to the DataFrame
df['T-size_num'] = df['T-size'].map(size_mapping)
This allows you to treat the input as numerical data while preserving the hierarchy
And as for the gender, you are misconceiving the repartition and the preproces. If you already put the repartition as an input, you will introduce a bias in your data. You must consider that Male and female as two distinct categories regardless of their existing repartition. You should map it with two different numbers, but without introducing proportions.
df['Gender_num'] = df['Gender'].map({'M':0 , 'F':1})
For a more detailed explanation and a coverage of more specificities than your question, I suggest reading this article regarding categorical data in Machine Learning
Thanks for the answer, for first question, you assigns 1, 2, 3 may I know the reson behind that?
– Mohamed Thasin ah
Nov 26 '18 at 10:04
The relationship of the values of your size parameters are unknown, so I assume it to be linear. And since sizes are positive entities, I took your minimum value to be greater than 0. It makes sure that the M size is a multiple of the S size (which is also true for the L size of course), and it would not be true for S=0, M=1 and L=3
– SantiStSupery
Nov 26 '18 at 10:20
I'm curious to know instead of1, 2, 3Can I use10, 20, 30or5, 10, 15?
– Mohamed Thasin ah
Nov 26 '18 at 10:22
1
Yes you could. The most important part is the way you model the relationship between your elements. If it is set to be linear with c=3a and b=2a, the patterns you will find in your data will remain the same whether you use 1, 2 and 3 or any other triplet that preserve that same linear relationship.
– SantiStSupery
Nov 26 '18 at 10:29
add a comment |
If you want to have a hierarchy in your size parameter, you may consider using a linear mapping for it. This would be :
size_mapping = {"S": 1, "M":2 , "L":3}
#mapping to the DataFrame
df['T-size_num'] = df['T-size'].map(size_mapping)
This allows you to treat the input as numerical data while preserving the hierarchy
And as for the gender, you are misconceiving the repartition and the preproces. If you already put the repartition as an input, you will introduce a bias in your data. You must consider that Male and female as two distinct categories regardless of their existing repartition. You should map it with two different numbers, but without introducing proportions.
df['Gender_num'] = df['Gender'].map({'M':0 , 'F':1})
For a more detailed explanation and a coverage of more specificities than your question, I suggest reading this article regarding categorical data in Machine Learning
Thanks for the answer, for first question, you assigns 1, 2, 3 may I know the reson behind that?
– Mohamed Thasin ah
Nov 26 '18 at 10:04
The relationship of the values of your size parameters are unknown, so I assume it to be linear. And since sizes are positive entities, I took your minimum value to be greater than 0. It makes sure that the M size is a multiple of the S size (which is also true for the L size of course), and it would not be true for S=0, M=1 and L=3
– SantiStSupery
Nov 26 '18 at 10:20
I'm curious to know instead of1, 2, 3Can I use10, 20, 30or5, 10, 15?
– Mohamed Thasin ah
Nov 26 '18 at 10:22
1
Yes you could. The most important part is the way you model the relationship between your elements. If it is set to be linear with c=3a and b=2a, the patterns you will find in your data will remain the same whether you use 1, 2 and 3 or any other triplet that preserve that same linear relationship.
– SantiStSupery
Nov 26 '18 at 10:29
add a comment |
If you want to have a hierarchy in your size parameter, you may consider using a linear mapping for it. This would be :
size_mapping = {"S": 1, "M":2 , "L":3}
#mapping to the DataFrame
df['T-size_num'] = df['T-size'].map(size_mapping)
This allows you to treat the input as numerical data while preserving the hierarchy
And as for the gender, you are misconceiving the repartition and the preproces. If you already put the repartition as an input, you will introduce a bias in your data. You must consider that Male and female as two distinct categories regardless of their existing repartition. You should map it with two different numbers, but without introducing proportions.
df['Gender_num'] = df['Gender'].map({'M':0 , 'F':1})
For a more detailed explanation and a coverage of more specificities than your question, I suggest reading this article regarding categorical data in Machine Learning
If you want to have a hierarchy in your size parameter, you may consider using a linear mapping for it. This would be :
size_mapping = {"S": 1, "M":2 , "L":3}
#mapping to the DataFrame
df['T-size_num'] = df['T-size'].map(size_mapping)
This allows you to treat the input as numerical data while preserving the hierarchy
And as for the gender, you are misconceiving the repartition and the preproces. If you already put the repartition as an input, you will introduce a bias in your data. You must consider that Male and female as two distinct categories regardless of their existing repartition. You should map it with two different numbers, but without introducing proportions.
df['Gender_num'] = df['Gender'].map({'M':0 , 'F':1})
For a more detailed explanation and a coverage of more specificities than your question, I suggest reading this article regarding categorical data in Machine Learning
answered Nov 26 '18 at 9:54
SantiStSuperySantiStSupery
117112
117112
Thanks for the answer, for first question, you assigns 1, 2, 3 may I know the reson behind that?
– Mohamed Thasin ah
Nov 26 '18 at 10:04
The relationship of the values of your size parameters are unknown, so I assume it to be linear. And since sizes are positive entities, I took your minimum value to be greater than 0. It makes sure that the M size is a multiple of the S size (which is also true for the L size of course), and it would not be true for S=0, M=1 and L=3
– SantiStSupery
Nov 26 '18 at 10:20
I'm curious to know instead of1, 2, 3Can I use10, 20, 30or5, 10, 15?
– Mohamed Thasin ah
Nov 26 '18 at 10:22
1
Yes you could. The most important part is the way you model the relationship between your elements. If it is set to be linear with c=3a and b=2a, the patterns you will find in your data will remain the same whether you use 1, 2 and 3 or any other triplet that preserve that same linear relationship.
– SantiStSupery
Nov 26 '18 at 10:29
add a comment |
Thanks for the answer, for first question, you assigns 1, 2, 3 may I know the reson behind that?
– Mohamed Thasin ah
Nov 26 '18 at 10:04
The relationship of the values of your size parameters are unknown, so I assume it to be linear. And since sizes are positive entities, I took your minimum value to be greater than 0. It makes sure that the M size is a multiple of the S size (which is also true for the L size of course), and it would not be true for S=0, M=1 and L=3
– SantiStSupery
Nov 26 '18 at 10:20
I'm curious to know instead of1, 2, 3Can I use10, 20, 30or5, 10, 15?
– Mohamed Thasin ah
Nov 26 '18 at 10:22
1
Yes you could. The most important part is the way you model the relationship between your elements. If it is set to be linear with c=3a and b=2a, the patterns you will find in your data will remain the same whether you use 1, 2 and 3 or any other triplet that preserve that same linear relationship.
– SantiStSupery
Nov 26 '18 at 10:29
Thanks for the answer, for first question, you assigns 1, 2, 3 may I know the reson behind that?
– Mohamed Thasin ah
Nov 26 '18 at 10:04
Thanks for the answer, for first question, you assigns 1, 2, 3 may I know the reson behind that?
– Mohamed Thasin ah
Nov 26 '18 at 10:04
The relationship of the values of your size parameters are unknown, so I assume it to be linear. And since sizes are positive entities, I took your minimum value to be greater than 0. It makes sure that the M size is a multiple of the S size (which is also true for the L size of course), and it would not be true for S=0, M=1 and L=3
– SantiStSupery
Nov 26 '18 at 10:20
The relationship of the values of your size parameters are unknown, so I assume it to be linear. And since sizes are positive entities, I took your minimum value to be greater than 0. It makes sure that the M size is a multiple of the S size (which is also true for the L size of course), and it would not be true for S=0, M=1 and L=3
– SantiStSupery
Nov 26 '18 at 10:20
I'm curious to know instead of
1, 2, 3 Can I use 10, 20, 30 or 5, 10, 15 ?– Mohamed Thasin ah
Nov 26 '18 at 10:22
I'm curious to know instead of
1, 2, 3 Can I use 10, 20, 30 or 5, 10, 15 ?– Mohamed Thasin ah
Nov 26 '18 at 10:22
1
1
Yes you could. The most important part is the way you model the relationship between your elements. If it is set to be linear with c=3a and b=2a, the patterns you will find in your data will remain the same whether you use 1, 2 and 3 or any other triplet that preserve that same linear relationship.
– SantiStSupery
Nov 26 '18 at 10:29
Yes you could. The most important part is the way you model the relationship between your elements. If it is set to be linear with c=3a and b=2a, the patterns you will find in your data will remain the same whether you use 1, 2 and 3 or any other triplet that preserve that same linear relationship.
– SantiStSupery
Nov 26 '18 at 10:29
add a comment |
It might be overkill for the M/F example, since it's binary - but if you are ever concerned about mapping a categorical into a numerical form, then consider one hot encoding. It basically stretches your single column containing n categories, into n binary columns.
So a dataset of:
Gender
M
F
M
M
F
Would become
Gender_M Gender_F
1 0
0 1
1 0
1 0
0 1
This takes away any notion of one thing being more "positive" than another - an absolute must for categorical data with more than 2 options, where there's no transitive A > B > C relationship and you don't want to smear your results by forcing one into your encoding scheme.
Thanks for the answer, I understood the first part, But I failed to understandan absolute must for categorical data with more than 2 options, where there's no transitive A > B > C relationship and you don't want to smear your results by forcing one into your encoding scheme.this point. can you make me clear.
– Mohamed Thasin ah
Nov 26 '18 at 10:31
1
So you've outlined two types of categorical variable in your example, {S,M,L} is transitive, since you can assign them an order. {M,F} isn't transitive, but it is binary, so coding to {0,1} is probably ok. Another type of categorical variable is one where there are more than 2 options, but which are not orderable (i.e. not transitive) - so {Vanilla, Strawberry, Grape} might be the options in a categorical variable, but if transformed to {0,1,2} a skew is introduced, suggesting that Strawberry exists "between" Vanilla and Grape when no such relationship exists outside of the arbitrary coding.
– Thomas Kimber
Nov 26 '18 at 10:44
Great explanation, It's really appreciated. For this{Vanilla, Strawberry, Grape}one hot encoding solve the problem right?
– Mohamed Thasin ah
Nov 26 '18 at 10:52
1
Yes, exactly. There's nothing stopping you from using it for{M,F}too. One-hot encoding has the disadvantage of expanding the dimensionality of your data, which can be a problem when the number of categories in a field gets large, but bearing that in mind, it's usually a good general-use option.
– Thomas Kimber
Nov 26 '18 at 11:02
add a comment |
It might be overkill for the M/F example, since it's binary - but if you are ever concerned about mapping a categorical into a numerical form, then consider one hot encoding. It basically stretches your single column containing n categories, into n binary columns.
So a dataset of:
Gender
M
F
M
M
F
Would become
Gender_M Gender_F
1 0
0 1
1 0
1 0
0 1
This takes away any notion of one thing being more "positive" than another - an absolute must for categorical data with more than 2 options, where there's no transitive A > B > C relationship and you don't want to smear your results by forcing one into your encoding scheme.
Thanks for the answer, I understood the first part, But I failed to understandan absolute must for categorical data with more than 2 options, where there's no transitive A > B > C relationship and you don't want to smear your results by forcing one into your encoding scheme.this point. can you make me clear.
– Mohamed Thasin ah
Nov 26 '18 at 10:31
1
So you've outlined two types of categorical variable in your example, {S,M,L} is transitive, since you can assign them an order. {M,F} isn't transitive, but it is binary, so coding to {0,1} is probably ok. Another type of categorical variable is one where there are more than 2 options, but which are not orderable (i.e. not transitive) - so {Vanilla, Strawberry, Grape} might be the options in a categorical variable, but if transformed to {0,1,2} a skew is introduced, suggesting that Strawberry exists "between" Vanilla and Grape when no such relationship exists outside of the arbitrary coding.
– Thomas Kimber
Nov 26 '18 at 10:44
Great explanation, It's really appreciated. For this{Vanilla, Strawberry, Grape}one hot encoding solve the problem right?
– Mohamed Thasin ah
Nov 26 '18 at 10:52
1
Yes, exactly. There's nothing stopping you from using it for{M,F}too. One-hot encoding has the disadvantage of expanding the dimensionality of your data, which can be a problem when the number of categories in a field gets large, but bearing that in mind, it's usually a good general-use option.
– Thomas Kimber
Nov 26 '18 at 11:02
add a comment |
It might be overkill for the M/F example, since it's binary - but if you are ever concerned about mapping a categorical into a numerical form, then consider one hot encoding. It basically stretches your single column containing n categories, into n binary columns.
So a dataset of:
Gender
M
F
M
M
F
Would become
Gender_M Gender_F
1 0
0 1
1 0
1 0
0 1
This takes away any notion of one thing being more "positive" than another - an absolute must for categorical data with more than 2 options, where there's no transitive A > B > C relationship and you don't want to smear your results by forcing one into your encoding scheme.
It might be overkill for the M/F example, since it's binary - but if you are ever concerned about mapping a categorical into a numerical form, then consider one hot encoding. It basically stretches your single column containing n categories, into n binary columns.
So a dataset of:
Gender
M
F
M
M
F
Would become
Gender_M Gender_F
1 0
0 1
1 0
1 0
0 1
This takes away any notion of one thing being more "positive" than another - an absolute must for categorical data with more than 2 options, where there's no transitive A > B > C relationship and you don't want to smear your results by forcing one into your encoding scheme.
answered Nov 26 '18 at 10:23
Thomas KimberThomas Kimber
3,45421324
3,45421324
Thanks for the answer, I understood the first part, But I failed to understandan absolute must for categorical data with more than 2 options, where there's no transitive A > B > C relationship and you don't want to smear your results by forcing one into your encoding scheme.this point. can you make me clear.
– Mohamed Thasin ah
Nov 26 '18 at 10:31
1
So you've outlined two types of categorical variable in your example, {S,M,L} is transitive, since you can assign them an order. {M,F} isn't transitive, but it is binary, so coding to {0,1} is probably ok. Another type of categorical variable is one where there are more than 2 options, but which are not orderable (i.e. not transitive) - so {Vanilla, Strawberry, Grape} might be the options in a categorical variable, but if transformed to {0,1,2} a skew is introduced, suggesting that Strawberry exists "between" Vanilla and Grape when no such relationship exists outside of the arbitrary coding.
– Thomas Kimber
Nov 26 '18 at 10:44
Great explanation, It's really appreciated. For this{Vanilla, Strawberry, Grape}one hot encoding solve the problem right?
– Mohamed Thasin ah
Nov 26 '18 at 10:52
1
Yes, exactly. There's nothing stopping you from using it for{M,F}too. One-hot encoding has the disadvantage of expanding the dimensionality of your data, which can be a problem when the number of categories in a field gets large, but bearing that in mind, it's usually a good general-use option.
– Thomas Kimber
Nov 26 '18 at 11:02
add a comment |
Thanks for the answer, I understood the first part, But I failed to understandan absolute must for categorical data with more than 2 options, where there's no transitive A > B > C relationship and you don't want to smear your results by forcing one into your encoding scheme.this point. can you make me clear.
– Mohamed Thasin ah
Nov 26 '18 at 10:31
1
So you've outlined two types of categorical variable in your example, {S,M,L} is transitive, since you can assign them an order. {M,F} isn't transitive, but it is binary, so coding to {0,1} is probably ok. Another type of categorical variable is one where there are more than 2 options, but which are not orderable (i.e. not transitive) - so {Vanilla, Strawberry, Grape} might be the options in a categorical variable, but if transformed to {0,1,2} a skew is introduced, suggesting that Strawberry exists "between" Vanilla and Grape when no such relationship exists outside of the arbitrary coding.
– Thomas Kimber
Nov 26 '18 at 10:44
Great explanation, It's really appreciated. For this{Vanilla, Strawberry, Grape}one hot encoding solve the problem right?
– Mohamed Thasin ah
Nov 26 '18 at 10:52
1
Yes, exactly. There's nothing stopping you from using it for{M,F}too. One-hot encoding has the disadvantage of expanding the dimensionality of your data, which can be a problem when the number of categories in a field gets large, but bearing that in mind, it's usually a good general-use option.
– Thomas Kimber
Nov 26 '18 at 11:02
Thanks for the answer, I understood the first part, But I failed to understand
an absolute must for categorical data with more than 2 options, where there's no transitive A > B > C relationship and you don't want to smear your results by forcing one into your encoding scheme. this point. can you make me clear.– Mohamed Thasin ah
Nov 26 '18 at 10:31
Thanks for the answer, I understood the first part, But I failed to understand
an absolute must for categorical data with more than 2 options, where there's no transitive A > B > C relationship and you don't want to smear your results by forcing one into your encoding scheme. this point. can you make me clear.– Mohamed Thasin ah
Nov 26 '18 at 10:31
1
1
So you've outlined two types of categorical variable in your example, {S,M,L} is transitive, since you can assign them an order. {M,F} isn't transitive, but it is binary, so coding to {0,1} is probably ok. Another type of categorical variable is one where there are more than 2 options, but which are not orderable (i.e. not transitive) - so {Vanilla, Strawberry, Grape} might be the options in a categorical variable, but if transformed to {0,1,2} a skew is introduced, suggesting that Strawberry exists "between" Vanilla and Grape when no such relationship exists outside of the arbitrary coding.
– Thomas Kimber
Nov 26 '18 at 10:44
So you've outlined two types of categorical variable in your example, {S,M,L} is transitive, since you can assign them an order. {M,F} isn't transitive, but it is binary, so coding to {0,1} is probably ok. Another type of categorical variable is one where there are more than 2 options, but which are not orderable (i.e. not transitive) - so {Vanilla, Strawberry, Grape} might be the options in a categorical variable, but if transformed to {0,1,2} a skew is introduced, suggesting that Strawberry exists "between" Vanilla and Grape when no such relationship exists outside of the arbitrary coding.
– Thomas Kimber
Nov 26 '18 at 10:44
Great explanation, It's really appreciated. For this
{Vanilla, Strawberry, Grape} one hot encoding solve the problem right?– Mohamed Thasin ah
Nov 26 '18 at 10:52
Great explanation, It's really appreciated. For this
{Vanilla, Strawberry, Grape} one hot encoding solve the problem right?– Mohamed Thasin ah
Nov 26 '18 at 10:52
1
1
Yes, exactly. There's nothing stopping you from using it for
{M,F} too. One-hot encoding has the disadvantage of expanding the dimensionality of your data, which can be a problem when the number of categories in a field gets large, but bearing that in mind, it's usually a good general-use option.– Thomas Kimber
Nov 26 '18 at 11:02
Yes, exactly. There's nothing stopping you from using it for
{M,F} too. One-hot encoding has the disadvantage of expanding the dimensionality of your data, which can be a problem when the number of categories in a field gets large, but bearing that in mind, it's usually a good general-use option.– Thomas Kimber
Nov 26 '18 at 11:02
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53478046%2fhow-to-handle-categorical-data-for-preprocessing-in-machine-learning%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown