Why does my decoded Windows-1252 string show up as a unicode value in a dictionary but not the value,...
In my application - following Ned Batchelder's recommendations of making a unicode sandwich - I first try to decode from Windows-1252 to UTF-8:
row[field] =row[field].decode('cp1252').encode('utf-8')
Later on, when I want to send my data to an endpoint I decode UTF-8:
row[field] = fld.decode('utf-8')
When I print just the field that has the offending Windows-1252 characters, it interprets them as such:
print row['dash']
# as well — ... “the intent was"
But when I try to print the entire dictionary I get unicode values:
print row
# as well xe2x80x93 ... xe2x80x9cthe intent wasxe2x80x9d
I want the wp-1252 characters themselves or equivalents such as the straight quotation mark instead of the left or right quotation mark.
python python-2.7 utf-8 character-encoding cp1252
add a comment |
In my application - following Ned Batchelder's recommendations of making a unicode sandwich - I first try to decode from Windows-1252 to UTF-8:
row[field] =row[field].decode('cp1252').encode('utf-8')
Later on, when I want to send my data to an endpoint I decode UTF-8:
row[field] = fld.decode('utf-8')
When I print just the field that has the offending Windows-1252 characters, it interprets them as such:
print row['dash']
# as well — ... “the intent was"
But when I try to print the entire dictionary I get unicode values:
print row
# as well xe2x80x93 ... xe2x80x9cthe intent wasxe2x80x9d
I want the wp-1252 characters themselves or equivalents such as the straight quotation mark instead of the left or right quotation mark.
python python-2.7 utf-8 character-encoding cp1252
When you are printing out a dictionary, the internal representation is shown, which is UTF-8.
– Maurice Meyer
Nov 21 '18 at 18:13
@MauriceMeyer you're right. Can you add this as an answer so I can accept it?
– Stepharr
Nov 21 '18 at 22:59
Your "sandwich" sounds backward. You.decode()
to Unicode when reading in data to a program for processing, then.encode()
to bytes to send it to a file or pipe. Databases "usually" can accept Unicode and are configured with an encoding that happens automatically when the database API puts the data in the database so you can skip the.encode()
step if that's the case.
– Mark Tolonen
Nov 22 '18 at 7:52
add a comment |
In my application - following Ned Batchelder's recommendations of making a unicode sandwich - I first try to decode from Windows-1252 to UTF-8:
row[field] =row[field].decode('cp1252').encode('utf-8')
Later on, when I want to send my data to an endpoint I decode UTF-8:
row[field] = fld.decode('utf-8')
When I print just the field that has the offending Windows-1252 characters, it interprets them as such:
print row['dash']
# as well — ... “the intent was"
But when I try to print the entire dictionary I get unicode values:
print row
# as well xe2x80x93 ... xe2x80x9cthe intent wasxe2x80x9d
I want the wp-1252 characters themselves or equivalents such as the straight quotation mark instead of the left or right quotation mark.
python python-2.7 utf-8 character-encoding cp1252
In my application - following Ned Batchelder's recommendations of making a unicode sandwich - I first try to decode from Windows-1252 to UTF-8:
row[field] =row[field].decode('cp1252').encode('utf-8')
Later on, when I want to send my data to an endpoint I decode UTF-8:
row[field] = fld.decode('utf-8')
When I print just the field that has the offending Windows-1252 characters, it interprets them as such:
print row['dash']
# as well — ... “the intent was"
But when I try to print the entire dictionary I get unicode values:
print row
# as well xe2x80x93 ... xe2x80x9cthe intent wasxe2x80x9d
I want the wp-1252 characters themselves or equivalents such as the straight quotation mark instead of the left or right quotation mark.
python python-2.7 utf-8 character-encoding cp1252
python python-2.7 utf-8 character-encoding cp1252
asked Nov 21 '18 at 18:09
StepharrStepharr
83
83
When you are printing out a dictionary, the internal representation is shown, which is UTF-8.
– Maurice Meyer
Nov 21 '18 at 18:13
@MauriceMeyer you're right. Can you add this as an answer so I can accept it?
– Stepharr
Nov 21 '18 at 22:59
Your "sandwich" sounds backward. You.decode()
to Unicode when reading in data to a program for processing, then.encode()
to bytes to send it to a file or pipe. Databases "usually" can accept Unicode and are configured with an encoding that happens automatically when the database API puts the data in the database so you can skip the.encode()
step if that's the case.
– Mark Tolonen
Nov 22 '18 at 7:52
add a comment |
When you are printing out a dictionary, the internal representation is shown, which is UTF-8.
– Maurice Meyer
Nov 21 '18 at 18:13
@MauriceMeyer you're right. Can you add this as an answer so I can accept it?
– Stepharr
Nov 21 '18 at 22:59
Your "sandwich" sounds backward. You.decode()
to Unicode when reading in data to a program for processing, then.encode()
to bytes to send it to a file or pipe. Databases "usually" can accept Unicode and are configured with an encoding that happens automatically when the database API puts the data in the database so you can skip the.encode()
step if that's the case.
– Mark Tolonen
Nov 22 '18 at 7:52
When you are printing out a dictionary, the internal representation is shown, which is UTF-8.
– Maurice Meyer
Nov 21 '18 at 18:13
When you are printing out a dictionary, the internal representation is shown, which is UTF-8.
– Maurice Meyer
Nov 21 '18 at 18:13
@MauriceMeyer you're right. Can you add this as an answer so I can accept it?
– Stepharr
Nov 21 '18 at 22:59
@MauriceMeyer you're right. Can you add this as an answer so I can accept it?
– Stepharr
Nov 21 '18 at 22:59
Your "sandwich" sounds backward. You
.decode()
to Unicode when reading in data to a program for processing, then .encode()
to bytes to send it to a file or pipe. Databases "usually" can accept Unicode and are configured with an encoding that happens automatically when the database API puts the data in the database so you can skip the .encode()
step if that's the case.– Mark Tolonen
Nov 22 '18 at 7:52
Your "sandwich" sounds backward. You
.decode()
to Unicode when reading in data to a program for processing, then .encode()
to bytes to send it to a file or pipe. Databases "usually" can accept Unicode and are configured with an encoding that happens automatically when the database API puts the data in the database so you can skip the .encode()
step if that's the case.– Mark Tolonen
Nov 22 '18 at 7:52
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53418170%2fwhy-does-my-decoded-windows-1252-string-show-up-as-a-unicode-value-in-a-dictiona%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53418170%2fwhy-does-my-decoded-windows-1252-string-show-up-as-a-unicode-value-in-a-dictiona%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
When you are printing out a dictionary, the internal representation is shown, which is UTF-8.
– Maurice Meyer
Nov 21 '18 at 18:13
@MauriceMeyer you're right. Can you add this as an answer so I can accept it?
– Stepharr
Nov 21 '18 at 22:59
Your "sandwich" sounds backward. You
.decode()
to Unicode when reading in data to a program for processing, then.encode()
to bytes to send it to a file or pipe. Databases "usually" can accept Unicode and are configured with an encoding that happens automatically when the database API puts the data in the database so you can skip the.encode()
step if that's the case.– Mark Tolonen
Nov 22 '18 at 7:52