Why does my decoded Windows-1252 string show up as a unicode value in a dictionary but not the value,...












0















In my application - following Ned Batchelder's recommendations of making a unicode sandwich - I first try to decode from Windows-1252 to UTF-8:



row[field] =row[field].decode('cp1252').encode('utf-8')


Later on, when I want to send my data to an endpoint I decode UTF-8:



row[field] = fld.decode('utf-8')


When I print just the field that has the offending Windows-1252 characters, it interprets them as such:



print row['dash']
# as well — ... “the intent was"


But when I try to print the entire dictionary I get unicode values:



print row
# as well xe2x80x93 ... xe2x80x9cthe intent wasxe2x80x9d


I want the wp-1252 characters themselves or equivalents such as the straight quotation mark instead of the left or right quotation mark.










share|improve this question























  • When you are printing out a dictionary, the internal representation is shown, which is UTF-8.

    – Maurice Meyer
    Nov 21 '18 at 18:13











  • @MauriceMeyer you're right. Can you add this as an answer so I can accept it?

    – Stepharr
    Nov 21 '18 at 22:59











  • Your "sandwich" sounds backward. You .decode() to Unicode when reading in data to a program for processing, then .encode() to bytes to send it to a file or pipe. Databases "usually" can accept Unicode and are configured with an encoding that happens automatically when the database API puts the data in the database so you can skip the .encode() step if that's the case.

    – Mark Tolonen
    Nov 22 '18 at 7:52


















0















In my application - following Ned Batchelder's recommendations of making a unicode sandwich - I first try to decode from Windows-1252 to UTF-8:



row[field] =row[field].decode('cp1252').encode('utf-8')


Later on, when I want to send my data to an endpoint I decode UTF-8:



row[field] = fld.decode('utf-8')


When I print just the field that has the offending Windows-1252 characters, it interprets them as such:



print row['dash']
# as well — ... “the intent was"


But when I try to print the entire dictionary I get unicode values:



print row
# as well xe2x80x93 ... xe2x80x9cthe intent wasxe2x80x9d


I want the wp-1252 characters themselves or equivalents such as the straight quotation mark instead of the left or right quotation mark.










share|improve this question























  • When you are printing out a dictionary, the internal representation is shown, which is UTF-8.

    – Maurice Meyer
    Nov 21 '18 at 18:13











  • @MauriceMeyer you're right. Can you add this as an answer so I can accept it?

    – Stepharr
    Nov 21 '18 at 22:59











  • Your "sandwich" sounds backward. You .decode() to Unicode when reading in data to a program for processing, then .encode() to bytes to send it to a file or pipe. Databases "usually" can accept Unicode and are configured with an encoding that happens automatically when the database API puts the data in the database so you can skip the .encode() step if that's the case.

    – Mark Tolonen
    Nov 22 '18 at 7:52
















0












0








0








In my application - following Ned Batchelder's recommendations of making a unicode sandwich - I first try to decode from Windows-1252 to UTF-8:



row[field] =row[field].decode('cp1252').encode('utf-8')


Later on, when I want to send my data to an endpoint I decode UTF-8:



row[field] = fld.decode('utf-8')


When I print just the field that has the offending Windows-1252 characters, it interprets them as such:



print row['dash']
# as well — ... “the intent was"


But when I try to print the entire dictionary I get unicode values:



print row
# as well xe2x80x93 ... xe2x80x9cthe intent wasxe2x80x9d


I want the wp-1252 characters themselves or equivalents such as the straight quotation mark instead of the left or right quotation mark.










share|improve this question














In my application - following Ned Batchelder's recommendations of making a unicode sandwich - I first try to decode from Windows-1252 to UTF-8:



row[field] =row[field].decode('cp1252').encode('utf-8')


Later on, when I want to send my data to an endpoint I decode UTF-8:



row[field] = fld.decode('utf-8')


When I print just the field that has the offending Windows-1252 characters, it interprets them as such:



print row['dash']
# as well — ... “the intent was"


But when I try to print the entire dictionary I get unicode values:



print row
# as well xe2x80x93 ... xe2x80x9cthe intent wasxe2x80x9d


I want the wp-1252 characters themselves or equivalents such as the straight quotation mark instead of the left or right quotation mark.







python python-2.7 utf-8 character-encoding cp1252






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 21 '18 at 18:09









StepharrStepharr

83




83













  • When you are printing out a dictionary, the internal representation is shown, which is UTF-8.

    – Maurice Meyer
    Nov 21 '18 at 18:13











  • @MauriceMeyer you're right. Can you add this as an answer so I can accept it?

    – Stepharr
    Nov 21 '18 at 22:59











  • Your "sandwich" sounds backward. You .decode() to Unicode when reading in data to a program for processing, then .encode() to bytes to send it to a file or pipe. Databases "usually" can accept Unicode and are configured with an encoding that happens automatically when the database API puts the data in the database so you can skip the .encode() step if that's the case.

    – Mark Tolonen
    Nov 22 '18 at 7:52





















  • When you are printing out a dictionary, the internal representation is shown, which is UTF-8.

    – Maurice Meyer
    Nov 21 '18 at 18:13











  • @MauriceMeyer you're right. Can you add this as an answer so I can accept it?

    – Stepharr
    Nov 21 '18 at 22:59











  • Your "sandwich" sounds backward. You .decode() to Unicode when reading in data to a program for processing, then .encode() to bytes to send it to a file or pipe. Databases "usually" can accept Unicode and are configured with an encoding that happens automatically when the database API puts the data in the database so you can skip the .encode() step if that's the case.

    – Mark Tolonen
    Nov 22 '18 at 7:52



















When you are printing out a dictionary, the internal representation is shown, which is UTF-8.

– Maurice Meyer
Nov 21 '18 at 18:13





When you are printing out a dictionary, the internal representation is shown, which is UTF-8.

– Maurice Meyer
Nov 21 '18 at 18:13













@MauriceMeyer you're right. Can you add this as an answer so I can accept it?

– Stepharr
Nov 21 '18 at 22:59





@MauriceMeyer you're right. Can you add this as an answer so I can accept it?

– Stepharr
Nov 21 '18 at 22:59













Your "sandwich" sounds backward. You .decode() to Unicode when reading in data to a program for processing, then .encode() to bytes to send it to a file or pipe. Databases "usually" can accept Unicode and are configured with an encoding that happens automatically when the database API puts the data in the database so you can skip the .encode() step if that's the case.

– Mark Tolonen
Nov 22 '18 at 7:52







Your "sandwich" sounds backward. You .decode() to Unicode when reading in data to a program for processing, then .encode() to bytes to send it to a file or pipe. Databases "usually" can accept Unicode and are configured with an encoding that happens automatically when the database API puts the data in the database so you can skip the .encode() step if that's the case.

– Mark Tolonen
Nov 22 '18 at 7:52














0






active

oldest

votes











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53418170%2fwhy-does-my-decoded-windows-1252-string-show-up-as-a-unicode-value-in-a-dictiona%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes
















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53418170%2fwhy-does-my-decoded-windows-1252-string-show-up-as-a-unicode-value-in-a-dictiona%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Costa Masnaga

Fotorealismo

Sidney Franklin