Why does my decoded Windows-1252 string show up as a unicode value in a dictionary but not the value,...

In my application - following Ned Batchelder's recommendations of making a unicode sandwich - I first try to decode from Windows-1252 to UTF-8:

row[field] =row[field].decode('cp1252').encode('utf-8')

Later on, when I want to send my data to an endpoint I decode UTF-8:

row[field] = fld.decode('utf-8')

When I print just the field that has the offending Windows-1252 characters, it interprets them as such:

print row['dash']

# as well — ... “the intent was"

But when I try to print the entire dictionary I get unicode values:

print row

# as well xe2x80x93 ... xe2x80x9cthe intent wasxe2x80x9d

I want the wp-1252 characters themselves or equivalents such as the straight quotation mark instead of the left or right quotation mark.

asked Nov 21 '18 at 18:09

Stepharr

When you are printing out a dictionary, the internal representation is shown, which is UTF-8.

– Maurice Meyer
Nov 21 '18 at 18:13

@MauriceMeyer you're right. Can you add this as an answer so I can accept it?

– Stepharr
Nov 21 '18 at 22:59

Your "sandwich" sounds backward. You .decode() to Unicode when reading in data to a program for processing, then .encode() to bytes to send it to a file or pipe. Databases "usually" can accept Unicode and are configured with an encoding that happens automatically when the database API puts the data in the database so you can skip the .encode() step if that's the case.

– Mark Tolonen
Nov 22 '18 at 7:52

add a comment |

In my application - following Ned Batchelder's recommendations of making a unicode sandwich - I first try to decode from Windows-1252 to UTF-8:

row[field] =row[field].decode('cp1252').encode('utf-8')

Later on, when I want to send my data to an endpoint I decode UTF-8:

row[field] = fld.decode('utf-8')

When I print just the field that has the offending Windows-1252 characters, it interprets them as such:

print row['dash']

# as well — ... “the intent was"

But when I try to print the entire dictionary I get unicode values:

print row

# as well xe2x80x93 ... xe2x80x9cthe intent wasxe2x80x9d

I want the wp-1252 characters themselves or equivalents such as the straight quotation mark instead of the left or right quotation mark.

asked Nov 21 '18 at 18:09

Stepharr

When you are printing out a dictionary, the internal representation is shown, which is UTF-8.

– Maurice Meyer
Nov 21 '18 at 18:13

@MauriceMeyer you're right. Can you add this as an answer so I can accept it?

– Stepharr
Nov 21 '18 at 22:59

Your "sandwich" sounds backward. You .decode() to Unicode when reading in data to a program for processing, then .encode() to bytes to send it to a file or pipe. Databases "usually" can accept Unicode and are configured with an encoding that happens automatically when the database API puts the data in the database so you can skip the .encode() step if that's the case.

– Mark Tolonen
Nov 22 '18 at 7:52

add a comment |

In my application - following Ned Batchelder's recommendations of making a unicode sandwich - I first try to decode from Windows-1252 to UTF-8:

row[field] =row[field].decode('cp1252').encode('utf-8')

Later on, when I want to send my data to an endpoint I decode UTF-8:

row[field] = fld.decode('utf-8')

When I print just the field that has the offending Windows-1252 characters, it interprets them as such:

print row['dash']

# as well — ... “the intent was"

But when I try to print the entire dictionary I get unicode values:

print row

# as well xe2x80x93 ... xe2x80x9cthe intent wasxe2x80x9d

I want the wp-1252 characters themselves or equivalents such as the straight quotation mark instead of the left or right quotation mark.

asked Nov 21 '18 at 18:09

Stepharr

In my application - following Ned Batchelder's recommendations of making a unicode sandwich - I first try to decode from Windows-1252 to UTF-8:

row[field] =row[field].decode('cp1252').encode('utf-8')

Later on, when I want to send my data to an endpoint I decode UTF-8:

row[field] = fld.decode('utf-8')

When I print just the field that has the offending Windows-1252 characters, it interprets them as such:

print row['dash']

# as well — ... “the intent was"

But when I try to print the entire dictionary I get unicode values:

print row

# as well xe2x80x93 ... xe2x80x9cthe intent wasxe2x80x9d

I want the wp-1252 characters themselves or equivalents such as the straight quotation mark instead of the left or right quotation mark.

python python-2.7 utf-8 character-encoding cp1252

asked Nov 21 '18 at 18:09

Stepharr

asked Nov 21 '18 at 18:09

Stepharr

asked Nov 21 '18 at 18:09

Stepharr

asked Nov 21 '18 at 18:09

Stepharr

asked Nov 21 '18 at 18:09

Stepharr

When you are printing out a dictionary, the internal representation is shown, which is UTF-8.

– Maurice Meyer
Nov 21 '18 at 18:13

@MauriceMeyer you're right. Can you add this as an answer so I can accept it?

– Stepharr
Nov 21 '18 at 22:59

Your "sandwich" sounds backward. You .decode() to Unicode when reading in data to a program for processing, then .encode() to bytes to send it to a file or pipe. Databases "usually" can accept Unicode and are configured with an encoding that happens automatically when the database API puts the data in the database so you can skip the .encode() step if that's the case.

– Mark Tolonen
Nov 22 '18 at 7:52

add a comment |

When you are printing out a dictionary, the internal representation is shown, which is UTF-8.

– Maurice Meyer
Nov 21 '18 at 18:13

@MauriceMeyer you're right. Can you add this as an answer so I can accept it?

– Stepharr
Nov 21 '18 at 22:59

Your "sandwich" sounds backward. You .decode() to Unicode when reading in data to a program for processing, then .encode() to bytes to send it to a file or pipe. Databases "usually" can accept Unicode and are configured with an encoding that happens automatically when the database API puts the data in the database so you can skip the .encode() step if that's the case.

– Mark Tolonen
Nov 22 '18 at 7:52

When you are printing out a dictionary, the internal representation is shown, which is UTF-8.

– Maurice Meyer
Nov 21 '18 at 18:13

@MauriceMeyer you're right. Can you add this as an answer so I can accept it?

– Stepharr
Nov 21 '18 at 22:59

Your "sandwich" sounds backward. You .decode() to Unicode when reading in data to a program for processing, then .encode() to bytes to send it to a file or pipe. Databases "usually" can accept Unicode and are configured with an encoding that happens automatically when the database API puts the data in the database so you can skip the .encode() step if that's the case.

– Mark Tolonen
Nov 22 '18 at 7:52

add a comment |

0

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53418170%2fwhy-does-my-decoded-windows-1252-string-show-up-as-a-unicode-value-in-a-dictiona%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

0

active

oldest

votes

0

active

oldest

votes

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Nsryjdtyk