UnicodeDecodeError : position of the error
I often face the "UnicodeDecodeError" when I'm writing some calculating programs. It says for example :
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 57: invalid continuation byte
My question is : how can I locate which word in my code causes the error ? I have no idea how to know were this 'position 57' is situated. And by the way, what means 'invalid continuation byte' ?
Thanks for the answers.
PS : this error comes from this code where I try to apply the 4th order Runge-Kutta method to the Lorenz equations :
# Auteur : Bastien Massion
# NOMA : 13701700
# Date création fichier : 23 novembre 2018 18h24
# Date dernière modification : 23 novembre 2018 19h05
# Je jure que ceci est le fruit de mon travail personnel
from numpy import *
def lorfunction(t, u): # u = [u_0, u_1, u_2] = [x, y, z]
fx = 10*u[1] - 10*u[0]
fy = 28*u[0] - u[0]*u[2] - u[1]
fz = u[0]*u[1] - 8/3*u[2]
return [fx, fy, fz]
def lorenz(Tstart, Tend, Ustart, n):
T, h = linspace(Tstart,Tend, n+1, retstep = True)
U = zeros((n+1, 3))
U[0,:] = Ustart
for i in range(0, n):
Ka = lorfunction(T[i], U[i])
Kb = lorfunction(T[i] + h/2, U[i] + h/2*Ka)
Kc = lorfunction(T[i] + h/2, U[i] + h/2*Kb)
Kd = lorfunction(T[i] + h, U[i] + h*Kc)
U[i+1] = U[i] + h/6*(Ka + 2*Kb + 2*Kc + Kd)
return T,U
print (lorenz(0.0, 100.0, [0,1,0,], 10000))
python python-unicode
add a comment |
I often face the "UnicodeDecodeError" when I'm writing some calculating programs. It says for example :
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 57: invalid continuation byte
My question is : how can I locate which word in my code causes the error ? I have no idea how to know were this 'position 57' is situated. And by the way, what means 'invalid continuation byte' ?
Thanks for the answers.
PS : this error comes from this code where I try to apply the 4th order Runge-Kutta method to the Lorenz equations :
# Auteur : Bastien Massion
# NOMA : 13701700
# Date création fichier : 23 novembre 2018 18h24
# Date dernière modification : 23 novembre 2018 19h05
# Je jure que ceci est le fruit de mon travail personnel
from numpy import *
def lorfunction(t, u): # u = [u_0, u_1, u_2] = [x, y, z]
fx = 10*u[1] - 10*u[0]
fy = 28*u[0] - u[0]*u[2] - u[1]
fz = u[0]*u[1] - 8/3*u[2]
return [fx, fy, fz]
def lorenz(Tstart, Tend, Ustart, n):
T, h = linspace(Tstart,Tend, n+1, retstep = True)
U = zeros((n+1, 3))
U[0,:] = Ustart
for i in range(0, n):
Ka = lorfunction(T[i], U[i])
Kb = lorfunction(T[i] + h/2, U[i] + h/2*Ka)
Kc = lorfunction(T[i] + h/2, U[i] + h/2*Kb)
Kd = lorfunction(T[i] + h, U[i] + h*Kc)
U[i+1] = U[i] + h/6*(Ka + 2*Kb + 2*Kc + Kd)
return T,U
print (lorenz(0.0, 100.0, [0,1,0,], 10000))
python python-unicode
.. Interestingly, while you can catch aUnicodeError
and parse itsstr
result for the position, that is not of much use. I triedrewind
and then a loop usingreadline
to get to the erroneous one, butreadline
is heavily buffered and no amount of tinkering made it read one line at a time on my system. Anyone else?
– usr2564301
Nov 24 '18 at 15:10
(Saving your text sample as Latin-1 and then opening and reading this with the default UTF-8 encoding is enough to test with.)
– usr2564301
Nov 24 '18 at 15:10
(I must be in Random Mode today.) The proper way to avoid aUnicodeError
is, of course, to make sure you open a text file with its properencoding='...'
. But your question is still valid when this proper encoding by all rights should have been UTF8 and thus actually contains an invalid byte.
– usr2564301
Nov 24 '18 at 15:14
add a comment |
I often face the "UnicodeDecodeError" when I'm writing some calculating programs. It says for example :
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 57: invalid continuation byte
My question is : how can I locate which word in my code causes the error ? I have no idea how to know were this 'position 57' is situated. And by the way, what means 'invalid continuation byte' ?
Thanks for the answers.
PS : this error comes from this code where I try to apply the 4th order Runge-Kutta method to the Lorenz equations :
# Auteur : Bastien Massion
# NOMA : 13701700
# Date création fichier : 23 novembre 2018 18h24
# Date dernière modification : 23 novembre 2018 19h05
# Je jure que ceci est le fruit de mon travail personnel
from numpy import *
def lorfunction(t, u): # u = [u_0, u_1, u_2] = [x, y, z]
fx = 10*u[1] - 10*u[0]
fy = 28*u[0] - u[0]*u[2] - u[1]
fz = u[0]*u[1] - 8/3*u[2]
return [fx, fy, fz]
def lorenz(Tstart, Tend, Ustart, n):
T, h = linspace(Tstart,Tend, n+1, retstep = True)
U = zeros((n+1, 3))
U[0,:] = Ustart
for i in range(0, n):
Ka = lorfunction(T[i], U[i])
Kb = lorfunction(T[i] + h/2, U[i] + h/2*Ka)
Kc = lorfunction(T[i] + h/2, U[i] + h/2*Kb)
Kd = lorfunction(T[i] + h, U[i] + h*Kc)
U[i+1] = U[i] + h/6*(Ka + 2*Kb + 2*Kc + Kd)
return T,U
print (lorenz(0.0, 100.0, [0,1,0,], 10000))
python python-unicode
I often face the "UnicodeDecodeError" when I'm writing some calculating programs. It says for example :
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 57: invalid continuation byte
My question is : how can I locate which word in my code causes the error ? I have no idea how to know were this 'position 57' is situated. And by the way, what means 'invalid continuation byte' ?
Thanks for the answers.
PS : this error comes from this code where I try to apply the 4th order Runge-Kutta method to the Lorenz equations :
# Auteur : Bastien Massion
# NOMA : 13701700
# Date création fichier : 23 novembre 2018 18h24
# Date dernière modification : 23 novembre 2018 19h05
# Je jure que ceci est le fruit de mon travail personnel
from numpy import *
def lorfunction(t, u): # u = [u_0, u_1, u_2] = [x, y, z]
fx = 10*u[1] - 10*u[0]
fy = 28*u[0] - u[0]*u[2] - u[1]
fz = u[0]*u[1] - 8/3*u[2]
return [fx, fy, fz]
def lorenz(Tstart, Tend, Ustart, n):
T, h = linspace(Tstart,Tend, n+1, retstep = True)
U = zeros((n+1, 3))
U[0,:] = Ustart
for i in range(0, n):
Ka = lorfunction(T[i], U[i])
Kb = lorfunction(T[i] + h/2, U[i] + h/2*Ka)
Kc = lorfunction(T[i] + h/2, U[i] + h/2*Kb)
Kd = lorfunction(T[i] + h, U[i] + h*Kc)
U[i+1] = U[i] + h/6*(Ka + 2*Kb + 2*Kc + Kd)
return T,U
print (lorenz(0.0, 100.0, [0,1,0,], 10000))
python python-unicode
python python-unicode
edited Nov 24 '18 at 14:49
usr2564301
17.8k73370
17.8k73370
asked Nov 24 '18 at 13:58
Bastien MassionBastien Massion
61
61
.. Interestingly, while you can catch aUnicodeError
and parse itsstr
result for the position, that is not of much use. I triedrewind
and then a loop usingreadline
to get to the erroneous one, butreadline
is heavily buffered and no amount of tinkering made it read one line at a time on my system. Anyone else?
– usr2564301
Nov 24 '18 at 15:10
(Saving your text sample as Latin-1 and then opening and reading this with the default UTF-8 encoding is enough to test with.)
– usr2564301
Nov 24 '18 at 15:10
(I must be in Random Mode today.) The proper way to avoid aUnicodeError
is, of course, to make sure you open a text file with its properencoding='...'
. But your question is still valid when this proper encoding by all rights should have been UTF8 and thus actually contains an invalid byte.
– usr2564301
Nov 24 '18 at 15:14
add a comment |
.. Interestingly, while you can catch aUnicodeError
and parse itsstr
result for the position, that is not of much use. I triedrewind
and then a loop usingreadline
to get to the erroneous one, butreadline
is heavily buffered and no amount of tinkering made it read one line at a time on my system. Anyone else?
– usr2564301
Nov 24 '18 at 15:10
(Saving your text sample as Latin-1 and then opening and reading this with the default UTF-8 encoding is enough to test with.)
– usr2564301
Nov 24 '18 at 15:10
(I must be in Random Mode today.) The proper way to avoid aUnicodeError
is, of course, to make sure you open a text file with its properencoding='...'
. But your question is still valid when this proper encoding by all rights should have been UTF8 and thus actually contains an invalid byte.
– usr2564301
Nov 24 '18 at 15:14
.. Interestingly, while you can catch a
UnicodeError
and parse its str
result for the position, that is not of much use. I tried rewind
and then a loop using readline
to get to the erroneous one, but readline
is heavily buffered and no amount of tinkering made it read one line at a time on my system. Anyone else?– usr2564301
Nov 24 '18 at 15:10
.. Interestingly, while you can catch a
UnicodeError
and parse its str
result for the position, that is not of much use. I tried rewind
and then a loop using readline
to get to the erroneous one, but readline
is heavily buffered and no amount of tinkering made it read one line at a time on my system. Anyone else?– usr2564301
Nov 24 '18 at 15:10
(Saving your text sample as Latin-1 and then opening and reading this with the default UTF-8 encoding is enough to test with.)
– usr2564301
Nov 24 '18 at 15:10
(Saving your text sample as Latin-1 and then opening and reading this with the default UTF-8 encoding is enough to test with.)
– usr2564301
Nov 24 '18 at 15:10
(I must be in Random Mode today.) The proper way to avoid a
UnicodeError
is, of course, to make sure you open a text file with its proper encoding='...'
. But your question is still valid when this proper encoding by all rights should have been UTF8 and thus actually contains an invalid byte.– usr2564301
Nov 24 '18 at 15:14
(I must be in Random Mode today.) The proper way to avoid a
UnicodeError
is, of course, to make sure you open a text file with its proper encoding='...'
. But your question is still valid when this proper encoding by all rights should have been UTF8 and thus actually contains an invalid byte.– usr2564301
Nov 24 '18 at 15:14
add a comment |
1 Answer
1
active
oldest
votes
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 57: invalid continuation byte
The 57
is the byte position of complaint in your Python source file.
In your case it seems to be the position of the é
in # Date création fichier
.
0xe9
is the byte at that position. I recognize it as the ISO-8859-1
(a.k.a. ISO-Latin-1) representation of the character é
.
So it seems your Python source file is actually encoded in ISO-8859-1,
but the Python interpreter for some reason assumes it to be encoded in UTF-8.
By the way:
In UTF-8 characters above 128 are encoded by 2 or more bytes,
the first being called the start byte, the others called continuation bytes.
For more explanation see the UTF-8 examples.
To understand the error I need to elaborate more.
Consider the bytes 0xe9 0x61 0x74
, as occuring in your Python file:
Decoding the 3 bytes as ISO-8859-1 (being a single-byte encoding) would result in 3 characters: éat
.
Decoding the same bytes as UTF-8 is more complicated.
The byte 0xe9
(because beginning with the bits 1110
) is a starting byte
to be followed by 2 continuation bytes.
Each continuation byte needs to begin with the bits 10
,
But the next 2 bytes (0x61 0x74
) violate this condition.
Thus, a UnicodeError saying invalid continuation byte is thrown.
To avoid this kind of problem you have some alternative options:
Keep your Python source encoded in ISO-8859-1
and add the line
# -*- coding:iso-8859-1 -*-
at the beginning of the file
as described in PEP 263 -- Defining Python Source Code Encodings.
- Save your Python source in UTF-8.
and rely on UTF-8 being the default source-encoding of your Python-interpreter.
Save your Python source in UTF-8.
and add the line
# -*- coding:utf-8 -*-
I would prefer the first or third option.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53458883%2funicodedecodeerror-position-of-the-error%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 57: invalid continuation byte
The 57
is the byte position of complaint in your Python source file.
In your case it seems to be the position of the é
in # Date création fichier
.
0xe9
is the byte at that position. I recognize it as the ISO-8859-1
(a.k.a. ISO-Latin-1) representation of the character é
.
So it seems your Python source file is actually encoded in ISO-8859-1,
but the Python interpreter for some reason assumes it to be encoded in UTF-8.
By the way:
In UTF-8 characters above 128 are encoded by 2 or more bytes,
the first being called the start byte, the others called continuation bytes.
For more explanation see the UTF-8 examples.
To understand the error I need to elaborate more.
Consider the bytes 0xe9 0x61 0x74
, as occuring in your Python file:
Decoding the 3 bytes as ISO-8859-1 (being a single-byte encoding) would result in 3 characters: éat
.
Decoding the same bytes as UTF-8 is more complicated.
The byte 0xe9
(because beginning with the bits 1110
) is a starting byte
to be followed by 2 continuation bytes.
Each continuation byte needs to begin with the bits 10
,
But the next 2 bytes (0x61 0x74
) violate this condition.
Thus, a UnicodeError saying invalid continuation byte is thrown.
To avoid this kind of problem you have some alternative options:
Keep your Python source encoded in ISO-8859-1
and add the line
# -*- coding:iso-8859-1 -*-
at the beginning of the file
as described in PEP 263 -- Defining Python Source Code Encodings.
- Save your Python source in UTF-8.
and rely on UTF-8 being the default source-encoding of your Python-interpreter.
Save your Python source in UTF-8.
and add the line
# -*- coding:utf-8 -*-
I would prefer the first or third option.
add a comment |
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 57: invalid continuation byte
The 57
is the byte position of complaint in your Python source file.
In your case it seems to be the position of the é
in # Date création fichier
.
0xe9
is the byte at that position. I recognize it as the ISO-8859-1
(a.k.a. ISO-Latin-1) representation of the character é
.
So it seems your Python source file is actually encoded in ISO-8859-1,
but the Python interpreter for some reason assumes it to be encoded in UTF-8.
By the way:
In UTF-8 characters above 128 are encoded by 2 or more bytes,
the first being called the start byte, the others called continuation bytes.
For more explanation see the UTF-8 examples.
To understand the error I need to elaborate more.
Consider the bytes 0xe9 0x61 0x74
, as occuring in your Python file:
Decoding the 3 bytes as ISO-8859-1 (being a single-byte encoding) would result in 3 characters: éat
.
Decoding the same bytes as UTF-8 is more complicated.
The byte 0xe9
(because beginning with the bits 1110
) is a starting byte
to be followed by 2 continuation bytes.
Each continuation byte needs to begin with the bits 10
,
But the next 2 bytes (0x61 0x74
) violate this condition.
Thus, a UnicodeError saying invalid continuation byte is thrown.
To avoid this kind of problem you have some alternative options:
Keep your Python source encoded in ISO-8859-1
and add the line
# -*- coding:iso-8859-1 -*-
at the beginning of the file
as described in PEP 263 -- Defining Python Source Code Encodings.
- Save your Python source in UTF-8.
and rely on UTF-8 being the default source-encoding of your Python-interpreter.
Save your Python source in UTF-8.
and add the line
# -*- coding:utf-8 -*-
I would prefer the first or third option.
add a comment |
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 57: invalid continuation byte
The 57
is the byte position of complaint in your Python source file.
In your case it seems to be the position of the é
in # Date création fichier
.
0xe9
is the byte at that position. I recognize it as the ISO-8859-1
(a.k.a. ISO-Latin-1) representation of the character é
.
So it seems your Python source file is actually encoded in ISO-8859-1,
but the Python interpreter for some reason assumes it to be encoded in UTF-8.
By the way:
In UTF-8 characters above 128 are encoded by 2 or more bytes,
the first being called the start byte, the others called continuation bytes.
For more explanation see the UTF-8 examples.
To understand the error I need to elaborate more.
Consider the bytes 0xe9 0x61 0x74
, as occuring in your Python file:
Decoding the 3 bytes as ISO-8859-1 (being a single-byte encoding) would result in 3 characters: éat
.
Decoding the same bytes as UTF-8 is more complicated.
The byte 0xe9
(because beginning with the bits 1110
) is a starting byte
to be followed by 2 continuation bytes.
Each continuation byte needs to begin with the bits 10
,
But the next 2 bytes (0x61 0x74
) violate this condition.
Thus, a UnicodeError saying invalid continuation byte is thrown.
To avoid this kind of problem you have some alternative options:
Keep your Python source encoded in ISO-8859-1
and add the line
# -*- coding:iso-8859-1 -*-
at the beginning of the file
as described in PEP 263 -- Defining Python Source Code Encodings.
- Save your Python source in UTF-8.
and rely on UTF-8 being the default source-encoding of your Python-interpreter.
Save your Python source in UTF-8.
and add the line
# -*- coding:utf-8 -*-
I would prefer the first or third option.
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 57: invalid continuation byte
The 57
is the byte position of complaint in your Python source file.
In your case it seems to be the position of the é
in # Date création fichier
.
0xe9
is the byte at that position. I recognize it as the ISO-8859-1
(a.k.a. ISO-Latin-1) representation of the character é
.
So it seems your Python source file is actually encoded in ISO-8859-1,
but the Python interpreter for some reason assumes it to be encoded in UTF-8.
By the way:
In UTF-8 characters above 128 are encoded by 2 or more bytes,
the first being called the start byte, the others called continuation bytes.
For more explanation see the UTF-8 examples.
To understand the error I need to elaborate more.
Consider the bytes 0xe9 0x61 0x74
, as occuring in your Python file:
Decoding the 3 bytes as ISO-8859-1 (being a single-byte encoding) would result in 3 characters: éat
.
Decoding the same bytes as UTF-8 is more complicated.
The byte 0xe9
(because beginning with the bits 1110
) is a starting byte
to be followed by 2 continuation bytes.
Each continuation byte needs to begin with the bits 10
,
But the next 2 bytes (0x61 0x74
) violate this condition.
Thus, a UnicodeError saying invalid continuation byte is thrown.
To avoid this kind of problem you have some alternative options:
Keep your Python source encoded in ISO-8859-1
and add the line
# -*- coding:iso-8859-1 -*-
at the beginning of the file
as described in PEP 263 -- Defining Python Source Code Encodings.
- Save your Python source in UTF-8.
and rely on UTF-8 being the default source-encoding of your Python-interpreter.
Save your Python source in UTF-8.
and add the line
# -*- coding:utf-8 -*-
I would prefer the first or third option.
edited Nov 24 '18 at 21:50
answered Nov 24 '18 at 14:46
Thomas FritschThomas Fritsch
5,361122133
5,361122133
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53458883%2funicodedecodeerror-position-of-the-error%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
.. Interestingly, while you can catch a
UnicodeError
and parse itsstr
result for the position, that is not of much use. I triedrewind
and then a loop usingreadline
to get to the erroneous one, butreadline
is heavily buffered and no amount of tinkering made it read one line at a time on my system. Anyone else?– usr2564301
Nov 24 '18 at 15:10
(Saving your text sample as Latin-1 and then opening and reading this with the default UTF-8 encoding is enough to test with.)
– usr2564301
Nov 24 '18 at 15:10
(I must be in Random Mode today.) The proper way to avoid a
UnicodeError
is, of course, to make sure you open a text file with its properencoding='...'
. But your question is still valid when this proper encoding by all rights should have been UTF8 and thus actually contains an invalid byte.– usr2564301
Nov 24 '18 at 15:14