UnicodeDecodeError : position of the error

I often face the "UnicodeDecodeError" when I'm writing some calculating programs. It says for example :

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 57: invalid continuation byte

My question is : how can I locate which word in my code causes the error ? I have no idea how to know were this 'position 57' is situated. And by the way, what means 'invalid continuation byte' ?

Thanks for the answers.

PS : this error comes from this code where I try to apply the 4th order Runge-Kutta method to the Lorenz equations :

# Auteur : Bastien Massion 

# NOMA : 13701700

# Date création fichier : 23 novembre 2018 18h24

# Date dernière modification : 23 novembre 2018 19h05



# Je jure que ceci est le fruit de mon travail personnel



from numpy import *



def lorfunction(t, u): # u = [u_0, u_1, u_2] = [x, y, z]

    fx = 10*u[1] - 10*u[0]

    fy = 28*u[0] - u[0]*u[2] - u[1]

    fz = u[0]*u[1] - 8/3*u[2]

    return [fx, fy, fz]



def lorenz(Tstart, Tend, Ustart, n):

    T, h = linspace(Tstart,Tend, n+1, retstep = True)

    U = zeros((n+1, 3))

    U[0,:] = Ustart



    for i in range(0, n):

        Ka = lorfunction(T[i], U[i])

        Kb = lorfunction(T[i] + h/2, U[i] + h/2*Ka)

        Kc = lorfunction(T[i] + h/2, U[i] + h/2*Kb)

        Kd = lorfunction(T[i] + h, U[i] + h*Kc)

        U[i+1] = U[i] + h/6*(Ka + 2*Kb + 2*Kc + Kd)



    return T,U



print (lorenz(0.0, 100.0, [0,1,0,], 10000))

edited Nov 24 '18 at 14:49

usr2564301

17.8k73370

asked Nov 24 '18 at 13:58

Bastien Massion

.. Interestingly, while you can catch a UnicodeError and parse its str result for the position, that is not of much use. I tried rewind and then a loop using readline to get to the erroneous one, but readline is heavily buffered and no amount of tinkering made it read one line at a time on my system. Anyone else?

– usr2564301
Nov 24 '18 at 15:10

(Saving your text sample as Latin-1 and then opening and reading this with the default UTF-8 encoding is enough to test with.)

– usr2564301
Nov 24 '18 at 15:10

(I must be in Random Mode today.) The proper way to avoid a UnicodeError is, of course, to make sure you open a text file with its proper encoding='...'. But your question is still valid when this proper encoding by all rights should have been UTF8 and thus actually contains an invalid byte.

– usr2564301
Nov 24 '18 at 15:14

add a comment |

I often face the "UnicodeDecodeError" when I'm writing some calculating programs. It says for example :

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 57: invalid continuation byte

My question is : how can I locate which word in my code causes the error ? I have no idea how to know were this 'position 57' is situated. And by the way, what means 'invalid continuation byte' ?

Thanks for the answers.

PS : this error comes from this code where I try to apply the 4th order Runge-Kutta method to the Lorenz equations :

# Auteur : Bastien Massion 

# NOMA : 13701700

# Date création fichier : 23 novembre 2018 18h24

# Date dernière modification : 23 novembre 2018 19h05



# Je jure que ceci est le fruit de mon travail personnel



from numpy import *



def lorfunction(t, u): # u = [u_0, u_1, u_2] = [x, y, z]

    fx = 10*u[1] - 10*u[0]

    fy = 28*u[0] - u[0]*u[2] - u[1]

    fz = u[0]*u[1] - 8/3*u[2]

    return [fx, fy, fz]



def lorenz(Tstart, Tend, Ustart, n):

    T, h = linspace(Tstart,Tend, n+1, retstep = True)

    U = zeros((n+1, 3))

    U[0,:] = Ustart



    for i in range(0, n):

        Ka = lorfunction(T[i], U[i])

        Kb = lorfunction(T[i] + h/2, U[i] + h/2*Ka)

        Kc = lorfunction(T[i] + h/2, U[i] + h/2*Kb)

        Kd = lorfunction(T[i] + h, U[i] + h*Kc)

        U[i+1] = U[i] + h/6*(Ka + 2*Kb + 2*Kc + Kd)



    return T,U



print (lorenz(0.0, 100.0, [0,1,0,], 10000))

edited Nov 24 '18 at 14:49

usr2564301

17.8k73370

asked Nov 24 '18 at 13:58

Bastien Massion

.. Interestingly, while you can catch a UnicodeError and parse its str result for the position, that is not of much use. I tried rewind and then a loop using readline to get to the erroneous one, but readline is heavily buffered and no amount of tinkering made it read one line at a time on my system. Anyone else?

– usr2564301
Nov 24 '18 at 15:10

(Saving your text sample as Latin-1 and then opening and reading this with the default UTF-8 encoding is enough to test with.)

– usr2564301
Nov 24 '18 at 15:10

(I must be in Random Mode today.) The proper way to avoid a UnicodeError is, of course, to make sure you open a text file with its proper encoding='...'. But your question is still valid when this proper encoding by all rights should have been UTF8 and thus actually contains an invalid byte.

– usr2564301
Nov 24 '18 at 15:14

add a comment |

I often face the "UnicodeDecodeError" when I'm writing some calculating programs. It says for example :

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 57: invalid continuation byte

My question is : how can I locate which word in my code causes the error ? I have no idea how to know were this 'position 57' is situated. And by the way, what means 'invalid continuation byte' ?

Thanks for the answers.

PS : this error comes from this code where I try to apply the 4th order Runge-Kutta method to the Lorenz equations :

# Auteur : Bastien Massion 

# NOMA : 13701700

# Date création fichier : 23 novembre 2018 18h24

# Date dernière modification : 23 novembre 2018 19h05



# Je jure que ceci est le fruit de mon travail personnel



from numpy import *



def lorfunction(t, u): # u = [u_0, u_1, u_2] = [x, y, z]

    fx = 10*u[1] - 10*u[0]

    fy = 28*u[0] - u[0]*u[2] - u[1]

    fz = u[0]*u[1] - 8/3*u[2]

    return [fx, fy, fz]



def lorenz(Tstart, Tend, Ustart, n):

    T, h = linspace(Tstart,Tend, n+1, retstep = True)

    U = zeros((n+1, 3))

    U[0,:] = Ustart



    for i in range(0, n):

        Ka = lorfunction(T[i], U[i])

        Kb = lorfunction(T[i] + h/2, U[i] + h/2*Ka)

        Kc = lorfunction(T[i] + h/2, U[i] + h/2*Kb)

        Kd = lorfunction(T[i] + h, U[i] + h*Kc)

        U[i+1] = U[i] + h/6*(Ka + 2*Kb + 2*Kc + Kd)



    return T,U



print (lorenz(0.0, 100.0, [0,1,0,], 10000))

edited Nov 24 '18 at 14:49

usr2564301

17.8k73370

asked Nov 24 '18 at 13:58

Bastien Massion

I often face the "UnicodeDecodeError" when I'm writing some calculating programs. It says for example :

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 57: invalid continuation byte

My question is : how can I locate which word in my code causes the error ? I have no idea how to know were this 'position 57' is situated. And by the way, what means 'invalid continuation byte' ?

Thanks for the answers.

PS : this error comes from this code where I try to apply the 4th order Runge-Kutta method to the Lorenz equations :

# Auteur : Bastien Massion 

# NOMA : 13701700

# Date création fichier : 23 novembre 2018 18h24

# Date dernière modification : 23 novembre 2018 19h05



# Je jure que ceci est le fruit de mon travail personnel



from numpy import *



def lorfunction(t, u): # u = [u_0, u_1, u_2] = [x, y, z]

    fx = 10*u[1] - 10*u[0]

    fy = 28*u[0] - u[0]*u[2] - u[1]

    fz = u[0]*u[1] - 8/3*u[2]

    return [fx, fy, fz]



def lorenz(Tstart, Tend, Ustart, n):

    T, h = linspace(Tstart,Tend, n+1, retstep = True)

    U = zeros((n+1, 3))

    U[0,:] = Ustart



    for i in range(0, n):

        Ka = lorfunction(T[i], U[i])

        Kb = lorfunction(T[i] + h/2, U[i] + h/2*Ka)

        Kc = lorfunction(T[i] + h/2, U[i] + h/2*Kb)

        Kd = lorfunction(T[i] + h, U[i] + h*Kc)

        U[i+1] = U[i] + h/6*(Ka + 2*Kb + 2*Kc + Kd)



    return T,U



print (lorenz(0.0, 100.0, [0,1,0,], 10000))

python python-unicode

edited Nov 24 '18 at 14:49

usr2564301

17.8k73370

asked Nov 24 '18 at 13:58

Bastien Massion

edited Nov 24 '18 at 14:49

usr2564301

17.8k73370

asked Nov 24 '18 at 13:58

Bastien Massion

edited Nov 24 '18 at 14:49

usr2564301

17.8k73370

edited Nov 24 '18 at 14:49

usr2564301

17.8k73370

edited Nov 24 '18 at 14:49

usr2564301

17.8k73370

asked Nov 24 '18 at 13:58

Bastien Massion

asked Nov 24 '18 at 13:58

Bastien Massion

asked Nov 24 '18 at 13:58

Bastien Massion

.. Interestingly, while you can catch a UnicodeError and parse its str result for the position, that is not of much use. I tried rewind and then a loop using readline to get to the erroneous one, but readline is heavily buffered and no amount of tinkering made it read one line at a time on my system. Anyone else?

– usr2564301
Nov 24 '18 at 15:10

(Saving your text sample as Latin-1 and then opening and reading this with the default UTF-8 encoding is enough to test with.)

– usr2564301
Nov 24 '18 at 15:10

(I must be in Random Mode today.) The proper way to avoid a UnicodeError is, of course, to make sure you open a text file with its proper encoding='...'. But your question is still valid when this proper encoding by all rights should have been UTF8 and thus actually contains an invalid byte.

– usr2564301
Nov 24 '18 at 15:14

add a comment |

.. Interestingly, while you can catch a UnicodeError and parse its str result for the position, that is not of much use. I tried rewind and then a loop using readline to get to the erroneous one, but readline is heavily buffered and no amount of tinkering made it read one line at a time on my system. Anyone else?

– usr2564301
Nov 24 '18 at 15:10

(Saving your text sample as Latin-1 and then opening and reading this with the default UTF-8 encoding is enough to test with.)

– usr2564301
Nov 24 '18 at 15:10

(I must be in Random Mode today.) The proper way to avoid a UnicodeError is, of course, to make sure you open a text file with its proper encoding='...'. But your question is still valid when this proper encoding by all rights should have been UTF8 and thus actually contains an invalid byte.

– usr2564301
Nov 24 '18 at 15:14

.. Interestingly, while you can catch a UnicodeError and parse its str result for the position, that is not of much use. I tried rewind and then a loop using readline to get to the erroneous one, but readline is heavily buffered and no amount of tinkering made it read one line at a time on my system. Anyone else?

– usr2564301
Nov 24 '18 at 15:10

(Saving your text sample as Latin-1 and then opening and reading this with the default UTF-8 encoding is enough to test with.)

– usr2564301
Nov 24 '18 at 15:10

(I must be in Random Mode today.) The proper way to avoid a UnicodeError is, of course, to make sure you open a text file with its proper encoding='...'. But your question is still valid when this proper encoding by all rights should have been UTF8 and thus actually contains an invalid byte.

– usr2564301
Nov 24 '18 at 15:14

add a comment |

1 Answer
1

active

oldest

votes

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 57: invalid continuation byte

The 57 is the byte position of complaint in your Python source file.
In your case it seems to be the position of the é in # Date création fichier.

0xe9 is the byte at that position. I recognize it as the ISO-8859-1
(a.k.a. ISO-Latin-1) representation of the character é.

So it seems your Python source file is actually encoded in ISO-8859-1,
but the Python interpreter for some reason assumes it to be encoded in UTF-8.

By the way:

In UTF-8 characters above 128 are encoded by 2 or more bytes,
the first being called the start byte, the others called continuation bytes.
For more explanation see the UTF-8 examples.

To understand the error I need to elaborate more.
Consider the bytes 0xe9 0x61 0x74, as occuring in your Python file:

Decoding the 3 bytes as ISO-8859-1 (being a single-byte encoding) would result in 3 characters: éat.

Decoding the same bytes as UTF-8 is more complicated.
The byte 0xe9 (because beginning with the bits 1110) is a starting byte
to be followed by 2 continuation bytes.
Each continuation byte needs to begin with the bits 10,
But the next 2 bytes (0x61 0x74) violate this condition.
Thus, a UnicodeError saying invalid continuation byte is thrown.

To avoid this kind of problem you have some alternative options:

Keep your Python source encoded in ISO-8859-1
and add the line
```
 # -*- coding:iso-8859-1 -*-
```
at the beginning of the file
as described in PEP 263 -- Defining Python Source Code Encodings.

Save your Python source in UTF-8.
and rely on UTF-8 being the default source-encoding of your Python-interpreter.

Save your Python source in UTF-8.
and add the line
```
 # -*- coding:utf-8 -*-
```

I would prefer the first or third option.

edited Nov 24 '18 at 21:50

answered Nov 24 '18 at 14:46

Thomas Fritsch

5,361122133

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53458883%2funicodedecodeerror-position-of-the-error%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 57: invalid continuation byte

The 57 is the byte position of complaint in your Python source file.
In your case it seems to be the position of the é in # Date création fichier.

0xe9 is the byte at that position. I recognize it as the ISO-8859-1
(a.k.a. ISO-Latin-1) representation of the character é.

So it seems your Python source file is actually encoded in ISO-8859-1,
but the Python interpreter for some reason assumes it to be encoded in UTF-8.

By the way:

In UTF-8 characters above 128 are encoded by 2 or more bytes,
the first being called the start byte, the others called continuation bytes.
For more explanation see the UTF-8 examples.

To avoid this kind of problem you have some alternative options:

Keep your Python source encoded in ISO-8859-1
and add the line
```
 # -*- coding:iso-8859-1 -*-
```
at the beginning of the file
as described in PEP 263 -- Defining Python Source Code Encodings.

Save your Python source in UTF-8.
and rely on UTF-8 being the default source-encoding of your Python-interpreter.

Save your Python source in UTF-8.
and add the line
```
 # -*- coding:utf-8 -*-
```

I would prefer the first or third option.

edited Nov 24 '18 at 21:50

answered Nov 24 '18 at 14:46

Thomas Fritsch

5,361122133

add a comment |

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 57: invalid continuation byte

The 57 is the byte position of complaint in your Python source file.
In your case it seems to be the position of the é in # Date création fichier.

0xe9 is the byte at that position. I recognize it as the ISO-8859-1
(a.k.a. ISO-Latin-1) representation of the character é.

So it seems your Python source file is actually encoded in ISO-8859-1,
but the Python interpreter for some reason assumes it to be encoded in UTF-8.

By the way:

In UTF-8 characters above 128 are encoded by 2 or more bytes,
the first being called the start byte, the others called continuation bytes.
For more explanation see the UTF-8 examples.

To avoid this kind of problem you have some alternative options:

Keep your Python source encoded in ISO-8859-1
and add the line
```
 # -*- coding:iso-8859-1 -*-
```
at the beginning of the file
as described in PEP 263 -- Defining Python Source Code Encodings.

Save your Python source in UTF-8.
and rely on UTF-8 being the default source-encoding of your Python-interpreter.

Save your Python source in UTF-8.
and add the line
```
 # -*- coding:utf-8 -*-
```

I would prefer the first or third option.

edited Nov 24 '18 at 21:50

answered Nov 24 '18 at 14:46

Thomas Fritsch

5,361122133

add a comment |

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 57: invalid continuation byte

The 57 is the byte position of complaint in your Python source file.
In your case it seems to be the position of the é in # Date création fichier.

0xe9 is the byte at that position. I recognize it as the ISO-8859-1
(a.k.a. ISO-Latin-1) representation of the character é.

So it seems your Python source file is actually encoded in ISO-8859-1,
but the Python interpreter for some reason assumes it to be encoded in UTF-8.

By the way:

In UTF-8 characters above 128 are encoded by 2 or more bytes,
the first being called the start byte, the others called continuation bytes.
For more explanation see the UTF-8 examples.

To avoid this kind of problem you have some alternative options:

Keep your Python source encoded in ISO-8859-1
and add the line
```
 # -*- coding:iso-8859-1 -*-
```
at the beginning of the file
as described in PEP 263 -- Defining Python Source Code Encodings.

Save your Python source in UTF-8.
and rely on UTF-8 being the default source-encoding of your Python-interpreter.

Save your Python source in UTF-8.
and add the line
```
 # -*- coding:utf-8 -*-
```

I would prefer the first or third option.

edited Nov 24 '18 at 21:50

answered Nov 24 '18 at 14:46

Thomas Fritsch

5,361122133

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 57: invalid continuation byte

The 57 is the byte position of complaint in your Python source file.
In your case it seems to be the position of the é in # Date création fichier.

0xe9 is the byte at that position. I recognize it as the ISO-8859-1
(a.k.a. ISO-Latin-1) representation of the character é.

So it seems your Python source file is actually encoded in ISO-8859-1,
but the Python interpreter for some reason assumes it to be encoded in UTF-8.

By the way:

In UTF-8 characters above 128 are encoded by 2 or more bytes,
the first being called the start byte, the others called continuation bytes.
For more explanation see the UTF-8 examples.

To avoid this kind of problem you have some alternative options:

Keep your Python source encoded in ISO-8859-1
and add the line
```
 # -*- coding:iso-8859-1 -*-
```
at the beginning of the file
as described in PEP 263 -- Defining Python Source Code Encodings.

Save your Python source in UTF-8.
and rely on UTF-8 being the default source-encoding of your Python-interpreter.

Save your Python source in UTF-8.
and add the line
```
 # -*- coding:utf-8 -*-
```

I would prefer the first or third option.

edited Nov 24 '18 at 21:50

answered Nov 24 '18 at 14:46

Thomas Fritsch

5,361122133

edited Nov 24 '18 at 21:50

answered Nov 24 '18 at 14:46

Thomas Fritsch

5,361122133

answered Nov 24 '18 at 14:46

Thomas Fritsch

5,361122133

answered Nov 24 '18 at 14:46

Thomas Fritsch

5,361122133

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Nsryjdtyk