UnicodeDecodeError : position of the error












1















I often face the "UnicodeDecodeError" when I'm writing some calculating programs. It says for example :



UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 57: invalid continuation byte


My question is : how can I locate which word in my code causes the error ? I have no idea how to know were this 'position 57' is situated. And by the way, what means 'invalid continuation byte' ?



Thanks for the answers.



PS : this error comes from this code where I try to apply the 4th order Runge-Kutta method to the Lorenz equations :



# Auteur : Bastien Massion 
# NOMA : 13701700
# Date création fichier : 23 novembre 2018 18h24
# Date dernière modification : 23 novembre 2018 19h05

# Je jure que ceci est le fruit de mon travail personnel

from numpy import *

def lorfunction(t, u): # u = [u_0, u_1, u_2] = [x, y, z]
fx = 10*u[1] - 10*u[0]
fy = 28*u[0] - u[0]*u[2] - u[1]
fz = u[0]*u[1] - 8/3*u[2]
return [fx, fy, fz]

def lorenz(Tstart, Tend, Ustart, n):
T, h = linspace(Tstart,Tend, n+1, retstep = True)
U = zeros((n+1, 3))
U[0,:] = Ustart

for i in range(0, n):
Ka = lorfunction(T[i], U[i])
Kb = lorfunction(T[i] + h/2, U[i] + h/2*Ka)
Kc = lorfunction(T[i] + h/2, U[i] + h/2*Kb)
Kd = lorfunction(T[i] + h, U[i] + h*Kc)
U[i+1] = U[i] + h/6*(Ka + 2*Kb + 2*Kc + Kd)

return T,U

print (lorenz(0.0, 100.0, [0,1,0,], 10000))









share|improve this question

























  • .. Interestingly, while you can catch a UnicodeError and parse its str result for the position, that is not of much use. I tried rewind and then a loop using readline to get to the erroneous one, but readline is heavily buffered and no amount of tinkering made it read one line at a time on my system. Anyone else?

    – usr2564301
    Nov 24 '18 at 15:10











  • (Saving your text sample as Latin-1 and then opening and reading this with the default UTF-8 encoding is enough to test with.)

    – usr2564301
    Nov 24 '18 at 15:10











  • (I must be in Random Mode today.) The proper way to avoid a UnicodeError is, of course, to make sure you open a text file with its proper encoding='...'. But your question is still valid when this proper encoding by all rights should have been UTF8 and thus actually contains an invalid byte.

    – usr2564301
    Nov 24 '18 at 15:14
















1















I often face the "UnicodeDecodeError" when I'm writing some calculating programs. It says for example :



UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 57: invalid continuation byte


My question is : how can I locate which word in my code causes the error ? I have no idea how to know were this 'position 57' is situated. And by the way, what means 'invalid continuation byte' ?



Thanks for the answers.



PS : this error comes from this code where I try to apply the 4th order Runge-Kutta method to the Lorenz equations :



# Auteur : Bastien Massion 
# NOMA : 13701700
# Date création fichier : 23 novembre 2018 18h24
# Date dernière modification : 23 novembre 2018 19h05

# Je jure que ceci est le fruit de mon travail personnel

from numpy import *

def lorfunction(t, u): # u = [u_0, u_1, u_2] = [x, y, z]
fx = 10*u[1] - 10*u[0]
fy = 28*u[0] - u[0]*u[2] - u[1]
fz = u[0]*u[1] - 8/3*u[2]
return [fx, fy, fz]

def lorenz(Tstart, Tend, Ustart, n):
T, h = linspace(Tstart,Tend, n+1, retstep = True)
U = zeros((n+1, 3))
U[0,:] = Ustart

for i in range(0, n):
Ka = lorfunction(T[i], U[i])
Kb = lorfunction(T[i] + h/2, U[i] + h/2*Ka)
Kc = lorfunction(T[i] + h/2, U[i] + h/2*Kb)
Kd = lorfunction(T[i] + h, U[i] + h*Kc)
U[i+1] = U[i] + h/6*(Ka + 2*Kb + 2*Kc + Kd)

return T,U

print (lorenz(0.0, 100.0, [0,1,0,], 10000))









share|improve this question

























  • .. Interestingly, while you can catch a UnicodeError and parse its str result for the position, that is not of much use. I tried rewind and then a loop using readline to get to the erroneous one, but readline is heavily buffered and no amount of tinkering made it read one line at a time on my system. Anyone else?

    – usr2564301
    Nov 24 '18 at 15:10











  • (Saving your text sample as Latin-1 and then opening and reading this with the default UTF-8 encoding is enough to test with.)

    – usr2564301
    Nov 24 '18 at 15:10











  • (I must be in Random Mode today.) The proper way to avoid a UnicodeError is, of course, to make sure you open a text file with its proper encoding='...'. But your question is still valid when this proper encoding by all rights should have been UTF8 and thus actually contains an invalid byte.

    – usr2564301
    Nov 24 '18 at 15:14














1












1








1


0






I often face the "UnicodeDecodeError" when I'm writing some calculating programs. It says for example :



UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 57: invalid continuation byte


My question is : how can I locate which word in my code causes the error ? I have no idea how to know were this 'position 57' is situated. And by the way, what means 'invalid continuation byte' ?



Thanks for the answers.



PS : this error comes from this code where I try to apply the 4th order Runge-Kutta method to the Lorenz equations :



# Auteur : Bastien Massion 
# NOMA : 13701700
# Date création fichier : 23 novembre 2018 18h24
# Date dernière modification : 23 novembre 2018 19h05

# Je jure que ceci est le fruit de mon travail personnel

from numpy import *

def lorfunction(t, u): # u = [u_0, u_1, u_2] = [x, y, z]
fx = 10*u[1] - 10*u[0]
fy = 28*u[0] - u[0]*u[2] - u[1]
fz = u[0]*u[1] - 8/3*u[2]
return [fx, fy, fz]

def lorenz(Tstart, Tend, Ustart, n):
T, h = linspace(Tstart,Tend, n+1, retstep = True)
U = zeros((n+1, 3))
U[0,:] = Ustart

for i in range(0, n):
Ka = lorfunction(T[i], U[i])
Kb = lorfunction(T[i] + h/2, U[i] + h/2*Ka)
Kc = lorfunction(T[i] + h/2, U[i] + h/2*Kb)
Kd = lorfunction(T[i] + h, U[i] + h*Kc)
U[i+1] = U[i] + h/6*(Ka + 2*Kb + 2*Kc + Kd)

return T,U

print (lorenz(0.0, 100.0, [0,1,0,], 10000))









share|improve this question
















I often face the "UnicodeDecodeError" when I'm writing some calculating programs. It says for example :



UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 57: invalid continuation byte


My question is : how can I locate which word in my code causes the error ? I have no idea how to know were this 'position 57' is situated. And by the way, what means 'invalid continuation byte' ?



Thanks for the answers.



PS : this error comes from this code where I try to apply the 4th order Runge-Kutta method to the Lorenz equations :



# Auteur : Bastien Massion 
# NOMA : 13701700
# Date création fichier : 23 novembre 2018 18h24
# Date dernière modification : 23 novembre 2018 19h05

# Je jure que ceci est le fruit de mon travail personnel

from numpy import *

def lorfunction(t, u): # u = [u_0, u_1, u_2] = [x, y, z]
fx = 10*u[1] - 10*u[0]
fy = 28*u[0] - u[0]*u[2] - u[1]
fz = u[0]*u[1] - 8/3*u[2]
return [fx, fy, fz]

def lorenz(Tstart, Tend, Ustart, n):
T, h = linspace(Tstart,Tend, n+1, retstep = True)
U = zeros((n+1, 3))
U[0,:] = Ustart

for i in range(0, n):
Ka = lorfunction(T[i], U[i])
Kb = lorfunction(T[i] + h/2, U[i] + h/2*Ka)
Kc = lorfunction(T[i] + h/2, U[i] + h/2*Kb)
Kd = lorfunction(T[i] + h, U[i] + h*Kc)
U[i+1] = U[i] + h/6*(Ka + 2*Kb + 2*Kc + Kd)

return T,U

print (lorenz(0.0, 100.0, [0,1,0,], 10000))






python python-unicode






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 24 '18 at 14:49









usr2564301

17.8k73370




17.8k73370










asked Nov 24 '18 at 13:58









Bastien MassionBastien Massion

61




61













  • .. Interestingly, while you can catch a UnicodeError and parse its str result for the position, that is not of much use. I tried rewind and then a loop using readline to get to the erroneous one, but readline is heavily buffered and no amount of tinkering made it read one line at a time on my system. Anyone else?

    – usr2564301
    Nov 24 '18 at 15:10











  • (Saving your text sample as Latin-1 and then opening and reading this with the default UTF-8 encoding is enough to test with.)

    – usr2564301
    Nov 24 '18 at 15:10











  • (I must be in Random Mode today.) The proper way to avoid a UnicodeError is, of course, to make sure you open a text file with its proper encoding='...'. But your question is still valid when this proper encoding by all rights should have been UTF8 and thus actually contains an invalid byte.

    – usr2564301
    Nov 24 '18 at 15:14



















  • .. Interestingly, while you can catch a UnicodeError and parse its str result for the position, that is not of much use. I tried rewind and then a loop using readline to get to the erroneous one, but readline is heavily buffered and no amount of tinkering made it read one line at a time on my system. Anyone else?

    – usr2564301
    Nov 24 '18 at 15:10











  • (Saving your text sample as Latin-1 and then opening and reading this with the default UTF-8 encoding is enough to test with.)

    – usr2564301
    Nov 24 '18 at 15:10











  • (I must be in Random Mode today.) The proper way to avoid a UnicodeError is, of course, to make sure you open a text file with its proper encoding='...'. But your question is still valid when this proper encoding by all rights should have been UTF8 and thus actually contains an invalid byte.

    – usr2564301
    Nov 24 '18 at 15:14

















.. Interestingly, while you can catch a UnicodeError and parse its str result for the position, that is not of much use. I tried rewind and then a loop using readline to get to the erroneous one, but readline is heavily buffered and no amount of tinkering made it read one line at a time on my system. Anyone else?

– usr2564301
Nov 24 '18 at 15:10





.. Interestingly, while you can catch a UnicodeError and parse its str result for the position, that is not of much use. I tried rewind and then a loop using readline to get to the erroneous one, but readline is heavily buffered and no amount of tinkering made it read one line at a time on my system. Anyone else?

– usr2564301
Nov 24 '18 at 15:10













(Saving your text sample as Latin-1 and then opening and reading this with the default UTF-8 encoding is enough to test with.)

– usr2564301
Nov 24 '18 at 15:10





(Saving your text sample as Latin-1 and then opening and reading this with the default UTF-8 encoding is enough to test with.)

– usr2564301
Nov 24 '18 at 15:10













(I must be in Random Mode today.) The proper way to avoid a UnicodeError is, of course, to make sure you open a text file with its proper encoding='...'. But your question is still valid when this proper encoding by all rights should have been UTF8 and thus actually contains an invalid byte.

– usr2564301
Nov 24 '18 at 15:14





(I must be in Random Mode today.) The proper way to avoid a UnicodeError is, of course, to make sure you open a text file with its proper encoding='...'. But your question is still valid when this proper encoding by all rights should have been UTF8 and thus actually contains an invalid byte.

– usr2564301
Nov 24 '18 at 15:14












1 Answer
1






active

oldest

votes


















2















UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 57: invalid continuation byte




The 57 is the byte position of complaint in your Python source file.
In your case it seems to be the position of the é in # Date création fichier.



0xe9 is the byte at that position. I recognize it as the ISO-8859-1
(a.k.a. ISO-Latin-1) representation of the character é.



So it seems your Python source file is actually encoded in ISO-8859-1,
but the Python interpreter for some reason assumes it to be encoded in UTF-8.



By the way:

In UTF-8 characters above 128 are encoded by 2 or more bytes,
the first being called the start byte, the others called continuation bytes.
For more explanation see the UTF-8 examples.



To understand the error I need to elaborate more.
Consider the bytes 0xe9 0x61 0x74, as occuring in your Python file:

Decoding the 3 bytes as ISO-8859-1 (being a single-byte encoding) would result in 3 characters: éat.

Decoding the same bytes as UTF-8 is more complicated.
The byte 0xe9 (because beginning with the bits 1110) is a starting byte
to be followed by 2 continuation bytes.
Each continuation byte needs to begin with the bits 10,
But the next 2 bytes (0x61 0x74) violate this condition.
Thus, a UnicodeError saying invalid continuation byte is thrown.



To avoid this kind of problem you have some alternative options:





  • Keep your Python source encoded in ISO-8859-1
    and add the line



     # -*- coding:iso-8859-1 -*-


    at the beginning of the file
    as described in PEP 263 -- Defining Python Source Code Encodings.



  • Save your Python source in UTF-8.
    and rely on UTF-8 being the default source-encoding of your Python-interpreter.


  • Save your Python source in UTF-8.
    and add the line



     # -*- coding:utf-8 -*-



I would prefer the first or third option.






share|improve this answer

























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53458883%2funicodedecodeerror-position-of-the-error%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    2















    UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 57: invalid continuation byte




    The 57 is the byte position of complaint in your Python source file.
    In your case it seems to be the position of the é in # Date création fichier.



    0xe9 is the byte at that position. I recognize it as the ISO-8859-1
    (a.k.a. ISO-Latin-1) representation of the character é.



    So it seems your Python source file is actually encoded in ISO-8859-1,
    but the Python interpreter for some reason assumes it to be encoded in UTF-8.



    By the way:

    In UTF-8 characters above 128 are encoded by 2 or more bytes,
    the first being called the start byte, the others called continuation bytes.
    For more explanation see the UTF-8 examples.



    To understand the error I need to elaborate more.
    Consider the bytes 0xe9 0x61 0x74, as occuring in your Python file:

    Decoding the 3 bytes as ISO-8859-1 (being a single-byte encoding) would result in 3 characters: éat.

    Decoding the same bytes as UTF-8 is more complicated.
    The byte 0xe9 (because beginning with the bits 1110) is a starting byte
    to be followed by 2 continuation bytes.
    Each continuation byte needs to begin with the bits 10,
    But the next 2 bytes (0x61 0x74) violate this condition.
    Thus, a UnicodeError saying invalid continuation byte is thrown.



    To avoid this kind of problem you have some alternative options:





    • Keep your Python source encoded in ISO-8859-1
      and add the line



       # -*- coding:iso-8859-1 -*-


      at the beginning of the file
      as described in PEP 263 -- Defining Python Source Code Encodings.



    • Save your Python source in UTF-8.
      and rely on UTF-8 being the default source-encoding of your Python-interpreter.


    • Save your Python source in UTF-8.
      and add the line



       # -*- coding:utf-8 -*-



    I would prefer the first or third option.






    share|improve this answer






























      2















      UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 57: invalid continuation byte




      The 57 is the byte position of complaint in your Python source file.
      In your case it seems to be the position of the é in # Date création fichier.



      0xe9 is the byte at that position. I recognize it as the ISO-8859-1
      (a.k.a. ISO-Latin-1) representation of the character é.



      So it seems your Python source file is actually encoded in ISO-8859-1,
      but the Python interpreter for some reason assumes it to be encoded in UTF-8.



      By the way:

      In UTF-8 characters above 128 are encoded by 2 or more bytes,
      the first being called the start byte, the others called continuation bytes.
      For more explanation see the UTF-8 examples.



      To understand the error I need to elaborate more.
      Consider the bytes 0xe9 0x61 0x74, as occuring in your Python file:

      Decoding the 3 bytes as ISO-8859-1 (being a single-byte encoding) would result in 3 characters: éat.

      Decoding the same bytes as UTF-8 is more complicated.
      The byte 0xe9 (because beginning with the bits 1110) is a starting byte
      to be followed by 2 continuation bytes.
      Each continuation byte needs to begin with the bits 10,
      But the next 2 bytes (0x61 0x74) violate this condition.
      Thus, a UnicodeError saying invalid continuation byte is thrown.



      To avoid this kind of problem you have some alternative options:





      • Keep your Python source encoded in ISO-8859-1
        and add the line



         # -*- coding:iso-8859-1 -*-


        at the beginning of the file
        as described in PEP 263 -- Defining Python Source Code Encodings.



      • Save your Python source in UTF-8.
        and rely on UTF-8 being the default source-encoding of your Python-interpreter.


      • Save your Python source in UTF-8.
        and add the line



         # -*- coding:utf-8 -*-



      I would prefer the first or third option.






      share|improve this answer




























        2












        2








        2








        UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 57: invalid continuation byte




        The 57 is the byte position of complaint in your Python source file.
        In your case it seems to be the position of the é in # Date création fichier.



        0xe9 is the byte at that position. I recognize it as the ISO-8859-1
        (a.k.a. ISO-Latin-1) representation of the character é.



        So it seems your Python source file is actually encoded in ISO-8859-1,
        but the Python interpreter for some reason assumes it to be encoded in UTF-8.



        By the way:

        In UTF-8 characters above 128 are encoded by 2 or more bytes,
        the first being called the start byte, the others called continuation bytes.
        For more explanation see the UTF-8 examples.



        To understand the error I need to elaborate more.
        Consider the bytes 0xe9 0x61 0x74, as occuring in your Python file:

        Decoding the 3 bytes as ISO-8859-1 (being a single-byte encoding) would result in 3 characters: éat.

        Decoding the same bytes as UTF-8 is more complicated.
        The byte 0xe9 (because beginning with the bits 1110) is a starting byte
        to be followed by 2 continuation bytes.
        Each continuation byte needs to begin with the bits 10,
        But the next 2 bytes (0x61 0x74) violate this condition.
        Thus, a UnicodeError saying invalid continuation byte is thrown.



        To avoid this kind of problem you have some alternative options:





        • Keep your Python source encoded in ISO-8859-1
          and add the line



           # -*- coding:iso-8859-1 -*-


          at the beginning of the file
          as described in PEP 263 -- Defining Python Source Code Encodings.



        • Save your Python source in UTF-8.
          and rely on UTF-8 being the default source-encoding of your Python-interpreter.


        • Save your Python source in UTF-8.
          and add the line



           # -*- coding:utf-8 -*-



        I would prefer the first or third option.






        share|improve this answer
















        UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 57: invalid continuation byte




        The 57 is the byte position of complaint in your Python source file.
        In your case it seems to be the position of the é in # Date création fichier.



        0xe9 is the byte at that position. I recognize it as the ISO-8859-1
        (a.k.a. ISO-Latin-1) representation of the character é.



        So it seems your Python source file is actually encoded in ISO-8859-1,
        but the Python interpreter for some reason assumes it to be encoded in UTF-8.



        By the way:

        In UTF-8 characters above 128 are encoded by 2 or more bytes,
        the first being called the start byte, the others called continuation bytes.
        For more explanation see the UTF-8 examples.



        To understand the error I need to elaborate more.
        Consider the bytes 0xe9 0x61 0x74, as occuring in your Python file:

        Decoding the 3 bytes as ISO-8859-1 (being a single-byte encoding) would result in 3 characters: éat.

        Decoding the same bytes as UTF-8 is more complicated.
        The byte 0xe9 (because beginning with the bits 1110) is a starting byte
        to be followed by 2 continuation bytes.
        Each continuation byte needs to begin with the bits 10,
        But the next 2 bytes (0x61 0x74) violate this condition.
        Thus, a UnicodeError saying invalid continuation byte is thrown.



        To avoid this kind of problem you have some alternative options:





        • Keep your Python source encoded in ISO-8859-1
          and add the line



           # -*- coding:iso-8859-1 -*-


          at the beginning of the file
          as described in PEP 263 -- Defining Python Source Code Encodings.



        • Save your Python source in UTF-8.
          and rely on UTF-8 being the default source-encoding of your Python-interpreter.


        • Save your Python source in UTF-8.
          and add the line



           # -*- coding:utf-8 -*-



        I would prefer the first or third option.







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Nov 24 '18 at 21:50

























        answered Nov 24 '18 at 14:46









        Thomas FritschThomas Fritsch

        5,361122133




        5,361122133
































            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53458883%2funicodedecodeerror-position-of-the-error%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Create new schema in PostgreSQL using DBeaver

            Deepest pit of an array with Javascript: test on Codility

            Costa Masnaga