ruamel condensing comments and injecting 0x07












3















Given the following code:



from ruamel.yaml import YAML

yaml = YAML()
with open(filename) as f:
z = yaml.load(f)
yaml.dump(z, sys.stdout)


And the following file:



a: >
Hello.<b>
World.


When <b> is a space character (0x20), produces the following YAML:



a: >
Hello. <0x07> World.


When <0x07> is the byte 0x07.
Trying to re-load this YAML using PyYAML results in an error as 0x07 is an invalid character.



This does not happen when I remove the trailing blank after the Hello. in the input YAML.



Any idea what can cause this?










share|improve this question

























  • You likely have a buggy ruamel version installed. I had the same issue, and it was fixed in version 0.15.66: pypi.org/project/ruamel.yaml/0.15.66 See the comment in the changelog

    – tinita
    Nov 25 '18 at 13:46













  • @tinita I'm using 0.15.79, which, AFAIK, is the latest. Also, I'm able to reproduce this with 0.15.66 as well.

    – Isaac
    Nov 25 '18 at 13:56
















3















Given the following code:



from ruamel.yaml import YAML

yaml = YAML()
with open(filename) as f:
z = yaml.load(f)
yaml.dump(z, sys.stdout)


And the following file:



a: >
Hello.<b>
World.


When <b> is a space character (0x20), produces the following YAML:



a: >
Hello. <0x07> World.


When <0x07> is the byte 0x07.
Trying to re-load this YAML using PyYAML results in an error as 0x07 is an invalid character.



This does not happen when I remove the trailing blank after the Hello. in the input YAML.



Any idea what can cause this?










share|improve this question

























  • You likely have a buggy ruamel version installed. I had the same issue, and it was fixed in version 0.15.66: pypi.org/project/ruamel.yaml/0.15.66 See the comment in the changelog

    – tinita
    Nov 25 '18 at 13:46













  • @tinita I'm using 0.15.79, which, AFAIK, is the latest. Also, I'm able to reproduce this with 0.15.66 as well.

    – Isaac
    Nov 25 '18 at 13:56














3












3








3








Given the following code:



from ruamel.yaml import YAML

yaml = YAML()
with open(filename) as f:
z = yaml.load(f)
yaml.dump(z, sys.stdout)


And the following file:



a: >
Hello.<b>
World.


When <b> is a space character (0x20), produces the following YAML:



a: >
Hello. <0x07> World.


When <0x07> is the byte 0x07.
Trying to re-load this YAML using PyYAML results in an error as 0x07 is an invalid character.



This does not happen when I remove the trailing blank after the Hello. in the input YAML.



Any idea what can cause this?










share|improve this question
















Given the following code:



from ruamel.yaml import YAML

yaml = YAML()
with open(filename) as f:
z = yaml.load(f)
yaml.dump(z, sys.stdout)


And the following file:



a: >
Hello.<b>
World.


When <b> is a space character (0x20), produces the following YAML:



a: >
Hello. <0x07> World.


When <0x07> is the byte 0x07.
Trying to re-load this YAML using PyYAML results in an error as 0x07 is an invalid character.



This does not happen when I remove the trailing blank after the Hello. in the input YAML.



Any idea what can cause this?







yaml ruamel.yaml






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 25 '18 at 13:41







Isaac

















asked Nov 25 '18 at 13:34









IsaacIsaac

13.9k34172




13.9k34172













  • You likely have a buggy ruamel version installed. I had the same issue, and it was fixed in version 0.15.66: pypi.org/project/ruamel.yaml/0.15.66 See the comment in the changelog

    – tinita
    Nov 25 '18 at 13:46













  • @tinita I'm using 0.15.79, which, AFAIK, is the latest. Also, I'm able to reproduce this with 0.15.66 as well.

    – Isaac
    Nov 25 '18 at 13:56



















  • You likely have a buggy ruamel version installed. I had the same issue, and it was fixed in version 0.15.66: pypi.org/project/ruamel.yaml/0.15.66 See the comment in the changelog

    – tinita
    Nov 25 '18 at 13:46













  • @tinita I'm using 0.15.79, which, AFAIK, is the latest. Also, I'm able to reproduce this with 0.15.66 as well.

    – Isaac
    Nov 25 '18 at 13:56

















You likely have a buggy ruamel version installed. I had the same issue, and it was fixed in version 0.15.66: pypi.org/project/ruamel.yaml/0.15.66 See the comment in the changelog

– tinita
Nov 25 '18 at 13:46







You likely have a buggy ruamel version installed. I had the same issue, and it was fixed in version 0.15.66: pypi.org/project/ruamel.yaml/0.15.66 See the comment in the changelog

– tinita
Nov 25 '18 at 13:46















@tinita I'm using 0.15.79, which, AFAIK, is the latest. Also, I'm able to reproduce this with 0.15.66 as well.

– Isaac
Nov 25 '18 at 13:56





@tinita I'm using 0.15.79, which, AFAIK, is the latest. Also, I'm able to reproduce this with 0.15.66 as well.

– Isaac
Nov 25 '18 at 13:56












1 Answer
1






active

oldest

votes


















1














The BEL character (0x07, a) is inserted during parsing in block style folded strings, so that the representation for that scalar in Python (ruamel.yaml.scalarstring.FoldedScalarString) can register the positions where the original folds did occur. At dump time, the reverse is done: the positions are translated to BEL characters (if they correspond to spaces) and so transmit these folding positions from the representer to the emitter, which then outputs the scalar with the "folds" at the original points the occurred. This of course can/should only happen if the positions still represent "foldable" positions.



The problem here is that the parser should, during loading, complain that your YAML is incorrect. It fails to do so, loads faulty data and then fails to properly dump the mess it allowed to be loaded in the first place, resulting in the BEL character ending up in the output.



The YAML specification states:




Folding allows long lines to be broken anywhere a single space character separates two non-space characters.




And as your line has not been folded between two non-space characters, this should result in a warning, if not in an immediate parser error.¹



Additionally the representer should of course be smart enough not to replace a space by a BEL character if the space it is replacing is adjacent to white-space. That situation can also occur after changing a string that was loaded from correct YAML with a folded string. I essentially consider that a bug.



The ruamel.yaml>0.15.80 has a fix for the incorrect representation. An implementation on the error/warning on loading is likely to follow soon.





¹ When only issuing a warning, my initial reaction is that I should strip the faulty trailing space, or spaces in case there are more, because it is invisible, and keeping the fold.






share|improve this answer

























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53468006%2fruamel-condensing-comments-and-injecting-0x07%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    1














    The BEL character (0x07, a) is inserted during parsing in block style folded strings, so that the representation for that scalar in Python (ruamel.yaml.scalarstring.FoldedScalarString) can register the positions where the original folds did occur. At dump time, the reverse is done: the positions are translated to BEL characters (if they correspond to spaces) and so transmit these folding positions from the representer to the emitter, which then outputs the scalar with the "folds" at the original points the occurred. This of course can/should only happen if the positions still represent "foldable" positions.



    The problem here is that the parser should, during loading, complain that your YAML is incorrect. It fails to do so, loads faulty data and then fails to properly dump the mess it allowed to be loaded in the first place, resulting in the BEL character ending up in the output.



    The YAML specification states:




    Folding allows long lines to be broken anywhere a single space character separates two non-space characters.




    And as your line has not been folded between two non-space characters, this should result in a warning, if not in an immediate parser error.¹



    Additionally the representer should of course be smart enough not to replace a space by a BEL character if the space it is replacing is adjacent to white-space. That situation can also occur after changing a string that was loaded from correct YAML with a folded string. I essentially consider that a bug.



    The ruamel.yaml>0.15.80 has a fix for the incorrect representation. An implementation on the error/warning on loading is likely to follow soon.





    ¹ When only issuing a warning, my initial reaction is that I should strip the faulty trailing space, or spaces in case there are more, because it is invisible, and keeping the fold.






    share|improve this answer






























      1














      The BEL character (0x07, a) is inserted during parsing in block style folded strings, so that the representation for that scalar in Python (ruamel.yaml.scalarstring.FoldedScalarString) can register the positions where the original folds did occur. At dump time, the reverse is done: the positions are translated to BEL characters (if they correspond to spaces) and so transmit these folding positions from the representer to the emitter, which then outputs the scalar with the "folds" at the original points the occurred. This of course can/should only happen if the positions still represent "foldable" positions.



      The problem here is that the parser should, during loading, complain that your YAML is incorrect. It fails to do so, loads faulty data and then fails to properly dump the mess it allowed to be loaded in the first place, resulting in the BEL character ending up in the output.



      The YAML specification states:




      Folding allows long lines to be broken anywhere a single space character separates two non-space characters.




      And as your line has not been folded between two non-space characters, this should result in a warning, if not in an immediate parser error.¹



      Additionally the representer should of course be smart enough not to replace a space by a BEL character if the space it is replacing is adjacent to white-space. That situation can also occur after changing a string that was loaded from correct YAML with a folded string. I essentially consider that a bug.



      The ruamel.yaml>0.15.80 has a fix for the incorrect representation. An implementation on the error/warning on loading is likely to follow soon.





      ¹ When only issuing a warning, my initial reaction is that I should strip the faulty trailing space, or spaces in case there are more, because it is invisible, and keeping the fold.






      share|improve this answer




























        1












        1








        1







        The BEL character (0x07, a) is inserted during parsing in block style folded strings, so that the representation for that scalar in Python (ruamel.yaml.scalarstring.FoldedScalarString) can register the positions where the original folds did occur. At dump time, the reverse is done: the positions are translated to BEL characters (if they correspond to spaces) and so transmit these folding positions from the representer to the emitter, which then outputs the scalar with the "folds" at the original points the occurred. This of course can/should only happen if the positions still represent "foldable" positions.



        The problem here is that the parser should, during loading, complain that your YAML is incorrect. It fails to do so, loads faulty data and then fails to properly dump the mess it allowed to be loaded in the first place, resulting in the BEL character ending up in the output.



        The YAML specification states:




        Folding allows long lines to be broken anywhere a single space character separates two non-space characters.




        And as your line has not been folded between two non-space characters, this should result in a warning, if not in an immediate parser error.¹



        Additionally the representer should of course be smart enough not to replace a space by a BEL character if the space it is replacing is adjacent to white-space. That situation can also occur after changing a string that was loaded from correct YAML with a folded string. I essentially consider that a bug.



        The ruamel.yaml>0.15.80 has a fix for the incorrect representation. An implementation on the error/warning on loading is likely to follow soon.





        ¹ When only issuing a warning, my initial reaction is that I should strip the faulty trailing space, or spaces in case there are more, because it is invisible, and keeping the fold.






        share|improve this answer















        The BEL character (0x07, a) is inserted during parsing in block style folded strings, so that the representation for that scalar in Python (ruamel.yaml.scalarstring.FoldedScalarString) can register the positions where the original folds did occur. At dump time, the reverse is done: the positions are translated to BEL characters (if they correspond to spaces) and so transmit these folding positions from the representer to the emitter, which then outputs the scalar with the "folds" at the original points the occurred. This of course can/should only happen if the positions still represent "foldable" positions.



        The problem here is that the parser should, during loading, complain that your YAML is incorrect. It fails to do so, loads faulty data and then fails to properly dump the mess it allowed to be loaded in the first place, resulting in the BEL character ending up in the output.



        The YAML specification states:




        Folding allows long lines to be broken anywhere a single space character separates two non-space characters.




        And as your line has not been folded between two non-space characters, this should result in a warning, if not in an immediate parser error.¹



        Additionally the representer should of course be smart enough not to replace a space by a BEL character if the space it is replacing is adjacent to white-space. That situation can also occur after changing a string that was loaded from correct YAML with a folded string. I essentially consider that a bug.



        The ruamel.yaml>0.15.80 has a fix for the incorrect representation. An implementation on the error/warning on loading is likely to follow soon.





        ¹ When only issuing a warning, my initial reaction is that I should strip the faulty trailing space, or spaces in case there are more, because it is invisible, and keeping the fold.







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Nov 26 '18 at 9:14

























        answered Nov 25 '18 at 19:45









        AnthonAnthon

        31.1k1795149




        31.1k1795149
































            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53468006%2fruamel-condensing-comments-and-injecting-0x07%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Create new schema in PostgreSQL using DBeaver

            Deepest pit of an array with Javascript: test on Codility

            Costa Masnaga