Replace junk character apostrophe using regex











up vote
1
down vote

favorite












All the apostrophe in my HTML is being converted to junk by the UI engine. I need to create a regex with below pattern to replace the string in Java.



The specific pattern is needed because the some characters are displayed as junk from the HTML. The whole string can be like : company㝵20ac?s



[2 characters]+"20ac"+[1 character]


I need to replace this whole string with a single quote. Something like:



string.replaceAll(<regex>, "'");


It should not be like this but the junk characters cannot be parsed by any java or HTML anymore once saved in the database.










share|improve this question
























  • the unicode codepoint associated with u20ac is the EURO SIGN character. The best you can do is to change it for EUR sequence, so you don't have any more problems with euros. A little more of context or a better description of your problem (better than calling junk the thing) could lead to a better solution. Probably if the HTML generator uses Java character sequence instead of the character, the browsers can display it properly. What are the two characters that precede the 20ac seq?
    – Luis Colorado
    Nov 22 at 5:53

















up vote
1
down vote

favorite












All the apostrophe in my HTML is being converted to junk by the UI engine. I need to create a regex with below pattern to replace the string in Java.



The specific pattern is needed because the some characters are displayed as junk from the HTML. The whole string can be like : company㝵20ac?s



[2 characters]+"20ac"+[1 character]


I need to replace this whole string with a single quote. Something like:



string.replaceAll(<regex>, "'");


It should not be like this but the junk characters cannot be parsed by any java or HTML anymore once saved in the database.










share|improve this question
























  • the unicode codepoint associated with u20ac is the EURO SIGN character. The best you can do is to change it for EUR sequence, so you don't have any more problems with euros. A little more of context or a better description of your problem (better than calling junk the thing) could lead to a better solution. Probably if the HTML generator uses Java character sequence instead of the character, the browsers can display it properly. What are the two characters that precede the 20ac seq?
    – Luis Colorado
    Nov 22 at 5:53















up vote
1
down vote

favorite









up vote
1
down vote

favorite











All the apostrophe in my HTML is being converted to junk by the UI engine. I need to create a regex with below pattern to replace the string in Java.



The specific pattern is needed because the some characters are displayed as junk from the HTML. The whole string can be like : company㝵20ac?s



[2 characters]+"20ac"+[1 character]


I need to replace this whole string with a single quote. Something like:



string.replaceAll(<regex>, "'");


It should not be like this but the junk characters cannot be parsed by any java or HTML anymore once saved in the database.










share|improve this question















All the apostrophe in my HTML is being converted to junk by the UI engine. I need to create a regex with below pattern to replace the string in Java.



The specific pattern is needed because the some characters are displayed as junk from the HTML. The whole string can be like : company㝵20ac?s



[2 characters]+"20ac"+[1 character]


I need to replace this whole string with a single quote. Something like:



string.replaceAll(<regex>, "'");


It should not be like this but the junk characters cannot be parsed by any java or HTML anymore once saved in the database.







java regex apostrophe






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 19 at 7:19









Oram

57313




57313










asked Nov 19 at 3:23









Riju Mahna

2,43684277




2,43684277












  • the unicode codepoint associated with u20ac is the EURO SIGN character. The best you can do is to change it for EUR sequence, so you don't have any more problems with euros. A little more of context or a better description of your problem (better than calling junk the thing) could lead to a better solution. Probably if the HTML generator uses Java character sequence instead of the character, the browsers can display it properly. What are the two characters that precede the 20ac seq?
    – Luis Colorado
    Nov 22 at 5:53




















  • the unicode codepoint associated with u20ac is the EURO SIGN character. The best you can do is to change it for EUR sequence, so you don't have any more problems with euros. A little more of context or a better description of your problem (better than calling junk the thing) could lead to a better solution. Probably if the HTML generator uses Java character sequence instead of the character, the browsers can display it properly. What are the two characters that precede the 20ac seq?
    – Luis Colorado
    Nov 22 at 5:53


















the unicode codepoint associated with u20ac is the EURO SIGN character. The best you can do is to change it for EUR sequence, so you don't have any more problems with euros. A little more of context or a better description of your problem (better than calling junk the thing) could lead to a better solution. Probably if the HTML generator uses Java character sequence instead of the character, the browsers can display it properly. What are the two characters that precede the 20ac seq?
– Luis Colorado
Nov 22 at 5:53






the unicode codepoint associated with u20ac is the EURO SIGN character. The best you can do is to change it for EUR sequence, so you don't have any more problems with euros. A little more of context or a better description of your problem (better than calling junk the thing) could lead to a better solution. Probably if the HTML generator uses Java character sequence instead of the character, the browsers can display it properly. What are the two characters that precede the 20ac seq?
– Luis Colorado
Nov 22 at 5:53














1 Answer
1






active

oldest

votes

















up vote
1
down vote













If you want any 2 characters followed by 20ac and then another character you can do something like this:



string.replaceAll("..(20ac).","'$1'");



The . means any character.
What's in the parenthesis will be captured and used later with $1.



Regex explanation



If you want to replace only junk characters you need to define them in the regex instead of the ..

Can be something like this: [㝵] (put all the junk characters inside the brackets).

For multiple characters you can use * for zero or more, + for one or more and {2} for exactly 2 characters.

So the end result can be something like this [㝵]+(20ac)?



Regex explanation






share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














     

    draft saved


    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53367892%2freplace-junk-character-apostrophe-using-regex%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    1
    down vote













    If you want any 2 characters followed by 20ac and then another character you can do something like this:



    string.replaceAll("..(20ac).","'$1'");



    The . means any character.
    What's in the parenthesis will be captured and used later with $1.



    Regex explanation



    If you want to replace only junk characters you need to define them in the regex instead of the ..

    Can be something like this: [㝵] (put all the junk characters inside the brackets).

    For multiple characters you can use * for zero or more, + for one or more and {2} for exactly 2 characters.

    So the end result can be something like this [㝵]+(20ac)?



    Regex explanation






    share|improve this answer



























      up vote
      1
      down vote













      If you want any 2 characters followed by 20ac and then another character you can do something like this:



      string.replaceAll("..(20ac).","'$1'");



      The . means any character.
      What's in the parenthesis will be captured and used later with $1.



      Regex explanation



      If you want to replace only junk characters you need to define them in the regex instead of the ..

      Can be something like this: [㝵] (put all the junk characters inside the brackets).

      For multiple characters you can use * for zero or more, + for one or more and {2} for exactly 2 characters.

      So the end result can be something like this [㝵]+(20ac)?



      Regex explanation






      share|improve this answer

























        up vote
        1
        down vote










        up vote
        1
        down vote









        If you want any 2 characters followed by 20ac and then another character you can do something like this:



        string.replaceAll("..(20ac).","'$1'");



        The . means any character.
        What's in the parenthesis will be captured and used later with $1.



        Regex explanation



        If you want to replace only junk characters you need to define them in the regex instead of the ..

        Can be something like this: [㝵] (put all the junk characters inside the brackets).

        For multiple characters you can use * for zero or more, + for one or more and {2} for exactly 2 characters.

        So the end result can be something like this [㝵]+(20ac)?



        Regex explanation






        share|improve this answer














        If you want any 2 characters followed by 20ac and then another character you can do something like this:



        string.replaceAll("..(20ac).","'$1'");



        The . means any character.
        What's in the parenthesis will be captured and used later with $1.



        Regex explanation



        If you want to replace only junk characters you need to define them in the regex instead of the ..

        Can be something like this: [㝵] (put all the junk characters inside the brackets).

        For multiple characters you can use * for zero or more, + for one or more and {2} for exactly 2 characters.

        So the end result can be something like this [㝵]+(20ac)?



        Regex explanation







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Nov 19 at 7:23

























        answered Nov 19 at 7:00









        Oram

        57313




        57313






























             

            draft saved


            draft discarded



















































             


            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53367892%2freplace-junk-character-apostrophe-using-regex%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Costa Masnaga

            Fotorealismo

            Sidney Franklin