Python - how to add a new line every time there is a pattern is found in a string?












1















How can I add a new line every time there is a pattern of a regex-list found in a string ?



I am using python 3.6.



I got the following input:



12.13.14 Here is supposed to start a new line.



12.13.15 Here is supposed to start a new line.



Here is some text. It is written in one lines. 12.13. Here is some more text. 2.12.14. Here is even more text.



I wish to have the following output:



12.13.14



Here is supposed to start a new line.



12.13.15



Here is supposed to start a new line.



Here is some text. It is written in one lines.



12.13.



Here is some more text.



2.12.14.



Here is even more text.



My first try returns as the output the same as the input:



in_file2 = 'work1-T1.txt'
out_file2 = 'work2-T1.txt'


start_rx = re.compile('|'.join(
['dd.dd.', 'd.dd.dd','dd.dd.dd']))


with open(in_file2,'r', encoding='utf-8') as fin2, open(out_file2, 'w', encoding='utf-8') as fout2:
text_list = fin2.read().split()
fin2.seek(0)

for string in fin2:
if re.match(start_rx, string):
string = str.replace(start_rx, 'nn' + start_rx + 'n')

fout2.write(string)


My second try returns an error 'TypeError: unsupported operand type(s) for +: '_sre.SRE_Pattern' and 'str''



in_file2 = 'work1-T1.txt'
out_file2 = 'work2-T1.txt'


start_rx = re.compile('|'.join(
['dd.dd.', 'd.dd.dd','dd.dd.dd']))

with open(in_file2,"r") as fin2, open(out_file2, 'w') as fout3:
for line in fin2:
start = False
if re.match(start_rx, line):
start = True
if start == False:
print ('do something')
if start == True:
line = 'n' + line ## leerzeichen vor Pos Nr
line = line.replace(start_rx, start_rx + 'n')
fout3.write(line)









share|improve this question


















  • 1





    Note you are trying to use str.replace method with regex, but it does not accept regex. You need re.sub. Try text = fin2.read() and then fout2.write(re.sub(r's*(d+(?:.d+)+.?)s*', r'nn1n', text)), too. See this demo.

    – Wiktor Stribiżew
    Nov 23 '18 at 10:54













  • This fixed the problem. Thank you.

    – Mady
    Nov 23 '18 at 11:05
















1















How can I add a new line every time there is a pattern of a regex-list found in a string ?



I am using python 3.6.



I got the following input:



12.13.14 Here is supposed to start a new line.



12.13.15 Here is supposed to start a new line.



Here is some text. It is written in one lines. 12.13. Here is some more text. 2.12.14. Here is even more text.



I wish to have the following output:



12.13.14



Here is supposed to start a new line.



12.13.15



Here is supposed to start a new line.



Here is some text. It is written in one lines.



12.13.



Here is some more text.



2.12.14.



Here is even more text.



My first try returns as the output the same as the input:



in_file2 = 'work1-T1.txt'
out_file2 = 'work2-T1.txt'


start_rx = re.compile('|'.join(
['dd.dd.', 'd.dd.dd','dd.dd.dd']))


with open(in_file2,'r', encoding='utf-8') as fin2, open(out_file2, 'w', encoding='utf-8') as fout2:
text_list = fin2.read().split()
fin2.seek(0)

for string in fin2:
if re.match(start_rx, string):
string = str.replace(start_rx, 'nn' + start_rx + 'n')

fout2.write(string)


My second try returns an error 'TypeError: unsupported operand type(s) for +: '_sre.SRE_Pattern' and 'str''



in_file2 = 'work1-T1.txt'
out_file2 = 'work2-T1.txt'


start_rx = re.compile('|'.join(
['dd.dd.', 'd.dd.dd','dd.dd.dd']))

with open(in_file2,"r") as fin2, open(out_file2, 'w') as fout3:
for line in fin2:
start = False
if re.match(start_rx, line):
start = True
if start == False:
print ('do something')
if start == True:
line = 'n' + line ## leerzeichen vor Pos Nr
line = line.replace(start_rx, start_rx + 'n')
fout3.write(line)









share|improve this question


















  • 1





    Note you are trying to use str.replace method with regex, but it does not accept regex. You need re.sub. Try text = fin2.read() and then fout2.write(re.sub(r's*(d+(?:.d+)+.?)s*', r'nn1n', text)), too. See this demo.

    – Wiktor Stribiżew
    Nov 23 '18 at 10:54













  • This fixed the problem. Thank you.

    – Mady
    Nov 23 '18 at 11:05














1












1








1








How can I add a new line every time there is a pattern of a regex-list found in a string ?



I am using python 3.6.



I got the following input:



12.13.14 Here is supposed to start a new line.



12.13.15 Here is supposed to start a new line.



Here is some text. It is written in one lines. 12.13. Here is some more text. 2.12.14. Here is even more text.



I wish to have the following output:



12.13.14



Here is supposed to start a new line.



12.13.15



Here is supposed to start a new line.



Here is some text. It is written in one lines.



12.13.



Here is some more text.



2.12.14.



Here is even more text.



My first try returns as the output the same as the input:



in_file2 = 'work1-T1.txt'
out_file2 = 'work2-T1.txt'


start_rx = re.compile('|'.join(
['dd.dd.', 'd.dd.dd','dd.dd.dd']))


with open(in_file2,'r', encoding='utf-8') as fin2, open(out_file2, 'w', encoding='utf-8') as fout2:
text_list = fin2.read().split()
fin2.seek(0)

for string in fin2:
if re.match(start_rx, string):
string = str.replace(start_rx, 'nn' + start_rx + 'n')

fout2.write(string)


My second try returns an error 'TypeError: unsupported operand type(s) for +: '_sre.SRE_Pattern' and 'str''



in_file2 = 'work1-T1.txt'
out_file2 = 'work2-T1.txt'


start_rx = re.compile('|'.join(
['dd.dd.', 'd.dd.dd','dd.dd.dd']))

with open(in_file2,"r") as fin2, open(out_file2, 'w') as fout3:
for line in fin2:
start = False
if re.match(start_rx, line):
start = True
if start == False:
print ('do something')
if start == True:
line = 'n' + line ## leerzeichen vor Pos Nr
line = line.replace(start_rx, start_rx + 'n')
fout3.write(line)









share|improve this question














How can I add a new line every time there is a pattern of a regex-list found in a string ?



I am using python 3.6.



I got the following input:



12.13.14 Here is supposed to start a new line.



12.13.15 Here is supposed to start a new line.



Here is some text. It is written in one lines. 12.13. Here is some more text. 2.12.14. Here is even more text.



I wish to have the following output:



12.13.14



Here is supposed to start a new line.



12.13.15



Here is supposed to start a new line.



Here is some text. It is written in one lines.



12.13.



Here is some more text.



2.12.14.



Here is even more text.



My first try returns as the output the same as the input:



in_file2 = 'work1-T1.txt'
out_file2 = 'work2-T1.txt'


start_rx = re.compile('|'.join(
['dd.dd.', 'd.dd.dd','dd.dd.dd']))


with open(in_file2,'r', encoding='utf-8') as fin2, open(out_file2, 'w', encoding='utf-8') as fout2:
text_list = fin2.read().split()
fin2.seek(0)

for string in fin2:
if re.match(start_rx, string):
string = str.replace(start_rx, 'nn' + start_rx + 'n')

fout2.write(string)


My second try returns an error 'TypeError: unsupported operand type(s) for +: '_sre.SRE_Pattern' and 'str''



in_file2 = 'work1-T1.txt'
out_file2 = 'work2-T1.txt'


start_rx = re.compile('|'.join(
['dd.dd.', 'd.dd.dd','dd.dd.dd']))

with open(in_file2,"r") as fin2, open(out_file2, 'w') as fout3:
for line in fin2:
start = False
if re.match(start_rx, line):
start = True
if start == False:
print ('do something')
if start == True:
line = 'n' + line ## leerzeichen vor Pos Nr
line = line.replace(start_rx, start_rx + 'n')
fout3.write(line)






regex python-3.x replace






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 23 '18 at 10:33









MadyMady

1389




1389








  • 1





    Note you are trying to use str.replace method with regex, but it does not accept regex. You need re.sub. Try text = fin2.read() and then fout2.write(re.sub(r's*(d+(?:.d+)+.?)s*', r'nn1n', text)), too. See this demo.

    – Wiktor Stribiżew
    Nov 23 '18 at 10:54













  • This fixed the problem. Thank you.

    – Mady
    Nov 23 '18 at 11:05














  • 1





    Note you are trying to use str.replace method with regex, but it does not accept regex. You need re.sub. Try text = fin2.read() and then fout2.write(re.sub(r's*(d+(?:.d+)+.?)s*', r'nn1n', text)), too. See this demo.

    – Wiktor Stribiżew
    Nov 23 '18 at 10:54













  • This fixed the problem. Thank you.

    – Mady
    Nov 23 '18 at 11:05








1




1





Note you are trying to use str.replace method with regex, but it does not accept regex. You need re.sub. Try text = fin2.read() and then fout2.write(re.sub(r's*(d+(?:.d+)+.?)s*', r'nn1n', text)), too. See this demo.

– Wiktor Stribiżew
Nov 23 '18 at 10:54







Note you are trying to use str.replace method with regex, but it does not accept regex. You need re.sub. Try text = fin2.read() and then fout2.write(re.sub(r's*(d+(?:.d+)+.?)s*', r'nn1n', text)), too. See this demo.

– Wiktor Stribiżew
Nov 23 '18 at 10:54















This fixed the problem. Thank you.

– Mady
Nov 23 '18 at 11:05





This fixed the problem. Thank you.

– Mady
Nov 23 '18 at 11:05












2 Answers
2






active

oldest

votes


















1














First of all, to search and replace with a regex, you need to use re.sub, not str.replace.



Second, if you use a re.sub, you can't use the regex pattern inside a replacement pattern, you need to group the parts of the regex you want to keep and use backreferences in the replacement (or, if you just want to refer to the whole match, use g<0> backreference, no capturing groups are required).



Third, when you build an unanchored alternation pattern, make sure longer alternatives come first, i.e. start_rx = re.compile('|'.join(['dd.dd.dd', 'd.dd.dd', 'dd.dd.'])). However, you may use a more precise pattern here manually.



Here is how your code can be fixed:



with open(in_file2,'r', encoding='utf-8') as fin2, open(out_file2, 'w', encoding='utf-8') as fout2:
text = fin2.read()
fout2.write(re.sub(r's*(d+(?:.d+)+.?)s*', r'nn1n', text))


See the Python demo



The pattern is



s*(d+(?:.d+)+.?)s*


See the regex demo



Details





  • s* - 0+ whitespaces


  • (d+(?:.d+)+.?) - Group 1 (1 in the replacement pattern):



    • d+ - 1+ digits


    • (?:.d+)+ - 1 or more repetitions of . and 1+ digits


    • .? - an optional .




  • s* - 0+ whitespaces






share|improve this answer

































    1














    Try this



    out_file2=re.sub(r'(d+) ', r'1n', in_file2)
    out_file2=re.sub(r'(w+).', r'1.n', in_file2)





    share|improve this answer























      Your Answer






      StackExchange.ifUsing("editor", function () {
      StackExchange.using("externalEditor", function () {
      StackExchange.using("snippets", function () {
      StackExchange.snippets.init();
      });
      });
      }, "code-snippets");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "1"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53445013%2fpython-how-to-add-a-new-line-every-time-there-is-a-pattern-is-found-in-a-strin%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      2 Answers
      2






      active

      oldest

      votes








      2 Answers
      2






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      1














      First of all, to search and replace with a regex, you need to use re.sub, not str.replace.



      Second, if you use a re.sub, you can't use the regex pattern inside a replacement pattern, you need to group the parts of the regex you want to keep and use backreferences in the replacement (or, if you just want to refer to the whole match, use g<0> backreference, no capturing groups are required).



      Third, when you build an unanchored alternation pattern, make sure longer alternatives come first, i.e. start_rx = re.compile('|'.join(['dd.dd.dd', 'd.dd.dd', 'dd.dd.'])). However, you may use a more precise pattern here manually.



      Here is how your code can be fixed:



      with open(in_file2,'r', encoding='utf-8') as fin2, open(out_file2, 'w', encoding='utf-8') as fout2:
      text = fin2.read()
      fout2.write(re.sub(r's*(d+(?:.d+)+.?)s*', r'nn1n', text))


      See the Python demo



      The pattern is



      s*(d+(?:.d+)+.?)s*


      See the regex demo



      Details





      • s* - 0+ whitespaces


      • (d+(?:.d+)+.?) - Group 1 (1 in the replacement pattern):



        • d+ - 1+ digits


        • (?:.d+)+ - 1 or more repetitions of . and 1+ digits


        • .? - an optional .




      • s* - 0+ whitespaces






      share|improve this answer






























        1














        First of all, to search and replace with a regex, you need to use re.sub, not str.replace.



        Second, if you use a re.sub, you can't use the regex pattern inside a replacement pattern, you need to group the parts of the regex you want to keep and use backreferences in the replacement (or, if you just want to refer to the whole match, use g<0> backreference, no capturing groups are required).



        Third, when you build an unanchored alternation pattern, make sure longer alternatives come first, i.e. start_rx = re.compile('|'.join(['dd.dd.dd', 'd.dd.dd', 'dd.dd.'])). However, you may use a more precise pattern here manually.



        Here is how your code can be fixed:



        with open(in_file2,'r', encoding='utf-8') as fin2, open(out_file2, 'w', encoding='utf-8') as fout2:
        text = fin2.read()
        fout2.write(re.sub(r's*(d+(?:.d+)+.?)s*', r'nn1n', text))


        See the Python demo



        The pattern is



        s*(d+(?:.d+)+.?)s*


        See the regex demo



        Details





        • s* - 0+ whitespaces


        • (d+(?:.d+)+.?) - Group 1 (1 in the replacement pattern):



          • d+ - 1+ digits


          • (?:.d+)+ - 1 or more repetitions of . and 1+ digits


          • .? - an optional .




        • s* - 0+ whitespaces






        share|improve this answer




























          1












          1








          1







          First of all, to search and replace with a regex, you need to use re.sub, not str.replace.



          Second, if you use a re.sub, you can't use the regex pattern inside a replacement pattern, you need to group the parts of the regex you want to keep and use backreferences in the replacement (or, if you just want to refer to the whole match, use g<0> backreference, no capturing groups are required).



          Third, when you build an unanchored alternation pattern, make sure longer alternatives come first, i.e. start_rx = re.compile('|'.join(['dd.dd.dd', 'd.dd.dd', 'dd.dd.'])). However, you may use a more precise pattern here manually.



          Here is how your code can be fixed:



          with open(in_file2,'r', encoding='utf-8') as fin2, open(out_file2, 'w', encoding='utf-8') as fout2:
          text = fin2.read()
          fout2.write(re.sub(r's*(d+(?:.d+)+.?)s*', r'nn1n', text))


          See the Python demo



          The pattern is



          s*(d+(?:.d+)+.?)s*


          See the regex demo



          Details





          • s* - 0+ whitespaces


          • (d+(?:.d+)+.?) - Group 1 (1 in the replacement pattern):



            • d+ - 1+ digits


            • (?:.d+)+ - 1 or more repetitions of . and 1+ digits


            • .? - an optional .




          • s* - 0+ whitespaces






          share|improve this answer















          First of all, to search and replace with a regex, you need to use re.sub, not str.replace.



          Second, if you use a re.sub, you can't use the regex pattern inside a replacement pattern, you need to group the parts of the regex you want to keep and use backreferences in the replacement (or, if you just want to refer to the whole match, use g<0> backreference, no capturing groups are required).



          Third, when you build an unanchored alternation pattern, make sure longer alternatives come first, i.e. start_rx = re.compile('|'.join(['dd.dd.dd', 'd.dd.dd', 'dd.dd.'])). However, you may use a more precise pattern here manually.



          Here is how your code can be fixed:



          with open(in_file2,'r', encoding='utf-8') as fin2, open(out_file2, 'w', encoding='utf-8') as fout2:
          text = fin2.read()
          fout2.write(re.sub(r's*(d+(?:.d+)+.?)s*', r'nn1n', text))


          See the Python demo



          The pattern is



          s*(d+(?:.d+)+.?)s*


          See the regex demo



          Details





          • s* - 0+ whitespaces


          • (d+(?:.d+)+.?) - Group 1 (1 in the replacement pattern):



            • d+ - 1+ digits


            • (?:.d+)+ - 1 or more repetitions of . and 1+ digits


            • .? - an optional .




          • s* - 0+ whitespaces







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Nov 23 '18 at 11:12

























          answered Nov 23 '18 at 11:06









          Wiktor StribiżewWiktor Stribiżew

          316k16134215




          316k16134215

























              1














              Try this



              out_file2=re.sub(r'(d+) ', r'1n', in_file2)
              out_file2=re.sub(r'(w+).', r'1.n', in_file2)





              share|improve this answer




























                1














                Try this



                out_file2=re.sub(r'(d+) ', r'1n', in_file2)
                out_file2=re.sub(r'(w+).', r'1.n', in_file2)





                share|improve this answer


























                  1












                  1








                  1







                  Try this



                  out_file2=re.sub(r'(d+) ', r'1n', in_file2)
                  out_file2=re.sub(r'(w+).', r'1.n', in_file2)





                  share|improve this answer













                  Try this



                  out_file2=re.sub(r'(d+) ', r'1n', in_file2)
                  out_file2=re.sub(r'(w+).', r'1.n', in_file2)






                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Nov 23 '18 at 10:46









                  gocengocen

                  258




                  258






























                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Stack Overflow!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53445013%2fpython-how-to-add-a-new-line-every-time-there-is-a-pattern-is-found-in-a-strin%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      Costa Masnaga

                      Fotorealismo

                      Sidney Franklin