Using regex with list comprehension in python












1















I have following code which will store all the csv filename in a list from a specific folder



import pandas as pd
import re
import os

files = os.listdir('.')
filename=[filename for filename in files if filename.endswith('.csv')]


However, in my folder, I have two types of csv files, one ends with, for example, _20.cvs(or maybe _18.csv,_01.csv), another one ends with _Raw.csv;



However I only need the first type stored in my list. I know regular expression may can help me on that, so I did some google search, and come up with the following code, but it seems doesn't work, can anyone offer a advice?



filename = [re.search(r'^d{2}.csv'),filename).group(0) for filename in files] 









share|improve this question























  • BTW, do you have _20.cvs or _20.csv?

    – Wiktor Stribiżew
    Nov 22 '18 at 8:10











  • _20.csv, thanks

    – Rowling
    Nov 22 '18 at 8:14
















1















I have following code which will store all the csv filename in a list from a specific folder



import pandas as pd
import re
import os

files = os.listdir('.')
filename=[filename for filename in files if filename.endswith('.csv')]


However, in my folder, I have two types of csv files, one ends with, for example, _20.cvs(or maybe _18.csv,_01.csv), another one ends with _Raw.csv;



However I only need the first type stored in my list. I know regular expression may can help me on that, so I did some google search, and come up with the following code, but it seems doesn't work, can anyone offer a advice?



filename = [re.search(r'^d{2}.csv'),filename).group(0) for filename in files] 









share|improve this question























  • BTW, do you have _20.cvs or _20.csv?

    – Wiktor Stribiżew
    Nov 22 '18 at 8:10











  • _20.csv, thanks

    – Rowling
    Nov 22 '18 at 8:14














1












1








1








I have following code which will store all the csv filename in a list from a specific folder



import pandas as pd
import re
import os

files = os.listdir('.')
filename=[filename for filename in files if filename.endswith('.csv')]


However, in my folder, I have two types of csv files, one ends with, for example, _20.cvs(or maybe _18.csv,_01.csv), another one ends with _Raw.csv;



However I only need the first type stored in my list. I know regular expression may can help me on that, so I did some google search, and come up with the following code, but it seems doesn't work, can anyone offer a advice?



filename = [re.search(r'^d{2}.csv'),filename).group(0) for filename in files] 









share|improve this question














I have following code which will store all the csv filename in a list from a specific folder



import pandas as pd
import re
import os

files = os.listdir('.')
filename=[filename for filename in files if filename.endswith('.csv')]


However, in my folder, I have two types of csv files, one ends with, for example, _20.cvs(or maybe _18.csv,_01.csv), another one ends with _Raw.csv;



However I only need the first type stored in my list. I know regular expression may can help me on that, so I did some google search, and come up with the following code, but it seems doesn't work, can anyone offer a advice?



filename = [re.search(r'^d{2}.csv'),filename).group(0) for filename in files] 






regex python-3.x






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 22 '18 at 8:00









RowlingRowling

8811




8811













  • BTW, do you have _20.cvs or _20.csv?

    – Wiktor Stribiżew
    Nov 22 '18 at 8:10











  • _20.csv, thanks

    – Rowling
    Nov 22 '18 at 8:14



















  • BTW, do you have _20.cvs or _20.csv?

    – Wiktor Stribiżew
    Nov 22 '18 at 8:10











  • _20.csv, thanks

    – Rowling
    Nov 22 '18 at 8:14

















BTW, do you have _20.cvs or _20.csv?

– Wiktor Stribiżew
Nov 22 '18 at 8:10





BTW, do you have _20.cvs or _20.csv?

– Wiktor Stribiżew
Nov 22 '18 at 8:10













_20.csv, thanks

– Rowling
Nov 22 '18 at 8:14





_20.csv, thanks

– Rowling
Nov 22 '18 at 8:14












4 Answers
4






active

oldest

votes


















4














You need to remove ^ (as it matches the start of string location), add $ at the end of the pattern (to make sure the match is at the end of the string) and escape the dot (else, . matches any char but a line break char).



Note you must check if there is a match before accessing .group():



result = [f for f in files if re.search(r'_d{2}.csv$', f)] 


Details





  • _ - an underscore


  • d{2} - 2 digits


  • . - a literal dot


  • csv - csv text


  • $ - end of string.


See the regex demo.



Python demo:



import re
files = ["gfrt_32_20.csv", "wertf_18.csv", "12_01.csv", "ith_Raw.csv"]
result = [f for f in files if re.search(r'_d{2}.csv$', f)]
print(result)
# => ['gfrt_32_20.csv', 'wertf_18.csv', '12_01.csv']





share|improve this answer


























  • @Borisu There is no need adding the details about re.match and re.search difference into my answer as OP problem is not related to it. Here is a good thread on that.

    – Wiktor Stribiżew
    Nov 22 '18 at 11:48





















2














re.match would not work because it matches at the beginning. Use re.search instead.
But everything else is fine in the previous solution.



import os
import re
files = os.listdir('.')
filenames = [f for f in files if re.search(r'(_d+.csv)', f)]
print(filenames)





share|improve this answer































    1














    Try to use re.match method:



    import os
    import re
    files = os.listdir('.')
    filenames = [f for f in files if re.match(r'(_d+.csv)', f)]
    print(filenames)





    share|improve this answer
























    • it gives me empty; the filename is like :Mydate_2018_11_22.csv; but we only care about 22.csv? right? for this instance, i think match and search are the same?

      – Rowling
      Nov 22 '18 at 8:13











    • @Frank could you provide exmples with full names of files, which you need to collect? Yes, they are same. In this case re.search and re.match can replace each other.

      – Rezvanov Maxim
      Nov 22 '18 at 8:17











    • [f for f in files if re.search(r'(_d+.csv)', f)] works; [f for f in files if re.match(r'(_d+.csv)', f)] doesn't

      – Rowling
      Nov 22 '18 at 8:17











    • Data_100000_11_22.csv

      – Rowling
      Nov 22 '18 at 8:18











    • @Frank try regex here: pythex.org

      – Rezvanov Maxim
      Nov 22 '18 at 8:20



















    1














    You should put the regex operation in the if clause so as to filter out those you don't want.



    You should also escape the . in the regex, since dots have special meaning in regex (match all non-line terminators).



    [filename for filename in files if re.search(r'd{2}.csv$', filename)]


    If you want only the matched bit, you can do a simple substring:



    [filename[-6:] for filename in files if re.search(r'd{2}.csv$', filename)]





    share|improve this answer























      Your Answer






      StackExchange.ifUsing("editor", function () {
      StackExchange.using("externalEditor", function () {
      StackExchange.using("snippets", function () {
      StackExchange.snippets.init();
      });
      });
      }, "code-snippets");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "1"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53426276%2fusing-regex-with-list-comprehension-in-python%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      4 Answers
      4






      active

      oldest

      votes








      4 Answers
      4






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      4














      You need to remove ^ (as it matches the start of string location), add $ at the end of the pattern (to make sure the match is at the end of the string) and escape the dot (else, . matches any char but a line break char).



      Note you must check if there is a match before accessing .group():



      result = [f for f in files if re.search(r'_d{2}.csv$', f)] 


      Details





      • _ - an underscore


      • d{2} - 2 digits


      • . - a literal dot


      • csv - csv text


      • $ - end of string.


      See the regex demo.



      Python demo:



      import re
      files = ["gfrt_32_20.csv", "wertf_18.csv", "12_01.csv", "ith_Raw.csv"]
      result = [f for f in files if re.search(r'_d{2}.csv$', f)]
      print(result)
      # => ['gfrt_32_20.csv', 'wertf_18.csv', '12_01.csv']





      share|improve this answer


























      • @Borisu There is no need adding the details about re.match and re.search difference into my answer as OP problem is not related to it. Here is a good thread on that.

        – Wiktor Stribiżew
        Nov 22 '18 at 11:48


















      4














      You need to remove ^ (as it matches the start of string location), add $ at the end of the pattern (to make sure the match is at the end of the string) and escape the dot (else, . matches any char but a line break char).



      Note you must check if there is a match before accessing .group():



      result = [f for f in files if re.search(r'_d{2}.csv$', f)] 


      Details





      • _ - an underscore


      • d{2} - 2 digits


      • . - a literal dot


      • csv - csv text


      • $ - end of string.


      See the regex demo.



      Python demo:



      import re
      files = ["gfrt_32_20.csv", "wertf_18.csv", "12_01.csv", "ith_Raw.csv"]
      result = [f for f in files if re.search(r'_d{2}.csv$', f)]
      print(result)
      # => ['gfrt_32_20.csv', 'wertf_18.csv', '12_01.csv']





      share|improve this answer


























      • @Borisu There is no need adding the details about re.match and re.search difference into my answer as OP problem is not related to it. Here is a good thread on that.

        – Wiktor Stribiżew
        Nov 22 '18 at 11:48
















      4












      4








      4







      You need to remove ^ (as it matches the start of string location), add $ at the end of the pattern (to make sure the match is at the end of the string) and escape the dot (else, . matches any char but a line break char).



      Note you must check if there is a match before accessing .group():



      result = [f for f in files if re.search(r'_d{2}.csv$', f)] 


      Details





      • _ - an underscore


      • d{2} - 2 digits


      • . - a literal dot


      • csv - csv text


      • $ - end of string.


      See the regex demo.



      Python demo:



      import re
      files = ["gfrt_32_20.csv", "wertf_18.csv", "12_01.csv", "ith_Raw.csv"]
      result = [f for f in files if re.search(r'_d{2}.csv$', f)]
      print(result)
      # => ['gfrt_32_20.csv', 'wertf_18.csv', '12_01.csv']





      share|improve this answer















      You need to remove ^ (as it matches the start of string location), add $ at the end of the pattern (to make sure the match is at the end of the string) and escape the dot (else, . matches any char but a line break char).



      Note you must check if there is a match before accessing .group():



      result = [f for f in files if re.search(r'_d{2}.csv$', f)] 


      Details





      • _ - an underscore


      • d{2} - 2 digits


      • . - a literal dot


      • csv - csv text


      • $ - end of string.


      See the regex demo.



      Python demo:



      import re
      files = ["gfrt_32_20.csv", "wertf_18.csv", "12_01.csv", "ith_Raw.csv"]
      result = [f for f in files if re.search(r'_d{2}.csv$', f)]
      print(result)
      # => ['gfrt_32_20.csv', 'wertf_18.csv', '12_01.csv']






      share|improve this answer














      share|improve this answer



      share|improve this answer








      edited Nov 22 '18 at 8:23

























      answered Nov 22 '18 at 8:17









      Wiktor StribiżewWiktor Stribiżew

      313k16133207




      313k16133207













      • @Borisu There is no need adding the details about re.match and re.search difference into my answer as OP problem is not related to it. Here is a good thread on that.

        – Wiktor Stribiżew
        Nov 22 '18 at 11:48





















      • @Borisu There is no need adding the details about re.match and re.search difference into my answer as OP problem is not related to it. Here is a good thread on that.

        – Wiktor Stribiżew
        Nov 22 '18 at 11:48



















      @Borisu There is no need adding the details about re.match and re.search difference into my answer as OP problem is not related to it. Here is a good thread on that.

      – Wiktor Stribiżew
      Nov 22 '18 at 11:48







      @Borisu There is no need adding the details about re.match and re.search difference into my answer as OP problem is not related to it. Here is a good thread on that.

      – Wiktor Stribiżew
      Nov 22 '18 at 11:48















      2














      re.match would not work because it matches at the beginning. Use re.search instead.
      But everything else is fine in the previous solution.



      import os
      import re
      files = os.listdir('.')
      filenames = [f for f in files if re.search(r'(_d+.csv)', f)]
      print(filenames)





      share|improve this answer




























        2














        re.match would not work because it matches at the beginning. Use re.search instead.
        But everything else is fine in the previous solution.



        import os
        import re
        files = os.listdir('.')
        filenames = [f for f in files if re.search(r'(_d+.csv)', f)]
        print(filenames)





        share|improve this answer


























          2












          2








          2







          re.match would not work because it matches at the beginning. Use re.search instead.
          But everything else is fine in the previous solution.



          import os
          import re
          files = os.listdir('.')
          filenames = [f for f in files if re.search(r'(_d+.csv)', f)]
          print(filenames)





          share|improve this answer













          re.match would not work because it matches at the beginning. Use re.search instead.
          But everything else is fine in the previous solution.



          import os
          import re
          files = os.listdir('.')
          filenames = [f for f in files if re.search(r'(_d+.csv)', f)]
          print(filenames)






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 22 '18 at 8:18









          AResemAResem

          1114




          1114























              1














              Try to use re.match method:



              import os
              import re
              files = os.listdir('.')
              filenames = [f for f in files if re.match(r'(_d+.csv)', f)]
              print(filenames)





              share|improve this answer
























              • it gives me empty; the filename is like :Mydate_2018_11_22.csv; but we only care about 22.csv? right? for this instance, i think match and search are the same?

                – Rowling
                Nov 22 '18 at 8:13











              • @Frank could you provide exmples with full names of files, which you need to collect? Yes, they are same. In this case re.search and re.match can replace each other.

                – Rezvanov Maxim
                Nov 22 '18 at 8:17











              • [f for f in files if re.search(r'(_d+.csv)', f)] works; [f for f in files if re.match(r'(_d+.csv)', f)] doesn't

                – Rowling
                Nov 22 '18 at 8:17











              • Data_100000_11_22.csv

                – Rowling
                Nov 22 '18 at 8:18











              • @Frank try regex here: pythex.org

                – Rezvanov Maxim
                Nov 22 '18 at 8:20
















              1














              Try to use re.match method:



              import os
              import re
              files = os.listdir('.')
              filenames = [f for f in files if re.match(r'(_d+.csv)', f)]
              print(filenames)





              share|improve this answer
























              • it gives me empty; the filename is like :Mydate_2018_11_22.csv; but we only care about 22.csv? right? for this instance, i think match and search are the same?

                – Rowling
                Nov 22 '18 at 8:13











              • @Frank could you provide exmples with full names of files, which you need to collect? Yes, they are same. In this case re.search and re.match can replace each other.

                – Rezvanov Maxim
                Nov 22 '18 at 8:17











              • [f for f in files if re.search(r'(_d+.csv)', f)] works; [f for f in files if re.match(r'(_d+.csv)', f)] doesn't

                – Rowling
                Nov 22 '18 at 8:17











              • Data_100000_11_22.csv

                – Rowling
                Nov 22 '18 at 8:18











              • @Frank try regex here: pythex.org

                – Rezvanov Maxim
                Nov 22 '18 at 8:20














              1












              1








              1







              Try to use re.match method:



              import os
              import re
              files = os.listdir('.')
              filenames = [f for f in files if re.match(r'(_d+.csv)', f)]
              print(filenames)





              share|improve this answer













              Try to use re.match method:



              import os
              import re
              files = os.listdir('.')
              filenames = [f for f in files if re.match(r'(_d+.csv)', f)]
              print(filenames)






              share|improve this answer












              share|improve this answer



              share|improve this answer










              answered Nov 22 '18 at 8:09









              Rezvanov MaximRezvanov Maxim

              1216




              1216













              • it gives me empty; the filename is like :Mydate_2018_11_22.csv; but we only care about 22.csv? right? for this instance, i think match and search are the same?

                – Rowling
                Nov 22 '18 at 8:13











              • @Frank could you provide exmples with full names of files, which you need to collect? Yes, they are same. In this case re.search and re.match can replace each other.

                – Rezvanov Maxim
                Nov 22 '18 at 8:17











              • [f for f in files if re.search(r'(_d+.csv)', f)] works; [f for f in files if re.match(r'(_d+.csv)', f)] doesn't

                – Rowling
                Nov 22 '18 at 8:17











              • Data_100000_11_22.csv

                – Rowling
                Nov 22 '18 at 8:18











              • @Frank try regex here: pythex.org

                – Rezvanov Maxim
                Nov 22 '18 at 8:20



















              • it gives me empty; the filename is like :Mydate_2018_11_22.csv; but we only care about 22.csv? right? for this instance, i think match and search are the same?

                – Rowling
                Nov 22 '18 at 8:13











              • @Frank could you provide exmples with full names of files, which you need to collect? Yes, they are same. In this case re.search and re.match can replace each other.

                – Rezvanov Maxim
                Nov 22 '18 at 8:17











              • [f for f in files if re.search(r'(_d+.csv)', f)] works; [f for f in files if re.match(r'(_d+.csv)', f)] doesn't

                – Rowling
                Nov 22 '18 at 8:17











              • Data_100000_11_22.csv

                – Rowling
                Nov 22 '18 at 8:18











              • @Frank try regex here: pythex.org

                – Rezvanov Maxim
                Nov 22 '18 at 8:20

















              it gives me empty; the filename is like :Mydate_2018_11_22.csv; but we only care about 22.csv? right? for this instance, i think match and search are the same?

              – Rowling
              Nov 22 '18 at 8:13





              it gives me empty; the filename is like :Mydate_2018_11_22.csv; but we only care about 22.csv? right? for this instance, i think match and search are the same?

              – Rowling
              Nov 22 '18 at 8:13













              @Frank could you provide exmples with full names of files, which you need to collect? Yes, they are same. In this case re.search and re.match can replace each other.

              – Rezvanov Maxim
              Nov 22 '18 at 8:17





              @Frank could you provide exmples with full names of files, which you need to collect? Yes, they are same. In this case re.search and re.match can replace each other.

              – Rezvanov Maxim
              Nov 22 '18 at 8:17













              [f for f in files if re.search(r'(_d+.csv)', f)] works; [f for f in files if re.match(r'(_d+.csv)', f)] doesn't

              – Rowling
              Nov 22 '18 at 8:17





              [f for f in files if re.search(r'(_d+.csv)', f)] works; [f for f in files if re.match(r'(_d+.csv)', f)] doesn't

              – Rowling
              Nov 22 '18 at 8:17













              Data_100000_11_22.csv

              – Rowling
              Nov 22 '18 at 8:18





              Data_100000_11_22.csv

              – Rowling
              Nov 22 '18 at 8:18













              @Frank try regex here: pythex.org

              – Rezvanov Maxim
              Nov 22 '18 at 8:20





              @Frank try regex here: pythex.org

              – Rezvanov Maxim
              Nov 22 '18 at 8:20











              1














              You should put the regex operation in the if clause so as to filter out those you don't want.



              You should also escape the . in the regex, since dots have special meaning in regex (match all non-line terminators).



              [filename for filename in files if re.search(r'd{2}.csv$', filename)]


              If you want only the matched bit, you can do a simple substring:



              [filename[-6:] for filename in files if re.search(r'd{2}.csv$', filename)]





              share|improve this answer




























                1














                You should put the regex operation in the if clause so as to filter out those you don't want.



                You should also escape the . in the regex, since dots have special meaning in regex (match all non-line terminators).



                [filename for filename in files if re.search(r'd{2}.csv$', filename)]


                If you want only the matched bit, you can do a simple substring:



                [filename[-6:] for filename in files if re.search(r'd{2}.csv$', filename)]





                share|improve this answer


























                  1












                  1








                  1







                  You should put the regex operation in the if clause so as to filter out those you don't want.



                  You should also escape the . in the regex, since dots have special meaning in regex (match all non-line terminators).



                  [filename for filename in files if re.search(r'd{2}.csv$', filename)]


                  If you want only the matched bit, you can do a simple substring:



                  [filename[-6:] for filename in files if re.search(r'd{2}.csv$', filename)]





                  share|improve this answer













                  You should put the regex operation in the if clause so as to filter out those you don't want.



                  You should also escape the . in the regex, since dots have special meaning in regex (match all non-line terminators).



                  [filename for filename in files if re.search(r'd{2}.csv$', filename)]


                  If you want only the matched bit, you can do a simple substring:



                  [filename[-6:] for filename in files if re.search(r'd{2}.csv$', filename)]






                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Nov 22 '18 at 8:16









                  SweeperSweeper

                  66k1073139




                  66k1073139






























                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Stack Overflow!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53426276%2fusing-regex-with-list-comprehension-in-python%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      Costa Masnaga

                      Fotorealismo

                      Sidney Franklin