Using regex with list comprehension in python

I have following code which will store all the csv filename in a list from a specific folder

import pandas as pd

import re

import os



files = os.listdir('.')

filename=[filename for filename in files if filename.endswith('.csv')]

However, in my folder, I have two types of csv files, one ends with, for example, _20.cvs(or maybe _18.csv,_01.csv), another one ends with _Raw.csv;

However I only need the first type stored in my list. I know regular expression may can help me on that, so I did some google search, and come up with the following code, but it seems doesn't work, can anyone offer a advice?

filename = [re.search(r'^d{2}.csv'),filename).group(0) for filename in files]

asked Nov 22 '18 at 8:00

Rowling

8811

BTW, do you have _20.cvs or _20.csv?

– Wiktor Stribiżew
Nov 22 '18 at 8:10

_20.csv, thanks

– Rowling
Nov 22 '18 at 8:14

add a comment |

I have following code which will store all the csv filename in a list from a specific folder

import pandas as pd

import re

import os



files = os.listdir('.')

filename=[filename for filename in files if filename.endswith('.csv')]

However, in my folder, I have two types of csv files, one ends with, for example, _20.cvs(or maybe _18.csv,_01.csv), another one ends with _Raw.csv;

filename = [re.search(r'^d{2}.csv'),filename).group(0) for filename in files]

asked Nov 22 '18 at 8:00

Rowling

8811

BTW, do you have _20.cvs or _20.csv?

– Wiktor Stribiżew
Nov 22 '18 at 8:10

_20.csv, thanks

– Rowling
Nov 22 '18 at 8:14

add a comment |

I have following code which will store all the csv filename in a list from a specific folder

import pandas as pd

import re

import os



files = os.listdir('.')

filename=[filename for filename in files if filename.endswith('.csv')]

However, in my folder, I have two types of csv files, one ends with, for example, _20.cvs(or maybe _18.csv,_01.csv), another one ends with _Raw.csv;

filename = [re.search(r'^d{2}.csv'),filename).group(0) for filename in files]

asked Nov 22 '18 at 8:00

Rowling

8811

I have following code which will store all the csv filename in a list from a specific folder

import pandas as pd

import re

import os



files = os.listdir('.')

filename=[filename for filename in files if filename.endswith('.csv')]

However, in my folder, I have two types of csv files, one ends with, for example, _20.cvs(or maybe _18.csv,_01.csv), another one ends with _Raw.csv;

filename = [re.search(r'^d{2}.csv'),filename).group(0) for filename in files]

regex python-3.x

asked Nov 22 '18 at 8:00

Rowling

8811

asked Nov 22 '18 at 8:00

Rowling

8811

asked Nov 22 '18 at 8:00

Rowling

8811

asked Nov 22 '18 at 8:00

Rowling

8811

asked Nov 22 '18 at 8:00

Rowling

8811

BTW, do you have _20.cvs or _20.csv?

– Wiktor Stribiżew
Nov 22 '18 at 8:10

_20.csv, thanks

– Rowling
Nov 22 '18 at 8:14

add a comment |

BTW, do you have _20.cvs or _20.csv?

– Wiktor Stribiżew
Nov 22 '18 at 8:10

_20.csv, thanks

– Rowling
Nov 22 '18 at 8:14

BTW, do you have _20.cvs or _20.csv?

– Wiktor Stribiżew
Nov 22 '18 at 8:10

_20.csv, thanks

– Rowling
Nov 22 '18 at 8:14

add a comment |

4 Answers
4

active

oldest

votes

You need to remove ^ (as it matches the start of string location), add $ at the end of the pattern (to make sure the match is at the end of the string) and escape the dot (else, . matches any char but a line break char).

Note you must check if there is a match before accessing .group():

result = [f for f in files if re.search(r'_d{2}.csv$', f)]

Details

_ - an underscore

d{2} - 2 digits

. - a literal dot

csv - csv text

$ - end of string.

See the regex demo.

Python demo:

import re

files = ["gfrt_32_20.csv", "wertf_18.csv", "12_01.csv", "ith_Raw.csv"]

result = [f for f in files if re.search(r'_d{2}.csv$', f)] 

print(result)

# => ['gfrt_32_20.csv', 'wertf_18.csv', '12_01.csv']

edited Nov 22 '18 at 8:23

answered Nov 22 '18 at 8:17

Wiktor Stribiżew

313k16133207

@Borisu There is no need adding the details about re.match and re.search difference into my answer as OP problem is not related to it. Here is a good thread on that.

– Wiktor Stribiżew
Nov 22 '18 at 11:48

add a comment |

re.match would not work because it matches at the beginning. Use re.search instead.
But everything else is fine in the previous solution.

import os

import re

files = os.listdir('.')

filenames = [f for f in files if re.search(r'(_d+.csv)', f)]

print(filenames)

answered Nov 22 '18 at 8:18

AResem

1114

add a comment |

Try to use re.match method:

import os

import re

files = os.listdir('.')

filenames = [f for f in files if re.match(r'(_d+.csv)', f)]

print(filenames)

answered Nov 22 '18 at 8:09

Rezvanov Maxim

1216

it gives me empty; the filename is like :Mydate_2018_11_22.csv; but we only care about 22.csv? right? for this instance, i think match and search are the same?

– Rowling
Nov 22 '18 at 8:13

@Frank could you provide exmples with full names of files, which you need to collect? Yes, they are same. In this case re.search and re.match can replace each other.

– Rezvanov Maxim
Nov 22 '18 at 8:17

[f for f in files if re.search(r'(_d+.csv)', f)] works; [f for f in files if re.match(r'(_d+.csv)', f)] doesn't

– Rowling
Nov 22 '18 at 8:17

Data_100000_11_22.csv

– Rowling
Nov 22 '18 at 8:18

@Frank try regex here: pythex.org

– Rezvanov Maxim
Nov 22 '18 at 8:20

|
show 3 more comments

You should put the regex operation in the if clause so as to filter out those you don't want.

You should also escape the . in the regex, since dots have special meaning in regex (match all non-line terminators).

[filename for filename in files if re.search(r'd{2}.csv$', filename)]

If you want only the matched bit, you can do a simple substring:

[filename[-6:] for filename in files if re.search(r'd{2}.csv$', filename)]

answered Nov 22 '18 at 8:16

Sweeper

66k1073139

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53426276%2fusing-regex-with-list-comprehension-in-python%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

4 Answers
4

active

oldest

votes

4 Answers
4

active

oldest

votes

Note you must check if there is a match before accessing .group():

result = [f for f in files if re.search(r'_d{2}.csv$', f)]

Details

_ - an underscore

d{2} - 2 digits

. - a literal dot

csv - csv text

$ - end of string.

See the regex demo.

Python demo:

import re

files = ["gfrt_32_20.csv", "wertf_18.csv", "12_01.csv", "ith_Raw.csv"]

result = [f for f in files if re.search(r'_d{2}.csv$', f)] 

print(result)

# => ['gfrt_32_20.csv', 'wertf_18.csv', '12_01.csv']

edited Nov 22 '18 at 8:23

answered Nov 22 '18 at 8:17

Wiktor Stribiżew

313k16133207

@Borisu There is no need adding the details about re.match and re.search difference into my answer as OP problem is not related to it. Here is a good thread on that.

– Wiktor Stribiżew
Nov 22 '18 at 11:48

add a comment |

Note you must check if there is a match before accessing .group():

result = [f for f in files if re.search(r'_d{2}.csv$', f)]

Details

_ - an underscore

d{2} - 2 digits

. - a literal dot

csv - csv text

$ - end of string.

See the regex demo.

Python demo:

import re

files = ["gfrt_32_20.csv", "wertf_18.csv", "12_01.csv", "ith_Raw.csv"]

result = [f for f in files if re.search(r'_d{2}.csv$', f)] 

print(result)

# => ['gfrt_32_20.csv', 'wertf_18.csv', '12_01.csv']

edited Nov 22 '18 at 8:23

answered Nov 22 '18 at 8:17

Wiktor Stribiżew

313k16133207

@Borisu There is no need adding the details about re.match and re.search difference into my answer as OP problem is not related to it. Here is a good thread on that.

– Wiktor Stribiżew
Nov 22 '18 at 11:48

add a comment |

Note you must check if there is a match before accessing .group():

result = [f for f in files if re.search(r'_d{2}.csv$', f)]

Details

_ - an underscore

d{2} - 2 digits

. - a literal dot

csv - csv text

$ - end of string.

See the regex demo.

Python demo:

import re

files = ["gfrt_32_20.csv", "wertf_18.csv", "12_01.csv", "ith_Raw.csv"]

result = [f for f in files if re.search(r'_d{2}.csv$', f)] 

print(result)

# => ['gfrt_32_20.csv', 'wertf_18.csv', '12_01.csv']

edited Nov 22 '18 at 8:23

answered Nov 22 '18 at 8:17

Wiktor Stribiżew

313k16133207

Note you must check if there is a match before accessing .group():

result = [f for f in files if re.search(r'_d{2}.csv$', f)]

Details

_ - an underscore

d{2} - 2 digits

. - a literal dot

csv - csv text

$ - end of string.

See the regex demo.

Python demo:

import re

files = ["gfrt_32_20.csv", "wertf_18.csv", "12_01.csv", "ith_Raw.csv"]

result = [f for f in files if re.search(r'_d{2}.csv$', f)] 

print(result)

# => ['gfrt_32_20.csv', 'wertf_18.csv', '12_01.csv']

edited Nov 22 '18 at 8:23

answered Nov 22 '18 at 8:17

Wiktor Stribiżew

313k16133207

edited Nov 22 '18 at 8:23

answered Nov 22 '18 at 8:17

Wiktor Stribiżew

313k16133207

answered Nov 22 '18 at 8:17

Wiktor Stribiżew

313k16133207

answered Nov 22 '18 at 8:17

Wiktor Stribiżew

313k16133207

@Borisu There is no need adding the details about re.match and re.search difference into my answer as OP problem is not related to it. Here is a good thread on that.

– Wiktor Stribiżew
Nov 22 '18 at 11:48

add a comment |

@Borisu There is no need adding the details about re.match and re.search difference into my answer as OP problem is not related to it. Here is a good thread on that.

– Wiktor Stribiżew
Nov 22 '18 at 11:48

@Borisu There is no need adding the details about re.match and re.search difference into my answer as OP problem is not related to it. Here is a good thread on that.

– Wiktor Stribiżew
Nov 22 '18 at 11:48

add a comment |

re.match would not work because it matches at the beginning. Use re.search instead.
But everything else is fine in the previous solution.

import os

import re

files = os.listdir('.')

filenames = [f for f in files if re.search(r'(_d+.csv)', f)]

print(filenames)

answered Nov 22 '18 at 8:18

AResem

1114

add a comment |

re.match would not work because it matches at the beginning. Use re.search instead.
But everything else is fine in the previous solution.

import os

import re

files = os.listdir('.')

filenames = [f for f in files if re.search(r'(_d+.csv)', f)]

print(filenames)

answered Nov 22 '18 at 8:18

AResem

1114

add a comment |

re.match would not work because it matches at the beginning. Use re.search instead.
But everything else is fine in the previous solution.

import os

import re

files = os.listdir('.')

filenames = [f for f in files if re.search(r'(_d+.csv)', f)]

print(filenames)

answered Nov 22 '18 at 8:18

AResem

1114

re.match would not work because it matches at the beginning. Use re.search instead.
But everything else is fine in the previous solution.

import os

import re

files = os.listdir('.')

filenames = [f for f in files if re.search(r'(_d+.csv)', f)]

print(filenames)

answered Nov 22 '18 at 8:18

AResem

1114

answered Nov 22 '18 at 8:18

AResem

1114

answered Nov 22 '18 at 8:18

AResem

1114

answered Nov 22 '18 at 8:18

AResem

1114

add a comment |

Try to use re.match method:

import os

import re

files = os.listdir('.')

filenames = [f for f in files if re.match(r'(_d+.csv)', f)]

print(filenames)

answered Nov 22 '18 at 8:09

Rezvanov Maxim

1216

it gives me empty; the filename is like :Mydate_2018_11_22.csv; but we only care about 22.csv? right? for this instance, i think match and search are the same?

– Rowling
Nov 22 '18 at 8:13

@Frank could you provide exmples with full names of files, which you need to collect? Yes, they are same. In this case re.search and re.match can replace each other.

– Rezvanov Maxim
Nov 22 '18 at 8:17

[f for f in files if re.search(r'(_d+.csv)', f)] works; [f for f in files if re.match(r'(_d+.csv)', f)] doesn't

– Rowling
Nov 22 '18 at 8:17

Data_100000_11_22.csv

– Rowling
Nov 22 '18 at 8:18

@Frank try regex here: pythex.org

– Rezvanov Maxim
Nov 22 '18 at 8:20

|
show 3 more comments

Try to use re.match method:

import os

import re

files = os.listdir('.')

filenames = [f for f in files if re.match(r'(_d+.csv)', f)]

print(filenames)

answered Nov 22 '18 at 8:09

Rezvanov Maxim

1216

it gives me empty; the filename is like :Mydate_2018_11_22.csv; but we only care about 22.csv? right? for this instance, i think match and search are the same?

– Rowling
Nov 22 '18 at 8:13

@Frank could you provide exmples with full names of files, which you need to collect? Yes, they are same. In this case re.search and re.match can replace each other.

– Rezvanov Maxim
Nov 22 '18 at 8:17

[f for f in files if re.search(r'(_d+.csv)', f)] works; [f for f in files if re.match(r'(_d+.csv)', f)] doesn't

– Rowling
Nov 22 '18 at 8:17

Data_100000_11_22.csv

– Rowling
Nov 22 '18 at 8:18

@Frank try regex here: pythex.org

– Rezvanov Maxim
Nov 22 '18 at 8:20

|
show 3 more comments

Try to use re.match method:

import os

import re

files = os.listdir('.')

filenames = [f for f in files if re.match(r'(_d+.csv)', f)]

print(filenames)

answered Nov 22 '18 at 8:09

Rezvanov Maxim

1216

Try to use re.match method:

import os

import re

files = os.listdir('.')

filenames = [f for f in files if re.match(r'(_d+.csv)', f)]

print(filenames)

answered Nov 22 '18 at 8:09

Rezvanov Maxim

1216

answered Nov 22 '18 at 8:09

Rezvanov Maxim

1216

answered Nov 22 '18 at 8:09

Rezvanov Maxim

1216

answered Nov 22 '18 at 8:09

Rezvanov Maxim

1216

it gives me empty; the filename is like :Mydate_2018_11_22.csv; but we only care about 22.csv? right? for this instance, i think match and search are the same?

– Rowling
Nov 22 '18 at 8:13

@Frank could you provide exmples with full names of files, which you need to collect? Yes, they are same. In this case re.search and re.match can replace each other.

– Rezvanov Maxim
Nov 22 '18 at 8:17

[f for f in files if re.search(r'(_d+.csv)', f)] works; [f for f in files if re.match(r'(_d+.csv)', f)] doesn't

– Rowling
Nov 22 '18 at 8:17

Data_100000_11_22.csv

– Rowling
Nov 22 '18 at 8:18

@Frank try regex here: pythex.org

– Rezvanov Maxim
Nov 22 '18 at 8:20

|
show 3 more comments

it gives me empty; the filename is like :Mydate_2018_11_22.csv; but we only care about 22.csv? right? for this instance, i think match and search are the same?

– Rowling
Nov 22 '18 at 8:13

@Frank could you provide exmples with full names of files, which you need to collect? Yes, they are same. In this case re.search and re.match can replace each other.

– Rezvanov Maxim
Nov 22 '18 at 8:17

[f for f in files if re.search(r'(_d+.csv)', f)] works; [f for f in files if re.match(r'(_d+.csv)', f)] doesn't

– Rowling
Nov 22 '18 at 8:17

Data_100000_11_22.csv

– Rowling
Nov 22 '18 at 8:18

@Frank try regex here: pythex.org

– Rezvanov Maxim
Nov 22 '18 at 8:20

it gives me empty; the filename is like :Mydate_2018_11_22.csv; but we only care about 22.csv? right? for this instance, i think match and search are the same?

– Rowling
Nov 22 '18 at 8:13

@Frank could you provide exmples with full names of files, which you need to collect? Yes, they are same. In this case re.search and re.match can replace each other.

– Rezvanov Maxim
Nov 22 '18 at 8:17

[f for f in files if re.search(r'(_d+.csv)', f)] works; [f for f in files if re.match(r'(_d+.csv)', f)] doesn't

– Rowling
Nov 22 '18 at 8:17

Data_100000_11_22.csv

– Rowling
Nov 22 '18 at 8:18

@Frank try regex here: pythex.org

– Rezvanov Maxim
Nov 22 '18 at 8:20

|
show 3 more comments

You should put the regex operation in the if clause so as to filter out those you don't want.

You should also escape the . in the regex, since dots have special meaning in regex (match all non-line terminators).

[filename for filename in files if re.search(r'd{2}.csv$', filename)]

If you want only the matched bit, you can do a simple substring:

[filename[-6:] for filename in files if re.search(r'd{2}.csv$', filename)]

answered Nov 22 '18 at 8:16

Sweeper

66k1073139

add a comment |

You should put the regex operation in the if clause so as to filter out those you don't want.

You should also escape the . in the regex, since dots have special meaning in regex (match all non-line terminators).

[filename for filename in files if re.search(r'd{2}.csv$', filename)]

If you want only the matched bit, you can do a simple substring:

[filename[-6:] for filename in files if re.search(r'd{2}.csv$', filename)]

answered Nov 22 '18 at 8:16

Sweeper

66k1073139

add a comment |

You should put the regex operation in the if clause so as to filter out those you don't want.

You should also escape the . in the regex, since dots have special meaning in regex (match all non-line terminators).

[filename for filename in files if re.search(r'd{2}.csv$', filename)]

If you want only the matched bit, you can do a simple substring:

[filename[-6:] for filename in files if re.search(r'd{2}.csv$', filename)]

answered Nov 22 '18 at 8:16

Sweeper

66k1073139

You should put the regex operation in the if clause so as to filter out those you don't want.

You should also escape the . in the regex, since dots have special meaning in regex (match all non-line terminators).

[filename for filename in files if re.search(r'd{2}.csv$', filename)]

If you want only the matched bit, you can do a simple substring:

[filename[-6:] for filename in files if re.search(r'd{2}.csv$', filename)]

answered Nov 22 '18 at 8:16

Sweeper

66k1073139

answered Nov 22 '18 at 8:16

Sweeper

66k1073139

answered Nov 22 '18 at 8:16

Sweeper

66k1073139

answered Nov 22 '18 at 8:16

Sweeper

66k1073139

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Nsryjdtyk