Using regex with list comprehension in python
I have following code which will store all the csv filename in a list from a specific folder
import pandas as pd
import re
import os
files = os.listdir('.')
filename=[filename for filename in files if filename.endswith('.csv')]
However, in my folder, I have two types of csv files, one ends with, for example, _20.cvs(or maybe _18.csv,_01.csv), another one ends with _Raw.csv;
However I only need the first type stored in my list. I know regular expression may can help me on that, so I did some google search, and come up with the following code, but it seems doesn't work, can anyone offer a advice?
filename = [re.search(r'^d{2}.csv'),filename).group(0) for filename in files]
regex python-3.x
add a comment |
I have following code which will store all the csv filename in a list from a specific folder
import pandas as pd
import re
import os
files = os.listdir('.')
filename=[filename for filename in files if filename.endswith('.csv')]
However, in my folder, I have two types of csv files, one ends with, for example, _20.cvs(or maybe _18.csv,_01.csv), another one ends with _Raw.csv;
However I only need the first type stored in my list. I know regular expression may can help me on that, so I did some google search, and come up with the following code, but it seems doesn't work, can anyone offer a advice?
filename = [re.search(r'^d{2}.csv'),filename).group(0) for filename in files]
regex python-3.x
BTW, do you have_20.cvs
or_20.csv
?
– Wiktor Stribiżew
Nov 22 '18 at 8:10
_20.csv, thanks
– Rowling
Nov 22 '18 at 8:14
add a comment |
I have following code which will store all the csv filename in a list from a specific folder
import pandas as pd
import re
import os
files = os.listdir('.')
filename=[filename for filename in files if filename.endswith('.csv')]
However, in my folder, I have two types of csv files, one ends with, for example, _20.cvs(or maybe _18.csv,_01.csv), another one ends with _Raw.csv;
However I only need the first type stored in my list. I know regular expression may can help me on that, so I did some google search, and come up with the following code, but it seems doesn't work, can anyone offer a advice?
filename = [re.search(r'^d{2}.csv'),filename).group(0) for filename in files]
regex python-3.x
I have following code which will store all the csv filename in a list from a specific folder
import pandas as pd
import re
import os
files = os.listdir('.')
filename=[filename for filename in files if filename.endswith('.csv')]
However, in my folder, I have two types of csv files, one ends with, for example, _20.cvs(or maybe _18.csv,_01.csv), another one ends with _Raw.csv;
However I only need the first type stored in my list. I know regular expression may can help me on that, so I did some google search, and come up with the following code, but it seems doesn't work, can anyone offer a advice?
filename = [re.search(r'^d{2}.csv'),filename).group(0) for filename in files]
regex python-3.x
regex python-3.x
asked Nov 22 '18 at 8:00
RowlingRowling
8811
8811
BTW, do you have_20.cvs
or_20.csv
?
– Wiktor Stribiżew
Nov 22 '18 at 8:10
_20.csv, thanks
– Rowling
Nov 22 '18 at 8:14
add a comment |
BTW, do you have_20.cvs
or_20.csv
?
– Wiktor Stribiżew
Nov 22 '18 at 8:10
_20.csv, thanks
– Rowling
Nov 22 '18 at 8:14
BTW, do you have
_20.cvs
or _20.csv
?– Wiktor Stribiżew
Nov 22 '18 at 8:10
BTW, do you have
_20.cvs
or _20.csv
?– Wiktor Stribiżew
Nov 22 '18 at 8:10
_20.csv, thanks
– Rowling
Nov 22 '18 at 8:14
_20.csv, thanks
– Rowling
Nov 22 '18 at 8:14
add a comment |
4 Answers
4
active
oldest
votes
You need to remove ^
(as it matches the start of string location), add $
at the end of the pattern (to make sure the match is at the end of the string) and escape the dot (else, .
matches any char but a line break char).
Note you must check if there is a match before accessing .group()
:
result = [f for f in files if re.search(r'_d{2}.csv$', f)]
Details
_
- an underscore
d{2}
- 2 digits
.
- a literal dot
csv
-csv
text
$
- end of string.
See the regex demo.
Python demo:
import re
files = ["gfrt_32_20.csv", "wertf_18.csv", "12_01.csv", "ith_Raw.csv"]
result = [f for f in files if re.search(r'_d{2}.csv$', f)]
print(result)
# => ['gfrt_32_20.csv', 'wertf_18.csv', '12_01.csv']
@Borisu There is no need adding the details aboutre.match
andre.search
difference into my answer as OP problem is not related to it. Here is a good thread on that.
– Wiktor Stribiżew
Nov 22 '18 at 11:48
add a comment |
re.match would not work because it matches at the beginning. Use re.search instead.
But everything else is fine in the previous solution.
import os
import re
files = os.listdir('.')
filenames = [f for f in files if re.search(r'(_d+.csv)', f)]
print(filenames)
add a comment |
Try to use re.match
method:
import os
import re
files = os.listdir('.')
filenames = [f for f in files if re.match(r'(_d+.csv)', f)]
print(filenames)
it gives me empty; the filename is like :Mydate_2018_11_22.csv; but we only care about 22.csv? right? for this instance, i think match and search are the same?
– Rowling
Nov 22 '18 at 8:13
@Frank could you provide exmples with full names of files, which you need to collect? Yes, they are same. In this casere.search
andre.match
can replace each other.
– Rezvanov Maxim
Nov 22 '18 at 8:17
[f for f in files if re.search(r'(_d+.csv)', f)] works; [f for f in files if re.match(r'(_d+.csv)', f)] doesn't
– Rowling
Nov 22 '18 at 8:17
Data_100000_11_22.csv
– Rowling
Nov 22 '18 at 8:18
@Frank try regex here: pythex.org
– Rezvanov Maxim
Nov 22 '18 at 8:20
|
show 3 more comments
You should put the regex operation in the if
clause so as to filter out those you don't want.
You should also escape the .
in the regex, since dots have special meaning in regex (match all non-line terminators).
[filename for filename in files if re.search(r'd{2}.csv$', filename)]
If you want only the matched bit, you can do a simple substring:
[filename[-6:] for filename in files if re.search(r'd{2}.csv$', filename)]
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53426276%2fusing-regex-with-list-comprehension-in-python%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
You need to remove ^
(as it matches the start of string location), add $
at the end of the pattern (to make sure the match is at the end of the string) and escape the dot (else, .
matches any char but a line break char).
Note you must check if there is a match before accessing .group()
:
result = [f for f in files if re.search(r'_d{2}.csv$', f)]
Details
_
- an underscore
d{2}
- 2 digits
.
- a literal dot
csv
-csv
text
$
- end of string.
See the regex demo.
Python demo:
import re
files = ["gfrt_32_20.csv", "wertf_18.csv", "12_01.csv", "ith_Raw.csv"]
result = [f for f in files if re.search(r'_d{2}.csv$', f)]
print(result)
# => ['gfrt_32_20.csv', 'wertf_18.csv', '12_01.csv']
@Borisu There is no need adding the details aboutre.match
andre.search
difference into my answer as OP problem is not related to it. Here is a good thread on that.
– Wiktor Stribiżew
Nov 22 '18 at 11:48
add a comment |
You need to remove ^
(as it matches the start of string location), add $
at the end of the pattern (to make sure the match is at the end of the string) and escape the dot (else, .
matches any char but a line break char).
Note you must check if there is a match before accessing .group()
:
result = [f for f in files if re.search(r'_d{2}.csv$', f)]
Details
_
- an underscore
d{2}
- 2 digits
.
- a literal dot
csv
-csv
text
$
- end of string.
See the regex demo.
Python demo:
import re
files = ["gfrt_32_20.csv", "wertf_18.csv", "12_01.csv", "ith_Raw.csv"]
result = [f for f in files if re.search(r'_d{2}.csv$', f)]
print(result)
# => ['gfrt_32_20.csv', 'wertf_18.csv', '12_01.csv']
@Borisu There is no need adding the details aboutre.match
andre.search
difference into my answer as OP problem is not related to it. Here is a good thread on that.
– Wiktor Stribiżew
Nov 22 '18 at 11:48
add a comment |
You need to remove ^
(as it matches the start of string location), add $
at the end of the pattern (to make sure the match is at the end of the string) and escape the dot (else, .
matches any char but a line break char).
Note you must check if there is a match before accessing .group()
:
result = [f for f in files if re.search(r'_d{2}.csv$', f)]
Details
_
- an underscore
d{2}
- 2 digits
.
- a literal dot
csv
-csv
text
$
- end of string.
See the regex demo.
Python demo:
import re
files = ["gfrt_32_20.csv", "wertf_18.csv", "12_01.csv", "ith_Raw.csv"]
result = [f for f in files if re.search(r'_d{2}.csv$', f)]
print(result)
# => ['gfrt_32_20.csv', 'wertf_18.csv', '12_01.csv']
You need to remove ^
(as it matches the start of string location), add $
at the end of the pattern (to make sure the match is at the end of the string) and escape the dot (else, .
matches any char but a line break char).
Note you must check if there is a match before accessing .group()
:
result = [f for f in files if re.search(r'_d{2}.csv$', f)]
Details
_
- an underscore
d{2}
- 2 digits
.
- a literal dot
csv
-csv
text
$
- end of string.
See the regex demo.
Python demo:
import re
files = ["gfrt_32_20.csv", "wertf_18.csv", "12_01.csv", "ith_Raw.csv"]
result = [f for f in files if re.search(r'_d{2}.csv$', f)]
print(result)
# => ['gfrt_32_20.csv', 'wertf_18.csv', '12_01.csv']
edited Nov 22 '18 at 8:23
answered Nov 22 '18 at 8:17
Wiktor StribiżewWiktor Stribiżew
313k16133207
313k16133207
@Borisu There is no need adding the details aboutre.match
andre.search
difference into my answer as OP problem is not related to it. Here is a good thread on that.
– Wiktor Stribiżew
Nov 22 '18 at 11:48
add a comment |
@Borisu There is no need adding the details aboutre.match
andre.search
difference into my answer as OP problem is not related to it. Here is a good thread on that.
– Wiktor Stribiżew
Nov 22 '18 at 11:48
@Borisu There is no need adding the details about
re.match
and re.search
difference into my answer as OP problem is not related to it. Here is a good thread on that.– Wiktor Stribiżew
Nov 22 '18 at 11:48
@Borisu There is no need adding the details about
re.match
and re.search
difference into my answer as OP problem is not related to it. Here is a good thread on that.– Wiktor Stribiżew
Nov 22 '18 at 11:48
add a comment |
re.match would not work because it matches at the beginning. Use re.search instead.
But everything else is fine in the previous solution.
import os
import re
files = os.listdir('.')
filenames = [f for f in files if re.search(r'(_d+.csv)', f)]
print(filenames)
add a comment |
re.match would not work because it matches at the beginning. Use re.search instead.
But everything else is fine in the previous solution.
import os
import re
files = os.listdir('.')
filenames = [f for f in files if re.search(r'(_d+.csv)', f)]
print(filenames)
add a comment |
re.match would not work because it matches at the beginning. Use re.search instead.
But everything else is fine in the previous solution.
import os
import re
files = os.listdir('.')
filenames = [f for f in files if re.search(r'(_d+.csv)', f)]
print(filenames)
re.match would not work because it matches at the beginning. Use re.search instead.
But everything else is fine in the previous solution.
import os
import re
files = os.listdir('.')
filenames = [f for f in files if re.search(r'(_d+.csv)', f)]
print(filenames)
answered Nov 22 '18 at 8:18
AResemAResem
1114
1114
add a comment |
add a comment |
Try to use re.match
method:
import os
import re
files = os.listdir('.')
filenames = [f for f in files if re.match(r'(_d+.csv)', f)]
print(filenames)
it gives me empty; the filename is like :Mydate_2018_11_22.csv; but we only care about 22.csv? right? for this instance, i think match and search are the same?
– Rowling
Nov 22 '18 at 8:13
@Frank could you provide exmples with full names of files, which you need to collect? Yes, they are same. In this casere.search
andre.match
can replace each other.
– Rezvanov Maxim
Nov 22 '18 at 8:17
[f for f in files if re.search(r'(_d+.csv)', f)] works; [f for f in files if re.match(r'(_d+.csv)', f)] doesn't
– Rowling
Nov 22 '18 at 8:17
Data_100000_11_22.csv
– Rowling
Nov 22 '18 at 8:18
@Frank try regex here: pythex.org
– Rezvanov Maxim
Nov 22 '18 at 8:20
|
show 3 more comments
Try to use re.match
method:
import os
import re
files = os.listdir('.')
filenames = [f for f in files if re.match(r'(_d+.csv)', f)]
print(filenames)
it gives me empty; the filename is like :Mydate_2018_11_22.csv; but we only care about 22.csv? right? for this instance, i think match and search are the same?
– Rowling
Nov 22 '18 at 8:13
@Frank could you provide exmples with full names of files, which you need to collect? Yes, they are same. In this casere.search
andre.match
can replace each other.
– Rezvanov Maxim
Nov 22 '18 at 8:17
[f for f in files if re.search(r'(_d+.csv)', f)] works; [f for f in files if re.match(r'(_d+.csv)', f)] doesn't
– Rowling
Nov 22 '18 at 8:17
Data_100000_11_22.csv
– Rowling
Nov 22 '18 at 8:18
@Frank try regex here: pythex.org
– Rezvanov Maxim
Nov 22 '18 at 8:20
|
show 3 more comments
Try to use re.match
method:
import os
import re
files = os.listdir('.')
filenames = [f for f in files if re.match(r'(_d+.csv)', f)]
print(filenames)
Try to use re.match
method:
import os
import re
files = os.listdir('.')
filenames = [f for f in files if re.match(r'(_d+.csv)', f)]
print(filenames)
answered Nov 22 '18 at 8:09
Rezvanov MaximRezvanov Maxim
1216
1216
it gives me empty; the filename is like :Mydate_2018_11_22.csv; but we only care about 22.csv? right? for this instance, i think match and search are the same?
– Rowling
Nov 22 '18 at 8:13
@Frank could you provide exmples with full names of files, which you need to collect? Yes, they are same. In this casere.search
andre.match
can replace each other.
– Rezvanov Maxim
Nov 22 '18 at 8:17
[f for f in files if re.search(r'(_d+.csv)', f)] works; [f for f in files if re.match(r'(_d+.csv)', f)] doesn't
– Rowling
Nov 22 '18 at 8:17
Data_100000_11_22.csv
– Rowling
Nov 22 '18 at 8:18
@Frank try regex here: pythex.org
– Rezvanov Maxim
Nov 22 '18 at 8:20
|
show 3 more comments
it gives me empty; the filename is like :Mydate_2018_11_22.csv; but we only care about 22.csv? right? for this instance, i think match and search are the same?
– Rowling
Nov 22 '18 at 8:13
@Frank could you provide exmples with full names of files, which you need to collect? Yes, they are same. In this casere.search
andre.match
can replace each other.
– Rezvanov Maxim
Nov 22 '18 at 8:17
[f for f in files if re.search(r'(_d+.csv)', f)] works; [f for f in files if re.match(r'(_d+.csv)', f)] doesn't
– Rowling
Nov 22 '18 at 8:17
Data_100000_11_22.csv
– Rowling
Nov 22 '18 at 8:18
@Frank try regex here: pythex.org
– Rezvanov Maxim
Nov 22 '18 at 8:20
it gives me empty; the filename is like :Mydate_2018_11_22.csv; but we only care about 22.csv? right? for this instance, i think match and search are the same?
– Rowling
Nov 22 '18 at 8:13
it gives me empty; the filename is like :Mydate_2018_11_22.csv; but we only care about 22.csv? right? for this instance, i think match and search are the same?
– Rowling
Nov 22 '18 at 8:13
@Frank could you provide exmples with full names of files, which you need to collect? Yes, they are same. In this case
re.search
and re.match
can replace each other.– Rezvanov Maxim
Nov 22 '18 at 8:17
@Frank could you provide exmples with full names of files, which you need to collect? Yes, they are same. In this case
re.search
and re.match
can replace each other.– Rezvanov Maxim
Nov 22 '18 at 8:17
[f for f in files if re.search(r'(_d+.csv)', f)] works; [f for f in files if re.match(r'(_d+.csv)', f)] doesn't
– Rowling
Nov 22 '18 at 8:17
[f for f in files if re.search(r'(_d+.csv)', f)] works; [f for f in files if re.match(r'(_d+.csv)', f)] doesn't
– Rowling
Nov 22 '18 at 8:17
Data_100000_11_22.csv
– Rowling
Nov 22 '18 at 8:18
Data_100000_11_22.csv
– Rowling
Nov 22 '18 at 8:18
@Frank try regex here: pythex.org
– Rezvanov Maxim
Nov 22 '18 at 8:20
@Frank try regex here: pythex.org
– Rezvanov Maxim
Nov 22 '18 at 8:20
|
show 3 more comments
You should put the regex operation in the if
clause so as to filter out those you don't want.
You should also escape the .
in the regex, since dots have special meaning in regex (match all non-line terminators).
[filename for filename in files if re.search(r'd{2}.csv$', filename)]
If you want only the matched bit, you can do a simple substring:
[filename[-6:] for filename in files if re.search(r'd{2}.csv$', filename)]
add a comment |
You should put the regex operation in the if
clause so as to filter out those you don't want.
You should also escape the .
in the regex, since dots have special meaning in regex (match all non-line terminators).
[filename for filename in files if re.search(r'd{2}.csv$', filename)]
If you want only the matched bit, you can do a simple substring:
[filename[-6:] for filename in files if re.search(r'd{2}.csv$', filename)]
add a comment |
You should put the regex operation in the if
clause so as to filter out those you don't want.
You should also escape the .
in the regex, since dots have special meaning in regex (match all non-line terminators).
[filename for filename in files if re.search(r'd{2}.csv$', filename)]
If you want only the matched bit, you can do a simple substring:
[filename[-6:] for filename in files if re.search(r'd{2}.csv$', filename)]
You should put the regex operation in the if
clause so as to filter out those you don't want.
You should also escape the .
in the regex, since dots have special meaning in regex (match all non-line terminators).
[filename for filename in files if re.search(r'd{2}.csv$', filename)]
If you want only the matched bit, you can do a simple substring:
[filename[-6:] for filename in files if re.search(r'd{2}.csv$', filename)]
answered Nov 22 '18 at 8:16
SweeperSweeper
66k1073139
66k1073139
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53426276%2fusing-regex-with-list-comprehension-in-python%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
BTW, do you have
_20.cvs
or_20.csv
?– Wiktor Stribiżew
Nov 22 '18 at 8:10
_20.csv, thanks
– Rowling
Nov 22 '18 at 8:14