Regular expressions: Ensuring b doesn't come between a and c
Here's something I'm trying to do with regular expressions, and I can't figure out how. I have a big file, and strings abc
, 123
and xyz
that appear multiple times throughout the file.
I want a regular expression to match a substring of the big file that begins with abc
, contains 123
somewhere in the middle, ends with xyz
, and there are no other instances of abc
or xyz
in the substring besides the start and the end.
Is this possible with regular expressions?
regex python-2.7
add a comment |
Here's something I'm trying to do with regular expressions, and I can't figure out how. I have a big file, and strings abc
, 123
and xyz
that appear multiple times throughout the file.
I want a regular expression to match a substring of the big file that begins with abc
, contains 123
somewhere in the middle, ends with xyz
, and there are no other instances of abc
or xyz
in the substring besides the start and the end.
Is this possible with regular expressions?
regex python-2.7
5
Since regular expressions are not fully standardized, all questions with this tag should also include a tag specifying the applicable programming language or tool. That said, is there any particular reason you want to use regular expressions here? It's possible, but in most environments, it's more complicated than not using regexes.
– user743382
May 15 '16 at 15:56
Should line breaks be considered or not? The big file will be read line by line or as one big string?
– Jorge Campos
May 15 '16 at 15:59
Regex flavor is python 2.7, newlines should be included.
– Ram Rachum
May 15 '16 at 16:16
add a comment |
Here's something I'm trying to do with regular expressions, and I can't figure out how. I have a big file, and strings abc
, 123
and xyz
that appear multiple times throughout the file.
I want a regular expression to match a substring of the big file that begins with abc
, contains 123
somewhere in the middle, ends with xyz
, and there are no other instances of abc
or xyz
in the substring besides the start and the end.
Is this possible with regular expressions?
regex python-2.7
Here's something I'm trying to do with regular expressions, and I can't figure out how. I have a big file, and strings abc
, 123
and xyz
that appear multiple times throughout the file.
I want a regular expression to match a substring of the big file that begins with abc
, contains 123
somewhere in the middle, ends with xyz
, and there are no other instances of abc
or xyz
in the substring besides the start and the end.
Is this possible with regular expressions?
regex python-2.7
regex python-2.7
edited May 15 '16 at 16:16
Jorge Campos
16.6k63665
16.6k63665
asked May 15 '16 at 15:53
Ram Rachum
25k59181294
25k59181294
5
Since regular expressions are not fully standardized, all questions with this tag should also include a tag specifying the applicable programming language or tool. That said, is there any particular reason you want to use regular expressions here? It's possible, but in most environments, it's more complicated than not using regexes.
– user743382
May 15 '16 at 15:56
Should line breaks be considered or not? The big file will be read line by line or as one big string?
– Jorge Campos
May 15 '16 at 15:59
Regex flavor is python 2.7, newlines should be included.
– Ram Rachum
May 15 '16 at 16:16
add a comment |
5
Since regular expressions are not fully standardized, all questions with this tag should also include a tag specifying the applicable programming language or tool. That said, is there any particular reason you want to use regular expressions here? It's possible, but in most environments, it's more complicated than not using regexes.
– user743382
May 15 '16 at 15:56
Should line breaks be considered or not? The big file will be read line by line or as one big string?
– Jorge Campos
May 15 '16 at 15:59
Regex flavor is python 2.7, newlines should be included.
– Ram Rachum
May 15 '16 at 16:16
5
5
Since regular expressions are not fully standardized, all questions with this tag should also include a tag specifying the applicable programming language or tool. That said, is there any particular reason you want to use regular expressions here? It's possible, but in most environments, it's more complicated than not using regexes.
– user743382
May 15 '16 at 15:56
Since regular expressions are not fully standardized, all questions with this tag should also include a tag specifying the applicable programming language or tool. That said, is there any particular reason you want to use regular expressions here? It's possible, but in most environments, it's more complicated than not using regexes.
– user743382
May 15 '16 at 15:56
Should line breaks be considered or not? The big file will be read line by line or as one big string?
– Jorge Campos
May 15 '16 at 15:59
Should line breaks be considered or not? The big file will be read line by line or as one big string?
– Jorge Campos
May 15 '16 at 15:59
Regex flavor is python 2.7, newlines should be included.
– Ram Rachum
May 15 '16 at 16:16
Regex flavor is python 2.7, newlines should be included.
– Ram Rachum
May 15 '16 at 16:16
add a comment |
4 Answers
4
active
oldest
votes
You need a tempered greedy token:
abc(?:(?!abc|xyz|123).)*123(?:(?!abc|xyz).)*xyz
See the regex demo
To make sure it matches across lines, use re.DOTALL
flag when compiling the regex.
Note that to achieve a better performance with such a heavy pattern, you should consider unrolling it. It can be done with negated character classes and negative lookaheads.
Pattern details:
abc
- matchabc
(?:(?!abc|xyz|123).)*
- match any character that is not the starting point for aabc
,xyz
or123
character sequences
123
- a literal string123
(?:(?!abc|xyz).)*
- any character that is not the starting point for aabc
orxyz
character sequences
xyz
- a trailing substringxyz
See the diagram below (if re.S
is used, .
will mean AnyChar
):
See the Python demo:
import re
p = re.compile(r'abc(?:(?!abc|xyz|123).)*123(?:(?!abc|xyz).)*xyz', re.DOTALL)
s = "abc 123 xyznabc abc 123 xyznabc text 123 xyznabc text xyz xyz"
print(p.findall(s))
// => ['abc 123 xyz', 'abc 123 xyz', 'abc text 123 xyz']
Can you please link the site from where you generated that state machine? I know a site exists with similar UI but can't find it though. Sorry for the irrelevant comment. I'll delete it soon :)
– rafid059
Mar 18 '17 at 18:24
3
See jex.im/regulex
– Wiktor Stribiżew
Mar 18 '17 at 20:10
Why the|123
?
– Stefan Pochmann
Jan 24 at 20:22
@StefanPochmann Well, you may get rid of that if you use a lazy quantifier in the first case:r'abc(?:(?!abc|xyz).)*?123(?:(?!abc|xyz).)*xyz'
. It will work similarly.
– Wiktor Stribiżew
Jan 24 at 20:26
Both are just optimizations, though, right? Not necessary?
– Stefan Pochmann
Jan 24 at 20:26
|
show 3 more comments
Using PCRE a solution would be:
This using m
flag. If you want to check only from start and end of a line add ^
and $
at beginning and end respectively
abc(?!.*(abc|xyz).*123).*123(?!.*(abc|xyz).*xyz).*xyz
Debuggex Demo
which software is used to draw diagram
– Er. Amit Joshi
Jun 3 at 10:15
@Er.AmitJoshi It is not a software, it is the Debuggex Site there is a link in the answer
– Jorge Campos
Jun 4 at 1:50
add a comment |
The comment by hvd is quite appropriate, and this just provides an example. In SQL, for instance, I think it would be clearer to do:
where val like 'abc%123%xyz' and
val not like 'abc%abc%' and
val not like '%xyz%xyz'
I imagine something quite similar is simple to do in other environments.
add a comment |
You could use lookaround.
/^abc(?!.*abc).*123.*(?<!xyz.*)xyz$/g
(I've not tested it.)
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f37240408%2fregular-expressions-ensuring-b-doesnt-come-between-a-and-c%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
You need a tempered greedy token:
abc(?:(?!abc|xyz|123).)*123(?:(?!abc|xyz).)*xyz
See the regex demo
To make sure it matches across lines, use re.DOTALL
flag when compiling the regex.
Note that to achieve a better performance with such a heavy pattern, you should consider unrolling it. It can be done with negated character classes and negative lookaheads.
Pattern details:
abc
- matchabc
(?:(?!abc|xyz|123).)*
- match any character that is not the starting point for aabc
,xyz
or123
character sequences
123
- a literal string123
(?:(?!abc|xyz).)*
- any character that is not the starting point for aabc
orxyz
character sequences
xyz
- a trailing substringxyz
See the diagram below (if re.S
is used, .
will mean AnyChar
):
See the Python demo:
import re
p = re.compile(r'abc(?:(?!abc|xyz|123).)*123(?:(?!abc|xyz).)*xyz', re.DOTALL)
s = "abc 123 xyznabc abc 123 xyznabc text 123 xyznabc text xyz xyz"
print(p.findall(s))
// => ['abc 123 xyz', 'abc 123 xyz', 'abc text 123 xyz']
Can you please link the site from where you generated that state machine? I know a site exists with similar UI but can't find it though. Sorry for the irrelevant comment. I'll delete it soon :)
– rafid059
Mar 18 '17 at 18:24
3
See jex.im/regulex
– Wiktor Stribiżew
Mar 18 '17 at 20:10
Why the|123
?
– Stefan Pochmann
Jan 24 at 20:22
@StefanPochmann Well, you may get rid of that if you use a lazy quantifier in the first case:r'abc(?:(?!abc|xyz).)*?123(?:(?!abc|xyz).)*xyz'
. It will work similarly.
– Wiktor Stribiżew
Jan 24 at 20:26
Both are just optimizations, though, right? Not necessary?
– Stefan Pochmann
Jan 24 at 20:26
|
show 3 more comments
You need a tempered greedy token:
abc(?:(?!abc|xyz|123).)*123(?:(?!abc|xyz).)*xyz
See the regex demo
To make sure it matches across lines, use re.DOTALL
flag when compiling the regex.
Note that to achieve a better performance with such a heavy pattern, you should consider unrolling it. It can be done with negated character classes and negative lookaheads.
Pattern details:
abc
- matchabc
(?:(?!abc|xyz|123).)*
- match any character that is not the starting point for aabc
,xyz
or123
character sequences
123
- a literal string123
(?:(?!abc|xyz).)*
- any character that is not the starting point for aabc
orxyz
character sequences
xyz
- a trailing substringxyz
See the diagram below (if re.S
is used, .
will mean AnyChar
):
See the Python demo:
import re
p = re.compile(r'abc(?:(?!abc|xyz|123).)*123(?:(?!abc|xyz).)*xyz', re.DOTALL)
s = "abc 123 xyznabc abc 123 xyznabc text 123 xyznabc text xyz xyz"
print(p.findall(s))
// => ['abc 123 xyz', 'abc 123 xyz', 'abc text 123 xyz']
Can you please link the site from where you generated that state machine? I know a site exists with similar UI but can't find it though. Sorry for the irrelevant comment. I'll delete it soon :)
– rafid059
Mar 18 '17 at 18:24
3
See jex.im/regulex
– Wiktor Stribiżew
Mar 18 '17 at 20:10
Why the|123
?
– Stefan Pochmann
Jan 24 at 20:22
@StefanPochmann Well, you may get rid of that if you use a lazy quantifier in the first case:r'abc(?:(?!abc|xyz).)*?123(?:(?!abc|xyz).)*xyz'
. It will work similarly.
– Wiktor Stribiżew
Jan 24 at 20:26
Both are just optimizations, though, right? Not necessary?
– Stefan Pochmann
Jan 24 at 20:26
|
show 3 more comments
You need a tempered greedy token:
abc(?:(?!abc|xyz|123).)*123(?:(?!abc|xyz).)*xyz
See the regex demo
To make sure it matches across lines, use re.DOTALL
flag when compiling the regex.
Note that to achieve a better performance with such a heavy pattern, you should consider unrolling it. It can be done with negated character classes and negative lookaheads.
Pattern details:
abc
- matchabc
(?:(?!abc|xyz|123).)*
- match any character that is not the starting point for aabc
,xyz
or123
character sequences
123
- a literal string123
(?:(?!abc|xyz).)*
- any character that is not the starting point for aabc
orxyz
character sequences
xyz
- a trailing substringxyz
See the diagram below (if re.S
is used, .
will mean AnyChar
):
See the Python demo:
import re
p = re.compile(r'abc(?:(?!abc|xyz|123).)*123(?:(?!abc|xyz).)*xyz', re.DOTALL)
s = "abc 123 xyznabc abc 123 xyznabc text 123 xyznabc text xyz xyz"
print(p.findall(s))
// => ['abc 123 xyz', 'abc 123 xyz', 'abc text 123 xyz']
You need a tempered greedy token:
abc(?:(?!abc|xyz|123).)*123(?:(?!abc|xyz).)*xyz
See the regex demo
To make sure it matches across lines, use re.DOTALL
flag when compiling the regex.
Note that to achieve a better performance with such a heavy pattern, you should consider unrolling it. It can be done with negated character classes and negative lookaheads.
Pattern details:
abc
- matchabc
(?:(?!abc|xyz|123).)*
- match any character that is not the starting point for aabc
,xyz
or123
character sequences
123
- a literal string123
(?:(?!abc|xyz).)*
- any character that is not the starting point for aabc
orxyz
character sequences
xyz
- a trailing substringxyz
See the diagram below (if re.S
is used, .
will mean AnyChar
):
See the Python demo:
import re
p = re.compile(r'abc(?:(?!abc|xyz|123).)*123(?:(?!abc|xyz).)*xyz', re.DOTALL)
s = "abc 123 xyznabc abc 123 xyznabc text 123 xyznabc text xyz xyz"
print(p.findall(s))
// => ['abc 123 xyz', 'abc 123 xyz', 'abc text 123 xyz']
edited Dec 15 '17 at 9:14
Eric Leschinski
85.4k37316272
85.4k37316272
answered May 15 '16 at 16:20
Wiktor Stribiżew
307k16126202
307k16126202
Can you please link the site from where you generated that state machine? I know a site exists with similar UI but can't find it though. Sorry for the irrelevant comment. I'll delete it soon :)
– rafid059
Mar 18 '17 at 18:24
3
See jex.im/regulex
– Wiktor Stribiżew
Mar 18 '17 at 20:10
Why the|123
?
– Stefan Pochmann
Jan 24 at 20:22
@StefanPochmann Well, you may get rid of that if you use a lazy quantifier in the first case:r'abc(?:(?!abc|xyz).)*?123(?:(?!abc|xyz).)*xyz'
. It will work similarly.
– Wiktor Stribiżew
Jan 24 at 20:26
Both are just optimizations, though, right? Not necessary?
– Stefan Pochmann
Jan 24 at 20:26
|
show 3 more comments
Can you please link the site from where you generated that state machine? I know a site exists with similar UI but can't find it though. Sorry for the irrelevant comment. I'll delete it soon :)
– rafid059
Mar 18 '17 at 18:24
3
See jex.im/regulex
– Wiktor Stribiżew
Mar 18 '17 at 20:10
Why the|123
?
– Stefan Pochmann
Jan 24 at 20:22
@StefanPochmann Well, you may get rid of that if you use a lazy quantifier in the first case:r'abc(?:(?!abc|xyz).)*?123(?:(?!abc|xyz).)*xyz'
. It will work similarly.
– Wiktor Stribiżew
Jan 24 at 20:26
Both are just optimizations, though, right? Not necessary?
– Stefan Pochmann
Jan 24 at 20:26
Can you please link the site from where you generated that state machine? I know a site exists with similar UI but can't find it though. Sorry for the irrelevant comment. I'll delete it soon :)
– rafid059
Mar 18 '17 at 18:24
Can you please link the site from where you generated that state machine? I know a site exists with similar UI but can't find it though. Sorry for the irrelevant comment. I'll delete it soon :)
– rafid059
Mar 18 '17 at 18:24
3
3
See jex.im/regulex
– Wiktor Stribiżew
Mar 18 '17 at 20:10
See jex.im/regulex
– Wiktor Stribiżew
Mar 18 '17 at 20:10
Why the
|123
?– Stefan Pochmann
Jan 24 at 20:22
Why the
|123
?– Stefan Pochmann
Jan 24 at 20:22
@StefanPochmann Well, you may get rid of that if you use a lazy quantifier in the first case:
r'abc(?:(?!abc|xyz).)*?123(?:(?!abc|xyz).)*xyz'
. It will work similarly.– Wiktor Stribiżew
Jan 24 at 20:26
@StefanPochmann Well, you may get rid of that if you use a lazy quantifier in the first case:
r'abc(?:(?!abc|xyz).)*?123(?:(?!abc|xyz).)*xyz'
. It will work similarly.– Wiktor Stribiżew
Jan 24 at 20:26
Both are just optimizations, though, right? Not necessary?
– Stefan Pochmann
Jan 24 at 20:26
Both are just optimizations, though, right? Not necessary?
– Stefan Pochmann
Jan 24 at 20:26
|
show 3 more comments
Using PCRE a solution would be:
This using m
flag. If you want to check only from start and end of a line add ^
and $
at beginning and end respectively
abc(?!.*(abc|xyz).*123).*123(?!.*(abc|xyz).*xyz).*xyz
Debuggex Demo
which software is used to draw diagram
– Er. Amit Joshi
Jun 3 at 10:15
@Er.AmitJoshi It is not a software, it is the Debuggex Site there is a link in the answer
– Jorge Campos
Jun 4 at 1:50
add a comment |
Using PCRE a solution would be:
This using m
flag. If you want to check only from start and end of a line add ^
and $
at beginning and end respectively
abc(?!.*(abc|xyz).*123).*123(?!.*(abc|xyz).*xyz).*xyz
Debuggex Demo
which software is used to draw diagram
– Er. Amit Joshi
Jun 3 at 10:15
@Er.AmitJoshi It is not a software, it is the Debuggex Site there is a link in the answer
– Jorge Campos
Jun 4 at 1:50
add a comment |
Using PCRE a solution would be:
This using m
flag. If you want to check only from start and end of a line add ^
and $
at beginning and end respectively
abc(?!.*(abc|xyz).*123).*123(?!.*(abc|xyz).*xyz).*xyz
Debuggex Demo
Using PCRE a solution would be:
This using m
flag. If you want to check only from start and end of a line add ^
and $
at beginning and end respectively
abc(?!.*(abc|xyz).*123).*123(?!.*(abc|xyz).*xyz).*xyz
Debuggex Demo
answered May 15 '16 at 16:15
Jorge Campos
16.6k63665
16.6k63665
which software is used to draw diagram
– Er. Amit Joshi
Jun 3 at 10:15
@Er.AmitJoshi It is not a software, it is the Debuggex Site there is a link in the answer
– Jorge Campos
Jun 4 at 1:50
add a comment |
which software is used to draw diagram
– Er. Amit Joshi
Jun 3 at 10:15
@Er.AmitJoshi It is not a software, it is the Debuggex Site there is a link in the answer
– Jorge Campos
Jun 4 at 1:50
which software is used to draw diagram
– Er. Amit Joshi
Jun 3 at 10:15
which software is used to draw diagram
– Er. Amit Joshi
Jun 3 at 10:15
@Er.AmitJoshi It is not a software, it is the Debuggex Site there is a link in the answer
– Jorge Campos
Jun 4 at 1:50
@Er.AmitJoshi It is not a software, it is the Debuggex Site there is a link in the answer
– Jorge Campos
Jun 4 at 1:50
add a comment |
The comment by hvd is quite appropriate, and this just provides an example. In SQL, for instance, I think it would be clearer to do:
where val like 'abc%123%xyz' and
val not like 'abc%abc%' and
val not like '%xyz%xyz'
I imagine something quite similar is simple to do in other environments.
add a comment |
The comment by hvd is quite appropriate, and this just provides an example. In SQL, for instance, I think it would be clearer to do:
where val like 'abc%123%xyz' and
val not like 'abc%abc%' and
val not like '%xyz%xyz'
I imagine something quite similar is simple to do in other environments.
add a comment |
The comment by hvd is quite appropriate, and this just provides an example. In SQL, for instance, I think it would be clearer to do:
where val like 'abc%123%xyz' and
val not like 'abc%abc%' and
val not like '%xyz%xyz'
I imagine something quite similar is simple to do in other environments.
The comment by hvd is quite appropriate, and this just provides an example. In SQL, for instance, I think it would be clearer to do:
where val like 'abc%123%xyz' and
val not like 'abc%abc%' and
val not like '%xyz%xyz'
I imagine something quite similar is simple to do in other environments.
edited May 15 '16 at 16:11
Jonathan Leffler
559k896651016
559k896651016
answered May 15 '16 at 16:01
Gordon Linoff
755k35290398
755k35290398
add a comment |
add a comment |
You could use lookaround.
/^abc(?!.*abc).*123.*(?<!xyz.*)xyz$/g
(I've not tested it.)
add a comment |
You could use lookaround.
/^abc(?!.*abc).*123.*(?<!xyz.*)xyz$/g
(I've not tested it.)
add a comment |
You could use lookaround.
/^abc(?!.*abc).*123.*(?<!xyz.*)xyz$/g
(I've not tested it.)
You could use lookaround.
/^abc(?!.*abc).*123.*(?<!xyz.*)xyz$/g
(I've not tested it.)
edited May 15 '16 at 16:13
answered May 15 '16 at 15:56
Kenny Lau
311210
311210
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f37240408%2fregular-expressions-ensuring-b-doesnt-come-between-a-and-c%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
5
Since regular expressions are not fully standardized, all questions with this tag should also include a tag specifying the applicable programming language or tool. That said, is there any particular reason you want to use regular expressions here? It's possible, but in most environments, it's more complicated than not using regexes.
– user743382
May 15 '16 at 15:56
Should line breaks be considered or not? The big file will be read line by line or as one big string?
– Jorge Campos
May 15 '16 at 15:59
Regex flavor is python 2.7, newlines should be included.
– Ram Rachum
May 15 '16 at 16:16