Communities

Writing
Writing
Codidact Meta
Codidact Meta
The Great Outdoors
The Great Outdoors
Photography & Video
Photography & Video
Scientific Speculation
Scientific Speculation
Cooking
Cooking
Electrical Engineering
Electrical Engineering
Judaism
Judaism
Languages & Linguistics
Languages & Linguistics
Software Development
Software Development
Mathematics
Mathematics
Christianity
Christianity
Code Golf
Code Golf
Music
Music
Physics
Physics
Linux Systems
Linux Systems
Power Users
Power Users
Tabletop RPGs
Tabletop RPGs
Community Proposals
Community Proposals
tag:snake search within a tag
answers:0 unanswered questions
user:xxxx search by author id
score:0.5 posts with 0.5+ score
"snake oil" exact phrase
votes:4 posts with 4+ votes
created:<1w created < 1 week ago
post_type:xxxx type of post
Search help
Notifications
Mark all as read See all your notifications »
Q&A

Welcome to Software Development on Codidact!

Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.

Post History

66%
+2 −0
Q&A How can I write an egrep (grep -E) regexp that matches lines containing two stanzas in arbitrary order?

Can that even be done without having to repeat either or resorting to more advanced processing than pure regular expressions I don't think it can. If you don't want to repeat x1=y2 and c5=d6, you...

posted 3y ago by hkotsubo‭  ·  edited 3y ago by hkotsubo‭

Answer
#2: Post edited by user avatar hkotsubo‭ · 2020-12-17T11:51:58Z (over 3 years ago)
Fix link
  • > _Can that even be done without having to repeat either or resorting to more advanced processing than pure regular expressions_
  • I don't think it can. If you don't want to repeat `x1=y2` and `c5=d6`, you'll have to use more advanced features, such as [lookaheads](https://www.regular-expressions.info/lookaround.html):
  • grep -P "^(?=([^;]+; )*x1=y2)(?=([^;]+; )*c5=d6)" your_input
  • The `-P` option tells `grep` to use [PCRE](https://www.pcre.org/), which supports the lookahead feature (not supported by [`grep`'s default BRE](https://www.regular-expressions.info/gnu.html#bre)). You can check all the differences between regex flavors [in this table](https://www.regular-expressions.info/refadv.html) (there are 2 comboboxes at the top, that you can use to choose different regex flavors to be compared).
  • Anyway, the idea of a lookahead is to... look ahead the current position, searching for whatever it is between `(?=` and `)`. So this regex has 2 lookaheads.
  • The first one: `(?=([^;]+; )*x1=y2)` searches for zero or more occurrences of `([^;]+; )` (which is one or more characters that are not `;`, followed by a `;` and a space), and then followed by `x1=y2`.
  • The "trick" is that a lookahead only "takes a look", and if it finds the match, it "comes back" to the position it was (which is, in this case, the [anchor](https://www.regular-expressions.info/anchors.html) `^` - the beginning of the string). So, this lookahead checks if anywhere in the string there's a `x1=y2`, and then it "comes back" to the beginning, and proceeds evaluating the rest of the expression.
  • The next part of the expression is another lookahead, which is very similar to the first and checks if anywhere in the string there's a `c5=d6`.
  • If both `x1=y2` and `c5=d6` exist, their respective lookaheads succeed and the regex reports a match. And this happens regardless of their relative order: `x1=y2` can be either before or after `c5=d6`. That's because both lookaheads start searching from the beginning of the string.
  • If one of them is not in the string, the respective lookahead fails and the regex doesn't match.
  • ---
  • Unfortunately, with BRE or ERE, you'll have to repeat `x1=y2` and `c5=d6` (make one alternative where `x1=y2` is before, and another one where it's after). Something like that:
  • grep -E "^(([^;]+; )*x1=y2; ([^;]+; )*c5=d6;|([^;]+; )*c5=d6; ([^;]+; )*x1=y2;)" your_input
  • The regex suggested by the [other answer](https://software.codidact.com/a/279540/279544) doesn't work, because it doesn't require both `x1=y2` and `c5=d6` to be in the string: it also matches a line containing just one of them twice, such as `a3=b4; x1=y2; x1=y2; ...` ([see here](https://regex101.com/r/BRgSjG/1/)).
  • ---
  • Another solution is to use a script to read the lines and check if they contain everything you want:
  • ```bash
  • while IFS="; " read -r -a line || [ -n "$line" ]
  • do
  • x=0
  • c=0
  • for i in ${line[@]}
  • do
  • if [ "$i" = "x1=y2" ]; then
  • x=1
  • elif [ "$i" = "c5=d6" ]; then
  • c=1
  • fi
  • done
  • if [ "$x" -eq 1 -a "$c" -eq 1 ]; then
  • echo "both were found"
  • fi
  • done < your_input
  • ```
  • It sets `IFS` to use `;` followed by space as a separator/delimiter, so `read` creates an array containing all the `variable=value` tokens. We just loop through this array checking if it contains both `x1=y2` and `c5=d6`.
  • Just for the record, I'd use some other programming language to process the lines. Regex is cool, but [it's not always the best solution](https://blog.codinghorror.com/regular-expressions-now-you-have-two-problems/).
  • > _Can that even be done without having to repeat either or resorting to more advanced processing than pure regular expressions_
  • I don't think it can. If you don't want to repeat `x1=y2` and `c5=d6`, you'll have to use more advanced features, such as [lookaheads](https://www.regular-expressions.info/lookaround.html):
  • grep -P "^(?=([^;]+; )*x1=y2)(?=([^;]+; )*c5=d6)" your_input
  • The `-P` option tells `grep` to use [PCRE](https://www.pcre.org/), which supports the lookahead feature (not supported by [`grep`'s default BRE](https://www.regular-expressions.info/gnu.html#bre)). You can check all the differences between regex flavors [in this table](https://www.regular-expressions.info/refadv.html) (there are 2 comboboxes at the top, that you can use to choose different regex flavors to be compared).
  • Anyway, the idea of a lookahead is to... look ahead the current position, searching for whatever it is between `(?=` and `)`. So this regex has 2 lookaheads.
  • The first one: `(?=([^;]+; )*x1=y2)` searches for zero or more occurrences of `([^;]+; )` (which is one or more characters that are not `;`, followed by a `;` and a space), and then followed by `x1=y2`.
  • The "trick" is that a lookahead only "takes a look", and if it finds the match, it "comes back" to the position it was (which is, in this case, the [anchor](https://www.regular-expressions.info/anchors.html) `^` - the beginning of the string). So, this lookahead checks if anywhere in the string there's a `x1=y2`, and then it "comes back" to the beginning, and proceeds evaluating the rest of the expression.
  • The next part of the expression is another lookahead, which is very similar to the first and checks if anywhere in the string there's a `c5=d6`.
  • If both `x1=y2` and `c5=d6` exist, their respective lookaheads succeed and the regex reports a match. And this happens regardless of their relative order: `x1=y2` can be either before or after `c5=d6`. That's because both lookaheads start searching from the beginning of the string.
  • If one of them is not in the string, the respective lookahead fails and the regex doesn't match.
  • ---
  • Unfortunately, with BRE or ERE, you'll have to repeat `x1=y2` and `c5=d6` (make one alternative where `x1=y2` is before, and another one where it's after). Something like that:
  • grep -E "^(([^;]+; )*x1=y2; ([^;]+; )*c5=d6;|([^;]+; )*c5=d6; ([^;]+; )*x1=y2;)" your_input
  • The regex suggested by the [other answer](https://software.codidact.com/posts/279593#answer-279597) doesn't work, because it doesn't require both `x1=y2` and `c5=d6` to be in the string: it also matches a line containing just one of them twice, such as `a3=b4; x1=y2; x1=y2; ...` ([see here](https://regex101.com/r/BRgSjG/1/)).
  • ---
  • Another solution is to use a script to read the lines and check if they contain everything you want:
  • ```bash
  • while IFS="; " read -r -a line || [ -n "$line" ]
  • do
  • x=0
  • c=0
  • for i in ${line[@]}
  • do
  • if [ "$i" = "x1=y2" ]; then
  • x=1
  • elif [ "$i" = "c5=d6" ]; then
  • c=1
  • fi
  • done
  • if [ "$x" -eq 1 -a "$c" -eq 1 ]; then
  • echo "both were found"
  • fi
  • done < your_input
  • ```
  • It sets `IFS` to use `;` followed by space as a separator/delimiter, so `read` creates an array containing all the `variable=value` tokens. We just loop through this array checking if it contains both `x1=y2` and `c5=d6`.
  • Just for the record, I'd use some other programming language to process the lines. Regex is cool, but [it's not always the best solution](https://blog.codinghorror.com/regular-expressions-now-you-have-two-problems/).
#1: Post edited by user avatar hkotsubo‭ · 2020-12-01T11:15:44Z (over 3 years ago)
  • > _Can that even be done without having to repeat either or resorting to more advanced processing than pure regular expressions_
  • I don't think it can. If you don't want to repeat `x1=y2` and `c5=d6`, you'll have to use more advanced features, such as [lookaheads](https://www.regular-expressions.info/lookaround.html):
  • grep -P "^(?=([^;]+; )*x1=y2)(?=([^;]+; )*c5=d6)" your_input
  • The `-P` option tells `grep` to use [PCRE](https://www.pcre.org/), which supports the lookahead feature (not supported by [`grep`'s default BRE](https://www.regular-expressions.info/gnu.html#bre)). You can check all the differences between regex flavors [in this table](https://www.regular-expressions.info/refadv.html) (there are 2 comboboxes at the top, that you can use to choose different regex flavors to be compared).
  • Anyway, the idea of a lookahead is to... look ahead the current position, searching for whatever it is between `(?=` and `)`. So this regex has 2 lookaheads.
  • The first one: `(?=([^;]+; )*x1=y2)` searches for zero or more occurrences of `([^;]+; )` (which is one or more characters that are not `;`, followed by a `;` and a space), and then followed by `x1=y2`.
  • The "trick" is that a lookahead only "takes a look", and if it finds the match, it "comes back" to the position it was (which is, in this case, the [anchor](https://www.regular-expressions.info/anchors.html) `^` - the beginning of the string). So, this lookahead checks if anywhere in the string there's a `x1=y2`, and then it "comes back" to the beginning, and proceeds evaluating the rest of the expression.
  • The next part of the expression is another lookahead, which is very similar to the first and checks if anywhere in the string there's a `c5=d6`.
  • If both `x1=y2` and `c5=d6` exist, their respective lookaheads succeed and the regex reports a match. And this happens regardless of their relative order: `x1=y2` can be either before or after `c5=d6`. That's because both lookaheads start searching from the beginning of the string.
  • If one of them is not in the string, the respective lookahead fails and the regex doesn't match.
  • ---
  • Unfortunately, with BRE or ERE, you'll have to repeat `x1=y2` and `c5=d6` (make one alternative where `x1=y2` is before, and another one where it's after). Something like that:
  • grep -E "^(([^;]+; )*x1=y2; ([^;]+; )*c5=d6;|([^;]+; )*c5=d6; ([^;]+; )*x1=y2;)" your_input
  • The regex suggested by the [other answer](https://software.codidact.com/a/279540/279544) doesn't work, because it doesn't require both `x1=y2` and `c5=d6` to be in the string: it also matches a line containing just one of them twice, such as `a3=b4; x1=y2; x1=y2; ...` ([see here](https://regex101.com/r/BRgSjG/1/)).
  • > _Can that even be done without having to repeat either or resorting to more advanced processing than pure regular expressions_
  • I don't think it can. If you don't want to repeat `x1=y2` and `c5=d6`, you'll have to use more advanced features, such as [lookaheads](https://www.regular-expressions.info/lookaround.html):
  • grep -P "^(?=([^;]+; )*x1=y2)(?=([^;]+; )*c5=d6)" your_input
  • The `-P` option tells `grep` to use [PCRE](https://www.pcre.org/), which supports the lookahead feature (not supported by [`grep`'s default BRE](https://www.regular-expressions.info/gnu.html#bre)). You can check all the differences between regex flavors [in this table](https://www.regular-expressions.info/refadv.html) (there are 2 comboboxes at the top, that you can use to choose different regex flavors to be compared).
  • Anyway, the idea of a lookahead is to... look ahead the current position, searching for whatever it is between `(?=` and `)`. So this regex has 2 lookaheads.
  • The first one: `(?=([^;]+; )*x1=y2)` searches for zero or more occurrences of `([^;]+; )` (which is one or more characters that are not `;`, followed by a `;` and a space), and then followed by `x1=y2`.
  • The "trick" is that a lookahead only "takes a look", and if it finds the match, it "comes back" to the position it was (which is, in this case, the [anchor](https://www.regular-expressions.info/anchors.html) `^` - the beginning of the string). So, this lookahead checks if anywhere in the string there's a `x1=y2`, and then it "comes back" to the beginning, and proceeds evaluating the rest of the expression.
  • The next part of the expression is another lookahead, which is very similar to the first and checks if anywhere in the string there's a `c5=d6`.
  • If both `x1=y2` and `c5=d6` exist, their respective lookaheads succeed and the regex reports a match. And this happens regardless of their relative order: `x1=y2` can be either before or after `c5=d6`. That's because both lookaheads start searching from the beginning of the string.
  • If one of them is not in the string, the respective lookahead fails and the regex doesn't match.
  • ---
  • Unfortunately, with BRE or ERE, you'll have to repeat `x1=y2` and `c5=d6` (make one alternative where `x1=y2` is before, and another one where it's after). Something like that:
  • grep -E "^(([^;]+; )*x1=y2; ([^;]+; )*c5=d6;|([^;]+; )*c5=d6; ([^;]+; )*x1=y2;)" your_input
  • The regex suggested by the [other answer](https://software.codidact.com/a/279540/279544) doesn't work, because it doesn't require both `x1=y2` and `c5=d6` to be in the string: it also matches a line containing just one of them twice, such as `a3=b4; x1=y2; x1=y2; ...` ([see here](https://regex101.com/r/BRgSjG/1/)).
  • ---
  • Another solution is to use a script to read the lines and check if they contain everything you want:
  • ```bash
  • while IFS="; " read -r -a line || [ -n "$line" ]
  • do
  • x=0
  • c=0
  • for i in ${line[@]}
  • do
  • if [ "$i" = "x1=y2" ]; then
  • x=1
  • elif [ "$i" = "c5=d6" ]; then
  • c=1
  • fi
  • done
  • if [ "$x" -eq 1 -a "$c" -eq 1 ]; then
  • echo "both were found"
  • fi
  • done < your_input
  • ```
  • It sets `IFS` to use `;` followed by space as a separator/delimiter, so `read` creates an array containing all the `variable=value` tokens. We just loop through this array checking if it contains both `x1=y2` and `c5=d6`.
  • Just for the record, I'd use some other programming language to process the lines. Regex is cool, but [it's not always the best solution](https://blog.codinghorror.com/regular-expressions-now-you-have-two-problems/).