Welcome to Software Development on Codidact!
Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.
Post History
I've got this sample regex: Pattern p = Pattern.compile("(?:([aeiou]+)[0-9]+|([123]+)[a-z]+)\\W+"); It basically has the following parts: one or more lowercase vowels ([aeiou]+), followed by one ...
#2: Post edited
- I've got this sample regex:
- ```java
- Pattern p = Pattern.compile("(?:([aeiou]+)[0-9]+|([123]+)[a-z]+)\\W+");
- ```
- It basically has the following parts:
- - one or more lowercase vowels (`[aeiou]+`), followed by one or more digits (`[0-9]+`), **or**
- - one or more digits 1, 2 or 3 (`[123]+`), followed by lowercase letters (`[a-z]+`)
- - all of this followed by one or more non-alphanumeric characters (`\W+`)
- There are also two [capturing groups][1]: one for the vowels, and another one for the digits 1, 2 or 3.
- I'm using [alternation][2] (`|`), which means that only one of these groups will be captured. Example:
- ```java
- Matcher m = p.matcher("ae123.");
- if (m.find()) {
- int n = m.groupCount();
- for (int i = 1; i <= n; i++) {
- System.out.format("group %d: %s\n", i, m.group(i));
- }
- }
- ```
- In this case, only the first group is captured, and the output is:
- > group 1: ae<br>
- > group 2: null
- But if the input string is `"111abc!!"`, the second group is captured, and the output is:
- > group 1: null<br>
- > group 2: 111
- Therefore, to know which group was captured, I need to loop through them and test if they are not `null`.
- ---
- Some regex engines support the [branch reset](https://www.rexegg.com/regex-disambiguation.html#branchreset) feature: putting the expression inside `(?|` and `)`, the groups numbering is reset each time an alternation is found ([example](https://regex101.com/r/uaQYYi/1)). So the regex above *could* be written as:
- ```java
- Pattern p = Pattern.compile("(?|([aeiou]+)[0-9]+|([123]+)[a-z]+)\\W+");
- ```
- The branch reset (`(?|`) makes both `([aeiou]+)` and `([123]+)` to be group 1 (and because there's an alternation - just one or another - only one of these expressions is captured). Using this feature, there would be no need to loop through the groups, testing if it's `null`. I could just get group 1 directly (`m.group(1)` would always have a value).
- But Java doesn't support branch reset, and the code above throws an exception:
- ```none
- java.util.regex.PatternSyntaxException: Unknown inline modifier near index 2
- (?|([aeiou]+)[0-9]+|([123]+)[a-z]+)\W+
- ^
- ```
I'm using Java 8, and taking a look at the [Java 14 docs](https://docs.oracle.com/en/java/javase/14/docs/api/java.base/java/util/regex/Pattern.html), we can see that this feature is still not supported (in [Java 15 preview](https://download.java.net/java/early_access/jdk15/docs/api/java.base/java/util/regex/Pattern.html) there's alo no mention of it).- I also checked an [alternative solution for .NET](https://stackoverflow.com/a/5378077): use [*named groups*][3] with the same name for all groups, but it also didn't work in Java:
- ```java
- Pattern p = Pattern.compile("(?:(?<somename>[aeiou]+)[0-9]+|(?<somename>[123]+)[a-z]+)\\W+");
- ```
- This code throws an exception, because in Java [you can't have two or more groups with the same name](https://www.regular-expressions.info/named.html#duplicate):
- ```none
- java.util.regex.PatternSyntaxException: Named capturing group <somename> is already defined near index 36
- (?:(?<somename>[aeiou]+)[0-9]+|(?<somename>[123]+)[a-z]+)\W+
- ^
- ```
- Is there a way to emulate branch reset in Java or the only solution is to loop through the groups, testing if they are `null`?
- [1]: https://www.regular-expressions.info/brackets.html
- [2]: https://www.regular-expressions.info/alternation.html
- [3]: https://www.regular-expressions.info/named.html
- I've got this sample regex:
- ```java
- Pattern p = Pattern.compile("(?:([aeiou]+)[0-9]+|([123]+)[a-z]+)\\W+");
- ```
- It basically has the following parts:
- - one or more lowercase vowels (`[aeiou]+`), followed by one or more digits (`[0-9]+`), **or**
- - one or more digits 1, 2 or 3 (`[123]+`), followed by lowercase letters (`[a-z]+`)
- - all of this followed by one or more non-alphanumeric characters (`\W+`)
- There are also two [capturing groups][1]: one for the vowels, and another one for the digits 1, 2 or 3.
- I'm using [alternation][2] (`|`), which means that only one of these groups will be captured. Example:
- ```java
- Matcher m = p.matcher("ae123.");
- if (m.find()) {
- int n = m.groupCount();
- for (int i = 1; i <= n; i++) {
- System.out.format("group %d: %s\n", i, m.group(i));
- }
- }
- ```
- In this case, only the first group is captured, and the output is:
- > group 1: ae<br>
- > group 2: null
- But if the input string is `"111abc!!"`, the second group is captured, and the output is:
- > group 1: null<br>
- > group 2: 111
- Therefore, to know which group was captured, I need to loop through them and test if they are not `null`.
- ---
- Some regex engines support the [branch reset](https://www.rexegg.com/regex-disambiguation.html#branchreset) feature: putting the expression inside `(?|` and `)`, the groups numbering is reset each time an alternation is found ([example](https://regex101.com/r/uaQYYi/1)). So the regex above *could* be written as:
- ```java
- Pattern p = Pattern.compile("(?|([aeiou]+)[0-9]+|([123]+)[a-z]+)\\W+");
- ```
- The branch reset (`(?|`) makes both `([aeiou]+)` and `([123]+)` to be group 1 (and because there's an alternation - just one or another - only one of these expressions is captured). Using this feature, there would be no need to loop through the groups, testing if it's `null`. I could just get group 1 directly (`m.group(1)` would always have a value).
- But Java doesn't support branch reset, and the code above throws an exception:
- ```none
- java.util.regex.PatternSyntaxException: Unknown inline modifier near index 2
- (?|([aeiou]+)[0-9]+|([123]+)[a-z]+)\W+
- ^
- ```
- I'm using Java 8, and taking a look at the [Java 14 docs](https://docs.oracle.com/en/java/javase/14/docs/api/java.base/java/util/regex/Pattern.html), we can see that this feature is still not supported (in [Java 15 preview](https://download.java.net/java/early_access/jdk15/docs/api/java.base/java/util/regex/Pattern.html) there's also no mention of it).
- I also checked an [alternative solution for .NET](https://stackoverflow.com/a/5378077): use [*named groups*][3] with the same name for all groups, but it also didn't work in Java:
- ```java
- Pattern p = Pattern.compile("(?:(?<somename>[aeiou]+)[0-9]+|(?<somename>[123]+)[a-z]+)\\W+");
- ```
- This code throws an exception, because in Java [you can't have two or more groups with the same name](https://www.regular-expressions.info/named.html#duplicate):
- ```none
- java.util.regex.PatternSyntaxException: Named capturing group <somename> is already defined near index 36
- (?:(?<somename>[aeiou]+)[0-9]+|(?<somename>[123]+)[a-z]+)\W+
- ^
- ```
- Is there a way to emulate branch reset in Java or the only solution is to loop through the groups, testing if they are `null`?
- [1]: https://www.regular-expressions.info/brackets.html
- [2]: https://www.regular-expressions.info/alternation.html
- [3]: https://www.regular-expressions.info/named.html
#1: Initial revision
How can I emulate regular expression's branch reset in Java?
I've got this sample regex: ```java Pattern p = Pattern.compile("(?:([aeiou]+)[0-9]+|([123]+)[a-z]+)\\W+"); ``` It basically has the following parts: - one or more lowercase vowels (`[aeiou]+`), followed by one or more digits (`[0-9]+`), **or** - one or more digits 1, 2 or 3 (`[123]+`), followed by lowercase letters (`[a-z]+`) - all of this followed by one or more non-alphanumeric characters (`\W+`) There are also two [capturing groups][1]: one for the vowels, and another one for the digits 1, 2 or 3. I'm using [alternation][2] (`|`), which means that only one of these groups will be captured. Example: ```java Matcher m = p.matcher("ae123."); if (m.find()) { int n = m.groupCount(); for (int i = 1; i <= n; i++) { System.out.format("group %d: %s\n", i, m.group(i)); } } ``` In this case, only the first group is captured, and the output is: > group 1: ae<br> > group 2: null But if the input string is `"111abc!!"`, the second group is captured, and the output is: > group 1: null<br> > group 2: 111 Therefore, to know which group was captured, I need to loop through them and test if they are not `null`. --- Some regex engines support the [branch reset](https://www.rexegg.com/regex-disambiguation.html#branchreset) feature: putting the expression inside `(?|` and `)`, the groups numbering is reset each time an alternation is found ([example](https://regex101.com/r/uaQYYi/1)). So the regex above *could* be written as: ```java Pattern p = Pattern.compile("(?|([aeiou]+)[0-9]+|([123]+)[a-z]+)\\W+"); ``` The branch reset (`(?|`) makes both `([aeiou]+)` and `([123]+)` to be group 1 (and because there's an alternation - just one or another - only one of these expressions is captured). Using this feature, there would be no need to loop through the groups, testing if it's `null`. I could just get group 1 directly (`m.group(1)` would always have a value). But Java doesn't support branch reset, and the code above throws an exception: ```none java.util.regex.PatternSyntaxException: Unknown inline modifier near index 2 (?|([aeiou]+)[0-9]+|([123]+)[a-z]+)\W+ ^ ``` I'm using Java 8, and taking a look at the [Java 14 docs](https://docs.oracle.com/en/java/javase/14/docs/api/java.base/java/util/regex/Pattern.html), we can see that this feature is still not supported (in [Java 15 preview](https://download.java.net/java/early_access/jdk15/docs/api/java.base/java/util/regex/Pattern.html) there's alo no mention of it). I also checked an [alternative solution for .NET](https://stackoverflow.com/a/5378077): use [*named groups*][3] with the same name for all groups, but it also didn't work in Java: ```java Pattern p = Pattern.compile("(?:(?<somename>[aeiou]+)[0-9]+|(?<somename>[123]+)[a-z]+)\\W+"); ``` This code throws an exception, because in Java [you can't have two or more groups with the same name](https://www.regular-expressions.info/named.html#duplicate): ```none java.util.regex.PatternSyntaxException: Named capturing group <somename> is already defined near index 36 (?:(?<somename>[aeiou]+)[0-9]+|(?<somename>[123]+)[a-z]+)\W+ ^ ``` Is there a way to emulate branch reset in Java or the only solution is to loop through the groups, testing if they are `null`? [1]: https://www.regular-expressions.info/brackets.html [2]: https://www.regular-expressions.info/alternation.html [3]: https://www.regular-expressions.info/named.html