Communities

Writing
Writing
Codidact Meta
Codidact Meta
The Great Outdoors
The Great Outdoors
Photography & Video
Photography & Video
Scientific Speculation
Scientific Speculation
Cooking
Cooking
Electrical Engineering
Electrical Engineering
Judaism
Judaism
Languages & Linguistics
Languages & Linguistics
Software Development
Software Development
Mathematics
Mathematics
Christianity
Christianity
Code Golf
Code Golf
Music
Music
Physics
Physics
Linux Systems
Linux Systems
Power Users
Power Users
Tabletop RPGs
Tabletop RPGs
Community Proposals
Community Proposals
tag:snake search within a tag
answers:0 unanswered questions
user:xxxx search by author id
score:0.5 posts with 0.5+ score
"snake oil" exact phrase
votes:4 posts with 4+ votes
created:<1w created < 1 week ago
post_type:xxxx type of post
Search help
Notifications
Mark all as read See all your notifications »
Q&A

Welcome to Software Development on Codidact!

Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.

Post History

83%
+8 −0
Q&A How can I emulate regular expression's branch reset in Java?

I've got this sample regex: Pattern p = Pattern.compile("(?:([aeiou]+)[0-9]+|([123]+)[a-z]+)\\W+"); It basically has the following parts: one or more lowercase vowels ([aeiou]+), followed by one ...

2 answers  ·  posted 4y ago by hkotsubo‭  ·  last activity 3y ago by hkotsubo‭

Question java regex
#2: Post edited by user avatar hkotsubo‭ · 2020-08-11T17:45:04Z (about 4 years ago)
  • I've got this sample regex:
  • ```java
  • Pattern p = Pattern.compile("(?:([aeiou]+)[0-9]+|([123]+)[a-z]+)\\W+");
  • ```
  • It basically has the following parts:
  • - one or more lowercase vowels (`[aeiou]+`), followed by one or more digits (`[0-9]+`), **or**
  • - one or more digits 1, 2 or 3 (`[123]+`), followed by lowercase letters (`[a-z]+`)
  • - all of this followed by one or more non-alphanumeric characters (`\W+`)
  • There are also two [capturing groups][1]: one for the vowels, and another one for the digits 1, 2 or 3.
  • I'm using [alternation][2] (`|`), which means that only one of these groups will be captured. Example:
  • ```java
  • Matcher m = p.matcher("ae123.");
  • if (m.find()) {
  • int n = m.groupCount();
  • for (int i = 1; i <= n; i++) {
  • System.out.format("group %d: %s\n", i, m.group(i));
  • }
  • }
  • ```
  • In this case, only the first group is captured, and the output is:
  • > group 1: ae<br>
  • > group 2: null
  • But if the input string is `"111abc!!"`, the second group is captured, and the output is:
  • > group 1: null<br>
  • > group 2: 111
  • Therefore, to know which group was captured, I need to loop through them and test if they are not `null`.
  • ---
  • Some regex engines support the [branch reset](https://www.rexegg.com/regex-disambiguation.html#branchreset) feature: putting the expression inside `(?|` and `)`, the groups numbering is reset each time an alternation is found ([example](https://regex101.com/r/uaQYYi/1)). So the regex above *could* be written as:
  • ```java
  • Pattern p = Pattern.compile("(?|([aeiou]+)[0-9]+|([123]+)[a-z]+)\\W+");
  • ```
  • The branch reset (`(?|`) makes both `([aeiou]+)` and `([123]+)` to be group 1 (and because there's an alternation - just one or another - only one of these expressions is captured). Using this feature, there would be no need to loop through the groups, testing if it's `null`. I could just get group 1 directly (`m.group(1)` would always have a value).
  • But Java doesn't support branch reset, and the code above throws an exception:
  • ```none
  • java.util.regex.PatternSyntaxException: Unknown inline modifier near index 2
  • (?|([aeiou]+)[0-9]+|([123]+)[a-z]+)\W+
  • ^
  • ```
  • I'm using Java 8, and taking a look at the [Java 14 docs](https://docs.oracle.com/en/java/javase/14/docs/api/java.base/java/util/regex/Pattern.html), we can see that this feature is still not supported (in [Java 15 preview](https://download.java.net/java/early_access/jdk15/docs/api/java.base/java/util/regex/Pattern.html) there's alo no mention of it).
  • I also checked an [alternative solution for .NET](https://stackoverflow.com/a/5378077): use [*named groups*][3] with the same name for all groups, but it also didn't work in Java:
  • ```java
  • Pattern p = Pattern.compile("(?:(?<somename>[aeiou]+)[0-9]+|(?<somename>[123]+)[a-z]+)\\W+");
  • ```
  • This code throws an exception, because in Java [you can't have two or more groups with the same name](https://www.regular-expressions.info/named.html#duplicate):
  • ```none
  • java.util.regex.PatternSyntaxException: Named capturing group <somename> is already defined near index 36
  • (?:(?<somename>[aeiou]+)[0-9]+|(?<somename>[123]+)[a-z]+)\W+
  • ^
  • ```
  • Is there a way to emulate branch reset in Java or the only solution is to loop through the groups, testing if they are `null`?
  • [1]: https://www.regular-expressions.info/brackets.html
  • [2]: https://www.regular-expressions.info/alternation.html
  • [3]: https://www.regular-expressions.info/named.html
  • I've got this sample regex:
  • ```java
  • Pattern p = Pattern.compile("(?:([aeiou]+)[0-9]+|([123]+)[a-z]+)\\W+");
  • ```
  • It basically has the following parts:
  • - one or more lowercase vowels (`[aeiou]+`), followed by one or more digits (`[0-9]+`), **or**
  • - one or more digits 1, 2 or 3 (`[123]+`), followed by lowercase letters (`[a-z]+`)
  • - all of this followed by one or more non-alphanumeric characters (`\W+`)
  • There are also two [capturing groups][1]: one for the vowels, and another one for the digits 1, 2 or 3.
  • I'm using [alternation][2] (`|`), which means that only one of these groups will be captured. Example:
  • ```java
  • Matcher m = p.matcher("ae123.");
  • if (m.find()) {
  • int n = m.groupCount();
  • for (int i = 1; i <= n; i++) {
  • System.out.format("group %d: %s\n", i, m.group(i));
  • }
  • }
  • ```
  • In this case, only the first group is captured, and the output is:
  • > group 1: ae<br>
  • > group 2: null
  • But if the input string is `"111abc!!"`, the second group is captured, and the output is:
  • > group 1: null<br>
  • > group 2: 111
  • Therefore, to know which group was captured, I need to loop through them and test if they are not `null`.
  • ---
  • Some regex engines support the [branch reset](https://www.rexegg.com/regex-disambiguation.html#branchreset) feature: putting the expression inside `(?|` and `)`, the groups numbering is reset each time an alternation is found ([example](https://regex101.com/r/uaQYYi/1)). So the regex above *could* be written as:
  • ```java
  • Pattern p = Pattern.compile("(?|([aeiou]+)[0-9]+|([123]+)[a-z]+)\\W+");
  • ```
  • The branch reset (`(?|`) makes both `([aeiou]+)` and `([123]+)` to be group 1 (and because there's an alternation - just one or another - only one of these expressions is captured). Using this feature, there would be no need to loop through the groups, testing if it's `null`. I could just get group 1 directly (`m.group(1)` would always have a value).
  • But Java doesn't support branch reset, and the code above throws an exception:
  • ```none
  • java.util.regex.PatternSyntaxException: Unknown inline modifier near index 2
  • (?|([aeiou]+)[0-9]+|([123]+)[a-z]+)\W+
  • ^
  • ```
  • I'm using Java 8, and taking a look at the [Java 14 docs](https://docs.oracle.com/en/java/javase/14/docs/api/java.base/java/util/regex/Pattern.html), we can see that this feature is still not supported (in [Java 15 preview](https://download.java.net/java/early_access/jdk15/docs/api/java.base/java/util/regex/Pattern.html) there's also no mention of it).
  • I also checked an [alternative solution for .NET](https://stackoverflow.com/a/5378077): use [*named groups*][3] with the same name for all groups, but it also didn't work in Java:
  • ```java
  • Pattern p = Pattern.compile("(?:(?<somename>[aeiou]+)[0-9]+|(?<somename>[123]+)[a-z]+)\\W+");
  • ```
  • This code throws an exception, because in Java [you can't have two or more groups with the same name](https://www.regular-expressions.info/named.html#duplicate):
  • ```none
  • java.util.regex.PatternSyntaxException: Named capturing group <somename> is already defined near index 36
  • (?:(?<somename>[aeiou]+)[0-9]+|(?<somename>[123]+)[a-z]+)\W+
  • ^
  • ```
  • Is there a way to emulate branch reset in Java or the only solution is to loop through the groups, testing if they are `null`?
  • [1]: https://www.regular-expressions.info/brackets.html
  • [2]: https://www.regular-expressions.info/alternation.html
  • [3]: https://www.regular-expressions.info/named.html
#1: Initial revision by user avatar hkotsubo‭ · 2020-08-11T16:21:57Z (about 4 years ago)
How can I emulate regular expression's branch reset in Java?
I've got this sample regex:

```java
Pattern p = Pattern.compile("(?:([aeiou]+)[0-9]+|([123]+)[a-z]+)\\W+");
```

It basically has the following parts:

- one or more lowercase vowels (`[aeiou]+`), followed by one or more digits (`[0-9]+`), **or**
- one or more digits 1, 2 or 3 (`[123]+`), followed by lowercase letters (`[a-z]+`)
- all of this followed by one or more non-alphanumeric characters (`\W+`)

There are also two [capturing groups][1]: one for the vowels, and another one for the digits 1, 2 or 3.
I'm using [alternation][2] (`|`), which means that only one of these groups will be captured. Example:

```java
Matcher m = p.matcher("ae123.");
if (m.find()) {
    int n = m.groupCount();
    for (int i = 1; i <= n; i++) {
        System.out.format("group %d: %s\n", i, m.group(i));
    }
}
```

In this case, only the first group is captured, and the output is:

> group 1: ae<br>
> group 2: null

But if the input string is `"111abc!!"`, the second group is captured, and the output is:

> group 1: null<br>
> group 2: 111

Therefore, to know which group was captured, I need to loop through them and test if they are not `null`.

---
Some regex engines support the [branch reset](https://www.rexegg.com/regex-disambiguation.html#branchreset) feature: putting the expression inside `(?|` and `)`, the groups numbering is reset each time an alternation is found ([example](https://regex101.com/r/uaQYYi/1)). So the regex above *could* be written as:

```java
Pattern p = Pattern.compile("(?|([aeiou]+)[0-9]+|([123]+)[a-z]+)\\W+");
```

The branch reset (`(?|`) makes both `([aeiou]+)` and `([123]+)` to be group 1 (and because there's an alternation - just one or another - only one of these expressions is captured). Using this feature, there would be no need to loop through the groups, testing if it's `null`. I could just get group 1 directly (`m.group(1)` would always have a value).

But Java doesn't support branch reset, and the code above throws an exception:

```none
java.util.regex.PatternSyntaxException: Unknown inline modifier near index 2
(?|([aeiou]+)[0-9]+|([123]+)[a-z]+)\W+
  ^
```

I'm using Java 8, and taking a look at the [Java 14 docs](https://docs.oracle.com/en/java/javase/14/docs/api/java.base/java/util/regex/Pattern.html), we can see that this feature is still not supported (in [Java 15 preview](https://download.java.net/java/early_access/jdk15/docs/api/java.base/java/util/regex/Pattern.html) there's alo no mention of it).

I also checked an [alternative solution for .NET](https://stackoverflow.com/a/5378077): use [*named groups*][3] with the same name for all groups, but it also didn't work in Java:

```java
Pattern p = Pattern.compile("(?:(?<somename>[aeiou]+)[0-9]+|(?<somename>[123]+)[a-z]+)\\W+");
```

This code throws an exception, because in Java [you can't have two or more groups with the same name](https://www.regular-expressions.info/named.html#duplicate):

```none
java.util.regex.PatternSyntaxException: Named capturing group <somename> is already defined near index 36
(?:(?<somename>[aeiou]+)[0-9]+|(?<somename>[123]+)[a-z]+)\W+
                                          ^
```

Is there a way to emulate branch reset in Java or the only solution is to loop through the groups, testing if they are `null`?


  [1]: https://www.regular-expressions.info/brackets.html
  [2]: https://www.regular-expressions.info/alternation.html
  [3]: https://www.regular-expressions.info/named.html