Welcome to Software Development on Codidact!
Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.
Post History
Currently, Java 16 is the latest version, and there's no support to branch reset yet. But one - still far from ideal - alternative is to use lookarounds: Pattern pattern = Pattern.compile("([aeiou...
Answer
#1: Initial revision
Currently, [Java 16 is the latest version][1], and there's no support to branch reset yet. But one - still far from ideal - alternative is to use [*lookarounds*](https://www.regular-expressions.info/lookaround.html): ``` Pattern pattern = Pattern.compile("([aeiou]+(?=\\d+\\W+)|[123]+(?=[a-z]+\\W+))"); Matcher matcher = pattern.matcher("ae123. 111abc!!"); while (matcher.find()) { System.out.println(matcher.group(1)); } ``` In the code above, `ae` and `111` are both in group 1, which *kinda* simulates what a branch reset does. Basically, I use [alternation](https://www.regular-expressions.info/alternation.html) (the `|` character, that means "or") with two options. The first option searchs for the vowels, and there's a lookahead that verifies if after them there's `\\d+\\W+` (digits and `\W+`). As this last part is inside a lookahead - inside `(?= )` - it won't be part of the match. Lookarounds are zero-length assertions: they just check if something exists (hence, "assertion") but its contents aren't returned as part of the match (hence, "zero length"). The second option searches for 1, 2 or 3, and the part that comes next (letters and `\W+`) are inside another lookahead. Everything is inside parenthesis, forming a single capturing group. Doing this way, either the vowels or the digits 1/2/3 (but not what comes after them) will be in this group. Hence, the `Matcher` just needs to check group 1. --- This might solve the simpler cases, but what if I needed two groups? Ex: if the numbers after the vowels, or the letters after 1/2/3, also need to be in one group (in this case, in group 2). With branch reset, all we need is: (?|([aeiou]+)([0-9]+)|([123]+)([a-z]+))\W+ But using lookarounds, I have to do something similar to what I did, using another alternation: ``` Pattern pattern = Pattern.compile("([aeiou]+(?=\\d+\\W+)|[123]+(?=[a-z]+\\W+))(\\d+|[a-z]+)(?=\\W+)"); Matcher matcher = pattern.matcher("ae123. 111abc!!"); while (matcher.find()) { System.out.println(matcher.group(1) + "\t" + matcher.group(2)); } ``` In this case, group 2 is simpler than group 1, as it has only the digits or the letters. The problem here is the redundancy: I have to repeat the digits and letters in group 1 lookaheads, and again in group 2. That's because the lookahead just checks what's ahead, and then it "comes back" to where it was (in this case, it comes back to the point immediately after group 1). In order to have these characters in group 2, I need to put them again in the expression. And if I needed more groups, the regex would become even more complex and redundant, with parts of the expression being repeated multiple times, turning it into a maintenance nightmare. Also, this isn't a good solution for cases where each branch of the alternation can have a different number of groups (which would make the regex even more complicated). --- Therefore, there not a good solution yet, at least not one that solves all the cases that a branch reset would, in a "clean and smooth" way. Perhaps there's no way to perfectly emulate it at all, and the only solution is to iterate for the groups, checking if they are set. [1]: https://openjdk.java.net/projects/jdk/16/