Welcome to Software Development on Codidact!
Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.
Why python regexps look expecting a begin match, but not an ending one?
My impression is that the regexps behave a little bit odd:
>>> import re
>>> r=re.compile("test")
>>> r.match("test")
<re.Match object; span=(0, 4), match='test'>
>>> r.match("1test")
>>> r.match("test2")
<re.Match object; span=(0, 4), match='test'>
>>> r.match("1test2")
>>>
I have also tried python-pcre, it behaves on the same way. If the "regexp" is only a single word, it should behave as a substring match (or a full-line match). It seems matching lines starting with "test", but not the ones ending with it (or containining it somewhere).
Why?
2 answers
The following users marked this post as Works for me:
User | Comment | Date |
---|---|---|
peterh | (no comment) | Jun 4, 2025 at 13:24 |
The documentation for re.match(...)
is explicit that it only matches at position 0.
If you're asking this question, what you probably want is re.search(...)
to match at any point within the string.
1 comment thread
It seems matching lines starting with "test", but not the ones ending with it (or containining it somewhere).
Yes, re.match matches lines that start with the described pattern:
If zero or more characters at the beginning of string match the regular expression pattern, return a corresponding Match.
To check that the pattern describes the entire line, use re.fullmatch; to do a substring match, use re.search.
There's also a separate heading in the documentation describing these differences.
Besides choosing the operation to do with the regex, you can also use "anchors" for the pattern that only match at the beginning or end of the string. To match at the beginning, use ^
as the first character in the regex; to match at the end, use $
as the last character. These are "zero-width" matches; when the regex engine checks for them, it doesn't associate them with any characters from the input string - it only checks the current position as it's matching.
Why?
Of course we should have different functions to do different things. The remaining question is why we should have a match
at all.
The simplest explanation I can think of is that it's easy to implement efficiently, and often useful. In particular, you can easily and efficiently implement fullmatch
in terms of match
- first match, and then see whether there is anything left in the input string after matching the pattern. But we can't do it the other way around: if we only have fullmatch
and want to get the match
effect, we can only modify the regex pattern to have "also match any characters after that" (.*
), and matching against that takes extra time (or special work to optimize the regex engine).
Meanwhile, a search
for a substring must be slower - in the worst case, you basically need to check at every position in the input.
but not the ones ending with it
It should be noted that there isn't an efficient way to check whether the input ends with a regex. That's because matching a regex requires scanning forwards in the string from some starting point, but the regex pattern doesn't match a fixed amount of data. Therefore, if the pattern matches at the end, we don't know where to start looking - we need to do the slow searching operation, and then see if one of those matches is at the end.
1 comment thread