Communities

Writing
Writing
Codidact Meta
Codidact Meta
The Great Outdoors
The Great Outdoors
Photography & Video
Photography & Video
Scientific Speculation
Scientific Speculation
Cooking
Cooking
Electrical Engineering
Electrical Engineering
Judaism
Judaism
Languages & Linguistics
Languages & Linguistics
Software Development
Software Development
Mathematics
Mathematics
Christianity
Christianity
Code Golf
Code Golf
Music
Music
Physics
Physics
Linux Systems
Linux Systems
Power Users
Power Users
Tabletop RPGs
Tabletop RPGs
Community Proposals
Community Proposals
tag:snake search within a tag
answers:0 unanswered questions
user:xxxx search by author id
score:0.5 posts with 0.5+ score
"snake oil" exact phrase
votes:4 posts with 4+ votes
created:<1w created < 1 week ago
post_type:xxxx type of post
Search help
Notifications
Mark all as read See all your notifications »
Q&A

Welcome to Software Development on Codidact!

Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.

Comments on Python Regex to parse multiple "word. word. word."

Post

Python Regex to parse multiple "word. word. word."

+6
−0

I'm trying to parse lines like "THIS. THAT..OTHER " so that "THIS. THAT." is found. There can be more than one <word><dot> separated by a space except no space after the last one. Based on the initial feedback, I've added more example lines to the lines array below.

My current results are:

re.compile('\\s?(?P<name>(?:\\w+\\.\\s(?=[\\.\\s]))+)', re.VERBOSE)
REGEX FAILED: [THIS. THAT..OTHER 

Here is the current code. I have tried several different regexes trying to find one that seeks the blank space after a dot for all but the last word/dot combination.

lines = [
    'THIS. THAT..OTHER                  ',
    'THIS. THAT. ANOTHER..OTHER         ',
    'THIS..OTHER                        ',
]

pat = re.compile(r"\s?(?P<name>(?:\w+\.\s(?=[\.\s]))+)", re.VERBOSE)

print(pat)

for line in lines:
    x = pat.match(line)

    if x:

        if hasattr(x, 'groupdict'):
            print(x.groupdict())
        else:
            print(x.groups())
    else:
        print("REGEX FAILED: [{}]".format(line))
History
Why does this post require attention from curators or moderators?
You might want to add some details to your flag.
Why should this post be closed?

1 comment thread

General comments (3 comments)
General comments
ArtOfCode‭ wrote almost 4 years ago

Not entirely clear on what parts of the line you want the regex to match?

CodeFarmer‭ wrote almost 4 years ago · edited almost 4 years ago

Okay, I added a code-quotes around the section I'm trying to find. Its 'THIS. THAT.'

Skipping 1 deleted comment.

Patol75‭ wrote over 3 years ago

CodeFarmer‭ Do you consider that the already existing answers are not satisfactory?