Communities

Writing
Writing
Codidact Meta
Codidact Meta
The Great Outdoors
The Great Outdoors
Photography & Video
Photography & Video
Scientific Speculation
Scientific Speculation
Cooking
Cooking
Electrical Engineering
Electrical Engineering
Judaism
Judaism
Languages & Linguistics
Languages & Linguistics
Software Development
Software Development
Mathematics
Mathematics
Christianity
Christianity
Code Golf
Code Golf
Music
Music
Physics
Physics
Linux Systems
Linux Systems
Power Users
Power Users
Tabletop RPGs
Tabletop RPGs
Community Proposals
Community Proposals
tag:snake search within a tag
answers:0 unanswered questions
user:xxxx search by author id
score:0.5 posts with 0.5+ score
"snake oil" exact phrase
votes:4 posts with 4+ votes
created:<1w created < 1 week ago
post_type:xxxx type of post
Search help
Notifications
Mark all as read See all your notifications »
Code Reviews

Welcome to Software Development on Codidact!

Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.

Comments on Detecting balanced parentheses in Python

Parent

Detecting balanced parentheses in Python

+7
−0

The problem

Given a string s containing just the characters '(', ')', '{', '}', '[' and ']', determine if the input string is valid.

An input string is valid if:

  1. Open brackets are closed by the same type of brackets.
  2. Open brackets are closed in the correct order.

The solution

def isValid(s: str) -> bool:
    if len(s) % 2 != 0:
        return False

    for i in range(len(s) // 2):
        s = s.replace("()", "").replace("[]", "").replace("{}", "")

    return len(s) == 0

My approach to the problem is replacing the pairs. The string is balanced if the string is empty after replacing len(str) // 2 times. Is this a good approach? How can I improve my algorithm?

History
Why does this post require moderator attention?
You might want to add some details to your flag.
Why should this post be closed?

0 comment threads

Post
+5
−0

Instead of replacing the brackets, you could do just one loop, and keep a stack with the opening brackets. Every time you find a closing bracket, check if it corresponds to the stack top: if it's not, the string is invalid, else, pop the opening bracket and proceed to the next character.

Do it until the end of the string, and the stack will be empty if it's a valid one:

def is_balanced(s, brackets={ '(': ')', '[': ']', '{': '}' }):
    if len(s) % 2 != 0:
        return False

    open_brackets = set(brackets.keys())
    close_brackets = set(brackets.values())
    stack = []
    for c in s:
        if c in open_brackets:
            stack.append(c)
        elif c in close_brackets:
            # close bracket without respective opening, string is invalid
            if len(stack) == 0:
                return False
            # close bracket doesn't correspond to last open bracket, string is invalid
            if c != brackets[stack[-1]]:
                return False
            stack.pop()
        else: # invalid char
            return False

    return len(stack) == 0

I put the keys and values inside set's (because searching with the in operator is O(1)), but it didn't make much difference. You can remove it if you want.


I admit I initially thought that calling replace 3 times per iteration, and always do it N / 2 times, would be slower, but to my surprise I found out that it depends on the string being tested. A quick test with timeit module (using your isValid function and is_balanced as defined above):

from timeit import timeit

# run 10 times each test
params = { 'number' : 10, 'globals': globals() }

s = '{[(())]{()[()]}}' * 1000
s += '))'
s += '{[(())]{()[()]}}' * 1000
print('--------------\nUnbalanced in the middle')
print(timeit('isValid(s)', **params))
print(timeit('is_balanced(s)', **params))

s = '))'
s += '{[(())]{()[()]}}' * 1000
s += '{[(())]{()[()]}}' * 1000
print('--------------\nUnbalanced in the beginning')
print(timeit('isValid(s)', **params))
print(timeit('is_balanced(s)', **params))

s = '{[(())]{()[()]}}' * 2000
s += '))'
print('--------------\nUnbalanced in the end')
print(timeit('isValid(s)', **params))
print(timeit('is_balanced(s)', **params))

s = '{[(())]{()[()]}}' * 2000
print('--------------\nBalanced')
print(timeit('isValid(s)', **params))
print(timeit('is_balanced(s)', **params))

In my machine, I've got these results:

--------------
Unbalanced in the middle
0.03587988700019196
0.02324946300359443
--------------
Unbalanced in the beginning
0.026359478004451375
1.2465003237593919e-05
--------------
Unbalanced in the end
0.025080425999476574
0.041097565001109615
--------------
Balanced
0.022329219995299354
0.04195229199831374

For a balanced string, surprisingly - at least to me - the replace approach is faster.

For unbalanced strings, it depends on where the unbalanced brackets are. If they are in the beginning or in the middle, the stack approach wins (with a huge margin if the problem is in the beginning, due to the else clause that checks an invalid char as soon as it finds one). But if they are in the end, the replace approach wins.


I really thought that using replace would always be slower, because each call to replace needs to scan the whole string, searching for the substring to be replaced (and return a new string). Even if you're looping N / 2 times, you're calling replace 3 times per iteration, which I really thought it'd have a huge cost. But that wasn't the case, and this really surprised me.

The only case where the difference is really huge is the aforementioned case, when the unbalanced bracket is in the beginning. One particular case is curious:

s = '{[((ab))]{()[()]}}' * 2000

With this, the replace approach took 4.5 seconds, while the stack approach took around 0.00001 seconds. Not sure if there's an implementation detail for the replace method, that makes this particular case very slow (tested on Python 3.8.5, Ubuntu 20.04).

Another case where the stack approach wins by one order of magnitude is:

# lots of opening brackets, followed by lots of closing brackets
s = ('(' * 2000) + (')' * 2000)

But as I said, I was expecting this difference in all cases. Anyway, that's why we should always test everything, instead of trusting on what we believe to be "obvious".


BTW, testing in another environments, the results vary. In IdeOne.com, the stack approach is always faster for unbalanced strings, and almost the same for balanced ones. And in Repl.it, the results were similar to my machine's.

History
Why does this post require moderator attention?
You might want to add some details to your flag.

1 comment thread

General comments (7 comments)
General comments
elgonzo‭ wrote almost 3 years ago

Your numbers don't appear to be "clean". Look at the "Unbalanced in the beginning" test. How could "is_balanced(s)" take more than 1 sec? Of course, it did. But this is not the performance characteristic of the alogrithm but rather some "disturbance" in your runtime environment. In a clean runtime environment, the execution time of this test run must be much lower than for example for "Unbalanced in the middle", as the algorithm will already bail out at the very first character.

elgonzo‭ wrote almost 3 years ago · edited almost 3 years ago

To ameliorate such effects of uncontrolled disturbing runtime effects and get "better" numbers without resorting to a full benchmark suite, run each individual test at least a couple hundred/thousand times and average the execution time. Also, prealloacte the stack to a size of len(s)/2 and performance numbers should improve (hopefully)

hkotsubo‭ wrote almost 3 years ago

@elgonzo The "unbalanced in the beginning" took 1.2465003237593919e-05 seconds - note the scientific notation, with "e-05" in the end, which actually means 0.0000124... seconds.

elgonzo‭ wrote almost 3 years ago

@hkotsubo‭ oops, i have to admit i didn't notice the scientific notation. My bad, i am sorry... :-)

hkotsubo‭ wrote almost 3 years ago · edited almost 3 years ago

And the timeit module already takes care of most "disturbances", and does the tests as clean as possible, running each test lots of times and doing some other stuff that I admit I don't fully know (but I've read some docs mentioning some things it does to eliminate any disturbances). Anyway, I increased the number argument and the results were similar

elgonzo‭ wrote almost 3 years ago · edited almost 3 years ago

Yeah, my false perception of an outlier due to me not noticing the E notation led me to a misbelief that the test functions where only executed once. Guess i need some shut-eye...

hkotsubo‭ wrote almost 3 years ago

@elgonzo AFAIK, lists in Python can't be preallocated. You can create one with N elements, and then use those already-created spaces. I've tested here, but that didn't make the code much faster.