Code Reviews

+6
−0

# The problem

Given a string s containing just the characters `'('`, `')'`, `'{'`, `'}'`, `'['` and `']'`, determine if the input string is valid.

An input string is valid if:

1. Open brackets are closed by the same type of brackets.
2. Open brackets are closed in the correct order.

# The solution

``````def isValid(s: str) -> bool:
if len(s) % 2 != 0:
return False

for i in range(len(s) // 2):
s = s.replace("()", "").replace("[]", "").replace("{}", "")

return len(s) == 0
``````

My approach to the problem is replacing the pairs. The string is balanced if the string is empty after replacing `len(str) // 2` times. Is this a good approach? How can I improve my algorithm?

Why does this post require moderator attention?
Why should this post be closed?

+6
−0

Use a stack while just scanning your string once from left to right. No need for multiple (performance-wise) expensive string replacements. If implemented right, the stack will only ever contain at maximum `len(s)/2` elements.

The algorithm is rather straightforward:

Check first whether the string is of odd or even length. If the string is of odd length, it must be an unbalanced string (Your code already does this part, i just mentioned it here again simply for having a full description of the stack-based alogrithm.)

For any opening paranthesis/bracket/brace in the string, push some value/token that represents the type of this opening bracket onto the stack (that can be the opening bracket characters themselves, the associated closing brackets, or some other identifying values such as from an enum). If before doing a push the count of elements in the stack is already `len(s)/2` you know the string must be unbalanced and the routine can abort now.

For any closing bracket in the string, pop a value/token from the stack and check whether that value/token represents the same bracket type as the closing bracket. If the stack is empty before attempting to pop, or the popped value/token mismatches the type of the closing bracket in the string, you know the string must be unbalanced and the routine can abort now.

When you finished scanning the string from left to right and the stack is not empty, you again know that the string must be unbalanced.

When the stack is empty after completing the scan of the text string, you know the string was balanced (or the string was empty).

I leave it to you as an exercise to translate this algorithm into concrete Python code.

P.S.: As a further exercise, you might try implementing the same stack-based algorithm while avoiding using an explicit stack data structure. This can be achieved by implementing the algorithm in a recursive manner, with the call stack conceptually taking the role of the stack that's keeping track of the (nested) bracket pairs.

Why does this post require moderator attention?

+4
−0

Instead of replacing the brackets, you could do just one loop, and keep a stack with the opening brackets. Every time you find a closing bracket, check if it corresponds to the stack top: if it's not, the string is invalid, else, pop the opening bracket and proceed to the next character.

Do it until the end of the string, and the stack will be empty if it's a valid one:

``````def is_balanced(s, brackets={ '(': ')', '[': ']', '{': '}' }):
if len(s) % 2 != 0:
return False

open_brackets = set(brackets.keys())
close_brackets = set(brackets.values())
stack = []
for c in s:
if c in open_brackets:
stack.append(c)
elif c in close_brackets:
# close bracket without respective opening, string is invalid
if len(stack) == 0:
return False
# close bracket doesn't correspond to last open bracket, string is invalid
if c != brackets[stack[-1]]:
return False
stack.pop()
else: # invalid char
return False

return len(stack) == 0
``````

I put the `keys` and `values` inside `set`'s (because searching with the `in` operator is O(1)), but it didn't make much difference. You can remove it if you want.

I admit I initially thought that calling `replace` 3 times per iteration, and always do it N / 2 times, would be slower, but to my surprise I found out that it depends on the string being tested. A quick test with `timeit` module (using your `isValid` function and `is_balanced` as defined above):

``````from timeit import timeit

# run 10 times each test
params = { 'number' : 10, 'globals': globals() }

s = '{[(())]{()[()]}}' * 1000
s += '))'
s += '{[(())]{()[()]}}' * 1000
print('--------------\nUnbalanced in the middle')
print(timeit('isValid(s)', **params))
print(timeit('is_balanced(s)', **params))

s = '))'
s += '{[(())]{()[()]}}' * 1000
s += '{[(())]{()[()]}}' * 1000
print('--------------\nUnbalanced in the beginning')
print(timeit('isValid(s)', **params))
print(timeit('is_balanced(s)', **params))

s = '{[(())]{()[()]}}' * 2000
s += '))'
print('--------------\nUnbalanced in the end')
print(timeit('isValid(s)', **params))
print(timeit('is_balanced(s)', **params))

s = '{[(())]{()[()]}}' * 2000
print('--------------\nBalanced')
print(timeit('isValid(s)', **params))
print(timeit('is_balanced(s)', **params))
``````

In my machine, I've got these results:

``````--------------
Unbalanced in the middle
0.03587988700019196
0.02324946300359443
--------------
Unbalanced in the beginning
0.026359478004451375
1.2465003237593919e-05
--------------
Unbalanced in the end
0.025080425999476574
0.041097565001109615
--------------
Balanced
0.022329219995299354
0.04195229199831374
``````

For a balanced string, surprisingly - at least to me - the `replace` approach is faster.

For unbalanced strings, it depends on where the unbalanced brackets are. If they are in the beginning or in the middle, the stack approach wins (with a huge margin if the problem is in the beginning, due to the `else` clause that checks an invalid char as soon as it finds one). But if they are in the end, the `replace` approach wins.

I really thought that using `replace` would always be slower, because each call to `replace` needs to scan the whole string, searching for the substring to be replaced (and return a new string). Even if you're looping N / 2 times, you're calling `replace` 3 times per iteration, which I really thought it'd have a huge cost. But that wasn't the case, and this really surprised me.

The only case where the difference is really huge is the aforementioned case, when the unbalanced bracket is in the beginning. One particular case is curious:

``````s = '{[((ab))]{()[()]}}' * 2000
``````

With this, the `replace` approach took 4.5 seconds, while the stack approach took around 0.00001 seconds. Not sure if there's an implementation detail for the `replace` method, that makes this particular case very slow (tested on Python 3.8.5, Ubuntu 20.04).

Another case where the stack approach wins by one order of magnitude is:

``````# lots of opening brackets, followed by lots of closing brackets
s = ('(' * 2000) + (')' * 2000)
``````

But as I said, I was expecting this difference in all cases. Anyway, that's why we should always test everything, instead of trusting on what we believe to be "obvious".

BTW, testing in another environments, the results vary. In IdeOne.com, the stack approach is always faster for unbalanced strings, and almost the same for balanced ones. And in Repl.it, the results were similar to my machine's.

Why does this post require moderator attention?

+1
−0

You've got an inefficiency in your code, as you always do replacements 3/2 times the length of the string. That is unnecessarily expensive.

By instead testing in each iteration whether the length actually changed, you get a much improved performance:

``````def is_valid(s: str) -> bool:
if len(s) % 1 != 0:
return False

# this initial value causes the loop to be skipped entirely for empty strings
prevlen = 0

# stop as soon as no further replacements have been made
while len(s) != prevlen:
prevlen = len(s)
s = s.replace("()", "").replace("[]", "").replace("{}", "")

return len(s) == 0
``````

I've put it together with your code and hkotsubo's timing code on tio.run, and got the following:

``````--------------
Unbalanced in the middle
0.04957069695228711
0.002779866976197809
--------------
Unbalanced in the beginning
0.05233071401016787
0.0026999289984814823
--------------
Unbalanced in the end
0.05092682002577931
0.0026755660073831677
--------------
Balanced
0.047405752004124224
0.002398615994025022
``````

Try it online!

Why does this post require moderator attention? 