Welcome to Software Development on Codidact!
Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.
Post History
Instead of replacing the brackets, you could do just one loop, and keep a stack with the opening brackets. Every time you find a closing bracket, check if it corresponds to the stack top: if it's n...
Answer
#5: Post edited
- Instead of replacing the brackets, you could do just one loop, and keep a stack with the opening brackets. Every time you find a closing bracket, check if it corresponds to the stack top: if it's not, the string is invalid, else, pop the opening bracket and proceed to the next character.
- Do it until the end of the string, and the stack will be empty if it's a valid one:
- ```python
- def is_balanced(s, brackets={ '(': ')', '[': ']', '{': '}' }):
- if len(s) % 2 != 0:
- return False
- open_brackets = set(brackets.keys())
- close_brackets = set(brackets.values())
- stack = []
- for c in s:
- if c in open_brackets:
- stack.append(c)
- elif c in close_brackets:
- # close bracket without respective opening, string is invalid
- if len(stack) == 0:
- return False
- # close bracket doesn't correspond to last open bracket, string is invalid
- if c != brackets[stack[-1]]:
- return False
- stack.pop()
- else: # invalid char
- return False
- return len(stack) == 0
- ```
- I put the `keys` and `values` inside `set`'s (because searching with the `in` operator is O(1)), but it didn't make much difference. You can remove it if you want.
- ---
- I admit I initially thought that calling `replace` 3 times per iteration, and always do it N / 2 times, would be slower, but to my surprise I found out that it depends on the string being tested. A quick test with [`timeit` module](https://docs.python.org/3/library/timeit.html) (using your `isValid` function and `is_balanced` as defined above):
- ```python
- from timeit import timeit
- # run 10 times each test
- params = { 'number' : 10, 'globals': globals() }
- s = '{[(())]{()[()]}}' * 1000
- s += '))'
- s += '{[(())]{()[()]}}' * 1000
- print('--------------\nUnbalanced in the middle')
- print(timeit('isValid(s)', **params))
- print(timeit('is_balanced(s)', **params))
- s = '))'
- s += '{[(())]{()[()]}}' * 1000
- s += '{[(())]{()[()]}}' * 1000
- print('--------------\nUnbalanced in the beginning')
- print(timeit('isValid(s)', **params))
- print(timeit('is_balanced(s)', **params))
- s = '{[(())]{()[()]}}' * 2000
- s += '))'
- print('--------------\nUnbalanced in the end')
- print(timeit('isValid(s)', **params))
- print(timeit('is_balanced(s)', **params))
- s = '{[(())]{()[()]}}' * 2000
- print('--------------\nBalanced')
- print(timeit('isValid(s)', **params))
- print(timeit('is_balanced(s)', **params))
- ```
- In my machine, I've got these results:
- ```none
- --------------
- Unbalanced in the middle
- 0.03587988700019196
- 0.02324946300359443
- --------------
- Unbalanced in the beginning
- 0.026359478004451375
- 1.2465003237593919e-05
- --------------
- Unbalanced in the end
- 0.025080425999476574
- 0.041097565001109615
- --------------
- Balanced
- 0.022329219995299354
- 0.04195229199831374
- ```
- For a balanced string, surprisingly - at least to me - the `replace` approach is faster.
- For unbalanced strings, it depends on where the unbalanced brackets are. If they are in the beginning or in the middle, the stack approach wins (with a huge margin if the problem is in the beginning, due to the `else` clause that checks an invalid char as soon as it finds one). But if they are in the end, the `replace` approach wins.
- ---
I really thought that using `replace` would always be slower, because each call to `replace` needs to scan the whole string, searching for the substring to be replaced (and return a new string). Even if you're looping N / 2 times, you're calling `replace` 3 times per iteration, which I really thought it'd have a huge cost. But that wasn' the case, and this really surprised me.- The only case where the difference is really huge is the aforementioned case, when the unbalanced bracket is in the beginning. One particular case is curious:
- ```python
- s = '{[((ab))]{()[()]}}' * 2000
- ```
- With this, the `replace` approach took 4.5 seconds, while the stack approach took around 0.00001 seconds. Not sure if there's an implementation detail for the `replace` method, that makes this particular case very slow (tested on Python 3.8.5, Ubuntu 20.04).
- Another case where the stack approach wins by one order of magnitude is:
- ```python
- # lots of opening brackets, followed by lots of closing brackets
- s = ('(' * 2000) + (')' * 2000)
- ```
- But as I said, I was expecting this difference in all cases. Anyway, that's why we should always test everything, instead of trusting on what we believe to be "obvious".
- ---
- BTW, testing in another environments, the results vary. [In IdeOne.com](https://ideone.com/vaKzgX), the stack approach is always faster for unbalanced strings, and almost the same for balanced ones. And [in Repl.it](https://replit.com/@hkotsubo/TautHollowStructure#main.py), the results were similar to my machine's.
- Instead of replacing the brackets, you could do just one loop, and keep a stack with the opening brackets. Every time you find a closing bracket, check if it corresponds to the stack top: if it's not, the string is invalid, else, pop the opening bracket and proceed to the next character.
- Do it until the end of the string, and the stack will be empty if it's a valid one:
- ```python
- def is_balanced(s, brackets={ '(': ')', '[': ']', '{': '}' }):
- if len(s) % 2 != 0:
- return False
- open_brackets = set(brackets.keys())
- close_brackets = set(brackets.values())
- stack = []
- for c in s:
- if c in open_brackets:
- stack.append(c)
- elif c in close_brackets:
- # close bracket without respective opening, string is invalid
- if len(stack) == 0:
- return False
- # close bracket doesn't correspond to last open bracket, string is invalid
- if c != brackets[stack[-1]]:
- return False
- stack.pop()
- else: # invalid char
- return False
- return len(stack) == 0
- ```
- I put the `keys` and `values` inside `set`'s (because searching with the `in` operator is O(1)), but it didn't make much difference. You can remove it if you want.
- ---
- I admit I initially thought that calling `replace` 3 times per iteration, and always do it N / 2 times, would be slower, but to my surprise I found out that it depends on the string being tested. A quick test with [`timeit` module](https://docs.python.org/3/library/timeit.html) (using your `isValid` function and `is_balanced` as defined above):
- ```python
- from timeit import timeit
- # run 10 times each test
- params = { 'number' : 10, 'globals': globals() }
- s = '{[(())]{()[()]}}' * 1000
- s += '))'
- s += '{[(())]{()[()]}}' * 1000
- print('--------------\nUnbalanced in the middle')
- print(timeit('isValid(s)', **params))
- print(timeit('is_balanced(s)', **params))
- s = '))'
- s += '{[(())]{()[()]}}' * 1000
- s += '{[(())]{()[()]}}' * 1000
- print('--------------\nUnbalanced in the beginning')
- print(timeit('isValid(s)', **params))
- print(timeit('is_balanced(s)', **params))
- s = '{[(())]{()[()]}}' * 2000
- s += '))'
- print('--------------\nUnbalanced in the end')
- print(timeit('isValid(s)', **params))
- print(timeit('is_balanced(s)', **params))
- s = '{[(())]{()[()]}}' * 2000
- print('--------------\nBalanced')
- print(timeit('isValid(s)', **params))
- print(timeit('is_balanced(s)', **params))
- ```
- In my machine, I've got these results:
- ```none
- --------------
- Unbalanced in the middle
- 0.03587988700019196
- 0.02324946300359443
- --------------
- Unbalanced in the beginning
- 0.026359478004451375
- 1.2465003237593919e-05
- --------------
- Unbalanced in the end
- 0.025080425999476574
- 0.041097565001109615
- --------------
- Balanced
- 0.022329219995299354
- 0.04195229199831374
- ```
- For a balanced string, surprisingly - at least to me - the `replace` approach is faster.
- For unbalanced strings, it depends on where the unbalanced brackets are. If they are in the beginning or in the middle, the stack approach wins (with a huge margin if the problem is in the beginning, due to the `else` clause that checks an invalid char as soon as it finds one). But if they are in the end, the `replace` approach wins.
- ---
- I really thought that using `replace` would always be slower, because each call to `replace` needs to scan the whole string, searching for the substring to be replaced (and return a new string). Even if you're looping N / 2 times, you're calling `replace` 3 times per iteration, which I really thought it'd have a huge cost. But that wasn't the case, and this really surprised me.
- The only case where the difference is really huge is the aforementioned case, when the unbalanced bracket is in the beginning. One particular case is curious:
- ```python
- s = '{[((ab))]{()[()]}}' * 2000
- ```
- With this, the `replace` approach took 4.5 seconds, while the stack approach took around 0.00001 seconds. Not sure if there's an implementation detail for the `replace` method, that makes this particular case very slow (tested on Python 3.8.5, Ubuntu 20.04).
- Another case where the stack approach wins by one order of magnitude is:
- ```python
- # lots of opening brackets, followed by lots of closing brackets
- s = ('(' * 2000) + (')' * 2000)
- ```
- But as I said, I was expecting this difference in all cases. Anyway, that's why we should always test everything, instead of trusting on what we believe to be "obvious".
- ---
- BTW, testing in another environments, the results vary. [In IdeOne.com](https://ideone.com/vaKzgX), the stack approach is always faster for unbalanced strings, and almost the same for balanced ones. And [in Repl.it](https://replit.com/@hkotsubo/TautHollowStructure#main.py), the results were similar to my machine's.
#4: Post edited
Instead of replacing the brackets, you could do just one loop, and keep a stack with the opening brackets. Every time you find a closing bracket, check if it corresponds to the stack top: if it's not, the string is valid, else, pop the opening bracket.- Do it until the end of the string, and the stack will be empty if it's a valid one:
- ```python
- def is_balanced(s, brackets={ '(': ')', '[': ']', '{': '}' }):
- if len(s) % 2 != 0:
- return False
- open_brackets = set(brackets.keys())
- close_brackets = set(brackets.values())
- stack = []
- for c in s:
- if c in open_brackets:
- stack.append(c)
- elif c in close_brackets:
- # close bracket without respective opening, string is invalid
- if len(stack) == 0:
- return False
- # close bracket doesn't correspond to last open bracket, string is invalid
- if c != brackets[stack[-1]]:
- return False
- stack.pop()
- else: # invalid char
- return False
- return len(stack) == 0
- ```
- I put the `keys` and `values` inside `set`'s (because searching with the `in` operator is O(1)), but it didn't make much difference. You can remove it if you want.
- ---
- I admit I initially thought that calling `replace` 3 times per iteration, and always do it N / 2 times, would be slower, but to my surprise I found out that it depends on the string being tested. A quick test with [`timeit` module](https://docs.python.org/3/library/timeit.html) (using your `isValid` function and `is_balanced` as defined above):
- ```python
- from timeit import timeit
- # run 10 times each test
- params = { 'number' : 10, 'globals': globals() }
- s = '{[(())]{()[()]}}' * 1000
- s += '))'
- s += '{[(())]{()[()]}}' * 1000
- print('--------------\nUnbalanced in the middle')
- print(timeit('isValid(s)', **params))
- print(timeit('is_balanced(s)', **params))
- s = '))'
- s += '{[(())]{()[()]}}' * 1000
- s += '{[(())]{()[()]}}' * 1000
- print('--------------\nUnbalanced in the beginning')
- print(timeit('isValid(s)', **params))
- print(timeit('is_balanced(s)', **params))
- s = '{[(())]{()[()]}}' * 2000
- s += '))'
- print('--------------\nUnbalanced in the end')
- print(timeit('isValid(s)', **params))
- print(timeit('is_balanced(s)', **params))
- s = '{[(())]{()[()]}}' * 2000
- print('--------------\nBalanced')
- print(timeit('isValid(s)', **params))
- print(timeit('is_balanced(s)', **params))
- ```
- In my machine, I've got these results:
- ```none
- --------------
- Unbalanced in the middle
- 0.03587988700019196
- 0.02324946300359443
- --------------
- Unbalanced in the beginning
- 0.026359478004451375
- 1.2465003237593919e-05
- --------------
- Unbalanced in the end
- 0.025080425999476574
- 0.041097565001109615
- --------------
- Balanced
- 0.022329219995299354
- 0.04195229199831374
- ```
- For a balanced string, surprisingly - at least to me - the `replace` approach is faster.
- For unbalanced strings, it depends on where the unbalanced brackets are. If they are in the beginning or in the middle, the stack approach wins (with a huge margin if the problem is in the beginning, due to the `else` clause that checks an invalid char as soon as it finds one). But if they are in the end, the `replace` approach wins.
- ---
- I really thought that using `replace` would always be slower, because each call to `replace` needs to scan the whole string, searching for the substring to be replaced (and return a new string). Even if you're looping N / 2 times, you're calling `replace` 3 times per iteration, which I really thought it'd have a huge cost. But that wasn' the case, and this really surprised me.
- The only case where the difference is really huge is the aforementioned case, when the unbalanced bracket is in the beginning. One particular case is curious:
- ```python
- s = '{[((ab))]{()[()]}}' * 2000
- ```
- With this, the `replace` approach took 4.5 seconds, while the stack approach took around 0.00001 seconds. Not sure if there's an implementation detail for the `replace` method, that makes this particular case very slow (tested on Python 3.8.5, Ubuntu 20.04).
- Another case where the stack approach wins by one order of magnitude is:
- ```python
- # lots of opening brackets, followed by lots of closing brackets
- s = ('(' * 2000) + (')' * 2000)
- ```
- But as I said, I was expecting this difference in all cases. Anyway, that's why we should always test everything, instead of trusting on what we believe to be "obvious".
- ---
- BTW, testing in another environments, the results vary. [In IdeOne.com](https://ideone.com/vaKzgX), the stack approach is always faster for unbalanced strings, and almost the same for balanced ones. And [in Repl.it](https://replit.com/@hkotsubo/TautHollowStructure#main.py), the results were similar to my machine's.
- Instead of replacing the brackets, you could do just one loop, and keep a stack with the opening brackets. Every time you find a closing bracket, check if it corresponds to the stack top: if it's not, the string is invalid, else, pop the opening bracket and proceed to the next character.
- Do it until the end of the string, and the stack will be empty if it's a valid one:
- ```python
- def is_balanced(s, brackets={ '(': ')', '[': ']', '{': '}' }):
- if len(s) % 2 != 0:
- return False
- open_brackets = set(brackets.keys())
- close_brackets = set(brackets.values())
- stack = []
- for c in s:
- if c in open_brackets:
- stack.append(c)
- elif c in close_brackets:
- # close bracket without respective opening, string is invalid
- if len(stack) == 0:
- return False
- # close bracket doesn't correspond to last open bracket, string is invalid
- if c != brackets[stack[-1]]:
- return False
- stack.pop()
- else: # invalid char
- return False
- return len(stack) == 0
- ```
- I put the `keys` and `values` inside `set`'s (because searching with the `in` operator is O(1)), but it didn't make much difference. You can remove it if you want.
- ---
- I admit I initially thought that calling `replace` 3 times per iteration, and always do it N / 2 times, would be slower, but to my surprise I found out that it depends on the string being tested. A quick test with [`timeit` module](https://docs.python.org/3/library/timeit.html) (using your `isValid` function and `is_balanced` as defined above):
- ```python
- from timeit import timeit
- # run 10 times each test
- params = { 'number' : 10, 'globals': globals() }
- s = '{[(())]{()[()]}}' * 1000
- s += '))'
- s += '{[(())]{()[()]}}' * 1000
- print('--------------\nUnbalanced in the middle')
- print(timeit('isValid(s)', **params))
- print(timeit('is_balanced(s)', **params))
- s = '))'
- s += '{[(())]{()[()]}}' * 1000
- s += '{[(())]{()[()]}}' * 1000
- print('--------------\nUnbalanced in the beginning')
- print(timeit('isValid(s)', **params))
- print(timeit('is_balanced(s)', **params))
- s = '{[(())]{()[()]}}' * 2000
- s += '))'
- print('--------------\nUnbalanced in the end')
- print(timeit('isValid(s)', **params))
- print(timeit('is_balanced(s)', **params))
- s = '{[(())]{()[()]}}' * 2000
- print('--------------\nBalanced')
- print(timeit('isValid(s)', **params))
- print(timeit('is_balanced(s)', **params))
- ```
- In my machine, I've got these results:
- ```none
- --------------
- Unbalanced in the middle
- 0.03587988700019196
- 0.02324946300359443
- --------------
- Unbalanced in the beginning
- 0.026359478004451375
- 1.2465003237593919e-05
- --------------
- Unbalanced in the end
- 0.025080425999476574
- 0.041097565001109615
- --------------
- Balanced
- 0.022329219995299354
- 0.04195229199831374
- ```
- For a balanced string, surprisingly - at least to me - the `replace` approach is faster.
- For unbalanced strings, it depends on where the unbalanced brackets are. If they are in the beginning or in the middle, the stack approach wins (with a huge margin if the problem is in the beginning, due to the `else` clause that checks an invalid char as soon as it finds one). But if they are in the end, the `replace` approach wins.
- ---
- I really thought that using `replace` would always be slower, because each call to `replace` needs to scan the whole string, searching for the substring to be replaced (and return a new string). Even if you're looping N / 2 times, you're calling `replace` 3 times per iteration, which I really thought it'd have a huge cost. But that wasn' the case, and this really surprised me.
- The only case where the difference is really huge is the aforementioned case, when the unbalanced bracket is in the beginning. One particular case is curious:
- ```python
- s = '{[((ab))]{()[()]}}' * 2000
- ```
- With this, the `replace` approach took 4.5 seconds, while the stack approach took around 0.00001 seconds. Not sure if there's an implementation detail for the `replace` method, that makes this particular case very slow (tested on Python 3.8.5, Ubuntu 20.04).
- Another case where the stack approach wins by one order of magnitude is:
- ```python
- # lots of opening brackets, followed by lots of closing brackets
- s = ('(' * 2000) + (')' * 2000)
- ```
- But as I said, I was expecting this difference in all cases. Anyway, that's why we should always test everything, instead of trusting on what we believe to be "obvious".
- ---
- BTW, testing in another environments, the results vary. [In IdeOne.com](https://ideone.com/vaKzgX), the stack approach is always faster for unbalanced strings, and almost the same for balanced ones. And [in Repl.it](https://replit.com/@hkotsubo/TautHollowStructure#main.py), the results were similar to my machine's.
#3: Post edited
- Instead of replacing the brackets, you could do just one loop, and keep a stack with the opening brackets. Every time you find a closing bracket, check if it corresponds to the stack top: if it's not, the string is valid, else, pop the opening bracket.
- Do it until the end of the string, and the stack will be empty if it's a valid one:
- ```python
- def is_balanced(s, brackets={ '(': ')', '[': ']', '{': '}' }):
- if len(s) % 2 != 0:
- return False
- open_brackets = set(brackets.keys())
- close_brackets = set(brackets.values())
- stack = []
- for c in s:
- if c in open_brackets:
- stack.append(c)
- elif c in close_brackets:
- # close bracket without respective opening, string is invalid
- if len(stack) == 0:
- return False
- # close bracket doesn't correspond to last open bracket, string is invalid
- if c != brackets[stack[-1]]:
- return False
- stack.pop()
- else: # invalid char
- return False
- return len(stack) == 0
- ```
- I put the `keys` and `values` inside `set`'s (because searching with the `in` operator is O(1)), but it didn't make much difference. You can remove it if you want.
- ---
- I admit I initially thought that calling `replace` 3 times per iteration, and always do it N / 2 times, would be slower, but to my surprise I found out that it depends on the string being tested. A quick test with [`timeit` module](https://docs.python.org/3/library/timeit.html) (using your `isValid` function and `is_balanced` as defined above):
- ```python
- from timeit import timeit
- # run 10 times each test
- params = { 'number' : 10, 'globals': globals() }
- s = '{[(())]{()[()]}}' * 1000
- s += '))'
- s += '{[(())]{()[()]}}' * 1000
- print('--------------\nUnbalanced in the middle')
- print(timeit('isValid(s)', **params))
- print(timeit('is_balanced(s)', **params))
- s = '))'
- s += '{[(())]{()[()]}}' * 1000
- s += '{[(())]{()[()]}}' * 1000
- print('--------------\nUnbalanced in the beginning')
- print(timeit('isValid(s)', **params))
- print(timeit('is_balanced(s)', **params))
- s = '{[(())]{()[()]}}' * 2000
- s += '))'
- print('--------------\nUnbalanced in the end')
- print(timeit('isValid(s)', **params))
- print(timeit('is_balanced(s)', **params))
- s = '{[(())]{()[()]}}' * 2000
- print('--------------\nBalanced')
- print(timeit('isValid(s)', **params))
- print(timeit('is_balanced(s)', **params))
- ```
- In my machine, I've got these results:
- ```none
- --------------
- Unbalanced in the middle
- 0.03587988700019196
- 0.02324946300359443
- --------------
- Unbalanced in the beginning
- 0.026359478004451375
- 1.2465003237593919e-05
- --------------
- Unbalanced in the end
- 0.025080425999476574
- 0.041097565001109615
- --------------
- Balanced
- 0.022329219995299354
- 0.04195229199831374
- ```
- For a balanced string, surprisingly - at least to me - the `replace` approach is faster.
- For unbalanced strings, it depends on where the unbalanced brackets are. If they are in the beginning or in the middle, the stack approach wins (with a huge margin if the problem is in the beginning, due to the `else` clause that checks an invalid char as soon as it finds one). But if they are in the end, the `replace` approach wins.
- ---
- I really thought that using `replace` would always be slower, because each call to `replace` needs to scan the whole string, searching for the substring to be replaced (and return a new string). Even if you're looping N / 2 times, you're calling `replace` 3 times per iteration, which I really thought it'd have a huge cost. But that wasn' the case, and this really surprised me.
- The only case where the difference is really huge is the aforementioned case, when the unbalanced bracket is in the beginning. One particular case is curious:
- ```python
- s = '{[((ab))]{()[()]}}' * 2000
- ```
With this, the `replace` approach took 4.5 seconds, while the stack approach took around 0.00001 seconds. Not sure if there's an implementation detail for the `replace` method, that makes this particular case very slow (tested on Python 3.8.5, Ubuntu 20.04), but that's the difference I was expecting in all cases. Anyway, that's why we should always test everything, instead of trusting on what we believe to be "obvious".- ---
- BTW, testing in another environments, the results vary. [In IdeOne.com](https://ideone.com/vaKzgX), the stack approach is always faster for unbalanced strings, and almost the same for balanced ones. And [in Repl.it](https://replit.com/@hkotsubo/TautHollowStructure#main.py), the results were similar to my machine's.
- Instead of replacing the brackets, you could do just one loop, and keep a stack with the opening brackets. Every time you find a closing bracket, check if it corresponds to the stack top: if it's not, the string is valid, else, pop the opening bracket.
- Do it until the end of the string, and the stack will be empty if it's a valid one:
- ```python
- def is_balanced(s, brackets={ '(': ')', '[': ']', '{': '}' }):
- if len(s) % 2 != 0:
- return False
- open_brackets = set(brackets.keys())
- close_brackets = set(brackets.values())
- stack = []
- for c in s:
- if c in open_brackets:
- stack.append(c)
- elif c in close_brackets:
- # close bracket without respective opening, string is invalid
- if len(stack) == 0:
- return False
- # close bracket doesn't correspond to last open bracket, string is invalid
- if c != brackets[stack[-1]]:
- return False
- stack.pop()
- else: # invalid char
- return False
- return len(stack) == 0
- ```
- I put the `keys` and `values` inside `set`'s (because searching with the `in` operator is O(1)), but it didn't make much difference. You can remove it if you want.
- ---
- I admit I initially thought that calling `replace` 3 times per iteration, and always do it N / 2 times, would be slower, but to my surprise I found out that it depends on the string being tested. A quick test with [`timeit` module](https://docs.python.org/3/library/timeit.html) (using your `isValid` function and `is_balanced` as defined above):
- ```python
- from timeit import timeit
- # run 10 times each test
- params = { 'number' : 10, 'globals': globals() }
- s = '{[(())]{()[()]}}' * 1000
- s += '))'
- s += '{[(())]{()[()]}}' * 1000
- print('--------------\nUnbalanced in the middle')
- print(timeit('isValid(s)', **params))
- print(timeit('is_balanced(s)', **params))
- s = '))'
- s += '{[(())]{()[()]}}' * 1000
- s += '{[(())]{()[()]}}' * 1000
- print('--------------\nUnbalanced in the beginning')
- print(timeit('isValid(s)', **params))
- print(timeit('is_balanced(s)', **params))
- s = '{[(())]{()[()]}}' * 2000
- s += '))'
- print('--------------\nUnbalanced in the end')
- print(timeit('isValid(s)', **params))
- print(timeit('is_balanced(s)', **params))
- s = '{[(())]{()[()]}}' * 2000
- print('--------------\nBalanced')
- print(timeit('isValid(s)', **params))
- print(timeit('is_balanced(s)', **params))
- ```
- In my machine, I've got these results:
- ```none
- --------------
- Unbalanced in the middle
- 0.03587988700019196
- 0.02324946300359443
- --------------
- Unbalanced in the beginning
- 0.026359478004451375
- 1.2465003237593919e-05
- --------------
- Unbalanced in the end
- 0.025080425999476574
- 0.041097565001109615
- --------------
- Balanced
- 0.022329219995299354
- 0.04195229199831374
- ```
- For a balanced string, surprisingly - at least to me - the `replace` approach is faster.
- For unbalanced strings, it depends on where the unbalanced brackets are. If they are in the beginning or in the middle, the stack approach wins (with a huge margin if the problem is in the beginning, due to the `else` clause that checks an invalid char as soon as it finds one). But if they are in the end, the `replace` approach wins.
- ---
- I really thought that using `replace` would always be slower, because each call to `replace` needs to scan the whole string, searching for the substring to be replaced (and return a new string). Even if you're looping N / 2 times, you're calling `replace` 3 times per iteration, which I really thought it'd have a huge cost. But that wasn' the case, and this really surprised me.
- The only case where the difference is really huge is the aforementioned case, when the unbalanced bracket is in the beginning. One particular case is curious:
- ```python
- s = '{[((ab))]{()[()]}}' * 2000
- ```
- With this, the `replace` approach took 4.5 seconds, while the stack approach took around 0.00001 seconds. Not sure if there's an implementation detail for the `replace` method, that makes this particular case very slow (tested on Python 3.8.5, Ubuntu 20.04).
- Another case where the stack approach wins by one order of magnitude is:
- ```python
- # lots of opening brackets, followed by lots of closing brackets
- s = ('(' * 2000) + (')' * 2000)
- ```
- But as I said, I was expecting this difference in all cases. Anyway, that's why we should always test everything, instead of trusting on what we believe to be "obvious".
- ---
- BTW, testing in another environments, the results vary. [In IdeOne.com](https://ideone.com/vaKzgX), the stack approach is always faster for unbalanced strings, and almost the same for balanced ones. And [in Repl.it](https://replit.com/@hkotsubo/TautHollowStructure#main.py), the results were similar to my machine's.
#2: Post edited
- Instead of replacing the brackets, you could do just one loop, and keep a stack with the opening brackets. Every time you find a closing bracket, check if it corresponds to the stack top: if it's not, the string is valid, else, pop the opening bracket.
- Do it until the end of the string, and the stack will be empty if it's a valid one:
- ```python
- def is_balanced(s, brackets={ '(': ')', '[': ']', '{': '}' }):
- if len(s) % 2 != 0:
- return False
- open_brackets = set(brackets.keys())
- close_brackets = set(brackets.values())
- stack = []
- for c in s:
- if c in open_brackets:
- stack.append(c)
- elif c in close_brackets:
- # close bracket without respective opening, string is invalid
- if len(stack) == 0:
- return False
- # close bracket doesn't correspond to last open bracket, string is invalid
- if c != brackets[stack[-1]]:
- return False
- stack.pop()
- else: # invalid char
- return False
- return len(stack) == 0
- ```
- ---
- I admit I initially thought that calling `replace` 3 times per iteration, and always do it N / 2 times, would be slower, but to my surprise I found out that it depends on the string being tested. A quick test with [`timeit` module](https://docs.python.org/3/library/timeit.html) (using your `isValid` function and `is_balanced` as defined above):
- ```python
- from timeit import timeit
- # run 10 times each test
- params = { 'number' : 10, 'globals': globals() }
- s = '{[(())]{()[()]}}' * 1000
- s += '))'
- s += '{[(())]{()[()]}}' * 1000
- print('--------------\nUnbalanced in the middle')
- print(timeit('isValid(s)', **params))
- print(timeit('is_balanced(s)', **params))
- s = '))'
- s += '{[(())]{()[()]}}' * 1000
- s += '{[(())]{()[()]}}' * 1000
- print('--------------\nUnbalanced in the beginning')
- print(timeit('isValid(s)', **params))
- print(timeit('is_balanced(s)', **params))
- s = '{[(())]{()[()]}}' * 2000
- s += '))'
- print('--------------\nUnbalanced in the end')
- print(timeit('isValid(s)', **params))
- print(timeit('is_balanced(s)', **params))
- s = '{[(())]{()[()]}}' * 2000
- print('--------------\nBalanced')
- print(timeit('isValid(s)', **params))
- print(timeit('is_balanced(s)', **params))
- ```
- In my machine, I've got these results:
- ```none
- --------------
- Unbalanced in the middle
- 0.03587988700019196
- 0.02324946300359443
- --------------
- Unbalanced in the beginning
- 0.026359478004451375
- 1.2465003237593919e-05
- --------------
- Unbalanced in the end
- 0.025080425999476574
- 0.041097565001109615
- --------------
- Balanced
- 0.022329219995299354
- 0.04195229199831374
- ```
- For a balanced string, surprisingly - at least to me - the `replace` approach is faster.
- For unbalanced strings, it depends on where the unbalanced brackets are. If they are in the beginning or in the middle, the stack approach wins (with a huge margin if the problem is in the beginning, due to the `else` clause that checks an invalid char as soon as it finds one). But if they are in the end, the `replace` approach wins.
- ---
- I really thought that using `replace` would always be slower, because each call to `replace` needs to scan the whole string, searching for the substring to be replaced (and return a new string). Even if you're looping N / 2 times, you're calling `replace` 3 times per iteration, which I really thought it'd have a huge cost. But that wasn' the case, and this really surprised me.
- The only case where the difference is really huge is the aforementioned case, when the unbalanced bracket is in the beginning. One particular case is curious:
- ```python
- s = '{[((ab))]{()[()]}}' * 2000
- ```
- With this, the `replace` approach took 4.5 seconds, while the stack approach took around 0.00001 seconds. Not sure if there's an implementation detail for the `replace` method, that makes this particular case very slow (tested on Python 3.8.5, Ubuntu 20.04), but that's the difference I was expecting in all cases. Anyway, that's why we should always test everything, instead of trusting on what we believe to be "obvious".
- ---
- BTW, testing in another environments, the results vary. [In IdeOne.com](https://ideone.com/vaKzgX), the stack approach is always faster for unbalanced strings, and almost the same for balanced ones. And [in Repl.it](https://replit.com/@hkotsubo/TautHollowStructure#main.py), the results were similar to my machine's.
- Instead of replacing the brackets, you could do just one loop, and keep a stack with the opening brackets. Every time you find a closing bracket, check if it corresponds to the stack top: if it's not, the string is valid, else, pop the opening bracket.
- Do it until the end of the string, and the stack will be empty if it's a valid one:
- ```python
- def is_balanced(s, brackets={ '(': ')', '[': ']', '{': '}' }):
- if len(s) % 2 != 0:
- return False
- open_brackets = set(brackets.keys())
- close_brackets = set(brackets.values())
- stack = []
- for c in s:
- if c in open_brackets:
- stack.append(c)
- elif c in close_brackets:
- # close bracket without respective opening, string is invalid
- if len(stack) == 0:
- return False
- # close bracket doesn't correspond to last open bracket, string is invalid
- if c != brackets[stack[-1]]:
- return False
- stack.pop()
- else: # invalid char
- return False
- return len(stack) == 0
- ```
- I put the `keys` and `values` inside `set`'s (because searching with the `in` operator is O(1)), but it didn't make much difference. You can remove it if you want.
- ---
- I admit I initially thought that calling `replace` 3 times per iteration, and always do it N / 2 times, would be slower, but to my surprise I found out that it depends on the string being tested. A quick test with [`timeit` module](https://docs.python.org/3/library/timeit.html) (using your `isValid` function and `is_balanced` as defined above):
- ```python
- from timeit import timeit
- # run 10 times each test
- params = { 'number' : 10, 'globals': globals() }
- s = '{[(())]{()[()]}}' * 1000
- s += '))'
- s += '{[(())]{()[()]}}' * 1000
- print('--------------\nUnbalanced in the middle')
- print(timeit('isValid(s)', **params))
- print(timeit('is_balanced(s)', **params))
- s = '))'
- s += '{[(())]{()[()]}}' * 1000
- s += '{[(())]{()[()]}}' * 1000
- print('--------------\nUnbalanced in the beginning')
- print(timeit('isValid(s)', **params))
- print(timeit('is_balanced(s)', **params))
- s = '{[(())]{()[()]}}' * 2000
- s += '))'
- print('--------------\nUnbalanced in the end')
- print(timeit('isValid(s)', **params))
- print(timeit('is_balanced(s)', **params))
- s = '{[(())]{()[()]}}' * 2000
- print('--------------\nBalanced')
- print(timeit('isValid(s)', **params))
- print(timeit('is_balanced(s)', **params))
- ```
- In my machine, I've got these results:
- ```none
- --------------
- Unbalanced in the middle
- 0.03587988700019196
- 0.02324946300359443
- --------------
- Unbalanced in the beginning
- 0.026359478004451375
- 1.2465003237593919e-05
- --------------
- Unbalanced in the end
- 0.025080425999476574
- 0.041097565001109615
- --------------
- Balanced
- 0.022329219995299354
- 0.04195229199831374
- ```
- For a balanced string, surprisingly - at least to me - the `replace` approach is faster.
- For unbalanced strings, it depends on where the unbalanced brackets are. If they are in the beginning or in the middle, the stack approach wins (with a huge margin if the problem is in the beginning, due to the `else` clause that checks an invalid char as soon as it finds one). But if they are in the end, the `replace` approach wins.
- ---
- I really thought that using `replace` would always be slower, because each call to `replace` needs to scan the whole string, searching for the substring to be replaced (and return a new string). Even if you're looping N / 2 times, you're calling `replace` 3 times per iteration, which I really thought it'd have a huge cost. But that wasn' the case, and this really surprised me.
- The only case where the difference is really huge is the aforementioned case, when the unbalanced bracket is in the beginning. One particular case is curious:
- ```python
- s = '{[((ab))]{()[()]}}' * 2000
- ```
- With this, the `replace` approach took 4.5 seconds, while the stack approach took around 0.00001 seconds. Not sure if there's an implementation detail for the `replace` method, that makes this particular case very slow (tested on Python 3.8.5, Ubuntu 20.04), but that's the difference I was expecting in all cases. Anyway, that's why we should always test everything, instead of trusting on what we believe to be "obvious".
- ---
- BTW, testing in another environments, the results vary. [In IdeOne.com](https://ideone.com/vaKzgX), the stack approach is always faster for unbalanced strings, and almost the same for balanced ones. And [in Repl.it](https://replit.com/@hkotsubo/TautHollowStructure#main.py), the results were similar to my machine's.
#1: Initial revision
Instead of replacing the brackets, you could do just one loop, and keep a stack with the opening brackets. Every time you find a closing bracket, check if it corresponds to the stack top: if it's not, the string is valid, else, pop the opening bracket. Do it until the end of the string, and the stack will be empty if it's a valid one: ```python def is_balanced(s, brackets={ '(': ')', '[': ']', '{': '}' }): if len(s) % 2 != 0: return False open_brackets = set(brackets.keys()) close_brackets = set(brackets.values()) stack = [] for c in s: if c in open_brackets: stack.append(c) elif c in close_brackets: # close bracket without respective opening, string is invalid if len(stack) == 0: return False # close bracket doesn't correspond to last open bracket, string is invalid if c != brackets[stack[-1]]: return False stack.pop() else: # invalid char return False return len(stack) == 0 ``` --- I admit I initially thought that calling `replace` 3 times per iteration, and always do it N / 2 times, would be slower, but to my surprise I found out that it depends on the string being tested. A quick test with [`timeit` module](https://docs.python.org/3/library/timeit.html) (using your `isValid` function and `is_balanced` as defined above): ```python from timeit import timeit # run 10 times each test params = { 'number' : 10, 'globals': globals() } s = '{[(())]{()[()]}}' * 1000 s += '))' s += '{[(())]{()[()]}}' * 1000 print('--------------\nUnbalanced in the middle') print(timeit('isValid(s)', **params)) print(timeit('is_balanced(s)', **params)) s = '))' s += '{[(())]{()[()]}}' * 1000 s += '{[(())]{()[()]}}' * 1000 print('--------------\nUnbalanced in the beginning') print(timeit('isValid(s)', **params)) print(timeit('is_balanced(s)', **params)) s = '{[(())]{()[()]}}' * 2000 s += '))' print('--------------\nUnbalanced in the end') print(timeit('isValid(s)', **params)) print(timeit('is_balanced(s)', **params)) s = '{[(())]{()[()]}}' * 2000 print('--------------\nBalanced') print(timeit('isValid(s)', **params)) print(timeit('is_balanced(s)', **params)) ``` In my machine, I've got these results: ```none -------------- Unbalanced in the middle 0.03587988700019196 0.02324946300359443 -------------- Unbalanced in the beginning 0.026359478004451375 1.2465003237593919e-05 -------------- Unbalanced in the end 0.025080425999476574 0.041097565001109615 -------------- Balanced 0.022329219995299354 0.04195229199831374 ``` For a balanced string, surprisingly - at least to me - the `replace` approach is faster. For unbalanced strings, it depends on where the unbalanced brackets are. If they are in the beginning or in the middle, the stack approach wins (with a huge margin if the problem is in the beginning, due to the `else` clause that checks an invalid char as soon as it finds one). But if they are in the end, the `replace` approach wins. --- I really thought that using `replace` would always be slower, because each call to `replace` needs to scan the whole string, searching for the substring to be replaced (and return a new string). Even if you're looping N / 2 times, you're calling `replace` 3 times per iteration, which I really thought it'd have a huge cost. But that wasn' the case, and this really surprised me. The only case where the difference is really huge is the aforementioned case, when the unbalanced bracket is in the beginning. One particular case is curious: ```python s = '{[((ab))]{()[()]}}' * 2000 ``` With this, the `replace` approach took 4.5 seconds, while the stack approach took around 0.00001 seconds. Not sure if there's an implementation detail for the `replace` method, that makes this particular case very slow (tested on Python 3.8.5, Ubuntu 20.04), but that's the difference I was expecting in all cases. Anyway, that's why we should always test everything, instead of trusting on what we believe to be "obvious". --- BTW, testing in another environments, the results vary. [In IdeOne.com](https://ideone.com/vaKzgX), the stack approach is always faster for unbalanced strings, and almost the same for balanced ones. And [in Repl.it](https://replit.com/@hkotsubo/TautHollowStructure#main.py), the results were similar to my machine's.