Welcome to Software Development on Codidact!
Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.
Simultaneous comparison in Python
I want to make multiple comparisons at once, of the same type, in a Python program. For example, to check whether all of a certain group of strings are in
a longer test string; or whether a specific variable is equal to any of some test values; etc.
I discovered that these naive approaches won't work:
if my_name and your_name in email:
print("The email is about both of us")
if cheese == "cheddar" or "edam" or "havarti":
print("Yum!")
The other Q&A explains why not. The question now is: what should I do instead, in Python? How can I write code that does these sorts of comparisons - and more generally, how can I figure out how to write the code?
And what if I want multiple possibilities on both sides of the comparison, or have the possibilities stored in a list (or other sequence)? Are there any special cases or other tricks?
1 answer
Using and
and or
correctly
The left-hand side and right-hand side of these operators should both be valid operands, which should each come from a comparison. Thus, we should repeat the comparison on each side:
if my_name in email and your_name in email:
print("The email is about both of us")
if cheese == "cheddar" or cheese == "edam" or cheese == "havarti":
print("Yum!")
The general approach: all
and any
If you have many values to test against, it could get annoying to repeat the comparison each time. Also, sometimes you don't know in advance how many values there are to compare against - for example, you might want to create a list of those values somewhere else in the program, and then compare to everything in the list (without knowing its length).
Python provides built-in functions called all
and any
which help with this task.
all
is analogous to and
(all([a, b, c])
works like a and b and c
):
Help on built-in function all in module builtins:
all(iterable, /)
Return True if bool(x) is True for all values x in the iterable.
If the iterable is empty, return True.
any
is analogous to or
(any([a, b, c])
works like a or b or c
):
Help on built-in function any in module builtins:
any(iterable, /)
Return True if bool(x) is True for any x in the iterable.
If the iterable is empty, return False.
The described results for empty inputs might seem counterintuitive, but they're mathematically valid. If unicorns don't exist, then we can make any generalization we like about "all unicorns", because there won't be anything to contradict us; but we can't say "there is some unicorn that..." because we are already defeated before we consider the restriction.
Naively, this only lets us avoid repeating the operator:
if all([my_name in email, your_name in email]):
print("The email is about both of us")
if any([cheese == "cheddar", cheese == "edam", cheese == "havarti"]):
print("Yum!")
This probably doesn't seem like any benefit at all, since the operators had to be replaced with commas anyway.
But the real power comes when we use a generator expression to create the input sequences:
if all(name in email for name in (my_name, your_name)):
print("The email is about both of us")
if any(cheese == kind for kind in ("cheddar", "edam", "havarti")):
print("Yum!")
Python gives us this expressive way to describe, abstractly, the group of values that need to be checked with all
or any
(i.e., as if they had and
s or or
s, respectively, written in between them).
Meanwhile, the inner logic of all
and any
will automatically return as soon as the answer is known - it can "short circuit" the same way that hard-coded and
and or
operators do. (TODO: make sure the Q&A about and
and or
discusses this!) Of course, if we pass lists (whether we create them "manually" or with a list comprehension), then the whole list has to be created first anyway, which defeats the purpose. But with generator expressions, we can preserve the short-circuiting. Python will only evaluate the generator as far as is needed. (TODO: links for more Q&A about these concepts)
de Morgan's laws with all
and any
As noted, all
is analogous to and
, and any
is analogous to or
. This entails that a form of de Morgan's laws apply to them. We can:
- negate each input element;
- swap
all
forand
or vice-versa; - and then negate the result
to get an equivalent expression.
For example, if we want to know whether any
of our balls
is not red()
, this is the same as finding out whether not all
of them are red()
. And with the generator-expression trick, there's only one place to write the negation for the inputs:
# one way
any(not red(ball) for ball in balls)
# equivalent!
not all(red(ball) for ball in balls)
Notice how naturally it reads.
Using all
and any
"two-dimensionally"
A generator expression can also use multiple for
clauses to iterate over all the pairs from the left-hand side and right-hand side of the comparison:
botanical_fruits = ['raspberry', 'strawberry', 'tomato']
culinary_vegetables = ['radish', 'spinach', 'tomato']
# is any fruit a vegetable (i.e., equal to some vegetable)?
any(f == v for f in botanical_fruits for v in culinary_vegetables)
However, it generally doesn't make sense to do this for comparison.
In the above example, what we're really trying to figure out is whether the two lists have any overlap - i.e., whether there's some value that's in both.
A simpler and more efficient approach is to use set
s instead, and see about their intersection:
botanical_fruits = {'raspberry', 'strawberry', 'tomato'}
culinary_vegetables = {'radish', 'spinach', 'tomato'}
# Empty sets are "falsey"; all others are "truthy".
bool(botanical_fruits.intersection(culinary_vegetables))
# (Or we could e.g. check the `len()` of the result.)
This can't short-circuit, but it's worst-case O(N) instead of O(N^2) (Python's hash-based sets can be built and intersected in linear time). And of course empty sets are automatically handled correctly.
On the other hand, checking whether every value is equal to every other value is... for normal types, just checking whether all the values are the same. So you could just reorganize them to put a single element on one side of the comparison, and use all
"normally". (Be careful of the case where there are no values on either side! Then you won't have a comparison value to use.) Or you could put them all in a set
and check if the result has at most a single value (as an exercise, convince yourself that "at most" is correct).
Other special cases
Depending on the operation and on the types of the individual values being compared, there may be a more efficient, clever or simple way to do it. These could involve replacing comparisons with in
, using set
operations, using regular expressions, and more.
(TODO: start a table of common tricks - perhaps with links to other Q&A as they're asked).
0 comment threads