Welcome to Software Development on Codidact!
Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.
Understanding mutable default arguments in Python
Consider this code example:
def example(param=[]):
param.append('value')
print(param)
When example
is repeatedly called with an existing list, it repeatedly appends to the list, as one might expect:
example
is repeatedly called with an existing list, it repeatedly appends to the list, as one might expect:>>> my_list = []
>>> example(my_list)
['value']
>>> example(my_list)
['value', 'value']
>>> example(my_list)
['value', 'value', 'value']
>>> my_list
['value', 'value', 'value']
If it's called with an empty list each time, it seems that each empty list is considered separately, which also makes sense:
>>> example([])
['value']
>>> example([])
['value']
>>> example([])
['value']
However, if called without passing an argument - using the default - it seems to "accumulate" the appended values as if the same list were being reused:
>>> example()
['value']
>>> example()
['value', 'value']
>>> example()
['value', 'value', 'value']
Some IDEs will warn that param
is a "mutable default argument" and suggest changes.
Exactly what does "mutable default argument" mean, and what are the consequences of defining param
this way? How is this functionality implemented, and what is the design decision behind it? Is it ever useful? How can I avoid bugs caused this way - in particular, how can I make the function work with a new list each time?
4 answers
Terminology
"Mutable default argument" means exactly what the individual words would suggest: it's an argument which is supplied as the default value for that parameter, which also is mutable. To mutate an object means to change its state - for a list, that includes adding, removing, replacing or mutating elements of the list. A mutable object is one that provides any way to mutate it - so, lists are mutable, because there is an interface to add and remove elements, etc.
On the other hand, a tuple is usually not thought of as "mutable" - even one that contains a list object. If such a tuple-containing-a-list were used as a default argument, it could cause the same problem. For our purposes, that also counts as "mutable".
However, it's important to understand that the problem does not occur simply because the object is mutable; it occurs because the function's code actually mutates that object.
Implementation and Rationale
Exactly as the results would suggest, the same list object is used every time that example
is called without an explicit argument. What happens is that when the function is created, an empty list is created - just that one time - and set up as the default value to use whenever an argument isn't passed explicitly.
So, the first time that the code calls example()
, the previously-made empty list gets an element appended. It's still the same object, but its contents have changed. So the next time that the code calls example()
, that list - which still contains the element from last time - is used again, and another element is appended.
Essentially, []
is code that runs ahead of time in order to create a default argument object.
We can verify this by using another function to create the default argument:
>>> def setup():
... print('calling setup')
... return []
...
>>> def example(param=setup()):
... param.append('test')
... print('The list is now:', param)
...
calling setup
The setup
function runs while creating the example
function. A key observation here is that Python's def
statement is a code statement, not just a definition used by a compiler: the code runs when it is encountered, and running that code involves calling the setup
function. After this, repeatedly calling example()
will build up the list, and won't call the setup
function again:
>>> example()
The list is now: ['test']
>>> example()
The list is now: ['test', 'test']
>>> example()
The list is now: ['test', 'test', 'test']
This actually more or less explains why it works that way: since Python's functions are first-class objects, created "on the fly" by the def
statement, it makes sense that the code for the default values is evaluated right away. There isn't anything in the syntax that would suggest that the code execution should be deferred.
One practical consequence of this behaviour is that a default value can be determined by a config file, and the file won't need to be read and parsed again very time the function is called.
0 comment threads
The following users marked this post as Works for me:
User | Comment | Date |
---|---|---|
Karl Knechtel | (no comment) | Sep 20, 2023 at 06:13 |
Workarounds
Avoiding mutation
Because problems are only caused by actually mutating the default argument, the simplest way to avoid problems is to... not do that. Pythonic code obeys command-query separation; a function should take its effect either by mutating one or more parameters (a "command") or by returning a useful value (a "query") - not both. To implement a "command" in this paradigm, there should always be an explicitly passed argument to mutate. Python doesn't support "out parameters" - everything passed to the function must actually exist before the function is called. If the default value were used, the calling code wouldn't be able to access it directly unless it were return
ed, which would violate the command-query separation.
Therefore, if the code needs to work by modifying one of the provided arguments, the corresponding parameter should not have a default value at all.
On the other hand, sometimes carelessly written code simply modifies the provided arguments for convenience. Avoid this; it leads to more subtle errors even without using mutable default arguments. Keep in mind that the passed-in objects might be used in other places, which might not expect the modification made by your function. If you need to determine the result of, say, adding an element to a provided list, make a new list:
This way can modify the caller's x
list unnecessarily:
def join_lists(x=[], y=[]):
x.extend(y)
return x
This way preserves the inputs:
def join_lists(x=(), y=()):
result = list(x)
x.extend(y)
return result
Specifying immutable defaults
The above example also shows a useful safeguard: the default values for x
and y
are changed to empty tuples, which are immutable. This serves two purposes:
-
Anyone who reads the code can deduce that the function is not intended to mutate the provided
x
andy
values, and that othertuple
values will be acceptable. (Of course, this can also be hinted using type annotations.) -
If the code is mistakenly written to try something like
x.append(y)
, a call that uses the default arguments will raise an exception rather than silently producing the wrong result. This makes it easier to debug the problem.
Sentinels as default arguments
The established practice in the Python community is to use the special value None
for default arguments.
For example, this function can either make a tuple from two provided values, or take a single value and make a tuple that uses the same value twice:
def pair(x, y=None):
if y is None:
y = x
return (x, y)
The special value None
has its own type, and there is special logic to make sure that there can only be that one instance of the type. Therefore, by convention, the is
operator is used to check for None
, to guard against types implementing __eq__
to make themselves "equal to" None
.
There are some problems with this approach, however. For example, if None
could be a legitimate value that is passed explicitly, then some other scheme will need to be used to create a sentinel value. There isn't a clear one-size-fits-all solution for this yet. The idea has been discussed quite a bit and there is a draft PEP (661), but no clear resolution so far.
More importantly, though, the idiom is arguably overused. If something like ()
or ''
makes sense as a default value, and provides the necessary functionality required for the algorithm, it would be better to use that rather than treating the default argument as a special case. As the Zen of Python says, "Simple is better than complex", and "special cases aren't special enough to break the rules".
Possible future enhancement: PEP 671
PEP 671 describes a new syntax for default arguments. Parameters that use a =>
operator ("arrow" symbol instead of an equals sign) for a default argument will instead treat the code there as a way to calculate a default value, instead of creating one ahead of time.
In the future (hopefully), this code will work, and repeated calls to example()
will show a single-element list each time. However, this syntax was not implemented for 3.12 as hoped by the PEP author, and there is currently no indication that implementation is planned at all.
def example(param=>[]):
param.append('test')
print('The list is now', param)
This also allows default argument values to be defined in terms of other arguments that were provided:
In the future (hopefully), this code will work, and pair(3)
will result in (3, 3)
.
# Currently (without PEP 671), implementing `pair` requires
# a sentinel value, as described in the previous section.
def pair(x, y=>x):
return (x, y)
0 comment threads
Possible justifications
It may make sense to use a mutable default argument in the following situations:
For simplicity
Consider for example an argument that should be some kind of mapping, where the function will only use it for lookup without actually mutating the provided object:
_default_config = {'value': 1}
def display_value(config=_default_config):
print(config['value'])
It's unwieldy to describe an "immutable dictionary" in Python, and the calling code is unlikely to take those extra steps anyway; so the implementation might as well use an ordinary dict
.
To create a unique sentinel
Credit to Martijn Pieters on Stack Overflow for this interesting bit of trivia.
For example, the standard library copy.deepcopy
algorithm needs a truly unique sentinel object for certain parts of its logic that cannot appear anywhere else in the program. So it can't use None
(that could be a valid value) or obvious sorts of immutable "empty" objects like ()
or ''
or 0
(since it could end up with an already-existing object with that value). By using []
(and then never actually mutating the object), it can be sure that other code will never have access to the same object (unless it deliberately "breaks the seal" by reaching into the function's internals).
As a cache
There are usually better ways to accomplish this.
In particular, the standard library provides functools.lru_cache
(and in 3.9 and up, functools.cache
) for memoization.
However, for a quick-and-dirty approach, the "accumulating" effect of a mutable default argument can be used deliberately to keep track of results - for example, from a recursive helper function defined on the fly (it's hard to give a good example of this), or to implement the "registry" of a decorator used for registering functions.
Example
registry = {}
def invoke(name, registry=registry):
return registry[name]()
def register(func, registry=registry):
registry[func.__name__] = func
return func
Functions "registered" with the decorator in the normal way will use the global registry:
@register
def test():
print('test function')
invoke('test')
But it can also be used explicitly to register a function into a different registry:
my_registry = {}
def example():
print('example function')
register(example, my_registry)
invoke('example', my_registry)
Optimization
The benefits here are marginal at best, and a local assignment is almost as good.
However, for performance-critical code, using a default argument that is "never supposed to be supplied explicitly" can be used to avoid repeatedly looking up a global name.
Example
import math
# The naive approach:
def global_trigonometry():
return [math.sin(i) for i in range(1000000)]
# Optimized:
def default_trigonometry(sin=math.sin):
return [sin(i) for i in range(1000000)]
On my machine, the optimization reduces the runtime by about 24% under Python 2.7 (which I keep around just for testing these sorts of legacy behaviours), 26% under Python 3.8, and 6% under Python 3.11.
Of course, the difference is considerably smaller if sin
is dumped directly into the global namespace. Simply making the same assignment inside the function also avoids the need for repeated lookup when the function is called (although it still needs to be looked up once per call).
To "bind" arguments or "partially apply" a function
This is a common and well-recognized idiom, but there are generally better ways.
Arguably, this technique uses one confusing "gotcha" to work around another, which some may find very inelegant. The standard library provides functools.partial
which should normally be used instead.
Often, mutable default arguments are used to work around the default late binding of values from an outer scope. Usually, this technique does not actually use mutable default arguments, but it does take advantage of the reason why mutable default arguments work the way that they do. That is to say, default arguments are early-binding, so they are used as a way to avoid the usual late-binding result.
Example
This comes up when trying to use a loop to create callback functions, for example to define button behaviours in a GUI (using Tkinter or something similar).
People often naively expect each Button
created this way to print
the value that i
had when the Button
was created, but they don't:
def make_buttons(window):
for i in range(9):
window.add(Button(command=lambda: print(i)))
The problem is that i
is not looked up until the button is actually clicked (and thus the callback provided as a command
gets called). However, since default arguments are early-binding, the problem can be avoided by using i
to set a default argument value.
This is a popular hack:
def make_buttons(window):
for i in range(9):
window.add(Button(command=lambda i=i: print(i)))
By adding i=i
, the callback function gets a default value for its i
parameter (which is a separate name from the local i
defined in make_buttons
). This default is never overridden; when the button is clicked, it uses the value of i
that was determined ahead of time.
The standard library functools.partial
solves the problem more elegantly:
from functools import partial
def make_buttons(window):
for i in range(9):
window.add(Button(command=partial(print, i)))
This way is explicit about the binding, and doesn't exploit a tricky feature of the language. It also doesn't needlessly expose a default argument that could in principle be overridden (but isn't supposed to be).
0 comment threads
Everything you put on the line with def
is global (global to the file, not as in the global
keyword), so the (initially empty) list you create with param=[]
persists and gets reused between calls to example()
. You probably want to create a local list for each invocation instead. For that, you have to find a way to put param = []
inside the method.
The normal way is:
def example(param: list | None = None):
if param is None:
param = []
Python is powerful and fun, so you can come up with more pithy ways of doing it. However, you will then have to add comments for other people who don't expect it, and IDEs which don't understand them, so they will not be so pithy in the end. You're best off just using the normal way above.
I think other details are best addressed in separate questions, such as:
- Mutable vs. immutable in Python
- Implementation details of the interpreter
- The history of Python design decisions
- Why not just
def (param=None):
- Why not just
if param:
Please comment on this question if you do ask these separately, and want me to answer them.
2 comment threads