Welcome to Software Development on Codidact!
Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.
Why is the new auto keyword from C++11 or C23 dangerous?
In older C and C++ standards, the auto
keyword simply meant automatic storage duration. As in the compiler automatically handles where the variable is stored, typically on the stack or in a register. And it was a pretty useless keyword since it can only be used at local scope, where all variables default to automatic storage duration anyway.
The C++11 committee decided to change the meaning of this keyword so that during declaration, the type is picked based on the initializer(s) provided. For example auto i=0;
will result in int
because the integer constant 0
is of type int
.
As I understand it, the main rationale was to get rid of cumbersome declarations in for
loops in particular.
for(auto i = cont.begin(); ...
is admittedly easier for the eye than
for(std::vector<std::string>::iterator i = cont.begin(); ...
However, veteran programmers seem to raise concerns about auto
being unsafe. It seems to be a topic where there's plenty of personal opinions as seen over at SO: How much is too much with C++11 auto keyword? Some people just happily encourage "go for it everywhere". Others, including various well-known C++ gurus, speak in favour of using it with caution.
Now C too is adapting the same functionality of auto
as C++,
as per C23.
What exactly is dangerous with the auto
keyword?
2 answers
A pitfall in C++ that I didn't see mentioned in the other answer is that it might give unexpected results with libraries using expression templates.
What are expression templates?
In a nutshell, expression templates are a technique that allows to write efficient numeric code with intuitive notation.
Consider for example a matrix library with straightforward implementation using operator overloading. Then when you write e.g.
Matrix A = B + C + D
where B
, C
and D
are also of type Matrix
, what will happen is that B + C
will generate a temporary matrix, which then is passed to the second operator+
as first argument, where the second argument is C; this will then generate yet another temporary matrix that is used to initialise A
. Now with move semantics, one may actually get rid of the temporary storage (I've not checked if that is actually possible), but the fact remains that the order of accesses will be very cache-unfriendly.
Now one way to solve this is to have instead a function that directly implements the optimal access sequence (and also ensures no temporary storage even without optimisations):
add_three_matrices(A, B, C, D);
however that doesn't give the nice intuitive syntax. Now what expression templates do if that the expression 'A + B + C' does not actually calculate the sum, but creates an object built from templates that represents the expression, and initialising the Matrix A then triggers the actual, optimised code. That is, you can now write
Matrix A = B + C + D
and still get the optimised code.
How does auto
affect this?
One might think that
auto A = B + C + D
gives equivalent code to the one above, but that is not the case. Instead auto
is determined to be the expression template type describing the operation. This is particularly bad if the expression contains some actual temporary that will have been destroyed at the end of the statement; say you are scaling D
with a double returned from a function:
auto A = B + C + D*f(x)
The return value of f
is bound to a reference inside the expression, but since that reference is not A, but some reference inside the expression, it won't extend the life time of the temporary. So if A
is ever used later (in a way that actually triggers the calculation), it will access a dangling reference.
0 comment threads
The auto
feature was indeed mainly meant to solve long cumbersome template container declarations in C++. But when introduced to C23 - where there are no templates let alone template containers - it just ends up as a solution without any problem that it solves.
auto
can create new problems just fine, however! And that goes for C and C++ both, although this answer will mainly focus on C where the feature is just about to get introduced. In C++ you can use auto
as long as you know what you are doing and it is done with caution.
The only problem that the language committee(s) seem to have consider was backwards compatibility with the previous use of auto
. C++20 (annex C) about compatibility for examples notes that using auto
as a classic storage class specifier when no initializers are present is problematic. But I think that scenario is the least concerning use of auto
though. The main problem lies in how it behaves as a new feature.
The problem with the new use of the auto
keyword is that the actual type of the initializer is not often obvious. In many cases you won't even know which type you actually ended up with, which is often something very important to know. A lot of these problems are caused by well-known design mistakes and old language bugs in C, where adding auto
to the pot makes things even worse.
In general, when we write an initializer which is wrong for whatever the reason, we like to be informed by the compiler that we messed up, rather than getting the code silently expected. This is the very reason why horribly dangerous language features like "implicit int" were removed from C ages ago.
Old, well-known language problems in C colliding with new language problems in C23
auto
is particularly problematic in C23 because C has not come as far as C++ in correcting old sins of the past. For example auto ch = 'A'
will give you a char
in C++ but an int
in C.
Or when dealing with boolean logic, something like auto a = b && c;
will give you a bool
in C++ but an int
in C. Even if b
and c
happens to be bool
operands.
Similarly, auto ptr = NULL
may give you an int
rather than a void*
in both languages. Both languages supposedly encourage the use of nullptr
instead, but there's a whole lot of old code out there using NULL
.
Re-writing the old malloc(n * sizeof(*ptr))
trick will also suffer as it can't be written as auto ptr = malloc(n * sizeof(*ptr));
Having some typedef enum { A } a;
and then auto x = A;
will result in an int
and not an a
. Where a
may be a smaller integer type than int
.
Except when you use the new enum
feature in C23 and do typedef enum : int8_t { A } a;
. Now auto x = A;
suddenly results in an a
type.
Const/qualifier correctness
Another sin of the past would be that auto ptr = "hello"
leads to a char*
in C and not a const char*
as in C++.
Well we can fix that easily enough, we just write const auto ptr
or auto const ptr
right? Not quite... Just as in the case of hiding a pointer behind a typedef, we end up with a char* const
and not a const char*
as was the intention.
So it simply turns out that you can't meaningfully combine auto
and const
in C. Meaning you can't have auto
and const correctness at the same time.
Subtle type rules
auto
is particularly nasty when used in low-level programming, together with certain operators, resulting in another type and/or signedness than expected.
Consider something like this:
unsigned int i = 1+1;
i = ~i;
printf("%#x\n", i); // prints 0xfffffffd
i += 3;
printf("%#x\n", i); // prints 0
That's well-defined code. Now how about auto
...
auto i = 1+1;
i = ~i;
printf("%#x\n", i); // undefined behavior, wrong conversion specifier
i += 3; // undefined behavior, integer overflow
printf("%#x\n", i);
Oops. Well how about this?
auto i = 0xFFFFFFFF;
i = ~i;
printf("%#x\n", i); // well-defined, prints 0
i -= 3; // well-defined
printf("%#x\n", i); // well-defined, prints 0xfffffffd
A slip of the type used by the initializer can obviously have major consequences and tracking down the root cause of that bug may not be easy.
auto f = true ? 1.0f : 0.0;
would be another subtle type promotion rule of C. Here f
ends up as double
, which might not have been expected.
Something like auto c = a | b;
where a
and b
are bool
, char
or unsigned short
etc will result in c
becoming an int
in both C and C++ due to integer promotion.
In case of short a = 1; auto b = -a;
we might have expected b
to also become short
and not int
.
And so on.
Wrong initializer by mistake
When dealing with more complex declarations like 2D arrays and pointers to them, a simple slip of the finger can silently result in the wrong type.
int arr [2][2];
auto p1 = arr;
auto p2 = *arr;
auto p3 = &arr;
Here p1
is int(*)[2]
(array decayed), p2
is int*
(array decayed) and p3
is int (*)[2][2]
(array did not decay). A simple miss of *
or &
will lead to a very different type.
Now had we typed out this explicitly like int (*p1)[2] = &arr
, then I will get a compiler message informing me that I typed &
when I shouldn't have. In case of auto
anything goes and the program might compile cleanly, but with a different result.
Also throw type qualifiers into the declaration on top of that and we are guaranteed to have a complete mess if we use auto
.
Known problems in C23
The C23 standard notes under the 6.7.10 Type inference chapter that using auto
together with anonymous struct/union declarations would cause implementation-defined behavior as the declared variable and its members may end up in the tag namespace, rather than the ordinary namespace as may have been expected.
The (lack of) rationale why auto
was added to C23
auto
was added as per proposal N3007. The main reason appears to be making C in sync with C++. However, in C++ auto
is somewhat handy and actually solves a few problems, as previously mentioned. Whereas the "rationale", if there ever was one, in N3007 boils down to subjective statements like.
However when the definition includes an initializer, it makes sense to derive this type directly from the type of the expression used to initialize the variable.
As we can see from the numerous examples I made above, deriving the type from the initializer does not obviously make sense. At all.
Or worse:
...obvious convenience for programmers who are perhaps too lazy to lookup the type
Oh come on! If they are too lazy for proper engineering they should maybe consider a different career. Maybe their boss ought to help them out with a swift career change even!
Or just maybe they should start using a programming IDE that does this for them, by a single keystroke or a few mouse clicks. Such IDEs become popular in the 1990s, it's hardly a new tool for the average programmer out there.
Recommended usage
In C++, it is recommended to use auto
to make long object type declarations readable, where you don't really care about the exact type. Particularly when reaching for an iterator
or a returned type from a member function in some verbose template class.
In C, it is not recommended to use auto
at all, because it only serves to create problems. It is a poorly researched and poorly implemented feature.
If anyone can actually give a non-subjective example of when it makes sense to use auto
to clearly improve everyday C code, I will certainly reconsider.
1 comment thread