Welcome to Software Development on Codidact!
Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.
Behavior of Pointer Arithmetic on the Stack
Consider the following code:
#include <stdio.h>
int main() {
int a = 5;
int b;
++*(&b + 1);
printf("%d\n", a);
return 0;
}
The output is as expected:
6
By creating and incrementing a pointer to b
, I'm able to access a
, since b
is below a
on the stack. Is this behavior guaranteed by the C language, or is this undefined/unspecified behavior? If UB, what does the standard have to say that disallows this? For example, does C guarantee that the stack grows downwards, or that arithmetic with pointers into the stack is valid?
4 answers
Generally speaking, pointer arithmetic is undefined behavior unless carried out on arrays. This is how the additive operators behave, C17 6.5.6:
For the purposes of these operators, a pointer to an object that is not an element of an array behaves the same as a pointer to the first element of an array of length one with the type of the object as its element type.
/--/
If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined. If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.
So your code has undefined behavior - during pointer arithmetic b
is to be regarded as an array int[1]
and you access it out of bounds.
Furthermore, the C standard doesn't know or mention anything about stacks and stack frames - there's no guarantees from the standard of the underlying memory layout at all. For example a compiler for a 64 bit CPU with 32 bit int
might decide to insert padding between the two integers and that's perfectly fine as far as C goes - there's no guarantee about adjacent allocation unless you use structs or arrays.
Also, while down-counting stacks are most common, some CPUs like for example Microchip PIC have up-counting stacks. Other CPUs don't even have stacks! For example I've worked with writing C for a very low end MCU (Freescale RS08) which didn't have a stack. I was painful but perfectly possible. There are similar stackless, extremely low-end 4 bit MCUs used in some consumer electronics.
0 comment threads
The following users marked this post as Works for me:
User | Comment | Date |
---|---|---|
Josh Hyatt | (no comment) | Jan 13, 2022 at 20:40 |
This is absolutely undefined behavior.
The C standard doesn't say anything about stacks or how they should behave or how local variables should be allocated on them. The word "stack" doesn't even occur in the C standard[1].
C does say that (for arrays) that it is only valid to index one past the end of them, e.g. given int x[2]
, x+2
is a valid pointer but it is undefined behavior to dereference that pointer. If we view &b
as a array of length 1, then you would be invoking undefined behavior when you dereference &b+1
.
Ultimately, there is absolutely nothing that states how the locations of two separate variables are related to each other in memory. This isn't surprising as a variable doesn't need to even be allocated in memory. It would be a completely valid and not uncommon optimization for the compiler to register allocate the variable a
or even just completely eliminate it by constant propagation/folding. In that case, your code would most likely be mutating something like a return address or a stack frame pointer which will almost certainly lead to a crash and/or erratic behavior.
-
Feel free to do a string search on this working draft version of the 2018 C standard. ↩︎
0 comment threads
I'm able to access a, since b is below a on the stack.
No, it's not!
You have no guarantee in what order the compiler allocates temporary variables on the stack, and even whether it does so at all. You don't even have a guarantee which way (towards high or low addresses) the stack grows. Different compilers on the same machine might do it differently. I've seen this on a PIC 18, for example.
On machines with a lot of registers, both variables might be kept solely in registers when there is little other demand for those registers.
The worst scenario is that whatever is one address past "b" isn't a general memory location, and reading it has side-effects. That's unlikely in this simplified example, but nothing in the standard rules it out. For example, if "b" happened to be allocated to the last register, and that register was mapped into data memory, then something completely unexpected could be at the next address.
Consider an architecture like a Microchip dsPIC. This machine has 16 16-bit registers that are also mapped to addresses 00h to 1Fh in data memory. If the compiler happened to allocate "b" to W14, then the next word is W15, which is the stack pointer. Reading it wouldn't have any side effects in this case, but writing it surely would. On a different architecture, you might end up reading the UART input data register, thereby clearing the last received data. That's unlikely, but you don't know it's not the case without specific knowledge of the machine and the compiler.
Not only does the C language not guarantee it, it also will fail on actual compilers, as soon as you enable optimisation (which you'll generally want to do because you want your code run fast, after all).
1 comment thread