Welcome to Software Development on Codidact!
Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.
Comments on Is it undefined behaviour to just make a pointer point outside boundaries of an array without dereferencing it?
Parent
Is it undefined behaviour to just make a pointer point outside boundaries of an array without dereferencing it?
I have heard that it is undefined behaviour to make a pointer point outside boundaries of an array even without dereferencing it. Can that really be true? Consider this code:
int main(void)
{
char arr[10];
char *ptr = &arr[-1];
char c = *ptr;
}
The line char c = *ptr
is obviously bad, because it's accessing out of bounds. But I heard something that even the second line char *ptr = &arr[-1]
invokes undefined behaviour? Is this true? What does the standard say?
Post
Yes, the second line invokes undefined behavior.
First of all, according to C17 6.5.2.1 regarding array subscripting, an expression E1[E2]
is just "syntactic sugar" for *((E1)+(E2)))
. So what applies here is actually the binary + operator. More info regarding why []
is actually never used with an array operand here.
So your example is equivalent to char *ptr = &*((arr) + (-1));
, where arr
"decays" into a pointer to the first element. The arr
operand ends up as a pointer type and the -1
operand is an integer type.
C17 6.5.6/8 then provides the following text for additive operators:
When an expression that has integer type is added to or subtracted from a pointer, the result has the type of the pointer operand. /--/
If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.
In this case, the result does not point inside arr
so "otherwise, the behavior is undefined".
That is, the evaluation of the + operator is what first causes undefined behavior, before the de-referencing.
We can demonstrate this UB with gcc x86 -O3 by first compiling:
int arr[3] = {1,2,3};
printf("%d\n", arr[0]);
which disassembles into
mov edi, offset .L.str
mov esi, 1
That is, as part of some calling convention ESI gets directly loaded with the value 1 , equivalent to what would have been stored in arr[0] if the array got allocated. If I change this to printf("%d\n", arr[-1]);
, the instruction for setting up ESI is simply removed from the disassembly and I suppose the program prints whatever garbage value that happened to be stored inside ESI. The compiler doesn't even attempt to de-reference the variable by fetching a value from the stack corresponding to memory address arr - 1
.
1 comment thread