Notifications
Sign Up Sign In
Q&A

Is it undefined behaviour to just make a pointer point outside boundaries of an array without dereferencing it?

+5
−0

I have heard that it is undefined behaviour to make a pointer point outside boundaries of an array even without dereferencing it. Can that really be true? Consider this code:

int main(void) 
{
    char arr[10];
    char *ptr = &arr[-1];
    char c = *ptr;
}

The line char c = *ptr is obviously bad, because it's accessing out of bounds. But I heard something that even the second line char *ptr = &arr[-1] invokes undefined behaviour? Is this true? What does the standard say?

Why should this post be closed?

5 comments

Your example might be better if you referenced arr[10] instead of arr[-1]. That eliminates issues of signed versus unsigned arithmetic of array subscripts, which it sounds like is not what you are trying to ask about. ‭Olin Lathrop‭ about 1 month ago

@olinlathrop Except that arr[10] is legal to point at, but not arr[11] ;) I mostly posted this question to get this site going. ‭klutt‭ about 1 month ago

@kami Nope, that line is exactly as it should be. ‭klutt‭ about 1 month ago

Actually, char *ptr = arr[-1]; is not valid, there needs to be an & or it's a constraint violation of simple assignment. I didn't think of it when I originally answered the question. ‭Lundin‭ about 1 month ago

@klutt sneaky edit is sneaky :P I really miss a note when the last edit was... ‭Kami‭ about 1 month ago

2 answers

+10
−0

Yes, the second line invokes undefined behavior.

First of all, according to C17 6.5.2.1 regarding array subscripting, an expression E1[E2] is just "syntactic sugar" for *((E1)+(E2))). So what applies here is actually the binary + operator. More info regarding why [] is actually never used with an array operand here.

So your example is equivalent to char *ptr = &*((arr) + (-1));, where arr "decays" into a pointer to the first element. The arr operand ends up as a pointer type and the -1 operand is an integer type.

C17 6.5.6/8 then provides the following text for additive operators:

When an expression that has integer type is added to or subtracted from a pointer, the result has the type of the pointer operand. /--/
If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.

In this case, the result does not point inside arr so "otherwise, the behavior is undefined".

That is, the evaluation of the + operator is what first causes undefined behavior, before the de-referencing.


We can demonstrate this UB with gcc x86 -O3 by first compiling:

int arr[3] = {1,2,3};
printf("%d\n", arr[0]);

which disassembles into

    mov     edi, offset .L.str
    mov     esi, 1

That is, as part of some calling convention ESI gets directly loaded with the value 1 , equivalent to what would have been stored in arr[0] if the array got allocated. If I change this to printf("%d\n", arr[-1]);, the instruction for setting up ESI is simply removed from the disassembly and I suppose the program prints whatever garbage value that happened to be stored inside ESI. The compiler doesn't even attempt to de-reference the variable by fetching a value from the stack corresponding to memory address arr - 1.

1 comment

You were assuming right. I have edited the question, so you might want to remove that part. ‭klutt‭ about 1 month ago

+1
−0

When a compiler encounters the statements

char arr[10];
char *ptr = &arr[-1]

there are three things that it could reasonably do:

  1. It can raise an error.

  2. It can compile the statements and raise a warning.

  3. It can compile the statemnts silently.

I think that, in cases 2 and 3, everyone would agree that the value placed in ptr should be such that

ptr + i == &arr[i - 1]

whenever i - 1 is a valid index of arr.

I assume that when the language specification says that the compiler's behaviour is undefined it means that the compiler designer is free to choose from these three options.

Although ptr would hold an invalid pointer value there are plausible situations where this would be useful. One example is simulating an array with a non-zero lower bound:

&ptr[1] == &arr[0]

I can think of two reasons for the compiler to generate a warning or error. The first is to draw the programmer's attention to a simple mistake in the "typing error" category. Perhaps he (or she) meant arr[1] or arr[N-1].

Here I think it is worth comparing &arr[-1] with the equivalent arr - 1. Although these are technically equivalent they are conceptually distinct. The former applies an invalid index to an array, then takes the address of the (non-existent) element. The latter is just a normal pointer arithmetic expression.

The fact that clang gives a warning for the former but passes the latter silently indicates that the clang designers recognised this distinction.

The second, more serious, reason for rejecting this code is that it may result in genuinely undefined behaviour when the program is run. In a general-purpose computer with a large address space this is unlikely, but in a microcontroller system the address of arr may be close to zero. (It can never actually be zero as this has a special meaning.) In that situation subtracting from a pointer could cause an arithmetic overflow, even if the pointer is never dereferenced.

To sum up, although the compiler may accept it and the resultant program do what you expect, it is better to avoid assigning illegal values to pointers. Even if it appears that you could make your code more efficient the compiler's optimiser will probably do a better job, and it will take account of the vagaries of the target hardware.

9 comments

You can't demonstrate that something isn't undefined behavior by running the code, all you prove with that is that you got lucky. You can however in some cases demonstrate that something is UB by disassembling the code and watch where it went wrong. ‭Lundin‭ about 1 month ago

If the C standard doesn't convince you, then think of common real-world scenarios: many architectures have memory protection traps from reading data from executable memory or executing code from data memory. Suppose your array is located at the very border of data memory on a certain machine and by going -1 you go outside that area. The machine will likely generate a hardware exception, possibly by just examining the index register and finding an invalid address there. ‭Lundin‭ about 1 month ago

Are you saying that my code would not compile because it's not in a function? I just did not include that code, but the question is changed now. ‭klutt‭ about 1 month ago

@klutt No. I am saying that it will not compile because you ar trying to assign a value of type char to a variable of type pointer. However, even if the types were made to match (by making arr an array of pointers to char, rather than an array of char) your code sample would not illustrate your question because "= arr[-1]" is dereferencing a (non-existent) array element. ‭chris-barry‭ about 1 month ago

Show 4 more comments

Sign up to answer this question »