Communities

Writing
Writing
Codidact Meta
Codidact Meta
The Great Outdoors
The Great Outdoors
Photography & Video
Photography & Video
Scientific Speculation
Scientific Speculation
Cooking
Cooking
Electrical Engineering
Electrical Engineering
Judaism
Judaism
Languages & Linguistics
Languages & Linguistics
Software Development
Software Development
Mathematics
Mathematics
Christianity
Christianity
Code Golf
Code Golf
Music
Music
Physics
Physics
Linux Systems
Linux Systems
Power Users
Power Users
Tabletop RPGs
Tabletop RPGs
Community Proposals
Community Proposals
tag:snake search within a tag
answers:0 unanswered questions
user:xxxx search by author id
score:0.5 posts with 0.5+ score
"snake oil" exact phrase
votes:4 posts with 4+ votes
created:<1w created < 1 week ago
post_type:xxxx type of post
Search help
Notifications
Mark all as read See all your notifications »
Q&A

Welcome to Software Development on Codidact!

Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.

Comments on Is it undefined behaviour to just make a pointer point outside boundaries of an array without dereferencing it?

Parent

Is it undefined behaviour to just make a pointer point outside boundaries of an array without dereferencing it?

+6
−0

I have heard that it is undefined behaviour to make a pointer point outside boundaries of an array even without dereferencing it. Can that really be true? Consider this code:

int main(void) 
{
    char arr[10];
    char *ptr = &arr[-1];
    char c = *ptr;
}

The line char c = *ptr is obviously bad, because it's accessing out of bounds. But I heard something that even the second line char *ptr = &arr[-1] invokes undefined behaviour? Is this true? What does the standard say?

History
Why does this post require attention from curators or moderators?
You might want to add some details to your flag.
Why should this post be closed?

1 comment thread

General comments (5 comments)
Post
+1
−1

When a compiler encounters the statements

char arr[10];
char *ptr = &arr[-1]

there are three things that it could reasonably do:

  1. It can raise an error.

  2. It can compile the statements and raise a warning.

  3. It can compile the statemnts silently.

I think that, in cases 2 and 3, everyone would agree that the value placed in ptr should be such that

ptr + i == &arr[i - 1]

whenever i - 1 is a valid index of arr.

I assume that when the language specification says that the compiler's behaviour is undefined it means that the compiler designer is free to choose from these three options.

Although ptr would hold an invalid pointer value there are plausible situations where this would be useful. One example is simulating an array with a non-zero lower bound:

&ptr[1] == &arr[0]

I can think of two reasons for the compiler to generate a warning or error. The first is to draw the programmer's attention to a simple mistake in the "typing error" category. Perhaps he (or she) meant arr[1] or arr[N-1].

Here I think it is worth comparing &arr[-1] with the equivalent arr - 1. Although these are technically equivalent they are conceptually distinct. The former applies an invalid index to an array, then takes the address of the (non-existent) element. The latter is just a normal pointer arithmetic expression.

The fact that clang gives a warning for the former but passes the latter silently indicates that the clang designers recognised this distinction.

The second, more serious, reason for rejecting this code is that it may result in genuinely undefined behaviour when the program is run. In a general-purpose computer with a large address space this is unlikely, but in a microcontroller system the address of arr may be close to zero. (It can never actually be zero as this has a special meaning.) In that situation subtracting from a pointer could cause an arithmetic overflow, even if the pointer is never dereferenced.

To sum up, although the compiler may accept it and the resultant program do what you expect, it is better to avoid assigning illegal values to pointers. Even if it appears that you could make your code more efficient the compiler's optimiser will probably do a better job, and it will take account of the vagaries of the target hardware.

History
Why does this post require attention from curators or moderators?
You might want to add some details to your flag.

1 comment thread

General comments (11 comments)
General comments
Lundin‭ wrote about 4 years ago

You can't demonstrate that something isn't undefined behavior by running the code, all you prove with that is that you got lucky. You can however in some cases demonstrate that something is UB by disassembling the code and watch where it went wrong.

Lundin‭ wrote about 4 years ago · edited about 4 years ago

If the C standard doesn't convince you, then think of common real-world scenarios: many architectures have memory protection traps from reading data from executable memory or executing code from data memory. Suppose your array is located at the very border of data memory on a certain machine and by going -1 you go outside that area. The machine will likely generate a hardware exception, possibly by just examining the index register and finding an invalid address there.

klutt‭ wrote about 4 years ago

Are you saying that my code would not compile because it's not in a function? I just did not include that code, but the question is changed now.

chris-barry‭ wrote about 4 years ago

@klutt No. I am saying that it will not compile because you ar trying to assign a value of type char to a variable of type pointer. However, even if the types were made to match (by making arr an array of pointers to char, rather than an array of char) your code sample would not illustrate your question because "= arr[-1]" is dereferencing a (non-existent) array element.

chris-barry‭ wrote about 4 years ago

@lundin True. The only way to determine that something is undefined is to look at the definition. Your example, however, is inappropriate as it attempts to dereference an out-of-bounds element and the OP says: "even without dereferencing it".

chris-barry‭ wrote about 4 years ago

@lundin Interestingly, 6.5.6/8 places restrictions on pointer arithmetic. As a not-too-contrived-example, imagine a user selects from a list numbered from one up. This is used to index a zero-based array. A simple approach would be to create a pointer to element -1 then add the user's selection to it. If this were done in a single expression "ptr = arr - 1 + choice" then it would be acceptable, but if it were split, "ptr = arr -1; ptr += choice" then its behaviour would be undefined.

Lundin‭ wrote about 4 years ago

Not really. The order of evaluation of + and - is unspecified, so you can't reliably tell if that example evaluates as (arr - 1) + choice or arr - (1 + choice). At least the former sub-expression is undefined behavior. In practice the compiler will likely replace it all with hard-coded addresses, but in theory it can break.

klutt‭ wrote about 4 years ago

@chris-barry You're correct. That was just a typo that i missed several times. Question is corrected with a &. Sorry for any inconvenience.

Martin Bonner‭ wrote almost 4 years ago

@chris barry - arr - 1 is undefined behaviour. On some architectures/machines it can trap at run-time. (I'm not sure any such machines are widely used these days, but they have existed, and they may do so in the future.)

Incnis Mrsi‭ wrote over 3 years ago

The klutt‭’s question doesn’t stand (and never stood) about behavior of the compiler. The question stands about behavior of a program. It is easy to make a C program to do dangerous and foolish things without causing any complaints compilation-time, and a qualified C programmer understands that any verification of the source code cannot catch all possible instances of the anomaly. What any C compiler says about different cases where a pointer can likely go out of array bounds is irrelevant.