Communities

Writing
Writing
Codidact Meta
Codidact Meta
The Great Outdoors
The Great Outdoors
Photography & Video
Photography & Video
Scientific Speculation
Scientific Speculation
Cooking
Cooking
Electrical Engineering
Electrical Engineering
Judaism
Judaism
Languages & Linguistics
Languages & Linguistics
Software Development
Software Development
Mathematics
Mathematics
Christianity
Christianity
Code Golf
Code Golf
Music
Music
Physics
Physics
Linux Systems
Linux Systems
Power Users
Power Users
Tabletop RPGs
Tabletop RPGs
Community Proposals
Community Proposals
tag:snake search within a tag
answers:0 unanswered questions
user:xxxx search by author id
score:0.5 posts with 0.5+ score
"snake oil" exact phrase
votes:4 posts with 4+ votes
created:<1w created < 1 week ago
post_type:xxxx type of post
Search help
Notifications
Mark all as read See all your notifications »
Q&A

Welcome to Software Development on Codidact!

Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.

Post History

50%
+1 −1
Q&A Is it undefined behaviour to just make a pointer point outside boundaries of an array without dereferencing it?

When a compiler encounters the statements char arr[10]; char *ptr = &arr[-1] there are three things that it could reasonably do: It can raise an error. It can compile the statements and ra...

posted 4y ago by chris-barry‭  ·  edited 4y ago by hkotsubo‭

Answer
#4: Post edited by user avatar hkotsubo‭ · 2020-08-21T23:08:20Z (about 4 years ago)
  • When a compiler encounters the statements
  • char arr[10];
  • char *ptr = &arr[-1]
  • there are three things that it could reasonably do:
  • 1. It can raise an error.
  • 2. It can compile the statements and raise a warning.
  • 3. It can compile the statemnts silently.
  • I think that, in cases 2 and 3, everyone would agree that the value placed in ptr should be such that
  • ptr + i == &arr[i - 1]
  • whenever i - 1 is a valid index of arr.
  • I assume that when the language specification says that the compiler's behaviour is undefined it means that the compiler designer is free to choose from these three options.
  • Although ptr would hold an invalid pointer value there are plausible situations where this would be useful. One example is simulating an array with a non-zero lower bound:
  • &ptr[1] == &arr[0]
  • I can think of two reasons for the compiler to generate a warning or error. The first is to draw the programmer's attention to a simple mistake in the "typing error" category. Perhaps he (or she) meant
  • arr[1] or arr[N-1].
  • Here I think it is worth comparing "&arr[-1]" with the equivalent "arr - 1".
  • Although these are technically equivalent they are conceptually distinct. The former applies an invalid index to an array, then takes the address of the (non-existent) element. The latter is just a normal pointer arithmetic expression.
  • The fact that clang gives a warning for the former but passes the latter silently indicates that the clang designers recognised this distinction.
  • The second, more serious, reason for rejecting this code is that it may result in genuinely undefined behaviour when the program is run. In a general-purpose computer with a large address space this is unlikely, but in a microcontroller system the address of arr may be close to zero. (It can never actually be zero as this has a special meaning.) In that situation subtracting from a pointer could cause an arithmetic overflow, even if the pointer is never dereferenced.
  • To sum up, although the compiler may accept it and the resultant program do what you expect, it is better to avoid assigning illegal values to pointers. Even if it appears that you could make your code more efficient the compiler's optimiser will probably do a better job, and it will take account of the vagaries of the target hardware.
  • When a compiler encounters the statements
  • ```
  • char arr[10];
  • char *ptr = &arr[-1]
  • ```
  • there are three things that it could reasonably do:
  • 1. It can raise an error.
  • 2. It can compile the statements and raise a warning.
  • 3. It can compile the statemnts silently.
  • I think that, in cases 2 and 3, everyone would agree that the value placed in `ptr` should be such that
  • ptr + i == &arr[i - 1]
  • whenever `i - 1` is a valid index of `arr`.
  • I assume that when the language specification says that the compiler's behaviour is undefined it means that the compiler designer is free to choose from these three options.
  • Although ptr would hold an invalid pointer value there are plausible situations where this would be useful. One example is simulating an array with a non-zero lower bound:
  • &ptr[1] == &arr[0]
  • I can think of two reasons for the compiler to generate a warning or error. The first is to draw the programmer's attention to a simple mistake in the "typing error" category. Perhaps he (or she) meant
  • `arr[1]` or `arr[N-1]`.
  • Here I think it is worth comparing `&arr[-1]` with the equivalent `arr - 1`.
  • Although these are technically equivalent they are conceptually distinct. The former applies an invalid index to an array, then takes the address of the (non-existent) element. The latter is just a normal pointer arithmetic expression.
  • The fact that clang gives a warning for the former but passes the latter silently indicates that the clang designers recognised this distinction.
  • The second, more serious, reason for rejecting this code is that it may result in genuinely undefined behaviour when the program is run. In a general-purpose computer with a large address space this is unlikely, but in a microcontroller system the address of arr may be close to zero. (It can never actually be zero as this has a special meaning.) In that situation subtracting from a pointer could cause an arithmetic overflow, even if the pointer is never dereferenced.
  • To sum up, although the compiler may accept it and the resultant program do what you expect, it is better to avoid assigning illegal values to pointers. Even if it appears that you could make your code more efficient the compiler's optimiser will probably do a better job, and it will take account of the vagaries of the target hardware.
#3: Post edited by user avatar chris-barry‭ · 2020-08-19T08:30:06Z (about 4 years ago)
Having given more consideration to the implications of the question, I have completely rewritten my answer.
  • First, the behaviour that is undefined is that of the compiler. Unless there is a Klutt manual that specifies precisely the type of code that you will submit to a C compiler you may do anything you like in this regard without being guilty of undefined behaviour.
  • As Kami pointed out, your code sample will not compile and would not be a valid illustration of your query even if it did compile.
  • Consider, instead, the following code:
  • ```c
  • #include <stdio.h>
  • int
  • main(int argc, char **argv)
  • {
  • char arr[10];
  • char *p1 = &arr[0];
  • char *p2 = &arr[-1];
  • char *p3 = arr - 1;
  • printf(
  • "arr: %p \np1: %p\np2: %p\np3: %p\n", arr, p1, p2, p3);
  • return 0;
  • }
  • ```
  • The assignment to `p1` is completely legal. It places in `p1` the address of the first element of `arr`.
  • Compiling with clang, the assignment to `p2` generates a warning. This should be expected as it can be considered to be taking the address of element -1, which does not exist. However, if it did exist (some languages will allow the definition of the lower bound of an array) then its address would be the address of element zero minus the size of one array element.
  • The assignment to `p3` is completely legal. It simply assigns a char address to a char pointer, then decrements that pointer by the size of a char. This could even be useful. For example, you might be initialising a pointer before entering a loop that increments the pointer before using it.
  • When I compile and run the code above I get this output:
  • ```
  • arr: 0x7fff0ee422ce
  • p1: 0x7fff0ee422ce
  • p2: 0x7fff0ee422cd
  • p3: 0x7fff0ee422cd
  • ```
  • which is exactly what I expected.
  • When a compiler encounters the statements
  • char arr[10];
  • char *ptr = &arr[-1]
  • there are three things that it could reasonably do:
  • 1. It can raise an error.
  • 2. It can compile the statements and raise a warning.
  • 3. It can compile the statemnts silently.
  • I think that, in cases 2 and 3, everyone would agree that the value placed in ptr should be such that
  • ptr + i == &arr[i - 1]
  • whenever i - 1 is a valid index of arr.
  • I assume that when the language specification says that the compiler's behaviour is undefined it means that the compiler designer is free to choose from these three options.
  • Although ptr would hold an invalid pointer value there are plausible situations where this would be useful. One example is simulating an array with a non-zero lower bound:
  • &ptr[1] == &arr[0]
  • I can think of two reasons for the compiler to generate a warning or error. The first is to draw the programmer's attention to a simple mistake in the "typing error" category. Perhaps he (or she) meant
  • arr[1] or arr[N-1].
  • Here I think it is worth comparing "&arr[-1]" with the equivalent "arr - 1".
  • Although these are technically equivalent they are conceptually distinct. The former applies an invalid index to an array, then takes the address of the (non-existent) element. The latter is just a normal pointer arithmetic expression.
  • The fact that clang gives a warning for the former but passes the latter silently indicates that the clang designers recognised this distinction.
  • The second, more serious, reason for rejecting this code is that it may result in genuinely undefined behaviour when the program is run. In a general-purpose computer with a large address space this is unlikely, but in a microcontroller system the address of arr may be close to zero. (It can never actually be zero as this has a special meaning.) In that situation subtracting from a pointer could cause an arithmetic overflow, even if the pointer is never dereferenced.
  • To sum up, although the compiler may accept it and the resultant program do what you expect, it is better to avoid assigning illegal values to pointers. Even if it appears that you could make your code more efficient the compiler's optimiser will probably do a better job, and it will take account of the vagaries of the target hardware.
#2: Post edited by user avatar hkotsubo‭ · 2020-08-18T12:23:51Z (about 4 years ago)
  • First, the behaviour that is undefined is that of the compiler. Unless there is a Klutt manual that specifies precisely the type of code that you will submit to a C compiler you may do anything you like in this regard without being guilty of undefined behaviour.
  • As Kami pointed out, your code sample will not compile and would not be a valid illustration of your query even if it did compile.
  • Consider, instead, the following code:
  • #include <stdio.h>
  • int
  • main(int argc, char **argv)
  • {
  • char arr[10];
  • char *p1 = &arr[0];
  • char *p2 = &arr[-1];
  • char *p3 = arr - 1;
  • printf(
  • "arr: %p \np1: %p\np2: %p\np3: %p\n", arr, p1, p2, p3);
  • return 0;
  • }
  • The assignment to p1 is completely legal. It places in p1 the address of the first element of arr.
  • Compiling with clang, the assignment to p2 generates a warning. This should be expected as it can be considered to be taking the address of element -1, which does not exist. However, if it did exist (some languages will allow the definition of the lower bound of an array) then its address would be the address of element zero minus the size of one array element.
  • The assignment to p3 is completely legal. It simply assigns a char address to a char pointer, then decrements that pointer by the size of a char. This could even be useful. For example, you might be initialising a pointer before entering a loop that increments the pointer before using it.
  • When I compile and run the code above I get this output:
  • arr: 0x7fff0ee422ce
  • p1: 0x7fff0ee422ce
  • p2: 0x7fff0ee422cd
  • p3: 0x7fff0ee422cd
  • which is exactly what I expected.
  • First, the behaviour that is undefined is that of the compiler. Unless there is a Klutt manual that specifies precisely the type of code that you will submit to a C compiler you may do anything you like in this regard without being guilty of undefined behaviour.
  • As Kami pointed out, your code sample will not compile and would not be a valid illustration of your query even if it did compile.
  • Consider, instead, the following code:
  • ```c
  • #include <stdio.h>
  • int
  • main(int argc, char **argv)
  • {
  • char arr[10];
  • char *p1 = &arr[0];
  • char *p2 = &arr[-1];
  • char *p3 = arr - 1;
  • printf(
  • "arr: %p \np1: %p\np2: %p\np3: %p\n", arr, p1, p2, p3);
  • return 0;
  • }
  • ```
  • The assignment to `p1` is completely legal. It places in `p1` the address of the first element of `arr`.
  • Compiling with clang, the assignment to `p2` generates a warning. This should be expected as it can be considered to be taking the address of element -1, which does not exist. However, if it did exist (some languages will allow the definition of the lower bound of an array) then its address would be the address of element zero minus the size of one array element.
  • The assignment to `p3` is completely legal. It simply assigns a char address to a char pointer, then decrements that pointer by the size of a char. This could even be useful. For example, you might be initialising a pointer before entering a loop that increments the pointer before using it.
  • When I compile and run the code above I get this output:
  • ```
  • arr: 0x7fff0ee422ce
  • p1: 0x7fff0ee422ce
  • p2: 0x7fff0ee422cd
  • p3: 0x7fff0ee422cd
  • ```
  • which is exactly what I expected.
#1: Initial revision by user avatar chris-barry‭ · 2020-08-17T22:12:07Z (about 4 years ago)
First, the behaviour that is undefined is that of the compiler. Unless there is a Klutt manual that specifies precisely the type of code that you will submit to a C compiler you may do anything you like in this regard without being guilty of undefined behaviour.

As Kami pointed out, your code sample will not compile and would not be a valid illustration of your query even if it did compile.

Consider, instead, the following code:

#include <stdio.h>

int
main(int argc, char **argv)
{
    char    arr[10];
    char    *p1 = &arr[0];
    char    *p2 = &arr[-1];
    char    *p3 = arr - 1;

	printf(
"arr: %p \np1:  %p\np2:  %p\np3:  %p\n", arr, p1, p2, p3);
	return 0;
}
 
The assignment to p1 is completely legal. It places in p1 the address of the first element of arr.

Compiling with clang, the assignment to p2 generates a warning. This should be expected as it can be considered to be taking the address of element -1, which does not exist. However, if it did exist (some languages will allow the definition of the lower bound of an array) then its address would be the address of element zero minus the size of one array element.

The assignment to p3 is completely legal. It simply assigns a char address to a char pointer, then decrements that pointer by the size of a char. This could even be useful. For example, you might be initialising a pointer before entering a loop that increments the pointer before using it.

When I compile and run the code above I get this output:

arr: 0x7fff0ee422ce 
p1:  0x7fff0ee422ce
p2:  0x7fff0ee422cd
p3:  0x7fff0ee422cd

which is exactly what I expected.