Welcome to Software Development on Codidact!

Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.

Post History

77%

+5 −0

Q&A Can I access an array element from a pointer to an object contiguous with but outside the array?

The problem with undefined behavior due to array out of bounds happens whenever we use pointer arithmetic, which is only defined to work within the bounds of an array. Where plain variables, "scala...

posted 1y ago by Lundin‭ · edited 1y ago by Lundin‭

Answer

#2: Post edited by

Lundin‭ · 2024-04-05T07:58:40Z (about 1 year ago)

Copy Link

Raw

Markdown

The problem with undefined behavior due to array out of bounds happens whenever we use pointer arithmetic, which is only defined to work within the bounds of an array. Where plain variables, "scalars", are defined to behave just the same as arrays of 1 item, as far as pointer arithmetic is concerned.
~~So in your example, `y` is to be regarded as an item of `int y[1]` and therefore `(&s.y)[-1]` would be just as out of bounds as `s.x[2]` and therefore also undefined behavior.~~
Relevant parts of the C standard can be found below "additive operators". This because `arr[i]` is guaranteed to be equivalent to `*(arr + i)` and so the rules for the `+` operator is what matters.
From C17 6.5.6:
> For the purposes of these operators, a pointer to an object that is not an element of an array behaves the same as a pointer to the first element of an array of length one with the type of the object as its
element type.
> /--/
> If both the pointer operand and the result point
to elements of the same array object, or one past the last element of the array object, the evaluation
shall not produce an overflow; otherwise, the behavior is undefined.
Meaning you can't point at array item `[-1]` either.
---
Other forms of UB we might encounter here is misaligned access or "strict aliasing" violations.
In this case alignment is not an issue and the static assert ensures there's no padding - naturally there _won't_ be any padding in practice, in any known real-world system - padding can only happen in theory here.
Strict aliasing is not an issue either as long as we use pointers to `int` for the access, since the actual type of the object store in memory is `int`.
---
There is a possible work-around which is well-defined: we can always inspect the struct through a pointer to character. This is well-defined due to a special rule in 6.3.2.3 that allows any object in C to be inspected byte by byte.
So this code is well-defined:
```c
#include <stdio.h>
#include <stddef.h>
struct MyStruct {
int x[2];
int y, z;
};
int main() {
struct MyStruct s = { {1,2},3,4 };
static_assert(sizeof(struct MyStruct) == sizeof(int[4]), "Unexpected Padding");
unsigned char* ptr = (unsigned char*) &s;
ptr += offsetof(struct MyStruct, y);
printf("y: %d\n", *(int*) ptr);
ptr -= sizeof(int);
printf("x[1]: %d\n", *(int*) ptr);
return 0;
}
```
Output:
```text
y: 3
x[1]: 2
```
In this case the whole struct is to be regarded as type
`unsigned char [sizeof(struct MyStruct)]` for the purpose of out-of-bounds checks and within this array we can access any byte. And given that the character pointer points at memory properly aligned for an `int`, from there we can safely cast to `int` and de-reference, because the actual type of the data ("effective type") is indeed `int`.

The problem with undefined behavior due to array out of bounds happens whenever we use pointer arithmetic, which is only defined to work within the bounds of an array. Where plain variables, "scalars", are defined to behave just the same as arrays of 1 item, as far as pointer arithmetic is concerned.
So in your example, `y` is to be regarded as an array like `int y[1]` and therefore `(&s.y)[-1]` would be just as out of bounds as `s.x[2]` and therefore also undefined behavior.
Relevant parts of the C standard can be found below "additive operators". This because `arr[i]` is guaranteed to be equivalent to `*(arr + i)` and so the rules for the `+` operator is what matters.
From C17 6.5.6:
> For the purposes of these operators, a pointer to an object that is not an element of an array behaves the same as a pointer to the first element of an array of length one with the type of the object as its
element type.
> /--/
> If both the pointer operand and the result point
to elements of the same array object, or one past the last element of the array object, the evaluation
shall not produce an overflow; otherwise, the behavior is undefined.
Meaning you can't point at array item `[-1]` either.
---
Other forms of UB we might encounter here is misaligned access or "strict aliasing" violations.
In this case alignment is not an issue and the static assert ensures there's no padding - naturally there _won't_ be any padding in practice, in any known real-world system - padding can only happen in theory here.
Strict aliasing is not an issue either as long as we use pointers to `int` for the access, since the actual type of the object store in memory is `int`.
---
There is a possible work-around which is well-defined: we can always inspect the struct through a pointer to character. This is well-defined due to a special rule in 6.3.2.3 that allows any object in C to be inspected byte by byte.
So this code is well-defined:
```c
#include <stdio.h>
#include <stddef.h>
struct MyStruct {
int x[2];
int y, z;
};
int main() {
struct MyStruct s = { {1,2},3,4 };
static_assert(sizeof(struct MyStruct) == sizeof(int[4]), "Unexpected Padding");
unsigned char* ptr = (unsigned char*) &s;
ptr += offsetof(struct MyStruct, y);
printf("y: %d\n", *(int*) ptr);
ptr -= sizeof(int);
printf("x[1]: %d\n", *(int*) ptr);
return 0;
}
```
Output:
```text
y: 3
x[1]: 2
```
In this case the whole struct is to be regarded as type
`unsigned char [sizeof(struct MyStruct)]` for the purpose of out-of-bounds checks and within this array we can access any byte. And given that the character pointer points at memory properly aligned for an `int`, from there we can safely cast to `int` and de-reference, because the actual type of the data ("effective type") is indeed `int`.

#1: Initial revision by

Lundin‭ · 2024-04-05T07:57:49Z (about 1 year ago)

Copy Link

Raw

Markdown

The problem with undefined behavior due to array out of bounds happens whenever we use pointer arithmetic, which is only defined to work within the bounds of an array. Where plain variables, "scalars", are defined to behave just the same as arrays of 1 item, as far as pointer arithmetic is concerned.

So in your example, `y` is to be regarded as an item of `int y[1]` and therefore `(&s.y)[-1]` would be just as out of bounds as `s.x[2]` and therefore also undefined behavior.

Relevant parts of the C standard can be found below "additive operators". This because `arr[i]` is guaranteed to be equivalent to `*(arr + i)` and so the rules for the `+` operator is what matters.

From C17 6.5.6:

> For the purposes of these operators, a pointer to an object that is not an element of an array behaves the same as a pointer to the first element of an array of length one with the type of the object as its
element type.  
> /--/  
> If both the pointer operand and the result point
to elements of the same array object, or one past the last element of the array object, the evaluation
shall not produce an overflow; otherwise, the behavior is undefined.

Meaning you can't point at array item `[-1]` either.

---




Other forms of UB we might encounter here is misaligned access or "strict aliasing" violations. 

In this case alignment is not an issue and the static assert ensures there's no padding - naturally there _won't_ be any padding in practice, in any known real-world system - padding can only happen in theory here.

Strict aliasing is not an issue either as long as we use pointers to `int` for the access, since the actual type of the object store in memory is `int`.

---

There is a possible work-around which is well-defined: we can always inspect the struct through a pointer to character. This is well-defined due to a special rule in 6.3.2.3 that allows any object in C to be inspected byte by byte.

So this code is well-defined:

```c
#include <stdio.h>
#include <stddef.h>

struct MyStruct {
    int x[2];
    int y, z;
};

int main() {
    struct MyStruct s = { {1,2},3,4 };
    static_assert(sizeof(struct MyStruct) == sizeof(int[4]), "Unexpected Padding");

    unsigned char* ptr = (unsigned char*) &s;
    ptr += offsetof(struct MyStruct, y);
    printf("y: %d\n", *(int*) ptr);
    ptr -= sizeof(int);
    printf("x[1]: %d\n", *(int*) ptr);

    return 0;
}
```

Output:

```text
y: 3
x[1]: 2
```

In this case the whole struct is to be regarded as type  
`unsigned char [sizeof(struct MyStruct)]` for the purpose of out-of-bounds checks and within this array we can access any byte. And given that the character pointer points at memory properly aligned for an `int`, from there we can safely cast to `int` and de-reference, because the actual type of the data ("effective type") is indeed `int`.

Communities

Post History