Communities

Writing
Writing
Codidact Meta
Codidact Meta
The Great Outdoors
The Great Outdoors
Photography & Video
Photography & Video
Scientific Speculation
Scientific Speculation
Cooking
Cooking
Electrical Engineering
Electrical Engineering
Judaism
Judaism
Languages & Linguistics
Languages & Linguistics
Software Development
Software Development
Mathematics
Mathematics
Christianity
Christianity
Code Golf
Code Golf
Music
Music
Physics
Physics
Linux Systems
Linux Systems
Power Users
Power Users
Tabletop RPGs
Tabletop RPGs
Community Proposals
Community Proposals
tag:snake search within a tag
answers:0 unanswered questions
user:xxxx search by author id
score:0.5 posts with 0.5+ score
"snake oil" exact phrase
votes:4 posts with 4+ votes
created:<1w created < 1 week ago
post_type:xxxx type of post
Search help
Notifications
Mark all as read See all your notifications »
Q&A

Welcome to Software Development on Codidact!

Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.

Post History

77%
+5 −0
Q&A Can I access an array element from a pointer to an object contiguous with but outside the array?

The problem with undefined behavior due to array out of bounds happens whenever we use pointer arithmetic, which is only defined to work within the bounds of an array. Where plain variables, "scala...

posted 9mo ago by Lundin‭  ·  edited 9mo ago by Lundin‭

Answer
#2: Post edited by user avatar Lundin‭ · 2024-04-05T07:58:40Z (9 months ago)
  • The problem with undefined behavior due to array out of bounds happens whenever we use pointer arithmetic, which is only defined to work within the bounds of an array. Where plain variables, "scalars", are defined to behave just the same as arrays of 1 item, as far as pointer arithmetic is concerned.
  • So in your example, `y` is to be regarded as an item of `int y[1]` and therefore `(&s.y)[-1]` would be just as out of bounds as `s.x[2]` and therefore also undefined behavior.
  • Relevant parts of the C standard can be found below "additive operators". This because `arr[i]` is guaranteed to be equivalent to `*(arr + i)` and so the rules for the `+` operator is what matters.
  • From C17 6.5.6:
  • > For the purposes of these operators, a pointer to an object that is not an element of an array behaves the same as a pointer to the first element of an array of length one with the type of the object as its
  • element type.
  • > /--/
  • > If both the pointer operand and the result point
  • to elements of the same array object, or one past the last element of the array object, the evaluation
  • shall not produce an overflow; otherwise, the behavior is undefined.
  • Meaning you can't point at array item `[-1]` either.
  • ---
  • Other forms of UB we might encounter here is misaligned access or "strict aliasing" violations.
  • In this case alignment is not an issue and the static assert ensures there's no padding - naturally there _won't_ be any padding in practice, in any known real-world system - padding can only happen in theory here.
  • Strict aliasing is not an issue either as long as we use pointers to `int` for the access, since the actual type of the object store in memory is `int`.
  • ---
  • There is a possible work-around which is well-defined: we can always inspect the struct through a pointer to character. This is well-defined due to a special rule in 6.3.2.3 that allows any object in C to be inspected byte by byte.
  • So this code is well-defined:
  • ```c
  • #include <stdio.h>
  • #include <stddef.h>
  • struct MyStruct {
  • int x[2];
  • int y, z;
  • };
  • int main() {
  • struct MyStruct s = { {1,2},3,4 };
  • static_assert(sizeof(struct MyStruct) == sizeof(int[4]), "Unexpected Padding");
  • unsigned char* ptr = (unsigned char*) &s;
  • ptr += offsetof(struct MyStruct, y);
  • printf("y: %d\n", *(int*) ptr);
  • ptr -= sizeof(int);
  • printf("x[1]: %d\n", *(int*) ptr);
  • return 0;
  • }
  • ```
  • Output:
  • ```text
  • y: 3
  • x[1]: 2
  • ```
  • In this case the whole struct is to be regarded as type
  • `unsigned char [sizeof(struct MyStruct)]` for the purpose of out-of-bounds checks and within this array we can access any byte. And given that the character pointer points at memory properly aligned for an `int`, from there we can safely cast to `int` and de-reference, because the actual type of the data ("effective type") is indeed `int`.
  • The problem with undefined behavior due to array out of bounds happens whenever we use pointer arithmetic, which is only defined to work within the bounds of an array. Where plain variables, "scalars", are defined to behave just the same as arrays of 1 item, as far as pointer arithmetic is concerned.
  • So in your example, `y` is to be regarded as an array like `int y[1]` and therefore `(&s.y)[-1]` would be just as out of bounds as `s.x[2]` and therefore also undefined behavior.
  • Relevant parts of the C standard can be found below "additive operators". This because `arr[i]` is guaranteed to be equivalent to `*(arr + i)` and so the rules for the `+` operator is what matters.
  • From C17 6.5.6:
  • > For the purposes of these operators, a pointer to an object that is not an element of an array behaves the same as a pointer to the first element of an array of length one with the type of the object as its
  • element type.
  • > /--/
  • > If both the pointer operand and the result point
  • to elements of the same array object, or one past the last element of the array object, the evaluation
  • shall not produce an overflow; otherwise, the behavior is undefined.
  • Meaning you can't point at array item `[-1]` either.
  • ---
  • Other forms of UB we might encounter here is misaligned access or "strict aliasing" violations.
  • In this case alignment is not an issue and the static assert ensures there's no padding - naturally there _won't_ be any padding in practice, in any known real-world system - padding can only happen in theory here.
  • Strict aliasing is not an issue either as long as we use pointers to `int` for the access, since the actual type of the object store in memory is `int`.
  • ---
  • There is a possible work-around which is well-defined: we can always inspect the struct through a pointer to character. This is well-defined due to a special rule in 6.3.2.3 that allows any object in C to be inspected byte by byte.
  • So this code is well-defined:
  • ```c
  • #include <stdio.h>
  • #include <stddef.h>
  • struct MyStruct {
  • int x[2];
  • int y, z;
  • };
  • int main() {
  • struct MyStruct s = { {1,2},3,4 };
  • static_assert(sizeof(struct MyStruct) == sizeof(int[4]), "Unexpected Padding");
  • unsigned char* ptr = (unsigned char*) &s;
  • ptr += offsetof(struct MyStruct, y);
  • printf("y: %d\n", *(int*) ptr);
  • ptr -= sizeof(int);
  • printf("x[1]: %d\n", *(int*) ptr);
  • return 0;
  • }
  • ```
  • Output:
  • ```text
  • y: 3
  • x[1]: 2
  • ```
  • In this case the whole struct is to be regarded as type
  • `unsigned char [sizeof(struct MyStruct)]` for the purpose of out-of-bounds checks and within this array we can access any byte. And given that the character pointer points at memory properly aligned for an `int`, from there we can safely cast to `int` and de-reference, because the actual type of the data ("effective type") is indeed `int`.
#1: Initial revision by user avatar Lundin‭ · 2024-04-05T07:57:49Z (9 months ago)
The problem with undefined behavior due to array out of bounds happens whenever we use pointer arithmetic, which is only defined to work within the bounds of an array. Where plain variables, "scalars", are defined to behave just the same as arrays of 1 item, as far as pointer arithmetic is concerned.

So in your example, `y` is to be regarded as an item of `int y[1]` and therefore `(&s.y)[-1]` would be just as out of bounds as `s.x[2]` and therefore also undefined behavior.

Relevant parts of the C standard can be found below "additive operators". This because `arr[i]` is guaranteed to be equivalent to `*(arr + i)` and so the rules for the `+` operator is what matters.

From C17 6.5.6:

> For the purposes of these operators, a pointer to an object that is not an element of an array behaves the same as a pointer to the first element of an array of length one with the type of the object as its
element type.  
> /--/  
> If both the pointer operand and the result point
to elements of the same array object, or one past the last element of the array object, the evaluation
shall not produce an overflow; otherwise, the behavior is undefined.

Meaning you can't point at array item `[-1]` either.

---




Other forms of UB we might encounter here is misaligned access or "strict aliasing" violations. 

In this case alignment is not an issue and the static assert ensures there's no padding - naturally there _won't_ be any padding in practice, in any known real-world system - padding can only happen in theory here.

Strict aliasing is not an issue either as long as we use pointers to `int` for the access, since the actual type of the object store in memory is `int`.

---

There is a possible work-around which is well-defined: we can always inspect the struct through a pointer to character. This is well-defined due to a special rule in 6.3.2.3 that allows any object in C to be inspected byte by byte.

So this code is well-defined:

```c
#include <stdio.h>
#include <stddef.h>

struct MyStruct {
    int x[2];
    int y, z;
};

int main() {
    struct MyStruct s = { {1,2},3,4 };
    static_assert(sizeof(struct MyStruct) == sizeof(int[4]), "Unexpected Padding");

    unsigned char* ptr = (unsigned char*) &s;
    ptr += offsetof(struct MyStruct, y);
    printf("y: %d\n", *(int*) ptr);
    ptr -= sizeof(int);
    printf("x[1]: %d\n", *(int*) ptr);

    return 0;
}
```

Output:

```text
y: 3
x[1]: 2
```

In this case the whole struct is to be regarded as type  
`unsigned char [sizeof(struct MyStruct)]` for the purpose of out-of-bounds checks and within this array we can access any byte. And given that the character pointer points at memory properly aligned for an `int`, from there we can safely cast to `int` and de-reference, because the actual type of the data ("effective type") is indeed `int`.