Communities

Writing
Writing
Codidact Meta
Codidact Meta
The Great Outdoors
The Great Outdoors
Photography & Video
Photography & Video
Scientific Speculation
Scientific Speculation
Cooking
Cooking
Electrical Engineering
Electrical Engineering
Judaism
Judaism
Languages & Linguistics
Languages & Linguistics
Software Development
Software Development
Mathematics
Mathematics
Christianity
Christianity
Code Golf
Code Golf
Music
Music
Physics
Physics
Linux Systems
Linux Systems
Power Users
Power Users
Tabletop RPGs
Tabletop RPGs
Community Proposals
Community Proposals
tag:snake search within a tag
answers:0 unanswered questions
user:xxxx search by author id
score:0.5 posts with 0.5+ score
"snake oil" exact phrase
votes:4 posts with 4+ votes
created:<1w created < 1 week ago
post_type:xxxx type of post
Search help
Notifications
Mark all as read See all your notifications »
Q&A

Welcome to Software Development on Codidact!

Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.

Comments on Storing more bytes than a union member has, but less than the union size, with memcpy(3)

Parent

Storing more bytes than a union member has, but less than the union size, with memcpy(3)

+5
−0

Let's say we have an object, we store it in a union (into some other narrower type, but with memcpy(3), so it's allowed --I guess--), and then read it from the union via it's original type (so no alignment issues or anything.

$ cat union.c 
#include <string.h>

struct s { int       a;  int       b; };
struct t { int       a;               };
union u  { struct s  s;  struct t  t; };

int
main(void)
{
	struct s  x = {42, 53};
	union u   y;
	int       z;

	memcpy(&y.t, &x, sizeof(x));  // y.t has declared/effective type of 'struct t'
	z = y.s.b;  // Is this UB?

	return z;
}

I would guess the above is undefined behavior, exactly at the point of the read of y.s.b.

The reason is that since we created an object of type struct t via memcpy(3), then the compiler is free to assume that the object is no wider than sizeof(struct t), and so y.s.b (which is beyond that) would be "uninitialized" (even though we really wrote bytes to it).

Is it UB as I expect?

However, neither GCC and Clang complain about such program:

$ gcc-13 -Wall -Wextra -Wpedantic -pedantic-errors union.c -O3 -fanalyzer -fsanitize=undefined -fsanitize=address
$ ./a.out; echo $?
53
$ clang-16 -Wall -Wextra -Wpedantic -pedantic-errors union.c -O3 -fsanitize=undefined -fsanitize=address
$ ./a.out; echo $?
53

BTW, does it change if I change and use allocated memory?

int
main(void)
{
	struct s  x = {42, 53};
	union u   *y = xmalloc(sizeof(union u));  // No declared/effective type
	int       z;

	memcpy(&y->t, &x, sizeof(x));  // This sets the effective type to 'struct s'
	z = y->s.b;  // No UB?

	return z;
}
History
Why does this post require attention from curators or moderators?
You might want to add some details to your flag.
Why should this post be closed?

0 comment threads

Post
+2
−0

memcpy(&y.t, &x, sizeof(x)); is a bit fishy since it would have made more sense to copy into &y or &y.s. None of this is necessarily UB however.

Regarding strict aliasing, it doesn't really matter. If you allocate with a malloc-like function then the data has no declared type and effective type rules C17 6.7 §6 apply, which covers memcpy. Either way the effective type becomes struct s, but inside a union that doesn't really matter.


More relevant is the rule about union type punning C17 6.5.2.3 §3 (normative) and foot note 97 (informative):

  1. If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called “type punning”). This might be a trap representation.

Here is the only case where UB might kick in - when you read a union member and the data gets reinterpreted as another type and matches a trap representation. Not relevant for 2's complement int but could be relevant for pointer members and maybe floating point.


Also noteworthy is the oddball rule common initial sequence:

One special guarantee is made in order to simplify the use of unions: if a union contains several structures that share a common initial sequence (see below), and if the union object currently contains one of these structures, it is permitted to inspect the common initial part of any of them anywhere that a declaration of the completed type of the union is visible. Two structures share a common initial sequence if corresponding members have compatible types (and, for bit-fields, the same widths) for a sequence of one or more initial members.

Since both your structs are present in a translation unit where the union is visible - and only then - it is formally well-defined to inspect any member in this common initial sequence no matter through which type. In your example s.a and t.a are guaranteed to overlap.

However lots of compilers have a bleak history on non-conformance here, so I wouldn't count on code relying on common initial sequence to be portable, even though the C standard marks it as well-defined.

History
Why does this post require attention from curators or moderators?
You might want to add some details to your flag.

1 comment thread

Common initial sequence rules (2 comments)
Common initial sequence rules
GrantMoyer‭ wrote over 1 year ago

"if a union contains several structures that share a common initial sequence (see below), and if the union object currently contains one of these structures, it is permitted to inspect the common initial part of any of them" The way I read it, any of them is any of the union members, not any struct objects with the same type as a union member. So struct s x = {1, 2}; union u y = x; y.t.a; is legal, but not the example you give.

Lundin‭ wrote over 1 year ago

GrantMoyer‭ Ah yeah the example I made is plain wrong. I'll remove it.