Communities

Writing
Writing
Codidact Meta
Codidact Meta
The Great Outdoors
The Great Outdoors
Photography & Video
Photography & Video
Scientific Speculation
Scientific Speculation
Cooking
Cooking
Electrical Engineering
Electrical Engineering
Judaism
Judaism
Languages & Linguistics
Languages & Linguistics
Software Development
Software Development
Mathematics
Mathematics
Christianity
Christianity
Code Golf
Code Golf
Music
Music
Physics
Physics
Linux Systems
Linux Systems
Power Users
Power Users
Tabletop RPGs
Tabletop RPGs
Community Proposals
Community Proposals
tag:snake search within a tag
answers:0 unanswered questions
user:xxxx search by author id
score:0.5 posts with 0.5+ score
"snake oil" exact phrase
votes:4 posts with 4+ votes
created:<1w created < 1 week ago
post_type:xxxx type of post
Search help
Notifications
Mark all as read See all your notifications »
Code Reviews

Welcome to Software Development on Codidact!

Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.

Comments on stpecpy(): Design a better string copy function that truncates

Post

stpecpy(): Design a better string copy function that truncates

+5
−1

I was directed a few days ago to a post about a string copy function, which IMO improves the commonly known string copy functions, including strlcpy(3BSD), strlcat(3BSD), and strscpy(9).

It defines a function, char *strecopy(char *dst, char *src, char *end), where end really means one past the end of the array, and the function always returns the pointer to the NUL byte of the dst string.

It is to be used as:

past_end = buf + sizeof(buf);
len = strecopy(strecopy(buf, "Hello", past_end), " world", past_end) - buf;

It improves chaining compared to the other mentioned functions.

However, I found some details that I think could be improved in that function. It can't detect overflow (one needs to calculate it as len != (strlen("Hello") + strlen(" world"))).

So, I designed the following function, which I'd like to get reviewed. I named it stpecpy(); the 'p' as in stpcpy(3), since both return a pointer to the end of the string (except for truncation); the 'e' as in strecopy(), after the extra parameter for the end of the array (except that this one really means the end of the array).

char *
stpecpy(char *dst, char *src, char *end)
{
	for (/* void */; dst <= end; dst++) {
		*dst = *src++;
		if (!*dst)
			return dst;
	}
	/* truncation detected */
	*end = '\0';
	return dst;
}

To be used as:

end = buf + sizeof(buf) - 1;
len = stpecpy(stpecpy(buf, "Hello", end), " world", end) - buf;
if (len == sizeof(buf)) {
	len--;
	handle_trunc();
}

dst and src are the same as in strcpy(3) and all other string copy functions. end is a pointer to the end of the array (i.e., to the last element of the array pointed to by dst).

It allows chaining without needing to recalculate the length of the string after each call (improvement already present in strecopy()), and allows for an easy way to detect truncation (the function returns end + 1).

When designing it, I had some doubts if I should return NULL or end + 1, but I decided to use end + 1 for the following reasons:

  • Don't need to add a check for NULL at the beginning of the function.

  • Callers can cause undefined behavior easily if the function returns NULL: len = stpecpy(dst, src, end) - buf;. NULL - buf is Undefined Behavior.

However, it has a minor problem: the returned pointer minus the beginning of the buffer is not always the length of the string (it is len + 1 in case of truncation); that may be unintuitive for some... But I think it's still best to do that.

A test for the function:

int
main(void)
{
	ptrdiff_t size = 10;
	char      buf[size];
	char     *end;
	ptrdiff_t len;

	end = &buf[size - 1];

	len = stpecpy(stpecpy(buf, "Hello", end), " world", end) - buf;
	if (len == size) {
		len--;
		puts("Following string is truncated.");
	}
	printf("%ti: %s\n", len, buf);

	len = stpecpy(stpecpy(stpecpy(buf, "Hello", end), " foo", end), "", end) - buf;
	if (len == size) {
		len--;
		puts("Following string is truncated.");
	}
	printf("%ti: %s\n", len, buf);

	len = stpecpy(stpecpy(stpecpy(buf, "Hello", end), " baar", end), "", end) - buf;
	if (len == size) {
		len--;
		puts("Following string is truncated.");
	}
	printf("%ti: %s\n", len, buf);

	len = stpecpy(stpecpy(buf, "H", end), "W", end) - buf;
	if (len == size) {
		len--;
		puts("Following string is truncated.");
	}
	printf("%ti: %s\n", len, buf);
}

Which prints:

$ ./a.out 
Following string is truncated.
9: Hello wor
9: Hello foo
Following string is truncated.
9: Hello baa
2: HW

So, what are your thoughts about this copy function?

As a similar function, stpsecpy() could be also added, if it is needed, for copying from non-strings (character arrays), while ensuring that a true string is created in dst, but that's out of this code review.


EDIT

After applying Lundin's advice, and also extending it with useful Clang extensions (_Nonnull), the prototype of the function would be:

char *_Nonnull
stpecpy(char *_Nonnull dst, const char *_Nonnull restrict src, char *_Nonnull end);

EDIT

Add implementation in terms of simple libc functions (memccpy(3)):

char *_Nonnull
stpecpy(char *_Nonnull dst,
        char *_Nonnull restrict src,
        char *_Nonnull end)
{
	char *p;

	p = memccpy(dst, src, '\0', end + 1 - dst);
	if (p != NULL)
		return p - 1;
	/* truncation detected */
	*end = '\0';
	return end + 1;
}
History
Why does this post require attention from curators or moderators?
You might want to add some details to your flag.
Why should this post be closed?

1 comment thread

_Nonnull (4 comments)
_Nonnull
Lundin‭ wrote over 2 years ago · edited over 2 years ago

_Nonnull don't use this. First of all, convention gives that this ought to be the caller's responsibility. Second, if you insist on using such features, the standard already has support for this through static. And lately compilers also have added support for it. So if you wish to protect against the caller passing null, you could do this with (char dst[static 1], const char src[static restrict 1], char end[static 1]).

alx‭ wrote over 2 years ago · edited over 2 years ago
  1. Convention is that, yes, but _Nonnull/static can help be more precise about that in the prototype. Helping the caller is not a bad thing to do. 2) static is problematic in that it means too much. It means that the pointer is not NULL, and that the underlying storage points to at least N elements (maybe more), N being at least 1. This is a perfect example of why static is wrong: dst may perfectly be a pointer to one past the end (consider the case that a previous chained call has truncated), and therefore it would be [static 0], which of course is illegal in ISO C. There's also a negative readability issue; for that, see my rant about static in the Linux man-pages mailing list: some pointers need not represent arrays, and still one may want to mark them as not NULL; using [static 1] for them is unintuitive and unreadable.
alx‭ wrote over 2 years ago · edited over 2 years ago

I want _Nonnull in the standard (I'm preparing a draft, but I'm going slow on that; want it to be perfect), static then becomes irrelevant/obsolete. I'd like [_Nonnull 3] to mean [static 3], *_Nonnull / [_Nonnull] to mean [static 0] and [3] to mean maybe [static 3] but maybe NULL. A more flexible language.

alx‭ wrote over 2 years ago · edited over 2 years ago

Of course, if one has portability issues, it can just #define _Nonnull to nothing for some compilers that lack support (right now only Clang supports it, AFAIK). Or simply manually remove the text. But for this showcase implementation, I'd like to be as explicit as possible, especially in the interface.