Communities

Writing
Writing
Codidact Meta
Codidact Meta
The Great Outdoors
The Great Outdoors
Photography & Video
Photography & Video
Scientific Speculation
Scientific Speculation
Cooking
Cooking
Electrical Engineering
Electrical Engineering
Judaism
Judaism
Languages & Linguistics
Languages & Linguistics
Software Development
Software Development
Mathematics
Mathematics
Christianity
Christianity
Code Golf
Code Golf
Music
Music
Physics
Physics
Linux Systems
Linux Systems
Power Users
Power Users
Tabletop RPGs
Tabletop RPGs
Community Proposals
Community Proposals
tag:snake search within a tag
answers:0 unanswered questions
user:xxxx search by author id
score:0.5 posts with 0.5+ score
"snake oil" exact phrase
votes:4 posts with 4+ votes
created:<1w created < 1 week ago
post_type:xxxx type of post
Search help
Notifications
Mark all as read See all your notifications »
Code Reviews

Welcome to Software Development on Codidact!

Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.

Comments on stpecpy(): Design a better string copy function that truncates

Parent

stpecpy(): Design a better string copy function that truncates

+5
−0

I was directed a few days ago to a post about a string copy function, which IMO improves the commonly known string copy functions, including strlcpy(3BSD), strlcat(3BSD), and strscpy(9).

It defines a function, char *strecopy(char *dst, char *src, char *end), where end really means one past the end of the array, and the function always returns the pointer to the NUL byte of the dst string.

It is to be used as:

past_end = buf + sizeof(buf);
len = strecopy(strecopy(buf, "Hello", past_end), " world", past_end) - buf;

It improves chaining compared to the other mentioned functions.

However, I found some details that I think could be improved in that function. It can't detect overflow (one needs to calculate it as len != (strlen("Hello") + strlen(" world"))).

So, I designed the following function, which I'd like to get reviewed. I named it stpecpy(); the 'p' as in stpcpy(3), since both return a pointer to the end of the string (except for truncation); the 'e' as in strecopy(), after the extra parameter for the end of the array (except that this one really means the end of the array).

char *
stpecpy(char *dst, char *src, char *end)
{
	for (/* void */; dst <= end; dst++) {
		*dst = *src++;
		if (!*dst)
			return dst;
	}
	/* truncation detected */
	*end = '\0';
	return dst;
}

To be used as:

end = buf + sizeof(buf) - 1;
len = stpecpy(stpecpy(buf, "Hello", end), " world", end) - buf;
if (len == sizeof(buf)) {
	len--;
	handle_trunc();
}

dst and src are the same as in strcpy(3) and all other string copy functions. end is a pointer to the end of the array (i.e., to the last element of the array pointed to by dst).

It allows chaining without needing to recalculate the length of the string after each call (improvement already present in strecopy()), and allows for an easy way to detect truncation (the function returns end + 1).

When designing it, I had some doubts if I should return NULL or end + 1, but I decided to use end + 1 for the following reasons:

  • Don't need to add a check for NULL at the beginning of the function.

  • Callers can cause undefined behavior easily if the function returns NULL: len = stpecpy(dst, src, end) - buf;. NULL - buf is Undefined Behavior.

However, it has a minor problem: the returned pointer minus the beginning of the buffer is not always the length of the string (it is len + 1 in case of truncation); that may be unintuitive for some... But I think it's still best to do that.

A test for the function:

int
main(void)
{
	ptrdiff_t size = 10;
	char      buf[size];
	char     *end;
	ptrdiff_t len;

	end = &buf[size - 1];

	len = stpecpy(stpecpy(buf, "Hello", end), " world", end) - buf;
	if (len == size) {
		len--;
		puts("Following string is truncated.");
	}
	printf("%ti: %s\n", len, buf);

	len = stpecpy(stpecpy(stpecpy(buf, "Hello", end), " foo", end), "", end) - buf;
	if (len == size) {
		len--;
		puts("Following string is truncated.");
	}
	printf("%ti: %s\n", len, buf);

	len = stpecpy(stpecpy(stpecpy(buf, "Hello", end), " baar", end), "", end) - buf;
	if (len == size) {
		len--;
		puts("Following string is truncated.");
	}
	printf("%ti: %s\n", len, buf);

	len = stpecpy(stpecpy(buf, "H", end), "W", end) - buf;
	if (len == size) {
		len--;
		puts("Following string is truncated.");
	}
	printf("%ti: %s\n", len, buf);
}

Which prints:

$ ./a.out 
Following string is truncated.
9: Hello wor
9: Hello foo
Following string is truncated.
9: Hello baa
2: HW

So, what are your thoughts about this copy function?

As a similar function, stpsecpy() could be also added, if it is needed, for copying from non-strings (character arrays), while ensuring that a true string is created in dst, but that's out of this code review.


EDIT

After applying Lundin's advice, and also extending it with useful Clang extensions (_Nonnull), the prototype of the function would be:

char *_Nonnull
stpecpy(char *_Nonnull dst, const char *_Nonnull restrict src, char *_Nonnull end);

EDIT

Add implementation in terms of simple libc functions (memccpy(3)):

char *_Nonnull
stpecpy(char *_Nonnull dst,
        char *_Nonnull restrict src,
        char *_Nonnull end)
{
	char *p;

	p = memccpy(dst, src, '\0', end + 1 - dst);
	if (p != NULL)
		return p - 1;
	/* truncation detected */
	*end = '\0';
	return end + 1;
}
History
Why does this post require moderator attention?
You might want to add some details to your flag.
Why should this post be closed?

1 comment thread

_Nonnull (4 comments)
Post
+4
−0
  • Performance-wise, I'd benchmark this vs if(memchr(src,'\0',n)==src+n) memcpy(dst, src, n); because it isn't obvious at least to me if that's faster or slower than your custom function.
  • Regarding where end is pointing, I think that's the right call since it's convenient to have a pointer to the null terminator for many reasons, such as when determining the string length or indeed some manner of "chaining".
  • Bug: no const correctness. Okay so it isn't an actual bug, but it's something with overwhelming consensus of being good practice. In this case it means that you should do const char* src.
  • Minor: No restrict correctness. As written,dst/end and src may not overlap (and you need to document this). Therefore, src can be const char* restrict. You can't restrict dst and end though since if used correctly, they point at the same buffer.
  • Aligned word copy improvements are possible. If you truly wish to write a library-quality function, it would do the copying on word-basis. This means handling misalignment at the beginning, then making an assumption of how far word copying can be done without reading out-of-bounds. Not exactly trivial - you can look at the Github project for glibc implementation of strcpy for inspiration. Note that glibc might rely on non-standard extensions however.
  • Style remark: if (!*dst) looks a bit weird and there's no obvious reason why you wouldn't write if (*dst != '\0').
History
Why does this post require moderator attention?
You might want to add some details to your flag.

3 comment threads

Readability (1 comment)
const and restrict correctness (1 comment)
Performance (1 comment)
Readability
alx‭ wrote about 2 years ago · edited about 2 years ago

Hmm, yes, !*dst is not very readable for a review, except for those used to the idiom. The equivalent (*dst == '\0') would probably be better for someone to review it.