Welcome to Software Development on Codidact!

Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.

stpecpy(): Design a better string copy function that truncates

−1

I was directed a few days ago to a post about a string copy function, which IMO improves the commonly known string copy functions, including strlcpy(3BSD), strlcat(3BSD), and strscpy(9).

It defines a function, char *strecopy(char *dst, char *src, char *end), where end really means one past the end of the array, and the function always returns the pointer to the NUL byte of the dst string.

It is to be used as:

past_end = buf + sizeof(buf);
len = strecopy(strecopy(buf, "Hello", past_end), " world", past_end) - buf;

It improves chaining compared to the other mentioned functions.

However, I found some details that I think could be improved in that function. It can't detect overflow (one needs to calculate it as len != (strlen("Hello") + strlen(" world"))).

So, I designed the following function, which I'd like to get reviewed. I named it stpecpy(); the 'p' as in stpcpy(3), since both return a pointer to the end of the string (except for truncation); the 'e' as in strecopy(), after the extra parameter for the end of the array (except that this one really means the end of the array).

char *
stpecpy(char *dst, char *src, char *end)
{
	for (/* void */; dst <= end; dst++) {
		*dst = *src++;
		if (!*dst)
			return dst;
	}
	/* truncation detected */
	*end = '\0';
	return dst;
}

To be used as:

end = buf + sizeof(buf) - 1;
len = stpecpy(stpecpy(buf, "Hello", end), " world", end) - buf;
if (len == sizeof(buf)) {
	len--;
	handle_trunc();
}

dst and src are the same as in strcpy(3) and all other string copy functions. end is a pointer to the end of the array (i.e., to the last element of the array pointed to by dst).

It allows chaining without needing to recalculate the length of the string after each call (improvement already present in strecopy()), and allows for an easy way to detect truncation (the function returns end + 1).

When designing it, I had some doubts if I should return NULL or end + 1, but I decided to use end + 1 for the following reasons:

Don't need to add a check for NULL at the beginning of the function.
Callers can cause undefined behavior easily if the function returns NULL: len = stpecpy(dst, src, end) - buf;. NULL - buf is Undefined Behavior.

However, it has a minor problem: the returned pointer minus the beginning of the buffer is not always the length of the string (it is len + 1 in case of truncation); that may be unintuitive for some... But I think it's still best to do that.

A test for the function:

int
main(void)
{
	ptrdiff_t size = 10;
	char      buf[size];
	char     *end;
	ptrdiff_t len;

	end = &buf[size - 1];

	len = stpecpy(stpecpy(buf, "Hello", end), " world", end) - buf;
	if (len == size) {
		len--;
		puts("Following string is truncated.");
	}
	printf("%ti: %s\n", len, buf);

	len = stpecpy(stpecpy(stpecpy(buf, "Hello", end), " foo", end), "", end) - buf;
	if (len == size) {
		len--;
		puts("Following string is truncated.");
	}
	printf("%ti: %s\n", len, buf);

	len = stpecpy(stpecpy(stpecpy(buf, "Hello", end), " baar", end), "", end) - buf;
	if (len == size) {
		len--;
		puts("Following string is truncated.");
	}
	printf("%ti: %s\n", len, buf);

	len = stpecpy(stpecpy(buf, "H", end), "W", end) - buf;
	if (len == size) {
		len--;
		puts("Following string is truncated.");
	}
	printf("%ti: %s\n", len, buf);
}

Which prints:

$ ./a.out 
Following string is truncated.
9: Hello wor
9: Hello foo
Following string is truncated.
9: Hello baa
2: HW

So, what are your thoughts about this copy function?

As a similar function, stpsecpy() could be also added, if it is needed, for copying from non-strings (character arrays), while ensuring that a true string is created in dst, but that's out of this code review.

EDIT

After applying Lundin's advice, and also extending it with useful Clang extensions (_Nonnull), the prototype of the function would be:

char *_Nonnull
stpecpy(char *_Nonnull dst, const char *_Nonnull restrict src, char *_Nonnull end);

EDIT

Add implementation in terms of simple libc functions (memccpy(3)):

char *_Nonnull
stpecpy(char *_Nonnull dst,
        char *_Nonnull restrict src,
        char *_Nonnull end)
{
	char *p;

	p = memccpy(dst, src, '\0', end + 1 - dst);
	if (p != NULL)
		return p - 1;
	/* truncation detected */
	*end = '\0';
	return end + 1;
}

c string string-concatenation strcpy

posted over 3 years ago

CC BY-NC-SA 4.0

2y ago

alx‭

663 reputation 17 10 97 129

Raw

Markdown

History

is a duplicate

This question has been asked before and has already been answered. It should be marked as a duplicate.

Please enter the URL of the proposed duplicate in the details field below.

not constructive

This question cannot be answered in a way that is helpful to anyone. It's not possible to learn something from possible answers, except for the solution for the specific problem of the asker.

1 comment thread

_Nonnull (4 comments)

2 answers

Score Active Age

You are accessing this answer with a direct link, so it's being shown above all other answers regardless of its score. You can return to the normal view.

−1

After addressing @Lundin 's suggestions:

Implemented in terms of libc functions for performance.
const correctness
style improvements

I have a few more that I realized after trying to replace some existing code:

Much code out there has char *end = buf + nitems(buf); (although that should really be called past_end). That's certainly easier to calculate than the real end that I used here, since you don't need to stick that -1 all around, and therefore has less chance of an accident.

It's more readable at call site (due to alignment) when the past_end pointer is the second parameter to the function.

So, I tweaked it to have the following definition:

char *
stpecpy(char *dst, char past_end[0], const char *restrict src)
{
	char *p;

	if (dst == past_end)
		return past_end;

	p = memccpy(dst, src, '\0', past_end - dst);
	if (p != NULL)
		return p - 1;

	/* truncation detected */
	past_end[-1] = '\0';
	return past_end;
}

To be clear, the return pointer is still the same: the pointer to the NUL byte on success, or one-past-the-array on truncation. It's only the last parameter that has changed to be a pointer one-past-the-array, which is simpler to get for users.

The name of the function stpecpy() seems to also have nice mnemonics for past_end.

Now the use is slightly simpler:

int
main(void)
{
	ptrdiff_t  size = 10;
	char       buf[size];
	char       *past_end, *p;
	ptrdiff_t  len;

	past_end = &buf[size];

	p = buf;
	p = stpecpy(p, past_end, "Hello");
	p = stpecpy(p, past_end, " world");
	if (p == past_end) {
		p--;
		puts("Following string is truncated.");
	}
	len = p - buf;
	printf("%ti: %s\n", len, buf);

	p = buf;
	p = stpecpy(p, past_end, "Hello");
	p = stpecpy(p, past_end, " foo");
	p = stpecpy(p, past_end, "");
	if (p == past_end) {
		p--;
		puts("Following string is truncated.");
	}
	len = p - buf;
	printf("%ti: %s\n", len, buf);

	p = buf;
	p = stpecpy(p, past_end, "Hello");
	p = stpecpy(p, past_end, " baar");
	p = stpecpy(p, past_end, "");
	if (p == past_end) {
		p--;
		puts("Following string is truncated.");
	}
	len = p - buf;
	printf("%ti: %s\n", len, buf);

	p = buf;
	p = stpecpy(p, past_end, "H");
	p = stpecpy(p, past_end, "W");
	if (p == past_end) {
		p--;
		puts("Following string is truncated.");
	}
	len = p - buf;
	printf("%ti: %s\n", len, buf);
}

Which prints:

$ ./a.out 
Following string is truncated.
9: Hello wor
9: Hello foo
Following string is truncated.
9: Hello baa
2: HW

posted over 2 years ago

CC BY-NC-SA 4.0

2y ago

alx‭

663 reputation 17 10 97 129

Copy Link

Raw

Markdown

History

0 comment threads

−0

Worked for alx‭

The following users marked this post as Works for me:

User	Comment	Date
alx‭	(no comment)	Dec 1, 2022 at 23:41

Performance-wise, I'd benchmark this vs if(memchr(src,'\0',n)==src+n) memcpy(dst, src, n); because it isn't obvious at least to me if that's faster or slower than your custom function.
Regarding where end is pointing, I think that's the right call since it's convenient to have a pointer to the null terminator for many reasons, such as when determining the string length or indeed some manner of "chaining".
Bug: no const correctness. Okay so it isn't an actual bug, but it's something with overwhelming consensus of being good practice. In this case it means that you should do const char* src.
Minor: No restrict correctness. As written,dst/end and src may not overlap (and you need to document this). Therefore, src can be const char* restrict. You can't restrict dst and end though since if used correctly, they point at the same buffer.
Aligned word copy improvements are possible. If you truly wish to write a library-quality function, it would do the copying on word-basis. This means handling misalignment at the beginning, then making an assumption of how far word copying can be done without reading out-of-bounds. Not exactly trivial - you can look at the Github project for glibc implementation of strcpy for inspiration. Note that glibc might rely on non-standard extensions however.
Style remark: if (!*dst) looks a bit weird and there's no obvious reason why you wouldn't write if (*dst != '\0').

posted over 3 years ago

CC BY-NC-SA 4.0

Lundin‭

9429 reputation 40 128 1088 177

Copy Link

Raw

Markdown

History

3 comment threads

Readability (1 comment)

const and restrict correctness (1 comment)

Performance (1 comment)

Communities

stpecpy(): Design a better string copy function that truncates

1 comment thread

2 answers

0 comment threads

3 comment threads