Welcome to Software Development on Codidact!
Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.
Comments on stpecpy(): Design a better string copy function that truncates
Parent
stpecpy(): Design a better string copy function that truncates
I was directed a few days ago to a post about a string copy function, which IMO improves the commonly known string copy functions, including strlcpy(3BSD), strlcat(3BSD), and strscpy(9).
It defines a function, char *strecopy(char *dst, char *src, char *end)
, where end really means one past the end of the array, and the function always returns the pointer to the NUL byte of the dst string.
It is to be used as:
past_end = buf + sizeof(buf);
len = strecopy(strecopy(buf, "Hello", past_end), " world", past_end) - buf;
It improves chaining compared to the other mentioned functions.
However, I found some details that I think could be improved in that function. It can't detect overflow (one needs to calculate it as len != (strlen("Hello") + strlen(" world"))
).
So, I designed the following function, which I'd like to get reviewed. I named it stpecpy(); the 'p' as in stpcpy(3), since both return a pointer to the end of the string (except for truncation); the 'e' as in strecopy(), after the extra parameter for the end of the array (except that this one really means the end of the array).
char *
stpecpy(char *dst, char *src, char *end)
{
for (/* void */; dst <= end; dst++) {
*dst = *src++;
if (!*dst)
return dst;
}
/* truncation detected */
*end = '\0';
return dst;
}
To be used as:
end = buf + sizeof(buf) - 1;
len = stpecpy(stpecpy(buf, "Hello", end), " world", end) - buf;
if (len == sizeof(buf)) {
len--;
handle_trunc();
}
dst and src are the same as in strcpy(3) and all other string copy functions. end is a pointer to the end of the array (i.e., to the last element of the array pointed to by dst).
It allows chaining without needing to recalculate the length of the string after each call (improvement already present in strecopy()), and allows for an easy way to detect truncation (the function returns end + 1
).
When designing it, I had some doubts if I should return NULL
or end + 1
, but I decided to use end + 1
for the following reasons:
-
Don't need to add a check for
NULL
at the beginning of the function. -
Callers can cause undefined behavior easily if the function returns
NULL
:len = stpecpy(dst, src, end) - buf;
.NULL - buf
is Undefined Behavior.
However, it has a minor problem: the returned pointer minus the beginning of the buffer is not always the length of the string (it is len + 1
in case of truncation); that may be unintuitive for some... But I think it's still best to do that.
A test for the function:
int
main(void)
{
ptrdiff_t size = 10;
char buf[size];
char *end;
ptrdiff_t len;
end = &buf[size - 1];
len = stpecpy(stpecpy(buf, "Hello", end), " world", end) - buf;
if (len == size) {
len--;
puts("Following string is truncated.");
}
printf("%ti: %s\n", len, buf);
len = stpecpy(stpecpy(stpecpy(buf, "Hello", end), " foo", end), "", end) - buf;
if (len == size) {
len--;
puts("Following string is truncated.");
}
printf("%ti: %s\n", len, buf);
len = stpecpy(stpecpy(stpecpy(buf, "Hello", end), " baar", end), "", end) - buf;
if (len == size) {
len--;
puts("Following string is truncated.");
}
printf("%ti: %s\n", len, buf);
len = stpecpy(stpecpy(buf, "H", end), "W", end) - buf;
if (len == size) {
len--;
puts("Following string is truncated.");
}
printf("%ti: %s\n", len, buf);
}
Which prints:
$ ./a.out
Following string is truncated.
9: Hello wor
9: Hello foo
Following string is truncated.
9: Hello baa
2: HW
So, what are your thoughts about this copy function?
As a similar function, stpsecpy()
could be also added, if it is needed, for copying from non-strings (character arrays), while ensuring that a true string is created in dst, but that's out of this code review.
EDIT
After applying Lundin's advice, and also extending it with useful Clang extensions (_Nonnull
), the prototype of the function would be:
char *_Nonnull
stpecpy(char *_Nonnull dst, const char *_Nonnull restrict src, char *_Nonnull end);
EDIT
Add implementation in terms of simple libc functions (memccpy(3)):
char *_Nonnull
stpecpy(char *_Nonnull dst,
char *_Nonnull restrict src,
char *_Nonnull end)
{
char *p;
p = memccpy(dst, src, '\0', end + 1 - dst);
if (p != NULL)
return p - 1;
/* truncation detected */
*end = '\0';
return end + 1;
}
Post
The following users marked this post as Works for me:
User | Comment | Date |
---|---|---|
alx | (no comment) | Dec 1, 2022 at 23:41 |
- Performance-wise, I'd benchmark this vs
if(memchr(src,'\0',n)==src+n) memcpy(dst, src, n);
because it isn't obvious at least to me if that's faster or slower than your custom function. - Regarding where
end
is pointing, I think that's the right call since it's convenient to have a pointer to the null terminator for many reasons, such as when determining the string length or indeed some manner of "chaining". - Bug: no const correctness. Okay so it isn't an actual bug, but it's something with overwhelming consensus of being good practice. In this case it means that you should do
const char* src
. - Minor: No restrict correctness. As written,
dst
/end
andsrc
may not overlap (and you need to document this). Therefore,src
can beconst char* restrict
. You can't restrictdst
andend
though since if used correctly, they point at the same buffer. - Aligned word copy improvements are possible. If you truly wish to write a library-quality function, it would do the copying on word-basis. This means handling misalignment at the beginning, then making an assumption of how far word copying can be done without reading out-of-bounds. Not exactly trivial - you can look at the Github project for glibc implementation of
strcpy
for inspiration. Note that glibc might rely on non-standard extensions however. - Style remark:
if (!*dst)
looks a bit weird and there's no obvious reason why you wouldn't writeif (*dst != '\0')
.
1 comment thread