Welcome to Software Development on Codidact!
Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.
How can I manage multiple consecutive strings in a buffer (and add more later)?
This question is inspired by If I have a char array containing strings with a null byte (\0) terminating each string, how would I add another string onto the end? on Stack Overflow.
Suppose I have a char[]
buffer that I'm using to represent multiple null-terminated (ASCII) strings, one after the other. I can easily set up an initial state that has two strings and sufficient room to add a third:
/* The exact amount of space is not critical to the question; it's enough
to store these strings and leave room for more. */
char buffer[80] = {'o', 'n', 'e', '\0', 't', 'w', 'o', '\0'};
Now suppose I have char* another_string = "three";
. How can I append or concatenate another_string
to the buffer
, generally? I do not want to concatenate the three
text with the two
, but instead put it in the buffer as a separate string.
I already know that the <string.h>
library functions expect a string to be null-terminated, so it seems like they won't help here. For example, strcat
would find the first null in the array instead of the second, and overwrite it; and strncpy
would need a pointer to where to start writing.
2 answers
The fundamental problem here is that it is already ambiguous where the "end" of the data in the buffer is. Strings can be empty (have zero length as reported by strlen
); as such, buffer
could equally well be interpreted as containing three strings, where the last is empty. Or more than that - up to what the buffer can hold.
The situation is even worse if we start with uninitialized memory; then there's no way to tell whether the byte after the last intentionally-written null is just uninitialized garbage, or the start of another actual string.
If we don't need to be able to store empty strings, one way around the problem is to mimic how null-termination works, but at a string level rather than a byte level. That is to say, we can establish a convention that the sequence of strings is "empty-string-terminated", and use strlen
repeatedly to search for this string. That will tell us where to copy the new string.
However, it will be both simpler and more flexible to just remember where the end of the string sequence is, and update it whenever another string is added. For example, we could do this using an integer index:
/* the lengths of the two initial strings and their null terminators */
int used = 8;
int usable = sizeof(buffer);
strncpy(buffer + used, another_string, usable - used);
buffer[usable - 1] = '\0';
used += strlen(another_string) + 1;
if (used > usable) used = usable;
This code takes care of a few important issues. Note the pointer arithmetic: buffer
decays to a pointer to the start of the array, so buffer + used
is the desired destination pointer. We need to restrict strncpy
to the amount of space that remains in the buffer - between buffer + used
and the end of the buffer - to avoid writing beyond the end of the array. Note that strncpy
avoids writing more than the declared amount of room, but does not null-terminate if it reaches that limit. To avoid ending up with non-null-terminated data at the end of the array, we can just unconditionally add a null to the last spot in the buffer each time, as shown. (A more sophisticated approach might detect this situation and report an error somehow.) After writing, we need to update the record of how much space is used. (When the buffer is full, used
will be limited to the array length; future attempts at strncpy
will see that zero bytes are available.)
Also keep in mind that a representation like this is not convenient for modifying the strings later. In particular, anything that tries to change the length of a string that isn't at the end of the sequence, will cause a major headache - because every other string after it will need to be shifted around to make room or close a gap. (This is the same reason that you can't easily modify a single line of a text file "in place".)
When looking at this, we might pretty soon note that storing strings in the same buffer by using null terminators as separator is quite clunky. It blocks us from using handy functions like strtok
, bsearch
or qsort
. And there's no obvious way to tell where all of it ends. To know where it ends, we have to keep track of the used size in bytes separately.
On the positive side, this sort of allocation is both fast and cache-friendly, so in raw performance it will easily beat anything based on a pointer table with malloc
/strdup
. Generally we should pick readability/maintainability over such micro-optimization considerations, however.
Most commonly, arrays of strings are accessed through a look-up table formed through a separate array of pointers, char* str[n]
. That's a convenient, flexible format and enables bsearch
/qsort
on the pointer table itself. We could have these pointers point at dynamically allocated strings, to read-only string literals (in which case const char*
should be used) or we could point them into this pre-allocated buffer.
With the pre-allocated buffer method, we can also start counting the used size at the same time as we initialize the pointers. Example:
Example:
#include <stdio.h>
#include <string.h>
#define BUFFER_SIZE 80
#define MAX_STRINGS_N 10
int main()
{
char buffer[BUFFER_SIZE] = {'o', 'n', 'e', '\0', 't', 'w', 'o', '\0'};
size_t used_size = 0;
size_t strings_allocated = 2;
char* str[MAX_STRINGS_N];
/* initialize the pointers */
char* next = buffer;
for(size_t i=0; i<strings_allocated; i++)
{
size_t next_size = strlen(next) + 1;
used_size += next_size;
str[i] = next;
next += next_size;
printf("%s (total size: %zu)\n", str[i], used_size);
}
}
As for how to add new strings to this buffer, it kind of depends on where they are coming from. Strings taken as input from stdin
or command line arguments ought to be sanitized before we use them in our program, but that's another story. Let us assume they are proper, sanitized C strings. Then we need not worry about using them and we then have some alternatives for copying them:
- The most obvious choice for copying a string is
strcpy
. This looks for the null terminator during copy so we need not know the size of the string in advance. Is also adds a null terminator to the end of the copied string. - But in this case we do want to know the size of the new string before we add it to the buffer. Or otherwise we can't check for overflow. So we want to call
strlen
on the new string and check if there is room before we copy anything. - Note: we need to copy the size of the new string, not the length. Size meaning string length + 1 for the null terminator. The new string must be null terminated or it is not a C string. But if we copy the size of the new string, that includes copying the null terminator.
- And once the size of a string is known, we may as well use
memcpy
, for an itty bit of a performance boost overstrcpy
, as the former doesn't check for null termination. - With a new compiler, we can also use the new
memccpy
function from C23 (What is C23 and why should I care?). This can even be used on non-santized data as it comes with a fixed size as input but can be told to stop looking once we find a null terminator.
Conclusion: either strcpy
, memcpy
or memccpy
are fine. In the example below I went with memccpy
just because this is a new function in standard C and not everyone is familiar with it yet.
If we for whatever reason wished to copy raw unsanitized data, we could have used non-standard strcpy_s
or strlcpy
. These works just like memccpy
(or the dangerous, obsolete strncpy
) but explicitly add a null terminator to the end of the new string. See Is strcpy dangerous and what should be used instead?
/* add new strings */
char new_str[] = "three"; // a new string from somewhere
next_size = strlen(new_str)+1;
if(used_size + next_size > BUFFER_SIZE)
{ /* some manner of error handling here */
fprintf(stderr, "String buffer full.");
exit(EXIT_FAILURE);
}
/*
Since next from the previous example is equivalent to &buffer[used_size],
either could be used here.
Copy the string using any of the functions previously mentioned:
*/
memccpy(next, new_str, '\0', next_size);
str[strings_allocated] = next;
used_size += next_size;
printf("%s (total size: %zu)\n", str[strings_allocated], used_size);
strings_allocated++;
But hold on, why all of this fuzz regarding adding null terminators... what does the initialization
char buffer[80] = {'o', 'n', 'e', '\0', 't', 'w', 'o', '\0'};
actually mean, more precisely? If we are attentive here, that's a buffer of 80 bytes but we only initialized 8 explicitly. C does actually guarantee that the rest of them are set to zeroes. In the current C17 standard 6.7.9 §21:
If there are fewer initializers in a brace-enclosed list than there are elements or members of an aggregate, or fewer characters in a string literal used to initialize an array of known size than there are elements in the array, the remainder of the aggregate shall be initialized implicitly the same as objects that have static storage duration.
Brace-enclosed list meaning {}
, "aggregate" being standardese for array or struct, and "same as objects that have static storage duration" referring to a previous part of the same chapter, C17 6.7.9 §10:
If an object that has static or thread storage duration is not initialized explicitly, then: /--/
- if it has arithmetic type, it is initialized to (positive or unsigned) zero
In plain English, C guarantees that after our initial 8 bytes of data, there are 72 zeroes following. So we needn't actually worry about copy the null terminator, it turns out. Though doing so explicitly is of course best practice and relying on the zero-initialization would have been both sloppy and dangerous.
1 comment thread