Communities

Writing
Writing
Codidact Meta
Codidact Meta
The Great Outdoors
The Great Outdoors
Photography & Video
Photography & Video
Scientific Speculation
Scientific Speculation
Cooking
Cooking
Electrical Engineering
Electrical Engineering
Judaism
Judaism
Languages & Linguistics
Languages & Linguistics
Software Development
Software Development
Mathematics
Mathematics
Christianity
Christianity
Code Golf
Code Golf
Music
Music
Physics
Physics
Linux Systems
Linux Systems
Power Users
Power Users
Tabletop RPGs
Tabletop RPGs
Community Proposals
Community Proposals
tag:snake search within a tag
answers:0 unanswered questions
user:xxxx search by author id
score:0.5 posts with 0.5+ score
"snake oil" exact phrase
votes:4 posts with 4+ votes
created:<1w created < 1 week ago
post_type:xxxx type of post
Search help
Notifications
Mark all as read See all your notifications »
Q&A

Welcome to Software Development on Codidact!

Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.

How can I manage multiple consecutive strings in a buffer (and add more later)?

+2
−0

This question is inspired by If I have a char array containing strings with a null byte (\0) terminating each string, how would I add another string onto the end? on Stack Overflow.

Suppose I have a char[] buffer that I'm using to represent multiple null-terminated (ASCII) strings, one after the other. I can easily set up an initial state that has two strings and sufficient room to add a third:

/* The exact amount of space is not critical to the question; it's enough
   to store these strings and leave room for more. */
char buffer[80] = {'o', 'n', 'e', '\0', 't', 'w', 'o', '\0'};

Now suppose I have char* another_string = "three";. How can I append or concatenate another_string to the buffer, generally? I do not want to concatenate the three text with the two, but instead put it in the buffer as a separate string.

I already know that the <string.h> library functions expect a string to be null-terminated, so it seems like they won't help here. For example, strcat would find the first null in the array instead of the second, and overwrite it; and strncpy would need a pointer to where to start writing.

History
Why does this post require moderator attention?
You might want to add some details to your flag.
Why should this post be closed?

0 comment threads

2 answers

+3
−0

The fundamental problem here is that it is already ambiguous where the "end" of the data in the buffer is. Strings can be empty (have zero length as reported by strlen); as such, buffer could equally well be interpreted as containing three strings, where the last is empty. Or more than that - up to what the buffer can hold.

The situation is even worse if we start with uninitialized memory; then there's no way to tell whether the byte after the last intentionally-written null is just uninitialized garbage, or the start of another actual string.

If we don't need to be able to store empty strings, one way around the problem is to mimic how null-termination works, but at a string level rather than a byte level. That is to say, we can establish a convention that the sequence of strings is "empty-string-terminated", and use strlen repeatedly to search for this string. That will tell us where to copy the new string.

However, it will be both simpler and more flexible to just remember where the end of the string sequence is, and update it whenever another string is added. For example, we could do this using an integer index:

/* the lengths of the two initial strings and their null terminators */
int used = 8;
int usable = sizeof(buffer);
strncpy(buffer + used, another_string, usable - used);
buffer[usable - 1] = '\0';
used += strlen(another_string) + 1;
if (used > usable) used = usable;

This code takes care of a few important issues. Note the pointer arithmetic: buffer decays to a pointer to the start of the array, so buffer + used is the desired destination pointer. We need to restrict strncpy to the amount of space that remains in the buffer - between buffer + used and the end of the buffer - to avoid writing beyond the end of the array. Note that strncpy avoids writing more than the declared amount of room, but does not null-terminate if it reaches that limit. To avoid ending up with non-null-terminated data at the end of the array, we can just unconditionally add a null to the last spot in the buffer each time, as shown. (A more sophisticated approach might detect this situation and report an error somehow.) After writing, we need to update the record of how much space is used. (When the buffer is full, used will be limited to the array length; future attempts at strncpy will see that zero bytes are available.)

Also keep in mind that a representation like this is not convenient for modifying the strings later. In particular, anything that tries to change the length of a string that isn't at the end of the sequence, will cause a major headache - because every other string after it will need to be shifted around to make room or close a gap. (This is the same reason that you can't easily modify a single line of a text file "in place".)

History
Why does this post require moderator attention?
You might want to add some details to your flag.

1 comment thread

strncpy (4 comments)
+3
−0

When looking at this, we might pretty soon note that storing strings in the same buffer by using null terminators as separator is quite clunky. It blocks us from using handy functions like strtok, bsearch or qsort. And there's no obvious way to tell where all of it ends. To know where it ends, we have to keep track of the used size in bytes separately.

On the positive side, this sort of allocation is both fast and cache-friendly, so in raw performance it will easily beat anything based on a pointer table with malloc/strdup. Generally we should pick readability/maintainability over such micro-optimization considerations, however.

Most commonly, arrays of strings are accessed through a look-up table formed through a separate array of pointers, char* str[n]. That's a convenient format, flexible format and enables bsearch/qsort on the pointer table itself. We could have these pointers point at dynamically allocated strings, to read-only string literals (in which case const char* should be used) or we could point them into this pre-allocated buffer.

With the pre-allocated buffer method, we can also start counting the used size at the same time as we initialize the pointers. Example:

Example:

#include <stdio.h>
#include <string.h>

#define BUFFER_SIZE    80
#define MAX_STRINGS_N  10

int main() 
{
  char buffer[BUFFER_SIZE] = {'o', 'n', 'e', '\0', 't', 'w', 'o', '\0'};
  size_t used_size = 0;
  size_t strings_allocated = 2;
  char* str[MAX_STRINGS_N];
  
  /* initialize the pointers */
  char* next = buffer;
  for(size_t i=0; i<strings_allocated; i++)
  {
    size_t next_size = strlen(next) + 1;
    used_size += next_size;
    str[i] = next;
    next += next_size;

    printf("%s (total size: %zu)\n", str[i], used_size);
  }
}

As for how to add new strings to this buffer, it kind of depends on where they are coming from. Strings taken as input from stdin or command line arguments ought to be sanitized before we use them in our program, but that's another story. Let us assume they are proper, sanitized C strings. Then we need not worry about using them and we then have some alternatives for copying them:

  • The most obvious choice for copying a string is strcpy. This looks for the null terminator during copy so we need not know the size of the string in advance. Is also adds a null terminator to the end of the copied string.
  • But in this case we do want to know the size of the new string before we add it to the buffer. Or otherwise we can't check for overflow. So we want to call strlen on the new string and check if there is room before we copy anything.
  • Note: we need to copy the size of the new string, not the length. Size meaning string length + 1 for the null terminator. The new string must be null terminated or it is not a C string. But if we copy the size of the new string, that includes copying the null terminator.
  • And once the size of a string is known, we may as well use memcpy, for an itty bit of a performance boost over strcpy, as the former doesn't check for null termination.
  • With a new compiler, we can also use the new memccpy function from C23 (What is C23 and why should I care?). This can even be used on non-santized data as it comes with a fixed size as input but can be told to stop looking once we find a null terminator.

Conclusion: either strcpy, memcpy or memccpy are fine. In the example below I went with memccpy just because this is a new function in standard C and not everyone is familiar with it yet.

If we for whatever reason wished to copy raw unsanitized data, we could have used non-standard strcpy_s or strlcpy. These works just like memccpy (or the dangerous, obsolete strncpy) but explicitly add a null terminator to the end of the new string. See Is strcpy dangerous and what should be used instead?


  /* add new strings */
  char new_str[] = "three"; // a new string from somewhere
  next_size = strlen(new_str)+1;

  if(used_size + next_size > BUFFER_SIZE)
  { /* some manner of error handling here */
    fprintf(stderr, "String buffer full.");
    exit(EXIT_FAILURE);
  }

  /* 
    Since next from the previous example is equivalent to &buffer[used_size], 
    either could be used here.
    Copy the string using any of the functions previously mentioned:
  */
  memccpy(next, new_str, '\0', next_size); 
  str[strings_allocated] = next;
  used_size += next_size;
  printf("%s (total size: %zu)\n", str[strings_allocated], used_size);
  strings_allocated++;

But hold on, why all of this fuzz regarding adding null terminators... what does the initialization

char buffer[80] = {'o', 'n', 'e', '\0', 't', 'w', 'o', '\0'};

actually mean, more precisely? If we are attentive here, that's a buffer of 80 bytes but we only initialized 8 explicitly. C does actually guarantee that the rest of them are set to zeroes. In the current C17 standard 6.7.9 §21:

If there are fewer initializers in a brace-enclosed list than there are elements or members of an aggregate, or fewer characters in a string literal used to initialize an array of known size than there are elements in the array, the remainder of the aggregate shall be initialized implicitly the same as objects that have static storage duration.

Brace-enclosed list meaning {}, "aggregate" being standardese for array or struct, and "same as objects that have static storage duration" referring to a previous part of the same chapter, C17 6.7.9 §10:

If an object that has static or thread storage duration is not initialized explicitly, then: /--/

  • if it has arithmetic type, it is initialized to (positive or unsigned) zero

In plain English, C guarantees that after our initial 8 bytes of data, there are 72 zeroes following. So we needn't actually worry about copy the null terminator, it turns out. Though doing so explicitly is of course best practice and relying on the zero-initialization would have been both sloppy and dangerous.

History
Why does this post require moderator attention?
You might want to add some details to your flag.

0 comment threads

Sign up to answer this question »