Welcome to Software Development on Codidact!

Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.

Post History

77%

+5 −0

Q&A How can I manage multiple consecutive strings in a buffer (and add more later)?

When looking at this, we might pretty soon note that storing strings in the same buffer by using null terminators as separator is quite clunky. It blocks us from using handy functions like strtok, ...

posted 1y ago by Lundin‭ · edited 1y ago by Lundin‭

Answer

#2: Post edited by

Lundin‭ · 2024-04-30T11:57:47Z (about 1 year ago)

Copy Link

Raw

Markdown

When looking at this, we might pretty soon note that storing strings in the same buffer by using null terminators as separator is quite clunky. It blocks us from using handy functions like `strtok`, `bsearch` or `qsort`. And there's no obvious way to tell where all of it ends. To know where it ends, we have to keep track of the used size in bytes separately.
On the positive side, this sort of allocation is both fast and cache-friendly, so in raw performance it will easily beat anything based on a pointer table with `malloc`/`strdup`. Generally we should pick readability/maintainability over such micro-optimization considerations, however.
Most commonly, arrays of strings are accessed through a look-up table formed through a separate array of pointers, `char* str[n]`. That's a convenient format, flexible format and enables `bsearch`/`qsort` on the pointer table itself. We could have these pointers point at dynamically allocated strings, to read-only string literals (in which case `const char*` should be used) or we could point them into this pre-allocated buffer.
With the pre-allocated buffer method, we can also start counting the used size at the same time as we initialize the pointers. Example:
Example:
```c
#include <stdio.h>
#include <string.h>
#define BUFFER_SIZE 80
#define MAX_STRINGS_N 10
int main()
{
char buffer[BUFFER_SIZE] = {'o', 'n', 'e', '\0', 't', 'w', 'o', '\0'};
size_t used_size = 0;
size_t strings_allocated = 2;
char* str[MAX_STRINGS_N];
/* initialize the pointers */
char* next = buffer;
for(size_t i=0; i<strings_allocated; i++)
{
size_t next_size = strlen(next) + 1;
used_size += next_size;
str[i] = next;
next += next_size;
printf("%s (total size: %zu)\n", str[i], used_size);
}
}
```
---
As for how to add new strings to this buffer, it kind of depends on where they are coming from. Strings taken as input from `stdin` or command line arguments ought to be sanitized before we use them in our program, but that's another story. Let us assume they are proper, sanitized C strings. Then we need not worry about using them and we then have some alternatives for copying them:
- The most obvious choice for copying a string is `strcpy`. This looks for the null terminator during copy so we need not know the size of the string in advance. Is also adds a null terminator to the end of the copied string.
- But in this case we do want to know the size of the new string before we add it to the buffer. Or otherwise we can't check for overflow. So we want to call `strlen` on the new string and check if there is room before we copy anything.
- Note: we need to copy the _size_ of the new string, not the _length_. Size meaning string length + 1 for the null terminator. The new string must be null terminated or it is not a C string. But if we copy the _size_ of the new string, that includes copying the null terminator.
- And once the size of a string is known, we may as well use `memcpy`, for an itty bit of a performance boost over `strcpy`, as the former doesn't check for null termination.
- With a new compiler, we can also use the new `memccpy` function from C23 ([What is C23 and why should I care?](https://software.codidact.com/posts/289414)). This can even be used on non-santized data as it comes with a fixed size as input but can be told to stop looking once we find a null terminator.
Conclusion: either `strcpy`, `memcpy` or `memccpy` are fine. In the example below I went with `memccpy` just because this is a new function in standard C and not everyone is familiar with it yet.
If we for whatever reason wished to copy raw unsanitized data, we could have used non-standard `strcpy_s` or `strlcpy`. These works just like `memccpy` (or the dangerous, obsolete `strncpy`) but explicitly add a null terminator to the end of the new string. See [Is strcpy dangerous and what should be used instead?](https://software.codidact.com/posts/281518)
---
```c
/* add new strings */
char new_str[] = "three"; // a new string from somewhere
next_size = strlen(new_str)+1;
if(used_size + next_size > BUFFER_SIZE)
{ /* some manner of error handling here */
fprintf(stderr, "String buffer full.");
exit(EXIT_FAILURE);
}
/*
Since next from the previous example is equivalent to &buffer[used_size],
either could be used here.
Copy the string using any of the functions previously mentioned:
*/
memccpy(next, new_str, '\0', next_size);
str[strings_allocated] = next;
used_size += next_size;
printf("%s (total size: %zu)\n", str[strings_allocated], used_size);
strings_allocated++;
```
---
But hold on, why all of this fuzz regarding adding null terminators... what does the initialization
char buffer[80] = {'o', 'n', 'e', '\0', 't', 'w', 'o', '\0'};
actually mean, more precisely? If we are attentive here, that's a buffer of 80 bytes but we only initialized 8 explicitly. C does actually guarantee that the rest of them are set to zeroes. In the current C17 standard 6.7.9 §21:
> If there are fewer initializers in a brace-enclosed list than there are elements or members of an
aggregate, or fewer characters in a string literal used to initialize an array of known size than there
are elements in the array, the remainder of the aggregate shall be initialized implicitly the same as
objects that have static storage duration.
Brace-enclosed list meaning `{}`, "aggregate" being standardese for array or struct, and "same as objects that have static storage duration" referring to a previous part of the same chapter, C17 6.7.9 §10:
> If an object that has static or thread storage duration is not initialized explicitly, then:
> /--/
> - if it has arithmetic type, it is initialized to (positive or unsigned) zero
In plain English, C guarantees that after our initial 8 bytes of data, there are 72 zeroes following. So we needn't actually worry about copy the null terminator, it turns out. Though doing so explicitly is of course best practice and relying on the zero-initialization would have been both sloppy and dangerous.

When looking at this, we might pretty soon note that storing strings in the same buffer by using null terminators as separator is quite clunky. It blocks us from using handy functions like `strtok`, `bsearch` or `qsort`. And there's no obvious way to tell where all of it ends. To know where it ends, we have to keep track of the used size in bytes separately.
On the positive side, this sort of allocation is both fast and cache-friendly, so in raw performance it will easily beat anything based on a pointer table with `malloc`/`strdup`. Generally we should pick readability/maintainability over such micro-optimization considerations, however.
Most commonly, arrays of strings are accessed through a look-up table formed through a separate array of pointers, `char* str[n]`. That's a convenient, flexible format and enables `bsearch`/`qsort` on the pointer table itself. We could have these pointers point at dynamically allocated strings, to read-only string literals (in which case `const char*` should be used) or we could point them into this pre-allocated buffer.
With the pre-allocated buffer method, we can also start counting the used size at the same time as we initialize the pointers. Example:
Example:
```c
#include <stdio.h>
#include <string.h>
#define BUFFER_SIZE 80
#define MAX_STRINGS_N 10
int main()
{
char buffer[BUFFER_SIZE] = {'o', 'n', 'e', '\0', 't', 'w', 'o', '\0'};
size_t used_size = 0;
size_t strings_allocated = 2;
char* str[MAX_STRINGS_N];
/* initialize the pointers */
char* next = buffer;
for(size_t i=0; i<strings_allocated; i++)
{
size_t next_size = strlen(next) + 1;
used_size += next_size;
str[i] = next;
next += next_size;
printf("%s (total size: %zu)\n", str[i], used_size);
}
}
```
---
As for how to add new strings to this buffer, it kind of depends on where they are coming from. Strings taken as input from `stdin` or command line arguments ought to be sanitized before we use them in our program, but that's another story. Let us assume they are proper, sanitized C strings. Then we need not worry about using them and we then have some alternatives for copying them:
- The most obvious choice for copying a string is `strcpy`. This looks for the null terminator during copy so we need not know the size of the string in advance. Is also adds a null terminator to the end of the copied string.
- But in this case we do want to know the size of the new string before we add it to the buffer. Or otherwise we can't check for overflow. So we want to call `strlen` on the new string and check if there is room before we copy anything.
- Note: we need to copy the _size_ of the new string, not the _length_. Size meaning string length + 1 for the null terminator. The new string must be null terminated or it is not a C string. But if we copy the _size_ of the new string, that includes copying the null terminator.
- And once the size of a string is known, we may as well use `memcpy`, for an itty bit of a performance boost over `strcpy`, as the former doesn't check for null termination.
- With a new compiler, we can also use the new `memccpy` function from C23 ([What is C23 and why should I care?](https://software.codidact.com/posts/289414)). This can even be used on non-santized data as it comes with a fixed size as input but can be told to stop looking once we find a null terminator.
Conclusion: either `strcpy`, `memcpy` or `memccpy` are fine. In the example below I went with `memccpy` just because this is a new function in standard C and not everyone is familiar with it yet.
If we for whatever reason wished to copy raw unsanitized data, we could have used non-standard `strcpy_s` or `strlcpy`. These works just like `memccpy` (or the dangerous, obsolete `strncpy`) but explicitly add a null terminator to the end of the new string. See [Is strcpy dangerous and what should be used instead?](https://software.codidact.com/posts/281518)
---
```c
/* add new strings */
char new_str[] = "three"; // a new string from somewhere
next_size = strlen(new_str)+1;
if(used_size + next_size > BUFFER_SIZE)
{ /* some manner of error handling here */
fprintf(stderr, "String buffer full.");
exit(EXIT_FAILURE);
}
/*
Since next from the previous example is equivalent to &buffer[used_size],
either could be used here.
Copy the string using any of the functions previously mentioned:
*/
memccpy(next, new_str, '\0', next_size);
str[strings_allocated] = next;
used_size += next_size;
printf("%s (total size: %zu)\n", str[strings_allocated], used_size);
strings_allocated++;
```
---
But hold on, why all of this fuzz regarding adding null terminators... what does the initialization
char buffer[80] = {'o', 'n', 'e', '\0', 't', 'w', 'o', '\0'};
actually mean, more precisely? If we are attentive here, that's a buffer of 80 bytes but we only initialized 8 explicitly. C does actually guarantee that the rest of them are set to zeroes. In the current C17 standard 6.7.9 §21:
> If there are fewer initializers in a brace-enclosed list than there are elements or members of an
aggregate, or fewer characters in a string literal used to initialize an array of known size than there
are elements in the array, the remainder of the aggregate shall be initialized implicitly the same as
objects that have static storage duration.
Brace-enclosed list meaning `{}`, "aggregate" being standardese for array or struct, and "same as objects that have static storage duration" referring to a previous part of the same chapter, C17 6.7.9 §10:
> If an object that has static or thread storage duration is not initialized explicitly, then:
> /--/
> - if it has arithmetic type, it is initialized to (positive or unsigned) zero
In plain English, C guarantees that after our initial 8 bytes of data, there are 72 zeroes following. So we needn't actually worry about copy the null terminator, it turns out. Though doing so explicitly is of course best practice and relying on the zero-initialization would have been both sloppy and dangerous.

#1: Initial revision by

Lundin‭ · 2024-03-22T10:15:47Z (about 1 year ago)

Copy Link

Raw

Markdown

When looking at this, we might pretty soon note that storing strings in the same buffer by using null terminators as separator is quite clunky. It blocks us from using handy functions like `strtok`, `bsearch` or `qsort`. And there's no obvious way to tell where all of it ends. To know where it ends, we have to keep track of the used size in bytes separately.

On the positive side, this sort of allocation is both fast and cache-friendly, so in raw performance it will easily beat anything based on a pointer table with `malloc`/`strdup`. Generally we should pick readability/maintainability over such micro-optimization considerations, however.

Most commonly, arrays of strings are accessed through a look-up table formed through a separate array of pointers, `char* str[n]`. That's a convenient format, flexible format and enables `bsearch`/`qsort` on the pointer table itself. We could have these pointers point at dynamically allocated strings, to read-only string literals (in which case `const char*` should be used) or we could point them into this pre-allocated buffer.

With the pre-allocated buffer method, we can also start counting the used size at the same time as we initialize the pointers. Example:

Example:

```c
#include <stdio.h>
#include <string.h>

#define BUFFER_SIZE    80
#define MAX_STRINGS_N  10

int main() 
{
  char buffer[BUFFER_SIZE] = {'o', 'n', 'e', '\0', 't', 'w', 'o', '\0'};
  size_t used_size = 0;
  size_t strings_allocated = 2;
  char* str[MAX_STRINGS_N];
  
  /* initialize the pointers */
  char* next = buffer;
  for(size_t i=0; i<strings_allocated; i++)
  {
    size_t next_size = strlen(next) + 1;
    used_size += next_size;
    str[i] = next;
    next += next_size;

    printf("%s (total size: %zu)\n", str[i], used_size);
  }
}
```

---

As for how to add new strings to this buffer, it kind of depends on where they are coming from. Strings taken as input from `stdin` or command line arguments ought to be sanitized before we use them in our program, but that's another story. Let us assume they are proper, sanitized C strings. Then we need not worry about using them and we then have some alternatives for copying them:

- The most obvious choice for copying a string is `strcpy`. This looks for the null terminator during copy so we need not know the size of the string in advance. Is also adds a null terminator to the end of the copied string.
- But in this case we do want to know the size of the new string before we add it to the buffer. Or otherwise we can't check for overflow. So we want to call `strlen` on the new string and check if there is room before we copy anything.
- Note: we need to copy the _size_ of the new string, not the _length_. Size meaning string length + 1 for the null terminator. The new string must be null terminated or it is not a C string. But if we copy the _size_ of the new string, that includes copying the null terminator.
- And once the size of a string is known, we may as well use `memcpy`, for an itty bit of a performance boost over `strcpy`, as the former doesn't check for null termination.
- With a new compiler, we can also use the new `memccpy` function from C23 ([What is C23 and why should I care?](https://software.codidact.com/posts/289414)). This can even be used on non-santized data as it comes with a fixed size as input but can be told to stop looking once we find a null terminator.

Conclusion: either `strcpy`, `memcpy` or `memccpy` are fine. In the example below I went with `memccpy` just because this is a new function in standard C and not everyone is familiar with it yet.

If we for whatever reason wished to copy raw unsanitized data, we could have used non-standard `strcpy_s` or `strlcpy`. These works just like `memccpy` (or the dangerous, obsolete `strncpy`) but explicitly add a null terminator to the end of the new string. See [Is strcpy dangerous and what should be used instead?](https://software.codidact.com/posts/281518)

---

```c
  /* add new strings */
  char new_str[] = "three"; // a new string from somewhere
  next_size = strlen(new_str)+1;

  if(used_size + next_size > BUFFER_SIZE)
  { /* some manner of error handling here */
    fprintf(stderr, "String buffer full.");
    exit(EXIT_FAILURE);
  }

  /* 
    Since next from the previous example is equivalent to &buffer[used_size], 
    either could be used here.
    Copy the string using any of the functions previously mentioned:
  */
  memccpy(next, new_str, '\0', next_size); 
  str[strings_allocated] = next;
  used_size += next_size;
  printf("%s (total size: %zu)\n", str[strings_allocated], used_size);
  strings_allocated++;
```

---

But hold on, why all of this fuzz regarding adding null terminators... what does the initialization

    char buffer[80] = {'o', 'n', 'e', '\0', 't', 'w', 'o', '\0'};

actually mean, more precisely? If we are attentive here, that's a buffer of 80 bytes but we only initialized 8 explicitly. C does actually guarantee that the rest of them are set to zeroes. In the current C17 standard 6.7.9 §21:

> If there are fewer initializers in a brace-enclosed list than there are elements or members of an
aggregate, or fewer characters in a string literal used to initialize an array of known size than there
are elements in the array, the remainder of the aggregate shall be initialized implicitly the same as
objects that have static storage duration.

Brace-enclosed list meaning `{}`, "aggregate" being standardese for array or struct, and "same as objects that have static storage duration" referring to a previous part of the same chapter, C17 6.7.9 §10:

> If an object that has static or thread storage duration is not initialized explicitly, then:
> /--/  
> - if it has arithmetic type, it is initialized to (positive or unsigned) zero

In plain English, C guarantees that after our initial 8 bytes of data, there are 72 zeroes following. So we needn't actually worry about copy the null terminator, it turns out. Though doing so explicitly is of course best practice and relying on the zero-initialization would have been both sloppy and dangerous.

Communities

Post History