array of arrays vs array of pointers to store array of string literals
Let's consider the following code:
const char a[][4] = {"aa", "aaa"};
const char *b[] = {"bb", "bbb"};
const char *const c[] = {"cc", "ccc"};
For shared libraries, both b
and c
arrays require the array of pointers to be generated at runtime, which implies performance costs.
See https://www.akkadia.org/drepper/dsohowto.pdf 2.4.3
But for a standalone program, does the same issue exist, or is the array generated at link time (ld(1))?
If the array can be read-only for c
in a standalone program, it could be even better than a
, since it doesn't consume an unnecessary byte for "cc"
.
Although... since the array of pointers approach requires an array of pointers that is separate from the strings, it might use even more than an extra byte:
Array of arrays:
aa\0\0aaa\0 // total 8 bytes
Array of pointers:
cc\0ccc\0ppppppppqqqqqqqq // total 23 bytes
p and q being pointers (64-bit) to the strings
so strings should be much more different (> 8 bytes in average compared to the longest string) in size to compensate for the extra array. Unless I'm missing something.
Edit: This issue was raised in a patch submitted to NGINX Unit. It can be helpful to see it, which contains some real numbers coming from real code: https://github.com/nginx/unit/pull/721
Edit2: Experimentally, an array of pointers seems to be much worse that an array of arrays (see that link). It more or less confirms my expectations of a large array that is put in the (initialized) data section of the binary. I expect that to slow down the startup.
The most relevant part of that link is the following:
$ git switch array_of_pointers
$ git clean -dffx
$ ./configure
$ make -j
$ size build/unitd
text data bss dec hex filename
374088 29640 1224 404952 62dd8 build/unitd
$ git switch array_of_arrays
$ git clean -dffx
$ ./configure
$ make -j
$ size build/unitd
text data bss dec hex filename
375266 29000 1224 405490 62ff2 build/unitd
1 comment thread