Communities

Writing
Writing
Codidact Meta
Codidact Meta
The Great Outdoors
The Great Outdoors
Photography & Video
Photography & Video
Scientific Speculation
Scientific Speculation
Cooking
Cooking
Electrical Engineering
Electrical Engineering
Judaism
Judaism
Languages & Linguistics
Languages & Linguistics
Software Development
Software Development
Mathematics
Mathematics
Christianity
Christianity
Code Golf
Code Golf
Music
Music
Physics
Physics
Linux Systems
Linux Systems
Power Users
Power Users
Tabletop RPGs
Tabletop RPGs
Community Proposals
Community Proposals
tag:snake search within a tag
answers:0 unanswered questions
user:xxxx search by author id
score:0.5 posts with 0.5+ score
"snake oil" exact phrase
votes:4 posts with 4+ votes
created:<1w created < 1 week ago
post_type:xxxx type of post
Search help
Notifications
Mark all as read See all your notifications »
Q&A

Welcome to Software Development on Codidact!

Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.

array of arrays vs array of pointers to store array of string literals

+6
−0

Let's consider the following code:

const char a[][4] = {"aa", "aaa"};
const char *b[] = {"bb", "bbb"};
const char *const c[] = {"cc", "ccc"};

For shared libraries, both b and c arrays require the array of pointers to be generated at runtime, which implies performance costs.

See https://www.akkadia.org/drepper/dsohowto.pdf 2.4.3

But for a standalone program, does the same issue exist, or is the array generated at link time (ld(1))?

If the array can be read-only for c in a standalone program, it could be even better than a, since it doesn't consume an unnecessary byte for "cc".

Although... since the array of pointers approach requires an array of pointers that is separate from the strings, it might use even more than an extra byte:

Array of arrays:

aa\0\0aaa\0                 // total 8 bytes

Array of pointers:

cc\0ccc\0ppppppppqqqqqqqq   // total 23 bytes

p and q being pointers (64-bit) to the strings

so strings should be much more different (> 8 bytes in average compared to the longest string) in size to compensate for the extra array. Unless I'm missing something.

Edit: This issue was raised in a patch submitted to NGINX Unit. It can be helpful to see it, which contains some real numbers coming from real code: https://github.com/nginx/unit/pull/721

Edit2: Experimentally, an array of pointers seems to be much worse that an array of arrays (see that link). It more or less confirms my expectations of a large array that is put in the (initialized) data section of the binary. I expect that to slow down the startup.

The most relevant part of that link is the following:

$ git switch array_of_pointers
$ git clean -dffx
$ ./configure
$ make -j
$ size build/unitd
   text	   data	    bss	    dec	    hex	filename
 374088	  29640	   1224	 404952	  62dd8	build/unitd
$ git switch array_of_arrays
$ git clean -dffx
$ ./configure
$ make -j
$ size build/unitd
   text	   data	    bss	    dec	    hex	filename
 375266	  29000	   1224	 405490	  62ff2	build/unitd
History
Why does this post require attention from curators or moderators?
You might want to add some details to your flag.
Why should this post be closed?

1 comment thread

What system? (2 comments)

0 answers

Sign up to answer this question »