Welcome to Software Development on Codidact!
Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.
Where to place digit separators in C23?
C23 introduces the digit separator '
which can be placed anywhere inside an integer constant for the purpose of clarity and self-documenting code. These are otherwise ignored by the compiler when determining the value of the number.
However, the language standard provides no guidance regarding how to sensibly use digit separators. These were introduced to C with proposal N2626 which in turn provides no guidance either - for example it suggests that 2'3434'5323
might be clearer to read than 234345323
, which I as a frequent user of engineering notation don't quite agree with. I believe the same feature was introduced in C++14 but with no guidance there either.
Are we to add '
at a whim or will are there any recommended practices to follow?
2 answers
This probably makes for a thoroughly unsatisfying answer, but there's probably far stronger cultural pressure than technical pressure. In European-derived cultures, we mostly group numbers by powers of one thousand as you suggest, and anybody doing something else should either have an extremely good reason ("it actually represents a series of decimal values, but we store them together, because we learned programming in 1967") or would get laughed out of any code review.
However, people in East Asian cultures group digits as factors of ten thousand in speech, even though they'll (usually) write it following European conventions. And India generally does pairwise separation, except for the final three digits. I assume that other approaches exist, but those get commonly cited.
So, while in power-of-two bases, it's probably safe to assume that some natural word-multiple boundary (eight in binary, three in octal, and four in hexadecimal seems consistent in what I've seen over the years, but as another answer points out, architecture may easily figure in, here) could and should become a strongly-encouraged convention, we'd want to take care that a convention for decimal representation doesn't ask billions of people to write code that's less readable for them.
Since this is all new, there might still be time to establish a consensus before this style feature too ends up "all over the place" (like upper/lower case hex, upper/lower case integer constant suffices etc).
Luckily we can lean on established computer science in this case - there are already best engineering practices for how to write numbers with various bases. If using those present best practices, then we end up with something like this:
Decimal integer/floating point constants (base 10)
Since programming sorts under the domain of engineering, these should respect engineering notation, which means that decimal values are conveniently expressed is multiples of 103 or 10-3. That is: tera, giga, mega, kilo, milli, micro, nano, pico and so on.
// appropriate style examples:
1'000'000
1'000'000.0
0.000'000
.000'000
// BAD style examples, do not use:
1'0000'0000
1'2'3
12'34'56
12.34'56'78
Binary constants (base 2)
Binary numbers are by convention always grouped either by nibbles or bytes. Grouping them by any larger unit will become unreadable. Grouping them as anything else but groups of 4 is senseless, except for cases where you have a number of bits not divisible by 4. In that case, remaining bits are placed to the left.
// appropriate style examples:
0b0000'0000'0000'0000
0b00000000'00000000
0b10'1010'1010
// BAD style examples, do not use:
0b00'00'00'00
0b0000000000000000'0000000000000000
0b1010'1010'10
Hexadecimal constants (base 16)
Hex might be grouped in several different ways. Sometimes it might make sense to group it on byte level, sometimes as 16 bit words. 32 bit words without decimal separators are harder to read. Breaking up nibbles doesn't make sense either. In case of numbers that aren't divisible by 16 bits, remaining bits are placed to the left.
// appropriate style examples:
0x00'00'00'00
0x0000'0000
0xAA'BBCC
// BAD style examples, do not use:
0x0'0'0'0
0x0000000000000000'0000000000000000
0xAABB'CC
1 comment thread