Welcome to Software Development on Codidact!
Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.
What is CPU endianness?
I was fooling around with the following C code on my trusty old x86 PC:
#include <stdint.h>
#include <stdio.h>
int main (void)
{
uint32_t u32 = 0xAABBCCDD;
uint8_t* ptr = (uint8_t*)&u32;
for(size_t i=0; i<sizeof(uint32_t); i++)
{
printf("%.2X", ptr[i]);
}
}
To my surprise, this prints DDCCBBAA
with all bytes backwards. Someone told me this was because of "endianness" and that my x86 is "little endian". What is the meaning of this?
1 answer
This goes back to the various CPU architecture "wars" in the 1970s-1980s between the competitors Intel and Motorola (for example Intel 8086 vs Motorola 68000). For various reasons, CPUs from these two manufacturers ended up with different byte ordering in relation to each other. Byte ordering referring to which byte of an integer (or float) "word" or "double word" that was stored first in memory.
Given some address 0x0000 where a 32 bit integer variable is allocated, most Motorola would store the integer value 0xAABBCCDD
as:
Big Endian
Address Byte
0x0000 AA
0x0001 BB
0x0002 CC
0x0003 DD
That is, most significant data byte at the lowest address. Intel did the other way around:
Little Endian
Address Byte
0x0000 DD
0x0001 CC
0x0002 BB
0x0003 AA
All CPU manufacturers ended up subscribing either to the "Motorola camp" or the "Intel camp". One can easily come up with a rationale for either format: the Motorola one stores data in the order numbers are read in English, wheras the Intel one stores the least significant byte at the least significant address, making value byte significance consistent with addresses.
There was lots of debate over which way that was best, even though the answer isn't obvious. There was obviously a lot of prestige between competitors involved. Therefore the computer scientist Danny Cohen humorously compared the conflict (here) with the classic story Gulliver's Travels by Jonathan Swift. In that story, two factions of the Lilliputians are fighting a fierce but pointless conflict over which side of a boiled egg that should be cracked open first: the "big end" or the "little end".
From Cohen's article:
It is the question of which bit should travel first, the bit from the little end of the word, or the bit from the big end of the word?
This definition isn't really easy to grasp technically, because the purpose of the definition is to make a joke. In the Motorola version, the "big end of the word" comes first (lowest address), so the Motorola style was named Big Endian and the Intel style Little Endian. The byte order of words is called endianness. What started as a joke has become formal terms.
Practically, endianess applies to all larger data types: 2, 4 or 8 byte integers or floating point numbers. It does not apply to single-byte data. Nor does it apply to text strings, which are almost always stored with the first letter at first address, to reflect the left-to-right reading order.
Similarly, data addresses in a computer also have an endianess, if accessed as variables: when stored in CPU index registers or used as C language pointers etc.
Whenever we write code that relies on the byte order of an integer type, we must take endianess into account or such code might not be portable. Similarly, various network protocols specify a network endianess to guarantee portability. All CPUs communicating with a certain standardized protocol must convert to/from their internal endianess to the endianess of the protocol.
Many network protocols (for example TCP/IP) follow Big Endian, which in some cases could be out of historical reasons: once upon a time CRC checksums were often calculated in hardware using logical XOR gates, and in order to do so, you must clock in the data left-to-right.
1 comment thread