Communities

Writing
Writing
Codidact Meta
Codidact Meta
The Great Outdoors
The Great Outdoors
Photography & Video
Photography & Video
Scientific Speculation
Scientific Speculation
Cooking
Cooking
Electrical Engineering
Electrical Engineering
Judaism
Judaism
Languages & Linguistics
Languages & Linguistics
Software Development
Software Development
Mathematics
Mathematics
Christianity
Christianity
Code Golf
Code Golf
Music
Music
Physics
Physics

Dashboard
Notifications
Mark all as read
Q&A

What is CPU endianness?

+6
−0

I was fooling around with the following C code on my trusty old x86 PC:

#include <stdint.h>
#include <stdio.h>

int main (void)
{
  uint32_t u32 = 0xAABBCCDD;
  uint8_t* ptr = (uint8_t*)&u32;
  for(size_t i=0; i<sizeof(uint32_t); i++)
  {
    printf("%.2X", ptr[i]);
  }
}

To my surprise, this prints DDCCBBAA with all bytes backwards. Someone told me this was because of "endianness" and that my x86 is "little endian". What is the meaning of this?

Why does this post require moderator attention?
You might want to add some details to your flag.
Why should this post be closed?

7 comments

The code snippet in the question violates the strict aliasing rule, and is undefined behaviour. :-( The safe C way to type-pun is to use a union. I'll post a suggested edit with this (feel free to reformat/edit to taste). Chris Jester-Young‭ about 1 month ago

@Chris Jester-Young‭ No, that's wrong. There is a special rule allowing us to inspect any type in C by using a character type (uint8_t is always a character type if supported). C17 6.3.2.3/7. "When a pointer to an object is converted to a pointer to a character type, the result points to the lowest addressed byte of the object. Successive increments of the result, up to the size of the object, yield pointers to the remaining bytes of the object." Lundin‭ 30 days ago

@Chris Jester-Young‭ Furthermore, if you read the actual "strict aliasing rule", it has an explicit exception for lvalue access through a character type. C17 6.5/7: "An object shall have its stored value accessed only by an lvalue expression that has one of the following types: ... - a character type." Lundin‭ 30 days ago

Anyone one who regularly crosses the line between c and c++ should keep in mind that c++’s strict aliasing rule is stricter than c’s. Punning in-place is possible a couple of ways in c and essentially forbidden in c++. Drives me bats because the committee could have made an exception for POD types, but there you have it. dmckee‭ 29 days ago

@dmckee you can use attributes to make specific types exempt from strict aliasing, also note that MSVC has no strict aliasing. Lundin, would you add this attribute to your post behind a "not MSVC" guard? Or mention fno-strict-aliasing? I've found that very few people know about type punning and I've found some real messes over the years. jrh‭ 24 days ago

Show 2 more comments

1 answer

+9
−0

This goes back to the various CPU architecture "wars" in the 1970s-1980s between the competitors Intel and Motorola (for example Intel 8086 vs Motorola 68000). For various reasons, CPUs from these two manufacturers ended up with different byte ordering in relation to each other. Byte ordering referring to which byte of an integer (or float) "word" or "double word" that was stored first in memory.

Given some address 0x0000 where a 32 bit integer variable is allocated, most Motorola would store the integer value 0xAABBCCDD as:

Big Endian

Address  Byte
0x0000   AA
0x0001   BB
0x0002   CC
0x0003   DD

That is, most significant data byte at the lowest address. Intel did the other way around:

Little Endian

Address  Byte
0x0000   DD
0x0001   CC
0x0002   BB
0x0003   AA

All CPU manufacturers ended up subscribing either to the "Motorola camp" or the "Intel camp". One can easily come up with a rationale for either format: the Motorola one stores data in the order numbers are read in English, wheras the Intel one stores the least significant byte at the least significant address, making value byte significance consistent with addresses.

There was lots of debate over which way that was best, even though the answer isn't obvious. There was obviously a lot of prestige between competitors involved. Therefore the computer scientist Danny Cohen humorously compared the conflict (here) with the classic story Gulliver's Travels by Jonathan Swift. In that story, two factions of the Lilliputians are fighting a fierce but pointless conflict over which side of a boiled egg that should be cracked open first: the "big end" or the "little end".

From Cohen's article:

It is the question of which bit should travel first, the bit from the little end of the word, or the bit from the big end of the word?

This definition isn't really easy to grasp technically, because the purpose of the definition is to make a joke. In the Motorola version, the "big end of the word" comes first (lowest address), so the Motorola style was named Big Endian and the Intel style Little Endian. The byte order of words is called endianness. What started as a joke has become formal terms.

Practically, endianess applies to all larger data types: 2, 4 or 8 byte integers or floating point numbers. It does not apply to single-byte data. Nor does it apply to text strings, which are almost always stored with the first letter at first address, to reflect the left-to-right reading order.

Similarly, data addresses in a computer also have an endianess, if accessed as variables: when stored in CPU index registers or used as C language pointers etc.

Whenever we write code that relies on the byte order of an integer type, we must take endianess into account or such code might not be portable. Similarly, various network protocols specify a network endianess to guarantee portability. All CPUs communicating with a certain standardized protocol must convert to/from their internal endianess to the endianess of the protocol.

Many network protocols (for example TCP/IP) follow Big Endian, which in some cases could be out of historical reasons: once upon a time CRC checksums were often calculated in hardware using logical XOR gates, and in order to do so, you must clock in the data left-to-right.

Why does this post require moderator attention?
You might want to add some details to your flag.

0 comments

Sign up to answer this question »