Notifications
Sign Up Sign In
Q&A

What is undefined behavior and how does it work?

+10
−0

I have created this sensational program:

#include <stdio.h>

int* func (void)
{
  int local=5;
  return &local;
}

int main (void)
{
  printf("%d\n", *func());
}

This prints 5 even though I'm returning a pointer to a local variable. It did not produce an error!

And then I made another sensational program:

int main (void)
{
  int arr[5] = {1,2,3,4,5};
  for(int i=0; i<10; i++)
  {
    printf("%d ", arr[i]);
  }
}

It did not produce an error! But it prints 1 2 3 4 5 and then some other stuff.

I was told that these cases where examples of "undefined behavior" and that we aren't allowed to write code like that.

But it works! The compiler did not protest, the OS didn't give me a "segmentation fault" and nothing bad happened. Isn't some sort of error supposed to happen when we do things like these?

What is this "undefined behavior" and who is responsible for taking care of it?

Why should this post be closed?

0 comments

1 answer

+13
−0

Undefined behavior (informally "UB") is a formal term in the C language, defined in C17 3.4.3

undefined behavior
behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements

Meaning that anything can happen when you invoke undefined behavior. No guarantees, no standardization, no deterministic behavior. Some examples of what might happen when you invoke it include:

  • The program crashes, misbehaves or gives strange results, or
  • Well-defined, non-standard compiler extensions handle the case gracefully, or
  • The program seems to work in fine and deterministic ways, but might crash or behave in random ways:
    • when you run it again, or
    • after it has been up and running fine for ages, or
    • when you provide different compiler options, or
    • when you enable optimization/release build, or
    • after code maintenance of completely unrelated parts elsewhere in the program, or
    • when you port it to a different compiler or system, or
    • after a compiler or system update.
  • Or something else entirely, including an old boring meme about demons flying out of the programmer's nose.

This means that undefined behavior is always to be regarded as a bug. A particularly nasty one, since it might be dormant in a program that seems to work just fine, then suddenly pop up and crash things later.

It is important to investigate the cause of the undefined behavior - not so much the symptoms. Beginners often ask why a certain behavior occurred when they did something undefined, but there is often not much to learn from investigating the outcome. Time is better spent on learning what caused the undefined behavior.

The C standard has a ~10 pages long summary list of various forms of undefined behavior. Then on top of that, anything that goes beyond the scope of the language is also regarded as undefined behavior.

Most forms of undefined behavior are run-time bugs and it is generally not the compiler's job to find them. Some compilers are nice enough to point out various common forms of undefined behavior when they can be spotted at compile-time, but compilers aren't required to do so.

So mostly this burden falls upon the human C programmer, who must know what they are doing and what they shouldn't be doing. The system is not "required" to produce segmentation faults, access violations etc.

In some cases compilers may provide well-defined non-standard extensions. Take for example the case of signed integer overflow - this is formally undefined behavior, but pretty much every existing CPU in the real world has instruction support for this of the assembler level. The CPU will simply wrap around according to 2's complement and set an overflow flag. Since that functionality is available in the hardware, it doesn't make much sense for the compiler to run away screaming just because the C standard says that something is undefined behavior. Might as well use the deterministic functionality available, although in this example the guarantee comes from the CPU instruction set, not from the C standard or C compiler.

But still, we should not write programs relying on undefined behavior. Some compiler might decide to make aggressive optimizations under the assumption that undefined behavior will never occur. They might for example simply remove instructions from the code.


Note that undefined behavior does not mean "compiler-specific". There are also two other formal terms for other kinds of poorly-defined behavior: unspecified behavior and implementation-defined behavior. The difference between these are:

  • Undefined behavior means anything can happen, including program crashes. There might not be any reliable behavior by any compiler.

    Examples: accessing variables that have gone out of scope, accessing an array out of bounds, integer overflow.

  • Unspecified behavior means that C allows multiple implementations of the same thing and a compiler need not document how it does those things. Or that an operation might produce any random value. Unspecified behavior need not be consistent and should not be relied upon by the program, but will not cause crashes etc.

    Examples: order of evaluation of operands or function arguments, the value of struct padding bytes/bits, function inlining.

  • Implementation-defined behavior means compiler-specific behavior, which unlike unspecified behavior must be documented by the compiler. These are cases where a language allows multiple different implementations by different compilers, but require the behavior to be deterministic and documented.

    Examples: the size of the various integer types, which signedness format that is used, byte ordering (endianess).

0 comments

Sign up to answer this question »