Communities

Writing
Writing
Codidact Meta
Codidact Meta
The Great Outdoors
The Great Outdoors
Photography & Video
Photography & Video
Scientific Speculation
Scientific Speculation
Cooking
Cooking
Electrical Engineering
Electrical Engineering
Judaism
Judaism
Languages & Linguistics
Languages & Linguistics
Software Development
Software Development
Mathematics
Mathematics
Christianity
Christianity
Code Golf
Code Golf
Music
Music
Physics
Physics
Linux Systems
Linux Systems
Power Users
Power Users
Tabletop RPGs
Tabletop RPGs
Community Proposals
Community Proposals
tag:snake search within a tag
answers:0 unanswered questions
user:xxxx search by author id
score:0.5 posts with 0.5+ score
"snake oil" exact phrase
votes:4 posts with 4+ votes
created:<1w created < 1 week ago
post_type:xxxx type of post
Search help
Notifications
Mark all as read See all your notifications »
Q&A

Welcome to Software Development on Codidact!

Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.

Comments on What is undefined behavior and how does it work?

Parent

What is undefined behavior and how does it work?

+11
−0

I have created this sensational program:

#include <stdio.h>

int* func (void)
{
  int local=5;
  return &local;
}

int main (void)
{
  printf("%d\n", *func());
}

This prints 5 even though I'm returning a pointer to a local variable. It did not produce an error!

And then I made another sensational program:

int main (void)
{
  int arr[5] = {1,2,3,4,5};
  for(int i=0; i<10; i++)
  {
    printf("%d ", arr[i]);
  }
}

It did not produce an error! But it prints 1 2 3 4 5 and then some other stuff.

I was told that these cases where examples of "undefined behavior" and that we aren't allowed to write code like that.

But it works! The compiler did not protest, the OS didn't give me a "segmentation fault" and nothing bad happened. Isn't some sort of error supposed to happen when we do things like these?

What is this "undefined behavior" and who is responsible for taking care of it?

History
Why does this post require moderator attention?
You might want to add some details to your flag.
Why should this post be closed?

1 comment thread

Does it happen to C# also? (2 comments)
Post
+13
−0

Undefined behavior (informally "UB") is a formal term in the C language, defined in C17 3.4.3

undefined behavior
behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements

Meaning that anything can happen when you invoke undefined behavior. No guarantees, no standardization, no deterministic behavior. Some examples of what might happen when you invoke it include:

  • The program crashes, misbehaves or gives strange results, or
  • Well-defined, non-standard compiler extensions handle the case gracefully, or
  • The program seems to work in fine and deterministic ways, but might crash or behave in random ways:
    • when you run it again, or
    • after it has been up and running fine for ages, or
    • when you provide different compiler options, or
    • when you enable optimization/release build, or
    • after code maintenance of completely unrelated parts elsewhere in the program, or
    • when you port it to a different compiler or system, or
    • after a compiler or system update.
  • Or something else entirely, including an old boring meme about demons flying out of the programmer's nose.

This means that undefined behavior is always to be regarded as a bug. A particularly nasty one, since it might be dormant in a program that seems to work just fine, then suddenly pop up and crash things later.

It is important to investigate the cause of the undefined behavior - not so much the symptoms. Beginners often ask why a certain behavior occurred when they did something undefined, but there is often not much to learn from investigating the outcome. Time is better spent on learning what caused the undefined behavior.

The C standard has a ~10 pages long summary list of various forms of undefined behavior. Then on top of that, anything that goes beyond the scope of the language is also regarded as undefined behavior.

Most forms of undefined behavior are run-time bugs and it is generally not the compiler's job to find them. Some compilers are nice enough to point out various common forms of undefined behavior when they can be spotted at compile-time, but compilers aren't required to do so.

So mostly this burden falls upon the human C programmer, who must know what they are doing and what they shouldn't be doing. The system is not "required" to produce segmentation faults, access violations etc.

In some cases compilers may provide well-defined non-standard extensions. Take for example the case of signed integer overflow - this is formally undefined behavior, but pretty much every existing CPU in the real world has instruction support for this of the assembler level. The CPU will simply wrap around according to 2's complement and set an overflow flag. Since that functionality is available in the hardware, it doesn't make much sense for the compiler to run away screaming just because the C standard says that something is undefined behavior. Might as well use the deterministic functionality available, although in this example the guarantee comes from the CPU instruction set, not from the C standard or C compiler.

But still, we should not write programs relying on undefined behavior. Some compiler might decide to make aggressive optimizations under the assumption that undefined behavior will never occur. They might for example simply remove instructions from the code.


Note that undefined behavior does not mean "compiler-specific". There are also two other formal terms for other kinds of poorly-defined behavior: unspecified behavior and implementation-defined behavior. The difference between these are:

  • Undefined behavior means anything can happen, including program crashes. There might not be any reliable behavior by any compiler.

    Examples: accessing variables that have gone out of scope, accessing an array out of bounds, integer overflow.

  • Unspecified behavior means that C allows multiple implementations of the same thing and a compiler need not document how it does those things. Or that an operation might produce any random value. Unspecified behavior need not be consistent and should not be relied upon by the program, but will not cause crashes etc.

    Examples: order of evaluation of operands or function arguments, the value of struct padding bytes/bits, function inlining.

  • Implementation-defined behavior means compiler-specific behavior, which unlike unspecified behavior must be documented by the compiler. These are cases where a language allows multiple different implementations by different compilers, but require the behavior to be deterministic and documented.

    Examples: the size of the various integer types, which signedness format that is used, byte ordering (endianess).

History
Why does this post require moderator attention?
You might want to add some details to your flag.

2 comment threads

For anyone wondering about the "demons flying out of the nose" meme, [here's the relevant link](https... (1 comment)
General comments (3 comments)
General comments
Derek Elkins‭ wrote over 3 years ago

The K Framework project, a system for specifying operational semantics for programming languages, made a semantics for C that would get stuck when undefined behavior was encountered. The implementation is here: https://github.com/kframework/c-semantics and it's described in the paper Defining the Undefinedness of C. In other words, it is a C compiler that produces an executable which stops and reports an error if the program invokes undefined behavior.

Incnis Mrsi‭ wrote almost 3 years ago

What is a “signedness format”? You might assume formats of signed integer types.

Lundin‭ wrote almost 3 years ago

@Incnis Mrsi‭ It means either 2's complement, 1's complement or signed magnitude. C unfortunately allows all of these still. In practice, the vast majority of all real-world computers use 2's complement.