Notifications
Q&A

Why can't we mix increment operators like i++ with other operators?

+5
−0

I'm experimenting with different operators and have a hard time understanding the outcome of certain expressions. I try to combine the ++ operators with other operators such as assignment in the same expression. But I get mighty strange results when I use the same variable more than once. One example:

int i=0;
i=i++;
printf("%d\n",i);

This prints 0 on gcc and clang compilers, but 1 when I use icc. And so on, different behavior with different compilers.

clang tells me:

warning: multiple unsequenced modifications to 'i' [-Wunsequenced]

gcc -Wall tells me:

warning: operation on 'i' may be undefined [-Wsequence-point]

I don't understand these warnings, what's the meaning of "unsequenced" and "sequence points"?


Various other attempts give similar strange and unpredicted results, with the behavior changing when I switch compiler. Some other examples that fail to behave deterministically:

  • i = i++ + ++i;
  • i = array[i++];
  • func(i, i++);
  • *ptr++ = *ptr++;

What is the reason behind all my problems with expressions like the ones above? I thought that operator precedence guaranteed a certain order of execution?

Why should this post be closed?

0 comments

1 answer

+9
−0

These examples have undefined behavior and unspecified behavior all at once!

Operator precedence has nothing to do with the order of execution, see What is the difference between operator precedence and order of evaluation? From that post we can also learn that most of the above examples have unspecified behavior, since they rely on a certain order of evaluation. This is true even for the assignment operator, which only guarantees that the value storage happens after evaluation of left and right operands, but not in which order the operands are evaluated.

But aside for the unspecified order of evaluation, these examples have an even more fundamental problem - they also have undefined behavior, which I will address further below.


Side effects and sequence points

There are two formal C standard terms we need to understand, to get to the bottom of understanding all this: side effect and sequence point.

  • Side effects are defined in the C standard as (C17 5.1.2.3):

    Accessing a volatile object, modifying an object, modifying a file, or calling a function that does any of those operations are all side effects, which are changes in the state of the execution environment. Evaluation of an expression in general includes both value computations and initiation of side effects.

    In plain English: updating the value of a variable is a side effect.

  • Sequence points are best described in an older version of the standard (C99 5.2.1.3):

    At certain specified points in the execution sequence called sequence points, all side effects of previous evaluations shall be complete and no side effects of subsequent evaluations shall have taken place.

    So sequence points are places where the compiler must be done with all previous execution. The most obvious example of a sequence point is the ; semicolon at the end of an expression.

This means that when we have code like this:

something();
i = i++ + ++i;

then the two ; are the sequence points. Everything between them does not have a well-defined execution order, the operands are unsequenced. This is problematic if we place one or several side effects there, because then the compiler doesn't necessarily know in which order it is expected to read and update each variable. And this (horrible) code has no less than 3 side effects: the updating of i as result of assignment, the i++ and the ++i.


Undefined behavior and sequence points

The C standard states that all such expressions are undefined behavior, see What is undefined behavior and how does it work? Meaning not only compiler-specific, but also a potential subtle but severe bug and there are no deterministic or guaranteed results. The C99 version of the standard expresses it in the most readable way (C99 6.5/2):

Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression. Furthermore, the prior value shall be read only to determine the value to be stored. 73)

(Latter standards use a much more confusing wording "sequenced before" and "sequenced after", but the meaning is the same.)

From the above quote we can learn that it is also undefined behavior to use the same variable at several times in an expression, for unrelated purposes and when there are side effects present. Note 73 gives some examples:

73) This paragraph renders undefined statement expressions such as

i = ++i + 1;
a[i++] = i;

while allowing

i = i + 1;
a[i] = i;

In the latter well-defined cases, the variable i occurs at several times in the same expression, but with at most one side effect and only for the purpose of calculating which value to store.


How to fix all these expressions, so that they are well-defined, portable and safe

The best and toughest rule of thumb (recommended by yours sincerely) is to never mix ++ or -- operators with other operators at all, ever. If we stick to that rule, we also don't need to ponder the difference between ++i and i++, since prefix vs postfix only matters when those operators are mixed with other operands.

A slightly more lenient rule could be to follow the MISRA-C guidelines for the use of C in critical systems (an industry standard, see MISRA-C), which has this rule (MISRA-C:2012 Rule 13.3):

A full expression containing an increment (++) or decrement (--) operator should have no other potential side effects other than that caused by the increment or decrement operator.

This still allows us to mix these operators with others, as long as we are aware that there are no other side effects present, so it is a quite reasonable rule.

In addition, we should never write code that relies on the unspecified order of evaluation. Again, MISRA-C provides a sensible rule (MISRA-C:2012 Rule 13.2):

The value of an expression and its persistent side effects shall be the same under all permitted evaluation orders

Meaning that writing code containing unspecified order of evaluation is fine, as long as the results don't depend on that order and we get the same results no matter compiler.

1 comment

This was about C, but C++ behaves the very same up to version C++14. In C++17 and later, as well as all versions of Java, the assignment operator has well-defined sequencing, making i=i++; etc well-defined. But C++17 and beyond still does not allow completely wild stuff like i++ + ++i. ‭Lundin‭ 24 days ago

Sign up to answer this question »