Welcome to Software Development on Codidact!
Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.
What are statements and expressions?
When I have tried to read technical explanations of the syntax rules for programming languages, and when I am trying to decipher error messages, I often encounter the terms expression and statement. It comes across that these two are related to each other somehow.
I understand that these terms have something to do with the actual code written in a programming language - not, for example, special sorts of values calculated by the program when it runs - right? But what do they mean exactly? How can I use these concepts to improve my understanding of a programming language?
4 answers
In computer programming, an expression is something that yields a value.
A statement performs an action.
For example, let us look at some pseudocode. Let's assume that we want to calculate the sum of 3 variables:
sum = a + b + c;
print(sum);
print(sum);
is a statement: it performs an action.
a + b + c
is an expression: it yields a value.
Now you may be wondering: is sum = a + b + c
a statement, or an expression?
The answer is that it's a statement, but it contains an expression. a + b + c
yields a value, and then an action is taken: the value is assigned to a variable.
In this example, we have an arithmetic expression. But most operations on strings and booleans are also expressions!
For example, we could have a conditional:
if (a > 3 && p == 5) { ... }
In this condition statement, the part a > 3 && p == 5
is an expression. A boolean expression.
Or, we might be concatenating strings:
fullName = firstName + " " + lastName;
In this line of code, firstName + " " + lastName
is a string expression. It yields a value. The line as a whole is a statement: it evaluates an expression and stores the result in a new variable.
In general, expressions occur inside statements. An expression yields a value, but after you have your value, you'll want to do something with it - store it somewhere, or output it, or send it as an argument to another function.
Statements and expressions are two syntactic categories that are used by many programming languages. Since they are syntactic, they depend on the programming language's syntax. In a real sense, a statement is "anything that can be used as a statement" and an expression is "anything that can be used as an expression"
Let me explain what I mean by this.
Expressions
While in most programming languages, the idea that expressions evaluate to a value and statements are actions does make intuitive sense, what things evaluate to values -- or even what values even are -- can be quite complicated and language-specific.
For example, consider void
functions in C, aka. non-value returning functions. Functions that return void
, intuitively, do not return a value. However, this is only half-true. Take a void function void foo();
. When we write the statement foo();
, this is actually an expression-statement that contains the expression foo()
, despite this expression, semantically, being void
and having no real value that it evaluates to.
This is why when we talk about expressions or statements, the real answer is just "whatever can be used as an expression, according to the programming language's grammar". And of course, what exactly counts as an expression depends on the language. C defines function calling syntax as an expression, regardless of the return type.
This doesn't have to be the case; Free Pascal for instance distinguishes "functions" which return values from "procedures" which don't. In Free Pascal, functions calls are expressions, while procedure calls are statements.[1]
As a more well-known example, Python is different from most mainstream languages in the C family in that it has both assignment statements using =
and assignment expressions using :=
.
C
int a;
int b = (a = 1);
Python (SyntaxError
)
a = 0
b = (a = 1)
Python
a = 0
b = (a := 1)
Statements
Again, most people have an intuition for what a statement is, and languages do try to follow that intuition when naming their syntax. Generally speaking, statements do something rather than evaluate to a value, though this is not universal, and what counts as a statement varies considerably between languages, and at any rate languages might decide that statements evaluate to values anyway (more on that later).
As mentioned previously, in C, assignment is an expression rather than a statement, despite primarily being used for its side-effect. In contrast, in Python, assignment can be either a statement or an expression, with different syntax for each.
All of this is just to say that again, what a statement is is defined more by syntax than any fundamental aspect, and different languages do the same thing differently.
Languages where everything is an expression
(or almost everything)
Functional languages and those influenced by them tend to have an everything-is-an-expression system, where what would traditionally be statements are expressions as well. In these kinds of languages, statements might not even exist, and you might only have declarations and expressions (for instance, Haskell falls into this category).
For example, in Rust, unlike C, if
-else
is an expression that evaluates to the result of the taken branch, essentially being the traditional ternary operator.
Rust
let y = if x == 1 {
"one"
} else {
"not one"
};
Even while-loops are expressions in Rust, although they always return the unit type.
Languages without expressions
At the extreme opposite end, assembly/machine/byte code only contains statements. Evaluating "expressions" (in the mathematical sense) relies on using statements. A hypothetical addition of 1 + 2
might look like this (in pseudo-code):
load_const reg1, 1 # Set reg1 to 1
load_const reg2, 2 # Set reg2 to 2
add reg3, reg1, reg2 # Add reg1 and reg2 and store the result in reg3
No expression to be seen here, just pure state manipulation, unless you consider reg1
and such to be expressions, though at this level, the entire point of distinguishing such a thing becomes somewhat meaningless
since the syntactic categories are instead 'instructions', 'registers', and 'constants'.
-
Interestingly, in Free Pascal, using an expression as a statement is actually a feature that you can turn on and off. ↩︎
To add to the excellent explanation by FractionalRadix, it's worth mentioning that sometimes the line between expressions and statements can seem a little blurry (at least to the observer — the language specification will almost certainly define the boundaries clearly and precisely).
For example, in C we have the expression ++i
, which means "add one to the value of i
and return the new value". This is an expression because it yields a value, but it also makes a change to the program state. Therefore you can use it as a "pure" expression and assign it to something else:
int iNew = ++i; // increment i, put incremented value into iNew
or you can use it as a statement in its own right:
++i; // increment i and do nothing else
Also in C, a simple assignment can be an expression as well as a statement, e.g.
int x = 2; // statement
int y = (x = 3); // y and x are now both equal to 3
Informally, you can think of an expression as something you can (but don't necessarily have to) put on the right-hand side of an assignment operation.
Languages may also differ in which constructs they treat as statements or expressions. In Rust, an if
construct is actually an expression, not a statement, and can be assigned as a value:
let x = if (y == 2) { 5 } else { 10 };
If you try this in C you'll get a compilation error because if
is a statement and does not evaluate to a value:
int x = if (y == 2) { 5; } else { 10; } // INVALID; won't compile
But in Python, you can't treat an assignment as an expression like you can in C, unless you use a special recently-introduced syntax:
x = 2
y = (x = 3)
^
SyntaxError: invalid syntax. Maybe you meant '==' or ':=' instead of '='?
y = (x := 3) # x and y are now both 3
The use of the terms expression and statement could vary between programming languages. However, the following distinction is widely used:
Expressions are syntactic forms that allow software authors to describe computations. They intentionally only bring few constraints with respect to ordering.
Statements are syntactic forms that allow software authors to define the order / sequence and conditions in which computations and state changes take place.
An early description of this distinction between expressions and statements and their respective uses is found in the Report on the Algorithmic Language ALGOL 60:
The basic concept used for the description of calculating rules is the well-known arithmetic expression containing as constituents numbers, variables, and functions. From such expressions are compounded, by applying rules of arithmetic composition, self-contained units of the language - explicit formulae - called assignment statements.
To show the flow of computational processes, certain nonarithmetic statements and statement clauses are added which may describe, e.g., alternatives, or iterative repetitions of computing statements.
The concepts of ALGOL have influenced many later languages (https://en.wikipedia.org/wiki/ALGOL, https://en.wikipedia.org/wiki/Generational_list_of_programming_languages).
Looking at the ordering aspect first, assume the following expression (which could be syntactically valid in C, Java, Python, Haskell and many other programming languages if the respective names a
-h
are properly declared):
f(a + b) + g(c + d) * (e - h)
It is normally not defined by the language specification, whether a+b
has to be computed before c+d
or e-h
. It is also left undefined whether the function f
will be called before g
. This is intentionally left unspecified to give the compiler or interpreter of the code the possibility to choose an order with a good performance. Some rules exist: there is a precedence defined (*
binds stronger than +
, computations in parentheses have precedence, ...), but these are not really ordering rules, but define the meaning of the expression. In fact, a compiler could re-arrange the expression by applying valid algebraic transformations.
For pure mathematical computations the ordering would have no impact on the result of the computation. However, as soon as state changes are possible, the order becomes important. Consider the following expression from C (and a few other languages which have adopted the concept from C), where ++i
represents the situation that the variable i
is incremented by one and the incremented value is used:
++i * 3 + (e - h)
In this example, the internal state of the program is modified by the expression: Whatever value i
had before, afterwards the value stored within i
is larger by one. Which does not bring a problem in the expression above: Due to C's precedence rules it is known that ++
binds stronger than *
. Thus, with the meaning of ++i*3
being clearly defined, it is not relevant, if ++i*3
is computed before e-h
. But what if the expression looks as follows:
++i * 3 + (e - h - i)
Which value of i
would be used in e-h-i
? The one before or the one after incrementing i
? Suddenly, ordering becomes important (the expression is in fact invalid in C as it has undefined behavior because of this ambiguity).
Since ordering is important when it comes to state changes, and since for expressions the ordering of computations is only partially defined, most programming languages define - in addition to expressions - statements, which handle (among other things) the ordering. The ordering of computations (which can involve state changes) between statements is clearly defined, such that also the order of state changes happening between statements or from within expressions becomes defined.
Going back to the example ++i * 3 + (e - h - i)
from above. In C, this expression with undefined behavior can be re-written in the following ways:
++i; // a statement consisting of a single state-changing expression
z = i * 3 + (e - h - i); // another statement, executed afterwards
Here it is clearly defined that both occurrences of i
in the expression i * 3 + (e - h - i)
use the incremented value, because the sequence of statements defines the ordering. In contrast, the example expression ++i * 3 + (e - h - i)
could have also been re-written as:
y = e - h - i; // still using the old value of i
z = ++i * 3 + y; // second statement with the state change
Here, the ordering makes it clear that e-h-i
uses the value of i
before the state change, while in ++i*3
the new value is used.
As mentioned initially, statements are often also about conditions under which computations (including state changes) are performed. In C and related languages there are conditional statements (if
, switch
, ...) and loop statements (for
, while
, do
, ...) that define if and how often certain computations are performed.
As others have mentioned, there are languages which solve these problems differently, like, having if
expressions, rather than if
statements. Or, even solving the ordering problem in completely different ways (like, the IO
monad in Haskell for defining the order of changes to program external state).
1 comment thread