Welcome to Software Development on Codidact!

Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.

Post History

66%

+2 −0

Code Reviews Measuring arithmetic overflow checking overhead in C#

Benchmarking is hard and there are a lot of potential issues with your code. I agree with Olin Lathrop that you want at least a few seconds worth of run time, not just due to the potential for a lo...

posted 3y ago by Derek Elkins‭

Answer

#1: Initial revision by

Derek Elkins‭ · 2022-02-12T23:48:48Z (about 3 years ago)

Copy Link

Raw

Markdown

However, there are still other problems. One problem is the proportion of what you're measuring compared to what you *want* to be measuring. Adding three `int`s can be (and you'd hope is) an extremely simple operation that could quite possibly take less than 2 cycles on average, possibly even less than 1. Particularly if `GetSum` isn't inlined, it's quite possible that the (unchecked) addition takes less than 1% of the execution time of an iteration of the loop in `Compute`. If it was 0.1%, for example, then even a 10x difference would barely register. With enough data you'd be able to detect that there's a statistically significant difference, but that difference would still look minuscule.

The second problem is systematic errors which can bias the result and are especially bad when, per the previous paragraph, effect size is so small. Some sources of bias may be handled by the benchmarking framework such as warming up caches so that whatever is benchmarked first doesn't eat a bunch of cache misses. A subtle, but likely minor example, is the branch in `GetSum`. Usually one branch of an `if` will be slightly faster than the other due to instruction cache and branch prediction issues. In this case, the CPU branch prediction should quickly figure out which branch is taken very early in the loop, so the effect should be negligible, though there may still be an effect. FoggyFinder points out another potential, and much larger, source of bias *depending on how you intend to do checked arithmetic*. **If** you intend to execute a handful of arithmetic operations, then catch any potential overflow exceptions and apply fallback logic, **then** you should do as FoggyFinder suggests and include such fallback logic but I would not bother making actual overflow cases assuming your assumption of their frequency in practice holds. (Again, even several orders of magnitude of difference for the actual overflow cases isn't going to change the overall result by much.) If, instead, you are going to do larger scale operations and catch an overflow only for the whole result, then you should NOT include the `try`-`catch` block.

A third problem is missed opportunities. Even if you did a completely perfect job a computing how much costlier `add.ovf` is compared to `add`, and frankly this would be easy to do and easier than benchmarking by just looking at the assembly, this would quite likely be meaningless in practice. The real cost of `add.ovf` may be in missed optimizations. You may have wondered how performing two add operations could take less than a cycle to execute. The answer is superscalar and/or SIMD execution or more generally what's known as instruction level parallelism. If a SIMD opcode let's you execute 4 adds at once in a single cycle, then each only costs a quarter of a cycle. The cost could even be 0 cycles if the compiler is able to eliminate the add entirely. A trivial but illustrative example is `(a + b) - b` is equivalent to just `a` in `int` arithmetic even if `a + b` overflows. The need to ensure `a + b` doesn't overflow would block this optimization in checked arithmetic.

So now some actual advice.

1. Most importantly, benchmark ***real*** and representative code. This is what you care about and this will incorporate any relevant second-order effects.
2. Minimize overhead. In this case, this means inlining *everything* and avoiding heap allocation in the inner loop. In other contexts, these may be negligible overhead.
3. Get enough samples. The higher the precision you need and the noiser the environment, the more samples you should get. More is always better. So add, at least, another factor of 1,000 to `Loops`.
4. If possible, measure the overhead. In terms of your original code, this might involve a variant of `GetSum` that does no addition and, perhaps, just returns `numbers[1]`, say. Admittedly, there's a risk that the compiler does something smart. However, if the result of doing no additions is roughly the same as with additions, then you know most of what you're measuring is overhead.
5. Don't put any benchmark choosing logic in the benchmark itself. This5is overhead, but, more importantly, it is an easily avoidable source of potential bias.
6. Use an AOT compiler and check the *assembly* NOT (just) the bytecode. This is less important when benchmarking real code as opposed to a microbenchmark because the different assembly output is part of the cost that you *want* to measure. Nevertheless, you want the difference to be due to the different code-under-test and not the benchmarking setup.

Within the context of your current code, the result of the above advice assuming you *don't* need fallback code every few arithmetic operations would be:

```csharp
public void ComputeBaseline()
{
for (int i = 0; i < Loops; i++)
{
int[] row = Data[i];
Results[i] = row[0] + row[1] + row[2];
}
}
```
and similarly for `ComputeChecked_`.

As an aside that I won't go into to detail here as it's not the topic of this question, I find it unlikely that a benchmark is needed to make the decision here. It's fine as a matter of curiosity. It also seems unlikely to me that checked arithmetic is the right solution to this problem (assuming it's a problem that needs solving) as opposed to larger or even unbounded numeric types.

Communities

Post History