Communities

Writing
Writing
Codidact Meta
Codidact Meta
The Great Outdoors
The Great Outdoors
Photography & Video
Photography & Video
Scientific Speculation
Scientific Speculation
Cooking
Cooking
Electrical Engineering
Electrical Engineering
Judaism
Judaism
Languages & Linguistics
Languages & Linguistics
Software Development
Software Development
Mathematics
Mathematics
Christianity
Christianity
Code Golf
Code Golf
Music
Music
Physics
Physics
Linux Systems
Linux Systems
Power Users
Power Users
Tabletop RPGs
Tabletop RPGs
Community Proposals
Community Proposals
tag:snake search within a tag
answers:0 unanswered questions
user:xxxx search by author id
score:0.5 posts with 0.5+ score
"snake oil" exact phrase
votes:4 posts with 4+ votes
created:<1w created < 1 week ago
post_type:xxxx type of post
Search help
Notifications
Mark all as read See all your notifications »
Code Reviews

Welcome to Software Development on Codidact!

Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.

Measuring arithmetic overflow checking overhead in C#

+4
−0

Overflow checking for integral-type arithmetic operations is disabled by default and it can be explicitly enabled by using using checked function or the -checked compiler switch.

Since I mainly develop business applications where the main focus is correctness, I am wondering if globally switching to checked makes sense and what would be the performance impact.

My benchmarking code is the following:

[MemoryDiagnoser]
public class Program
{
    private const int Loops = 100 * 1000;
    private static readonly int[][] Data = new int[Loops][];
    private static readonly int[] Results = new int[Loops];
    private const int MaxValue = int.MaxValue / 3 + 5 * 1000 * 1000;

    [GlobalSetup]
    // populate data to avoid wasting benchmarking time with this
    public static void CreateData()
    {
        Random r = new();
        for (int i = 0; i < Loops; i++)
        {
            int n1 = r.Next(MaxValue);
            int n2 = r.Next(MaxValue);
            int n3 = r.Next(MaxValue);
            Data[i] = new[] { n1, n2, n3 };
        }
    }

    // checked is applied based on flag
    // since checked(function_call) does not seem to work
    private static int GetSum(bool isChecked, int[] numbers)
    {
        if (isChecked)
            return checked (numbers[0] + numbers[1] + numbers[2]);

        return numbers[0] + numbers[1] + numbers[2];
    }

    // execute the sum in a loop to get a significant computation time
    private static void Compute(bool isChecked)
    {
        for (int i = 0; i < Loops; i++)
        {
            int[] row = Data[i];
            int result = GetSum(isChecked , new [] {row[0], row[1], row[2]});
            Results[i] = result;
        }
    }

    [Benchmark(Baseline = true)]
    public void ComputeBaseline()
    {
        Compute(false);
    }

    [Benchmark(Baseline = false)]
    public void ComputeChecked_()
    {
        Compute(true);
    }

    static void Main(string[] args)
    {
        var summary = BenchmarkRunner.Run<Program>();
    }
}

My results are as follows (release without attached to process):

Method Mean Error StdDev Ratio RatioSD Gen 0 Allocated
ComputeBaseline 1.257 ms 0.0127 ms 0.0106 ms 1.00 0.00 849.6094 4 MB
ComputeChecked_ 1.153 ms 0.0213 ms 0.0209 ms 0.92 0.02 849.6094 4 MB

I am not sure I am reading this correct, because it seems that the checked version is a little bit faster than the unchecked one or my code is incorrectly written for the benchmark.

I would appreciate a code review to understand if I doing something wrong in my assessment.


I have made some changes based on the received feedback:

  • fixed a bug related to initialization - checked version gets closer to the unchecked (base) one
  • no inlining - did not notice any change
  • increased the runtime - extra getting closer between base and checked
  • reduced bias by introducing a small chance to actually have an overflow - the checked takes clearly longer even if the overflow very rarely (a couple of dozens of times pe 10M loops, which is reasonable to happen in a real-life scenario).

The final result looks like this:

Method Mean Error StdDev Ratio RatioSD Gen 0 Allocated
ComputeBaseline 218.0 ms 4.34 ms 7.82 ms 1.00 0.00 85000.0000 381 MB
ComputeChecked_ 229.5 ms 2.51 ms 2.34 ms 1.02 0.04 85000.0000 381 MB

For those interested in what the benchmark is actually outputting, I will add the relevant parts here:

// Validating benchmarks:
// ***** BenchmarkRunner: Start   *****
// ***** Found 2 benchmark(s) in total *****
// ***** Building 1 exe(s) in Parallel: Start   *****
// start dotnet restore  /p:UseSharedCompilation=false /p:BuildInParallel=false /m:1 /p:Deterministic=true /p:Optimize=true in C:\Users\Alex\source\repos\CheckedOperationsBenchmark\bin\Release\net5.0\8844cf42-ce2d-4210-8746-304950dee939
// command took 2.27s and exited with 0
// start dotnet build -c Release  --no-restore /p:UseSharedCompilation=false /p:BuildInParallel=false /m:1 /p:Deterministic=true /p:Optimize=true in C:\Users\Alex\source\repos\CheckedOperationsBenchmark\bin\Release\net5.0\8844cf42-ce2d-4210-8746-304950dee939
// command took 3.78s and exited with 0
// ***** Done, took 00:00:06 (6.21 sec)   *****
// Found 2 benchmarks:
//   Program.ComputeBaseline: DefaultJob
//   Program.ComputeChecked_: DefaultJob

// **************************
// Benchmark: Program.ComputeBaseline: DefaultJob
// *** Execute ***
// Launch: 1 / 1
// Execute: dotnet "8844cf42-ce2d-4210-8746-304950dee939.dll" --benchmarkName "CheckedOperationsBenchmark.Program.ComputeBaseline" --job "Default" --benchmarkId 0 in C:\Users\Alex\source\repos\CheckedOperationsBenchmark\bin\Release\net5.0\8844cf42-ce2d-4210-8746-304950dee939\bin\Release\net5.0
// BeforeAnythingElse

// Benchmark Process Environment Information:
// Runtime=.NET 5.0.10 (5.0.1021.41214), X64 RyuJIT
// GC=Concurrent Workstation
// Job: DefaultJob

OverheadJitting  1: 1 op, 405100.00 ns, 405.1000 us/op
WorkloadJitting  1: 1 op, 271396400.00 ns, 271.3964 ms/op

WorkloadPilot    1: 2 op, 423324000.00 ns, 211.6620 ms/op
WorkloadPilot    2: 3 op, 646507700.00 ns, 215.5026 ms/op

WorkloadWarmup   1: 3 op, 650490500.00 ns, 216.8302 ms/op
...
WorkloadWarmup   6: 3 op, 671575300.00 ns, 223.8584 ms/op

// BeforeActualRun
WorkloadActual   1: 3 op, 641281200.00 ns, 213.7604 ms/op
...
WorkloadActual  49: 3 op, 634473700.00 ns, 211.4912 ms/op

// AfterActualRun
WorkloadResult   1: 3 op, 641281200.00 ns, 213.7604 ms/op
...
WorkloadResult  41: 3 op, 634473700.00 ns, 211.4912 ms/op
GC:  255 0 0 1200000000 3
Threading:  3 0 3

// AfterAll
// Benchmark Process 18616 has exited with code 0.

Mean = 218.037 ms, StdErr = 1.221 ms (0.56%), N = 41, StdDev = 7.817 ms
Min = 208.001 ms, Q1 = 212.466 ms, Median = 216.227 ms, Q3 = 221.466 ms, Max = 239.382 ms
IQR = 8.999 ms, LowerFence = 198.967 ms, UpperFence = 234.965 ms
ConfidenceInterval = [213.702 ms; 222.372 ms] (CI 99.9%), Margin = 4.335 ms (1.99% of Mean)
Skewness = 1.24, Kurtosis = 3.89, MValue = 2

// **************************
// Benchmark: Program.ComputeChecked_: DefaultJob
// *** Execute ***
// Launch: 1 / 1
// Execute: dotnet "8844cf42-ce2d-4210-8746-304950dee939.dll" --benchmarkName "CheckedOperationsBenchmark.Program.ComputeChecked_" --job "Default" --benchmarkId 1 in C:\Users\Alex\source\repos\CheckedOperationsBenchmark\bin\Release\net5.0\8844cf42-ce2d-4210-8746-304950dee939\bin\Release\net5.0
// BeforeAnythingElse

// Benchmark Process Environment Information:
// Runtime=.NET 5.0.10 (5.0.1021.41214), X64 RyuJIT
// GC=Concurrent Workstation
// Job: DefaultJob

OverheadJitting  1: 1 op, 377800.00 ns, 377.8000 us/op
Arithmetic operation resulted in an overflow.
...
Arithmetic operation resulted in an overflow.
WorkloadJitting  1: 1 op, 221698200.00 ns, 221.6982 ms/op

Arithmetic operation resulted in an overflow.
...
Arithmetic operation resulted in an overflow.
WorkloadPilot    1: 2 op, 468579600.00 ns, 234.2898 ms/op
Arithmetic operation resulted in an overflow.
...
Arithmetic operation resulted in an overflow.
WorkloadPilot    2: 3 op, 690754700.00 ns, 230.2516 ms/op

Arithmetic operation resulted in an overflow.
...
Arithmetic operation resulted in an overflow.
WorkloadWarmup   1: 3 op, 681204300.00 ns, 227.0681 ms/op
Arithmetic operation resulted in an overflow.
...
Arithmetic operation resulted in an overflow.
WorkloadWarmup   2: 3 op, 687082100.00 ns, 229.0274 ms/op
Arithmetic operation resulted in an overflow.
...
Arithmetic operation resulted in an overflow.
WorkloadWarmup   3: 3 op, 694201200.00 ns, 231.4004 ms/op
Arithmetic operation resulted in an overflow.
...
Arithmetic operation resulted in an overflow.
WorkloadWarmup   4: 3 op, 688857400.00 ns, 229.6191 ms/op
Arithmetic operation resulted in an overflow.
...
Arithmetic operation resulted in an overflow.
WorkloadWarmup   5: 3 op, 688555700.00 ns, 229.5186 ms/op
Arithmetic operation resulted in an overflow.
...
Arithmetic operation resulted in an overflow.
WorkloadWarmup   6: 3 op, 691014900.00 ns, 230.3383 ms/op
Arithmetic operation resulted in an overflow.
...
Arithmetic operation resulted in an overflow.
WorkloadWarmup   7: 3 op, 685841900.00 ns, 228.6140 ms/op

// BeforeActualRun
Arithmetic operation resulted in an overflow.
...
Arithmetic operation resulted in an overflow.
WorkloadActual   1: 3 op, 690004100.00 ns, 230.0014 ms/op
Arithmetic operation resulted in an overflow.
...
Arithmetic operation resulted in an overflow.
WorkloadActual   2: 3 op, 687871600.00 ns, 229.2905 ms/op
Arithmetic operation resulted in an overflow.
...
Arithmetic operation resulted in an overflow.
WorkloadActual   3: 3 op, 697458600.00 ns, 232.4862 ms/op
Arithmetic operation resulted in an overflow.
...
Arithmetic operation resulted in an overflow.
WorkloadActual   4: 3 op, 697258400.00 ns, 232.4195 ms/op
Arithmetic operation resulted in an overflow.
...
Arithmetic operation resulted in an overflow.
WorkloadActual   5: 3 op, 695430000.00 ns, 231.8100 ms/op
Arithmetic operation resulted in an overflow.
...
Arithmetic operation resulted in an overflow.
WorkloadActual   6: 3 op, 686219800.00 ns, 228.7399 ms/op
Arithmetic operation resulted in an overflow.
...
Arithmetic operation resulted in an overflow.
WorkloadActual   7: 3 op, 689060900.00 ns, 229.6870 ms/op
Arithmetic operation resulted in an overflow.
...
Arithmetic operation resulted in an overflow.
WorkloadActual   8: 3 op, 690860200.00 ns, 230.2867 ms/op
Arithmetic operation resulted in an overflow.
...
Arithmetic operation resulted in an overflow.
WorkloadActual   9: 3 op, 699231000.00 ns, 233.0770 ms/op
Arithmetic operation resulted in an overflow.
...
Arithmetic operation resulted in an overflow.
WorkloadActual  10: 3 op, 686350000.00 ns, 228.7833 ms/op
Arithmetic operation resulted in an overflow.
...
Arithmetic operation resulted in an overflow.
WorkloadActual  11: 3 op, 682036900.00 ns, 227.3456 ms/op
Arithmetic operation resulted in an overflow.
...
Arithmetic operation resulted in an overflow.
WorkloadActual  12: 3 op, 675138400.00 ns, 225.0461 ms/op
Arithmetic operation resulted in an overflow.
...
Arithmetic operation resulted in an overflow.
WorkloadActual  13: 3 op, 688420500.00 ns, 229.4735 ms/op
Arithmetic operation resulted in an overflow.
...
Arithmetic operation resulted in an overflow.
WorkloadActual  14: 3 op, 679475100.00 ns, 226.4917 ms/op
Arithmetic operation resulted in an overflow.
...
Arithmetic operation resulted in an overflow.
WorkloadActual  15: 3 op, 681246000.00 ns, 227.0820 ms/op

// AfterActualRun
Arithmetic operation resulted in an overflow.
...
Arithmetic operation resulted in an overflow.
WorkloadResult   1: 3 op, 690004100.00 ns, 230.0014 ms/op
...
WorkloadResult  15: 3 op, 681246000.00 ns, 227.0820 ms/op
GC:  255 0 0 1200007200 3
Threading:  3 0 3

// AfterAll
// Benchmark Process 2072 has exited with code 0.

Mean = 229.468 ms, StdErr = 0.605 ms (0.26%), N = 15, StdDev = 2.345 ms
Min = 225.046 ms, Q1 = 228.043 ms, Median = 229.474 ms, Q3 = 231.048 ms, Max = 233.077 ms
IQR = 3.006 ms, LowerFence = 223.534 ms, UpperFence = 235.557 ms
ConfidenceInterval = [226.961 ms; 231.975 ms] (CI 99.9%), Margin = 2.507 ms (1.09% of Mean)
Skewness = -0.12, Kurtosis = 1.93, MValue = 2

// ***** BenchmarkRunner: Finish  *****

// * Export *
  BenchmarkDotNet.Artifacts\results\CheckedOperationsBenchmark.Program-report.csv
  BenchmarkDotNet.Artifacts\results\CheckedOperationsBenchmark.Program-report-github.md
  BenchmarkDotNet.Artifacts\results\CheckedOperationsBenchmark.Program-report.html

// * Detailed results *
Program.ComputeBaseline: DefaultJob
Runtime = .NET 5.0.10 (5.0.1021.41214), X64 RyuJIT; GC = Concurrent Workstation
Mean = 218.037 ms, StdErr = 1.221 ms (0.56%), N = 41, StdDev = 7.817 ms
Min = 208.001 ms, Q1 = 212.466 ms, Median = 216.227 ms, Q3 = 221.466 ms, Max = 239.382 ms
IQR = 8.999 ms, LowerFence = 198.967 ms, UpperFence = 234.965 ms
ConfidenceInterval = [213.702 ms; 222.372 ms] (CI 99.9%), Margin = 4.335 ms (1.99% of Mean)
Skewness = 1.24, Kurtosis = 3.89, MValue = 2
-------------------- Histogram --------------------
[205.025 ms ; 210.959 ms) | @@@@
[210.959 ms ; 216.910 ms) | @@@@@@@@@@@@@@@@@@@@@
[216.910 ms ; 222.730 ms) | @@@@@@@
[222.730 ms ; 229.247 ms) | @@@@@@
[229.247 ms ; 235.592 ms) |
[235.592 ms ; 242.357 ms) | @@@
---------------------------------------------------

Program.ComputeChecked_: DefaultJob
Runtime = .NET 5.0.10 (5.0.1021.41214), X64 RyuJIT; GC = Concurrent Workstation
Mean = 229.468 ms, StdErr = 0.605 ms (0.26%), N = 15, StdDev = 2.345 ms
Min = 225.046 ms, Q1 = 228.043 ms, Median = 229.474 ms, Q3 = 231.048 ms, Max = 233.077 ms
IQR = 3.006 ms, LowerFence = 223.534 ms, UpperFence = 235.557 ms
ConfidenceInterval = [226.961 ms; 231.975 ms] (CI 99.9%), Margin = 2.507 ms (1.09% of Mean)
Skewness = -0.12, Kurtosis = 1.93, MValue = 2
-------------------- Histogram --------------------
[223.798 ms ; 234.325 ms) | @@@@@@@@@@@@@@@
---------------------------------------------------

// * Summary *

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19044.1466 (21H2)
Intel Core i7-8750H CPU 2.20GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores
.NET SDK=6.0.100
  [Host]     : .NET 5.0.10 (5.0.1021.41214), X64 RyuJIT
  DefaultJob : .NET 5.0.10 (5.0.1021.41214), X64 RyuJIT


|          Method |     Mean |   Error |  StdDev | Ratio | RatioSD |      Gen 0 | Allocated |
|---------------- |---------:|--------:|--------:|------:|--------:|-----------:|----------:|
| ComputeBaseline | 218.0 ms | 4.34 ms | 7.82 ms |  1.00 |    0.00 | 85000.0000 |    381 MB |
| ComputeChecked_ | 229.5 ms | 2.51 ms | 2.34 ms |  1.02 |    0.04 | 85000.0000 |    381 MB |

// * Hints *
Outliers
  Program.ComputeBaseline: Default -> 8 outliers were removed (259.93 ms..346.44 ms)

// * Legends *
  Mean      : Arithmetic mean of all measurements
  Error     : Half of 99.9% confidence interval
  StdDev    : Standard deviation of all measurements
  Ratio     : Mean of the ratio distribution ([Current]/[Baseline])
  RatioSD   : Standard deviation of the ratio distribution ([Current]/[Baseline])
  Gen 0     : GC Generation 0 collects per 1000 operations
  Allocated : Allocated memory per single operation (managed only, inclusive, 1KB = 1024B)
  1 ms      : 1 Millisecond (0.001 sec)

// * Diagnostic Output - MemoryDiagnoser *


// ***** BenchmarkRunner: End *****
// ** Remained 0 benchmark(s) to run **
Run time: 00:02:23 (143.72 sec), executed benchmarks: 2

Global total time: 00:02:29 (149.95 sec), executed benchmarks: 2
// * Artifacts cleanup *
History
Why does this post require attention from curators or moderators?
You might want to add some details to your flag.
Why should this post be closed?

2 comment threads

Try to establish a control... (4 comments)
Biased data (1 comment)

2 answers

You are accessing this answer with a direct link, so it's being shown above all other answers regardless of its score. You can return to the normal view.

+2
−0

Most general purpose computing operating systems can't be counted on for accurate timing as short at 1 to 2 ms. Any test case should run for a few seconds at least. Those runs should then be repeated a few times to get a sense for how much noise is on the data.

Either increase the complexity of the problem or run more iterations. I'd target the shortest run time to be in the 3-5 second range.

History
Why does this post require attention from curators or moderators?
You might want to add some details to your flag.

0 comment threads

+2
−0

Benchmarking is hard and there are a lot of potential issues with your code. I agree with Olin Lathrop that you want at least a few seconds worth of run time, not just due to the potential for a low precision clock but also to deal with noise from various sources.

However, there are still other problems. One problem is the proportion of what you're measuring compared to what you want to be measuring. Adding three ints can be (and you'd hope is) an extremely simple operation that could quite possibly take less than 2 cycles on average, possibly even less than 1. Particularly if GetSum isn't inlined, it's quite possible that the (unchecked) addition takes less than 1% of the execution time of an iteration of the loop in Compute. If it was 0.1%, for example, then even a 10x difference would barely register. With enough data you'd be able to detect that there's a statistically significant difference, but that difference would still look minuscule.

The second problem is systematic errors which can bias the result and are especially bad when, per the previous paragraph, effect size is so small. Some sources of bias may be handled by the benchmarking framework such as warming up caches so that whatever is benchmarked first doesn't eat a bunch of cache misses. A subtle, but likely minor example, is the branch in GetSum. Usually one branch of an if will be slightly faster than the other due to instruction cache and branch prediction issues. In this case, the CPU branch prediction should quickly figure out which branch is taken very early in the loop, so the effect should be negligible, though there may still be an effect. FoggyFinder points out another potential, and much larger, source of bias depending on how you intend to do checked arithmetic. If you intend to execute a handful of arithmetic operations, then catch any potential overflow exceptions and apply fallback logic, then you should do as FoggyFinder suggests and include such fallback logic but I would not bother making actual overflow cases assuming your assumption of their frequency in practice holds. (Again, even several orders of magnitude of difference for the actual overflow cases isn't going to change the overall result by much.) If, instead, you are going to do larger scale operations and catch an overflow only for the whole result, then you should NOT include the try-catch block.

A third problem is missed opportunities. Even if you did a completely perfect job a computing how much costlier add.ovf is compared to add, and frankly this would be easy to do and easier than benchmarking by just looking at the assembly, this would quite likely be meaningless in practice. The real cost of add.ovf may be in missed optimizations. You may have wondered how performing two add operations could take less than a cycle to execute. The answer is superscalar and/or SIMD execution or more generally what's known as instruction level parallelism. If a SIMD opcode let's you execute 4 adds at once in a single cycle, then each only costs a quarter of a cycle. The cost could even be 0 cycles if the compiler is able to eliminate the add entirely. A trivial but illustrative example is (a + b) - b is equivalent to just a in int arithmetic even if a + b overflows. The need to ensure a + b doesn't overflow would block this optimization in checked arithmetic.

So now some actual advice.

  1. Most importantly, benchmark real and representative code. This is what you care about and this will incorporate any relevant second-order effects.
  2. Minimize overhead. In this case, this means inlining everything and avoiding heap allocation in the inner loop. In other contexts, these may be negligible overhead.
  3. Get enough samples. The higher the precision you need and the noiser the environment, the more samples you should get. More is always better. So add, at least, another factor of 1,000 to Loops.
  4. If possible, measure the overhead. In terms of your original code, this might involve a variant of GetSum that does no addition and, perhaps, just returns numbers[1], say. Admittedly, there's a risk that the compiler does something smart. However, if the result of doing no additions is roughly the same as with additions, then you know most of what you're measuring is overhead.
  5. Don't put any benchmark choosing logic in the benchmark itself. This5is overhead, but, more importantly, it is an easily avoidable source of potential bias.
  6. Use an AOT compiler and check the assembly NOT (just) the bytecode. This is less important when benchmarking real code as opposed to a microbenchmark because the different assembly output is part of the cost that you want to measure. Nevertheless, you want the difference to be due to the different code-under-test and not the benchmarking setup.

Within the context of your current code, the result of the above advice assuming you don't need fallback code every few arithmetic operations would be:

public void ComputeBaseline()
{
    for (int i = 0; i < Loops; i++)
    {
        int[] row = Data[i];
        Results[i] = row[0] + row[1] + row[2];
    }
}

and similarly for ComputeChecked_.

As an aside that I won't go into to detail here as it's not the topic of this question, I find it unlikely that a benchmark is needed to make the decision here. It's fine as a matter of curiosity. It also seems unlikely to me that checked arithmetic is the right solution to this problem (assuming it's a problem that needs solving) as opposed to larger or even unbounded numeric types.

History
Why does this post require attention from curators or moderators?
You might want to add some details to your flag.

0 comment threads

Sign up to answer this question »