Communities

Writing
Writing
Codidact Meta
Codidact Meta
The Great Outdoors
The Great Outdoors
Photography & Video
Photography & Video
Scientific Speculation
Scientific Speculation
Cooking
Cooking
Electrical Engineering
Electrical Engineering
Judaism
Judaism
Languages & Linguistics
Languages & Linguistics
Software Development
Software Development
Mathematics
Mathematics
Christianity
Christianity
Code Golf
Code Golf
Music
Music
Physics
Physics
Linux Systems
Linux Systems
Power Users
Power Users
Tabletop RPGs
Tabletop RPGs
Community Proposals
Community Proposals
tag:snake search within a tag
answers:0 unanswered questions
user:xxxx search by author id
score:0.5 posts with 0.5+ score
"snake oil" exact phrase
votes:4 posts with 4+ votes
created:<1w created < 1 week ago
post_type:xxxx type of post
Search help
Notifications
Mark all as read See all your notifications »
Code Reviews

Welcome to Software Development on Codidact!

Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.

Comments on Measuring arithmetic overflow checking overhead in C#

Post

Measuring arithmetic overflow checking overhead in C#

+4
−0

Overflow checking for integral-type arithmetic operations is disabled by default and it can be explicitly enabled by using using checked function or the -checked compiler switch.

Since I mainly develop business applications where the main focus is correctness, I am wondering if globally switching to checked makes sense and what would be the performance impact.

My benchmarking code is the following:

[MemoryDiagnoser]
public class Program
{
    private const int Loops = 100 * 1000;
    private static readonly int[][] Data = new int[Loops][];
    private static readonly int[] Results = new int[Loops];
    private const int MaxValue = int.MaxValue / 3 + 5 * 1000 * 1000;

    [GlobalSetup]
    // populate data to avoid wasting benchmarking time with this
    public static void CreateData()
    {
        Random r = new();
        for (int i = 0; i < Loops; i++)
        {
            int n1 = r.Next(MaxValue);
            int n2 = r.Next(MaxValue);
            int n3 = r.Next(MaxValue);
            Data[i] = new[] { n1, n2, n3 };
        }
    }

    // checked is applied based on flag
    // since checked(function_call) does not seem to work
    private static int GetSum(bool isChecked, int[] numbers)
    {
        if (isChecked)
            return checked (numbers[0] + numbers[1] + numbers[2]);

        return numbers[0] + numbers[1] + numbers[2];
    }

    // execute the sum in a loop to get a significant computation time
    private static void Compute(bool isChecked)
    {
        for (int i = 0; i < Loops; i++)
        {
            int[] row = Data[i];
            int result = GetSum(isChecked , new [] {row[0], row[1], row[2]});
            Results[i] = result;
        }
    }

    [Benchmark(Baseline = true)]
    public void ComputeBaseline()
    {
        Compute(false);
    }

    [Benchmark(Baseline = false)]
    public void ComputeChecked_()
    {
        Compute(true);
    }

    static void Main(string[] args)
    {
        var summary = BenchmarkRunner.Run<Program>();
    }
}

My results are as follows (release without attached to process):

Method Mean Error StdDev Ratio RatioSD Gen 0 Allocated
ComputeBaseline 1.257 ms 0.0127 ms 0.0106 ms 1.00 0.00 849.6094 4 MB
ComputeChecked_ 1.153 ms 0.0213 ms 0.0209 ms 0.92 0.02 849.6094 4 MB

I am not sure I am reading this correct, because it seems that the checked version is a little bit faster than the unchecked one or my code is incorrectly written for the benchmark.

I would appreciate a code review to understand if I doing something wrong in my assessment.


I have made some changes based on the received feedback:

  • fixed a bug related to initialization - checked version gets closer to the unchecked (base) one
  • no inlining - did not notice any change
  • increased the runtime - extra getting closer between base and checked
  • reduced bias by introducing a small chance to actually have an overflow - the checked takes clearly longer even if the overflow very rarely (a couple of dozens of times pe 10M loops, which is reasonable to happen in a real-life scenario).

The final result looks like this:

Method Mean Error StdDev Ratio RatioSD Gen 0 Allocated
ComputeBaseline 218.0 ms 4.34 ms 7.82 ms 1.00 0.00 85000.0000 381 MB
ComputeChecked_ 229.5 ms 2.51 ms 2.34 ms 1.02 0.04 85000.0000 381 MB

For those interested in what the benchmark is actually outputting, I will add the relevant parts here:

// Validating benchmarks:
// ***** BenchmarkRunner: Start   *****
// ***** Found 2 benchmark(s) in total *****
// ***** Building 1 exe(s) in Parallel: Start   *****
// start dotnet restore  /p:UseSharedCompilation=false /p:BuildInParallel=false /m:1 /p:Deterministic=true /p:Optimize=true in C:\Users\Alex\source\repos\CheckedOperationsBenchmark\bin\Release\net5.0\8844cf42-ce2d-4210-8746-304950dee939
// command took 2.27s and exited with 0
// start dotnet build -c Release  --no-restore /p:UseSharedCompilation=false /p:BuildInParallel=false /m:1 /p:Deterministic=true /p:Optimize=true in C:\Users\Alex\source\repos\CheckedOperationsBenchmark\bin\Release\net5.0\8844cf42-ce2d-4210-8746-304950dee939
// command took 3.78s and exited with 0
// ***** Done, took 00:00:06 (6.21 sec)   *****
// Found 2 benchmarks:
//   Program.ComputeBaseline: DefaultJob
//   Program.ComputeChecked_: DefaultJob

// **************************
// Benchmark: Program.ComputeBaseline: DefaultJob
// *** Execute ***
// Launch: 1 / 1
// Execute: dotnet "8844cf42-ce2d-4210-8746-304950dee939.dll" --benchmarkName "CheckedOperationsBenchmark.Program.ComputeBaseline" --job "Default" --benchmarkId 0 in C:\Users\Alex\source\repos\CheckedOperationsBenchmark\bin\Release\net5.0\8844cf42-ce2d-4210-8746-304950dee939\bin\Release\net5.0
// BeforeAnythingElse

// Benchmark Process Environment Information:
// Runtime=.NET 5.0.10 (5.0.1021.41214), X64 RyuJIT
// GC=Concurrent Workstation
// Job: DefaultJob

OverheadJitting  1: 1 op, 405100.00 ns, 405.1000 us/op
WorkloadJitting  1: 1 op, 271396400.00 ns, 271.3964 ms/op

WorkloadPilot    1: 2 op, 423324000.00 ns, 211.6620 ms/op
WorkloadPilot    2: 3 op, 646507700.00 ns, 215.5026 ms/op

WorkloadWarmup   1: 3 op, 650490500.00 ns, 216.8302 ms/op
...
WorkloadWarmup   6: 3 op, 671575300.00 ns, 223.8584 ms/op

// BeforeActualRun
WorkloadActual   1: 3 op, 641281200.00 ns, 213.7604 ms/op
...
WorkloadActual  49: 3 op, 634473700.00 ns, 211.4912 ms/op

// AfterActualRun
WorkloadResult   1: 3 op, 641281200.00 ns, 213.7604 ms/op
...
WorkloadResult  41: 3 op, 634473700.00 ns, 211.4912 ms/op
GC:  255 0 0 1200000000 3
Threading:  3 0 3

// AfterAll
// Benchmark Process 18616 has exited with code 0.

Mean = 218.037 ms, StdErr = 1.221 ms (0.56%), N = 41, StdDev = 7.817 ms
Min = 208.001 ms, Q1 = 212.466 ms, Median = 216.227 ms, Q3 = 221.466 ms, Max = 239.382 ms
IQR = 8.999 ms, LowerFence = 198.967 ms, UpperFence = 234.965 ms
ConfidenceInterval = [213.702 ms; 222.372 ms] (CI 99.9%), Margin = 4.335 ms (1.99% of Mean)
Skewness = 1.24, Kurtosis = 3.89, MValue = 2

// **************************
// Benchmark: Program.ComputeChecked_: DefaultJob
// *** Execute ***
// Launch: 1 / 1
// Execute: dotnet "8844cf42-ce2d-4210-8746-304950dee939.dll" --benchmarkName "CheckedOperationsBenchmark.Program.ComputeChecked_" --job "Default" --benchmarkId 1 in C:\Users\Alex\source\repos\CheckedOperationsBenchmark\bin\Release\net5.0\8844cf42-ce2d-4210-8746-304950dee939\bin\Release\net5.0
// BeforeAnythingElse

// Benchmark Process Environment Information:
// Runtime=.NET 5.0.10 (5.0.1021.41214), X64 RyuJIT
// GC=Concurrent Workstation
// Job: DefaultJob

OverheadJitting  1: 1 op, 377800.00 ns, 377.8000 us/op
Arithmetic operation resulted in an overflow.
...
Arithmetic operation resulted in an overflow.
WorkloadJitting  1: 1 op, 221698200.00 ns, 221.6982 ms/op

Arithmetic operation resulted in an overflow.
...
Arithmetic operation resulted in an overflow.
WorkloadPilot    1: 2 op, 468579600.00 ns, 234.2898 ms/op
Arithmetic operation resulted in an overflow.
...
Arithmetic operation resulted in an overflow.
WorkloadPilot    2: 3 op, 690754700.00 ns, 230.2516 ms/op

Arithmetic operation resulted in an overflow.
...
Arithmetic operation resulted in an overflow.
WorkloadWarmup   1: 3 op, 681204300.00 ns, 227.0681 ms/op
Arithmetic operation resulted in an overflow.
...
Arithmetic operation resulted in an overflow.
WorkloadWarmup   2: 3 op, 687082100.00 ns, 229.0274 ms/op
Arithmetic operation resulted in an overflow.
...
Arithmetic operation resulted in an overflow.
WorkloadWarmup   3: 3 op, 694201200.00 ns, 231.4004 ms/op
Arithmetic operation resulted in an overflow.
...
Arithmetic operation resulted in an overflow.
WorkloadWarmup   4: 3 op, 688857400.00 ns, 229.6191 ms/op
Arithmetic operation resulted in an overflow.
...
Arithmetic operation resulted in an overflow.
WorkloadWarmup   5: 3 op, 688555700.00 ns, 229.5186 ms/op
Arithmetic operation resulted in an overflow.
...
Arithmetic operation resulted in an overflow.
WorkloadWarmup   6: 3 op, 691014900.00 ns, 230.3383 ms/op
Arithmetic operation resulted in an overflow.
...
Arithmetic operation resulted in an overflow.
WorkloadWarmup   7: 3 op, 685841900.00 ns, 228.6140 ms/op

// BeforeActualRun
Arithmetic operation resulted in an overflow.
...
Arithmetic operation resulted in an overflow.
WorkloadActual   1: 3 op, 690004100.00 ns, 230.0014 ms/op
Arithmetic operation resulted in an overflow.
...
Arithmetic operation resulted in an overflow.
WorkloadActual   2: 3 op, 687871600.00 ns, 229.2905 ms/op
Arithmetic operation resulted in an overflow.
...
Arithmetic operation resulted in an overflow.
WorkloadActual   3: 3 op, 697458600.00 ns, 232.4862 ms/op
Arithmetic operation resulted in an overflow.
...
Arithmetic operation resulted in an overflow.
WorkloadActual   4: 3 op, 697258400.00 ns, 232.4195 ms/op
Arithmetic operation resulted in an overflow.
...
Arithmetic operation resulted in an overflow.
WorkloadActual   5: 3 op, 695430000.00 ns, 231.8100 ms/op
Arithmetic operation resulted in an overflow.
...
Arithmetic operation resulted in an overflow.
WorkloadActual   6: 3 op, 686219800.00 ns, 228.7399 ms/op
Arithmetic operation resulted in an overflow.
...
Arithmetic operation resulted in an overflow.
WorkloadActual   7: 3 op, 689060900.00 ns, 229.6870 ms/op
Arithmetic operation resulted in an overflow.
...
Arithmetic operation resulted in an overflow.
WorkloadActual   8: 3 op, 690860200.00 ns, 230.2867 ms/op
Arithmetic operation resulted in an overflow.
...
Arithmetic operation resulted in an overflow.
WorkloadActual   9: 3 op, 699231000.00 ns, 233.0770 ms/op
Arithmetic operation resulted in an overflow.
...
Arithmetic operation resulted in an overflow.
WorkloadActual  10: 3 op, 686350000.00 ns, 228.7833 ms/op
Arithmetic operation resulted in an overflow.
...
Arithmetic operation resulted in an overflow.
WorkloadActual  11: 3 op, 682036900.00 ns, 227.3456 ms/op
Arithmetic operation resulted in an overflow.
...
Arithmetic operation resulted in an overflow.
WorkloadActual  12: 3 op, 675138400.00 ns, 225.0461 ms/op
Arithmetic operation resulted in an overflow.
...
Arithmetic operation resulted in an overflow.
WorkloadActual  13: 3 op, 688420500.00 ns, 229.4735 ms/op
Arithmetic operation resulted in an overflow.
...
Arithmetic operation resulted in an overflow.
WorkloadActual  14: 3 op, 679475100.00 ns, 226.4917 ms/op
Arithmetic operation resulted in an overflow.
...
Arithmetic operation resulted in an overflow.
WorkloadActual  15: 3 op, 681246000.00 ns, 227.0820 ms/op

// AfterActualRun
Arithmetic operation resulted in an overflow.
...
Arithmetic operation resulted in an overflow.
WorkloadResult   1: 3 op, 690004100.00 ns, 230.0014 ms/op
...
WorkloadResult  15: 3 op, 681246000.00 ns, 227.0820 ms/op
GC:  255 0 0 1200007200 3
Threading:  3 0 3

// AfterAll
// Benchmark Process 2072 has exited with code 0.

Mean = 229.468 ms, StdErr = 0.605 ms (0.26%), N = 15, StdDev = 2.345 ms
Min = 225.046 ms, Q1 = 228.043 ms, Median = 229.474 ms, Q3 = 231.048 ms, Max = 233.077 ms
IQR = 3.006 ms, LowerFence = 223.534 ms, UpperFence = 235.557 ms
ConfidenceInterval = [226.961 ms; 231.975 ms] (CI 99.9%), Margin = 2.507 ms (1.09% of Mean)
Skewness = -0.12, Kurtosis = 1.93, MValue = 2

// ***** BenchmarkRunner: Finish  *****

// * Export *
  BenchmarkDotNet.Artifacts\results\CheckedOperationsBenchmark.Program-report.csv
  BenchmarkDotNet.Artifacts\results\CheckedOperationsBenchmark.Program-report-github.md
  BenchmarkDotNet.Artifacts\results\CheckedOperationsBenchmark.Program-report.html

// * Detailed results *
Program.ComputeBaseline: DefaultJob
Runtime = .NET 5.0.10 (5.0.1021.41214), X64 RyuJIT; GC = Concurrent Workstation
Mean = 218.037 ms, StdErr = 1.221 ms (0.56%), N = 41, StdDev = 7.817 ms
Min = 208.001 ms, Q1 = 212.466 ms, Median = 216.227 ms, Q3 = 221.466 ms, Max = 239.382 ms
IQR = 8.999 ms, LowerFence = 198.967 ms, UpperFence = 234.965 ms
ConfidenceInterval = [213.702 ms; 222.372 ms] (CI 99.9%), Margin = 4.335 ms (1.99% of Mean)
Skewness = 1.24, Kurtosis = 3.89, MValue = 2
-------------------- Histogram --------------------
[205.025 ms ; 210.959 ms) | @@@@
[210.959 ms ; 216.910 ms) | @@@@@@@@@@@@@@@@@@@@@
[216.910 ms ; 222.730 ms) | @@@@@@@
[222.730 ms ; 229.247 ms) | @@@@@@
[229.247 ms ; 235.592 ms) |
[235.592 ms ; 242.357 ms) | @@@
---------------------------------------------------

Program.ComputeChecked_: DefaultJob
Runtime = .NET 5.0.10 (5.0.1021.41214), X64 RyuJIT; GC = Concurrent Workstation
Mean = 229.468 ms, StdErr = 0.605 ms (0.26%), N = 15, StdDev = 2.345 ms
Min = 225.046 ms, Q1 = 228.043 ms, Median = 229.474 ms, Q3 = 231.048 ms, Max = 233.077 ms
IQR = 3.006 ms, LowerFence = 223.534 ms, UpperFence = 235.557 ms
ConfidenceInterval = [226.961 ms; 231.975 ms] (CI 99.9%), Margin = 2.507 ms (1.09% of Mean)
Skewness = -0.12, Kurtosis = 1.93, MValue = 2
-------------------- Histogram --------------------
[223.798 ms ; 234.325 ms) | @@@@@@@@@@@@@@@
---------------------------------------------------

// * Summary *

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19044.1466 (21H2)
Intel Core i7-8750H CPU 2.20GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores
.NET SDK=6.0.100
  [Host]     : .NET 5.0.10 (5.0.1021.41214), X64 RyuJIT
  DefaultJob : .NET 5.0.10 (5.0.1021.41214), X64 RyuJIT


|          Method |     Mean |   Error |  StdDev | Ratio | RatioSD |      Gen 0 | Allocated |
|---------------- |---------:|--------:|--------:|------:|--------:|-----------:|----------:|
| ComputeBaseline | 218.0 ms | 4.34 ms | 7.82 ms |  1.00 |    0.00 | 85000.0000 |    381 MB |
| ComputeChecked_ | 229.5 ms | 2.51 ms | 2.34 ms |  1.02 |    0.04 | 85000.0000 |    381 MB |

// * Hints *
Outliers
  Program.ComputeBaseline: Default -> 8 outliers were removed (259.93 ms..346.44 ms)

// * Legends *
  Mean      : Arithmetic mean of all measurements
  Error     : Half of 99.9% confidence interval
  StdDev    : Standard deviation of all measurements
  Ratio     : Mean of the ratio distribution ([Current]/[Baseline])
  RatioSD   : Standard deviation of the ratio distribution ([Current]/[Baseline])
  Gen 0     : GC Generation 0 collects per 1000 operations
  Allocated : Allocated memory per single operation (managed only, inclusive, 1KB = 1024B)
  1 ms      : 1 Millisecond (0.001 sec)

// * Diagnostic Output - MemoryDiagnoser *


// ***** BenchmarkRunner: End *****
// ** Remained 0 benchmark(s) to run **
Run time: 00:02:23 (143.72 sec), executed benchmarks: 2

Global total time: 00:02:29 (149.95 sec), executed benchmarks: 2
// * Artifacts cleanup *
History
Why does this post require attention from curators or moderators?
You might want to add some details to your flag.
Why should this post be closed?

2 comment threads

Try to establish a control... (4 comments)
Biased data (1 comment)
Try to establish a control...
elgonzo‭ wrote over 2 years ago · edited over 2 years ago

Just a stupid idea: What if the observed results are not due to the use of checked, but an artifact of the JIT compiler or due to external factors influencing your benchmark run (like, do you have other stuff running on your box that might possibly intermittently load the CPU and chew cycles)?

To get meaningful results for benchmarking runs where only rather small runtime discrepancies are expected, i would first make sure to run them on a machine that is unperturbed by other processes as much as possible.

Then, i would also establish a control to increase confidence (or disprove) in checked being actually responsible for the observed difference in runtime, regardless of whether one or the other benchmark run is slower or faster.

elgonzo‭ wrote over 2 years ago · edited over 2 years ago

To do so, the benchmarks should be altered so that the code for the checked and unchecked case is isolated into separate methods. Keeping as much of your current code base as possible, i would do this (not sure if you need to add [MethodImpl(MethodImplOptions.NoInlining)] to prevent inlining, but doesn't hurt, i guess):

[MethodImpl(MethodImplOptions.NoInlining)]
private static int GetSumChecked(int[] numbers)
    => checked (numbers[0] + numbers[1] + numbers[2]);

[MethodImpl(MethodImplOptions.NoInlining)]
private static int GetSumUnchecked(int[] numbers)
    => numbers[0] + numbers[1] + numbers[2];

private static int GetSum(bool isChecked, int[] numbers)
    => (isChecked) ? GetSumChecked(numbers) : GetSumUnchecked(numbers) ;

Run it and note the test results.

Also make a (readable) copy of the IL produced by the C# compiler.

elgonzo‭ wrote over 2 years ago · edited over 2 years ago

Then remove the checked keyword from the GetSumChecked method:

[MethodImpl(MethodImplOptions.NoInlining)]
private static int GetSumChecked(int[] numbers)
    => numbers[0] + numbers[1] + numbers[2];

and run the benchmark again. (Now it should be apparent why i suggest separate GetSumChecked and GetSumUnchecked methods. Simply removing the checked from the GetSum might allow the JIT compiler to optimize the GetSum method in a way that could possibly spoil the benchmark result.)

Does the result differ from the previous run using checked ?

Also compare the IL produced by the C# compiler for this code variant with the IL code of the code variant i mentioned in my previous comment. Verify that the IL generated by the C# compiler is identical between the two code variants except for add/add.ovf.

Not sure if the results of doing this will yield any new and meaningful insights, but i guess it's worth a try...?

Alexei‭ wrote over 2 years ago

elgonzo‭ Ref. to "Just a stupid idea: What if the observed results are not due to the use of checked, but an artifact of the JIT compiler or due to external factors influencing your benchmark run (like, do you have other stuff running on your box that might possibly intermittently load the CPU and chew cycles)?" - I think most of these are tackled by the benchmarking library which I have noticed that perform some sort of prewarming and other "fake" activities between the actual tests. Also, the "outliers" are not considered.

I will check the other suggestions later and let you know about the results. Thanks.