Communities

Writing
Writing
Codidact Meta
Codidact Meta
The Great Outdoors
The Great Outdoors
Photography & Video
Photography & Video
Scientific Speculation
Scientific Speculation
Cooking
Cooking
Electrical Engineering
Electrical Engineering
Judaism
Judaism
Languages & Linguistics
Languages & Linguistics
Software Development
Software Development
Mathematics
Mathematics
Christianity
Christianity
Code Golf
Code Golf
Music
Music
Physics
Physics
Linux Systems
Linux Systems
Power Users
Power Users
Tabletop RPGs
Tabletop RPGs
Community Proposals
Community Proposals
tag:snake search within a tag
answers:0 unanswered questions
user:xxxx search by author id
score:0.5 posts with 0.5+ score
"snake oil" exact phrase
votes:4 posts with 4+ votes
created:<1w created < 1 week ago
post_type:xxxx type of post
Search help
Notifications
Mark all as read See all your notifications »
Q&A

Welcome to Software Development on Codidact!

Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.

Why static code analyzers such as SonarQube indicate a high code complexity for switch statements?

+3
−0

During a presentation of a pipeline configuration, a colleague showed a SonarQube integration and one of its reports. A warning was caused by overrunning the max value for the code complexity threshold in a function containing a fairly large switch statement.

Why are switch statements considered too complex and what to do about them?

The problematic code was similar to the one below, but containing about double of the case statements:

enum FooStasus
{
    ToBeStarted = 1,
    Starting = 2,
    OnGoing = 3,
    OnHold = 4,
    Stopped = 5
}

private static double GetWeight(FooStasus status)
{
    switch (status)
    {
        case FooStasus.ToBeStarted:
        case FooStasus.Starting: return 1.0;
        case FooStasus.OnGoing: return 2.0;
        case FooStasus.OnHold: return 0.2;
        case FooStasus.Stopped: return 0;
        default: return 0.5;
    }
}

History
Why does this post require attention from curators or moderators?
You might want to add some details to your flag.
Why should this post be closed?

1 comment thread

Cyclomatic complexity = the number of execution paths (2 comments)

2 answers

+5
−0

It depends on why you have a large switch statement. I'm going to assume it's because you have a large enum. If not, then you probably have some complicated logic, and you should endeavor to break it into a more hierarchical form. For example, if you 12 cases, you might be able to do a check to split it into 3 classes of 4 cases. (The cases don't need to split evenly.) Most likely, you'd also want to have a new method for each of the classes.

From what I can tell, there is no complicated logic that SonarQube is using. It simply has a rule that complains about switch statements with many cases. Traditional code complexity metrics like cyclomatic complexity may be motivations for this rule, but the rule itself is not sophisticated.

If you have a large enum, then the natural way to consume it is with a large switch. The question becomes: why do you have a large enum and what are your alternatives?

First, large enums do arise fairly naturally in many situations. Some fairly generic CS examples are: opcodes of a (virtual) machine, states in a state machine, node types of an abstract syntax tree, packet types in network protocols, field types in various file formats/protocols, and error/status codes. Nevertheless, it's worthwhile to think about what the enum represents and whether it's natural or should be split into different things or represented some other way.

In fact, many of the examples I've listed are not best represented by a (traditional) enum type as each case usually has some naturally associated data that varies with the case. For example, a node of an abstract syntax tree will have a different number of children depending on the node type and often additional details. Similarly, different types of errors will often have different data that could be associated with them.

The natural representation of these is what is known as a sum type aka tagged/disjoint union and is also a special case of algebraic data types. The natural way to consume a sum type is pattern matching. Sum types and pattern matching are common features in (typed) functional languages but are only recently being included in mainstream OO languages; often in limited or unsatisfying ways for those familiar with the concept from FP.

Sum types are a fundamental way of structuring data, so, of course, OO has some story for it though it is rarely framed in terms of sum types.

The OO Solution

Returning to the simple switch and enum for a moment, an OO purist would say that we should never do case analysis of any sort, and all such case analysis should be replaced with dynamic dispatch. Using the example from the question would produce:

interface FooStatus {
   double GetWeight();
}

class ToBeStarted : FooStatus {
   public double GetWeight() {
      return 1.0;
   }
}

// etc.

This completely resolves this issue, allows us to add additional data on a case-by-case basis, i.e. it allows supporting sum types, and it adds extensibility. We lose code locality – the implementation of GetWeight is now spread amongst several classes – and we can no longer leverage fall-through or default cases. We also lose access to the local scope.

While extensibility is the usual reason given for this approach and is often compelling, there are plenty of cases where it's not compelling and we lose extensibility along a different axis. For example, taking the OO purist completely seriously would mean every if statement in our program should instead be represented by a method on a Bool interface. Few languages even allow you to do this nor would any sane developer want to do this as a matter of course.

While the OO representation allows you to add cases easily, it makes it difficult to add new case analyses. Adding new cases analyses was trivial with the switch approach: it's just another switch statement.

The FP Solution (encoded into OO)

In type theory, a type isn't defined by how it's represented. Instead, it is defined by how it is constructed and how it is consumed. The previous representation doesn't fully define a sum type as it doesn't support being consumed as a sum type. The solution to this is to add a method that encodes case analysis. In the OO world this more or less corresponds to the Visitor Pattern. For small sum types, I prefer to just pass the cases in as higher-order functions. This leads to code like:

optionalNumber.Match(
    () => -1,
    x  => x + 1);

This works at a theoretical level because a function consuming a (binary) sum is the same as a pair of functions. Symbolically, A + B -> C is isomorphic to (A -> C, B -> C).[1]

For large sum type, I'd hew more toward the traditional Visitor Pattern and bundle up the parameters of the Match method into a class. This would produce:

interface FooVisitor<C> {
   C applyToBeStarted();
   // etc.
}

interface FooStatus {
    C Match<C>(FooVisitor<C> visitor);
}


class ToBeStarted : FooStatus {
    C Match<C>(FooVisitor<C> visitor) {
        return visitor.applyToBeStarted();
    }
}
// etc.

double GetWeight(FooStatus status) {
    class GetWeightVisitor : FooVisitor<double> {
        public double applyToBeStarted() {
            return 1.0;
        }
        // etc.
    }
    return status.Match(new GetWeightVisitor());
}

Again, this readily scales to more general sum types. Java's anonymous inner classes work rather nicely here, but the result is a bit verbose and clunky in C# hence my preference for tuples of higher-order functions which work even more nicely. This pattern also works rather nicely with C# extension methods.

The FooVisitor class encodes the options which eliminates the extensibility of the previous solution: adding a new status means updating all the visitors. The gain is that analyzing FooStatus in a different way is just a matter of making a new FooVisitor. We can still directly add methods to the FooStatus interface combining this approach with the previous approach.

This pattern works best when consumers of a class might add ad-hoc analyses regularly which you don't want to pollute the class with. This pattern is exactly what "pure OO" languages like Smalltalk do with things like Bool.

The Table-Driven Solution

One way of implementing a switch statement, especially large switches, is what historically has been called a jump table. To do this, the compiler would make an array of code addresses each pointing at a branch of the switch, then the switch would index into that array and then perform an indirect jump. We can easily do the same thing manually given higher-order functions (or objects). Java's EnumMap fits this quite nicely. This is the approach used in the other answer by Alexei, so I won't repeat it here.

The benefit of this approach is that it can add a lot of dynamism. Manipulating these tables at run-time is quite easy and natural. It's also quite extensible.

The downsides are that it is too dynamic most of the time. There is no exhaustiveness checking, and you'll get no warning to update these tables when you add a new case. This also inhibits other code analysis tools and compiler optimizations. This approach also doesn't easily scale to more general sum types, i.e. there isn't a clean way to pass in differently typed data based on the case.

The switch Solution

If after careful consideration of the alternatives and their costs and benefits, you still think the large enum is the best choice, then the best solution is likely to just tell SonarQube to shut up. Most static analysis tools have a way to locally disable a rule, e.g. just for a single method. It might be possible to disable the rule for the enum as you'll run into the issue basically any time you interact with the enum non-trivially.

There are certainly benefits of the enum/switch solution such as code locality, access to local scope, and exhaustiveness checking, and these aren't worth discarding just to make a tool happy. With the pattern matching features added in recent versions of C#, you can handle arbitrary sum types albeit at the cost of exhaustiveness checking due to the way it works.


  1. This isomorphism and several like it are very easy to remember if we write A -> C as CA. The isomorphism just discussed is then the rule for multiplying exponentials: CA+B = CACB which you might remember from grade school. All the so-called "high school algebra" rules have type theoretic analogues. Indeed, these rules generalize far beyond that. ↩︎

History
Why does this post require attention from curators or moderators?
You might want to add some details to your flag.

0 comment threads

+2
−0

I would skip the theoretical part of actually computing the cyclomatic complexity of a switch statement and mention that it can see as a bunch of if statements. Since each if adds to the complexity, the higher the number of cases, the higher the complexity.

While a simple switch such as the one in the question is simple to read and maintain, if it grows way larger, it becomes easier to introduce bugs, even if C# prevents the flow control from passing from one case to another if the case has at least one instruction (i.e. each nonempty case must use break or return).

One way to mitigate this relies on the actual code complexity. If this is a simple mapping, a simple Dictionary can be used. Example:

private static Dictionary<FooStasus, double> _statusMap = new()
{
    { FooStasus.ToBeStarted, 1.0 },
    { FooStasus.Starting, 1.0 },
    { FooStasus.OnGoing, 2.0 },
    { FooStasus.OnHold, 0.1 },
    { FooStasus.Stopped, 0 },
};

private static double GetWeightSimple(FooStasus status) =>
    _statusMap.GetValueOrDefault(status, 0.5);

If the logic is more complex, the dictionary can hold Actions or Funcs that might rely on the case value to perform some logic. If they are more than in the following example, it would be wise to define them as separate functions:

private static Dictionary<FooStasus, Func<FooStasus, double>> GetWeightActMap = new()
{
    { FooStasus.ToBeStarted, _ => { Console.WriteLine("To be started"); return 1.0;} },
    { FooStasus.Starting, status => { Console.WriteLine($"Status is {status}"); return 1.0; } },
    { FooStasus.OnGoing, _ => 2.0 },
    { FooStasus.OnHold, _ => 0.1 },
    { FooStasus.Stopped, _ => default },
};

private static double GetWeightAct(FooStasus status) =>
    GetWeightActMap.GetValueOrDefault(status)?.Invoke(status) ?? default;

However, as suggested by elgonzo this should be avoided for anything that is a little more complex than my trivial example and use (local) functions instead, since they provide meaning to each case.

Also, whenever the mapping list is fairly long, a table might be more suitable for storage.

History
Why does this post require attention from curators or moderators?
You might want to add some details to your flag.

1 comment thread

Nitpicker Central... :-) (4 comments)

Sign up to answer this question »