Welcome to Software Development on Codidact!

Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.

Post History

77%

+5 −0

Q&A Why static code analyzers such as SonarQube indicate a high code complexity for switch statements?

It depends on why you have a large switch statement. I'm going to assume it's because you have a large enum. If not, then you probably have some complicated logic, and you should endeavor to break ...

posted 3y ago by Derek Elkins‭

Answer

#1: Initial revision by

Derek Elkins‭ · 2022-02-07T01:18:42Z (over 3 years ago)

Copy Link

Raw

Markdown

It depends on why you have a large `switch` statement. I'm going to assume it's because you have a large `enum`. If not, then you probably have some complicated logic, and you should endeavor to break it into a more hierarchical form. For example, if you 12 cases, you might be able to do a check to split it into 3 classes of 4 cases. (The cases don't need to split evenly.) Most likely, you'd also want to have a new method for each of the classes.

From what I can tell, there is no complicated logic that SonarQube is using. It simply has a rule that complains about `switch` statements with many cases. Traditional code complexity metrics like cyclomatic complexity may be motivations for this rule, but the rule itself is not sophisticated.

If you have a large `enum`, then the natural way to consume it is with a large `switch`. The question becomes: why do you have a large `enum` and what are your alternatives?

First, large `enum`s do arise fairly naturally in many situations. Some fairly generic CS examples are: opcodes of a (virtual) machine, states in a state machine, node types of an abstract syntax tree, packet types in network protocols, field types in various file formats/protocols, and error/status codes. Nevertheless, it's worthwhile to think about what the `enum` represents and whether it's natural or should be split into different things or represented some other way.

In fact, many of the examples I've listed are not best represented by a (traditional) `enum` type as each case usually has some naturally associated data that varies with the case. For example, a node of an abstract syntax tree will have a different number of children depending on the node type and often additional details. Similarly, different types of errors will often have different data that could be associated with them.

The natural representation of these is what is known as a **sum type** aka tagged/disjoint union and is also a special case of algebraic data types. The natural way to consume a sum type is pattern matching. Sum types and pattern matching are common features in (typed) functional languages but are only recently being included in mainstream OO languages; often in limited or unsatisfying ways for those familiar with the concept from FP.

Sum types are a fundamental way of structuring data, so, of course, OO has some story for it though it is rarely framed in terms of sum types.

### The OO Solution

Returning to the simple `switch` and `enum` for a moment, an OO purist would say that we should never do case analysis of any sort, and all such case analysis should be replaced with dynamic dispatch. Using the example from the question would produce:

```csharp
interface FooStatus {
   double GetWeight();
}

class ToBeStarted : FooStatus {
   public double GetWeight() {
      return 1.0;
   }
}

// etc.
```
This completely resolves this issue, allows us to add additional data on a case-by-case basis, i.e. it allows supporting sum types, and it adds extensibility. We lose code locality &ndash; the implementation of `GetWeight` is now spread amongst several classes &ndash; and we can no longer leverage fall-through or default cases. We also lose access to the local scope.

While extensibility is the usual reason given for this approach and is often compelling, there are plenty of cases where it's not compelling and we lose extensibility along a different axis. For example, taking the OO purist completely seriously would mean *every* `if` statement in our program should instead be represented by a method on a `Bool` interface. Few languages even allow you to do this nor would any sane developer want to do this as a matter of course.

While the OO representation allows you to add cases easily, it makes it difficult to add new case analyses. Adding new cases analyses was trivial with the `switch` approach: it's just another `switch` statement.

### The FP Solution (encoded into OO)

In type theory, a type isn't defined by how it's represented. Instead, it is defined by how it is constructed *and how it is consumed*. The previous representation doesn't fully define a sum type as it doesn't support being consumed as a sum type. The solution to this is to add a method that encodes case analysis. In the OO world this more or less corresponds to the Visitor Pattern. For small sum types, I prefer to just pass the cases in as higher-order functions. This leads to code like:

```csharp
optionalNumber.Match(
    () => -1,
    x  => x + 1);
```

This works at a theoretical level because a function consuming a (binary) sum is the same as a pair of functions. Symbolically, `A + B -> C` is isomorphic to `(A -> C, B -> C)`.^[This isomorphism and several like it are very easy to remember if we write `A -> C` as C<sup>A</sup>. The isomorphism just discussed is then the rule for multiplying exponentials: C<sup>A+B</sup> = C<sup>A</sup>C<sup>B</sup> which you might remember from grade school. All the so-called "high school algebra" rules have type theoretic analogues. Indeed, these rules generalize far beyond that.]

For large sum type, I'd hew more toward the traditional Visitor Pattern and bundle up the parameters of the `Match` method into a class. This would produce:

```csharp
interface FooVisitor<C> {
   C applyToBeStarted();
   // etc.
}

interface FooStatus {
    C Match<C>(FooVisitor<C> visitor);
}


class ToBeStarted : FooStatus {
    C Match<C>(FooVisitor<C> visitor) {
        return visitor.applyToBeStarted();
    }
}
// etc.

double GetWeight(FooStatus status) {
    class GetWeightVisitor : FooVisitor<double> {
        public double applyToBeStarted() {
            return 1.0;
        }
        // etc.
    }
    return status.Match(new GetWeightVisitor());
}
```

Again, this readily scales to more general sum types. Java's anonymous inner classes work rather nicely here, but the result is a bit verbose and clunky in C# hence my preference for tuples of higher-order functions which work even more nicely. This pattern also works rather nicely with C# extension methods.

The `FooVisitor` class encodes the options which eliminates the extensibility of the previous solution: adding a new status means updating all the visitors. The gain is that analyzing `FooStatus` in a different way is just a matter of making a new `FooVisitor`. We can still directly add methods to the `FooStatus` interface combining this approach with the previous approach.

This pattern works best when consumers of a class might add ad-hoc analyses regularly which you don't want to pollute the class with. This pattern is exactly what "pure OO" languages like Smalltalk do with things like `Bool`.

### The Table-Driven Solution

One way of implementing a `switch` statement, especially large `switch`es, is what historically has been called a jump table. To do this, the compiler would make an array of code addresses each pointing at a branch of the `switch`, then the `switch` would index into that array and then perform an indirect jump. We can easily do the same thing manually given higher-order functions (or objects). Java's `EnumMap` fits this quite nicely. This is the approach used in the other answer by Alexei, so I won't repeat it here.

The benefit of this approach is that it can add a lot of dynamism. Manipulating these tables at run-time is quite easy and natural. It's also quite extensible.

The downsides are that it is *too* dynamic most of the time. There is no exhaustiveness checking, and you'll get no warning to update these tables when you add a new case. This also inhibits other code analysis tools and compiler optimizations. This approach also doesn't easily scale to more general sum types, i.e. there isn't a clean way to pass in differently typed data based on the case.

### The `switch` Solution

If after careful consideration of the alternatives and their costs and benefits, you still think the large `enum` is the best choice, then the best solution is likely to just tell SonarQube to shut up. Most static analysis tools have a way to locally disable a rule, e.g. just for a single method. It might be possible to disable the rule for the `enum` as you'll run into the issue basically any time you interact with the `enum` non-trivially.

There are certainly benefits of the `enum`/`switch` solution such as code locality, access to local scope, and exhaustiveness checking, and these aren't worth discarding just to make a tool happy. With the pattern matching features added in recent versions of C#, you can handle arbitrary sum types albeit at the cost of exhaustiveness checking due to the way it works.

Communities

Post History