How does the switch statement execute?

  • A+
Category:Languages

I am stepping through two different blocks of code to learn the difference in execution between the switch statement and if.

While if seems to check every single condition, switch seems to skip directly to the condition that evaluates to true. How does the compiler know which statement will evaluate to true before pre-check? Please see below code:

public void IFvSwitch()     {         string x = "f";         switch (x)  // Compiler skips directly to case "f":         {             case "a":                 MessageBox.Show(x);                 break;             case "b":                 MessageBox.Show(x);                 break;             case "f":                 MessageBox.Show(x);                 break;         }          if (x == "a") // Compiler checks all conditions          {             MessageBox.Show(x);         }         else if (x == "b")         {             MessageBox.Show(x);         }         else if (x == "f")         {             MessageBox.Show(x);         }     } 


While if seems to check every single condition, switch seems to skip directly to the condition that evaluates to true. How does the compiler know which statement will evaluate to true before pre-check?

Let's start by correcting your use of jargon. The C# compiler translates C# into IL. The runtime has a jit compiler that translates IL into machine code. The C# compiler also generates information that informs the debugger where it should be causing breaks when you are single-stepping.

So it's not exactly clear what your question is. Is the question "how does the C# compiler generate IL to avoid linear search?" or is it "how does the C# compiler inform the debugger that it should skip showing the search portion of the switch codegen?" Or is it "how does the jit compiler generate machine code for a switch?"

Tell you what, I'll handwave a little here and give you vague answers that describe the strategy but not the details. If you have questions about the details then post a more specific question. In particular, I'll use "the compiler" to mean either the C# compiler or the jit compiler, without saying which. Sometimes work is done by the C# compiler, sometimes it defers that work to the jitter, but either way, the work gets done.


Switch codegen is complicated. Let's start with the easy one. The compiler can choose to compile a switch as though it was a bunch of if-else statements, and in practice it often does. It particularly does so if the switch is very small, as in your example.

How does the debugger know to not show you the comparison steps that are really happening? The compiler also generates debugging information that informs the debugger what instructions should be skipped over and which ones should generate automatic breakpoints when single-stepping. So the C# compiler, the jit compiler and the debugger all work together to ensure that the developer has the right experience when stepping on a switch.


What happens if the switch is big? Doing a huge if-else to find the right switch section is potentially slow. Let's again do an easy case and look at a big switch full of strings. In that case the compiler will do the following:

  • Figure out the address of each switch section
  • Generate code which builds a static dictionary from string to int
  • The switch becomes "look up the string in the dictionary; if it is not there, go to the default case or the end of the switch if there is no default. If it is there, go to the address that is in the dictionary"

Now the work of quickly finding the string without checking every one of them is left up to the dictionary, which uses a hash table to cut down on the number of string comparisons by a huge factor.


What if we have a large switch but the cases are integers?

switch(whatever) {     case 1000: … break;   case 1001: … break;   … many more …   case 1999: … break;   default: … break; } 

In this situation the compiler could make a static array of a thousand switch case addresses, and then generate code like this:

  • If the value is less than 1000 or more than 1999, go to the default case
  • Otherwise, look up the address by dereferencing the array, and go there.

So now we are down to two comparisons and an array dereference; this really does jump directly to the right section.


What about this?

switch(whatever) {     case 1000: … break;   case 1001: … break;   … many more …   case 1999: … break;   case 2001: … break;   default: … break; } 

There's no case 2000. In that case the compiler can generate a jump table as before, with 2002 entries, and in the 2000 slot, it just puts the address of the default section.

What about this?

switch(whatever) {     case 1000: … break;   case 1001: … break;   … many more …   case 1999: … break;   case 1000001: … break;   default: … break; } 

Now we are in an interesting case! The compiler will not generate a jump table with 1000002 entries, most of which are "go to the default case". In this case the compiler could generate a hybrid switch:

  • If the value is less than 1000 go to the default case
  • If the value is 1000001, go to its case
  • If the value is greater than 1999 go to the default case
  • Otherwise, check the jump table

And so on. I'm sure you can see how this can get very complicated when the switches are large and there are many dense ranges of values.

There has been a lot of work in the optimization community on what precisely is the best balance for creating hybrid switch strategies.

There is also lots of opportunity for things to go wrong. I'll end with a war story from the 1990s. The MSVC compiler back then -- I think it was version 4 -- had a bug where in my particular program, the hybrid jump table was generated incorrectly. Basically, it broke up a bunch of contiguous ranges into individual jump tables when it didn't need to.

Normally that would not be a problem, but this particular switch was the innermost loop of the logic in the Microsoft VBScript and JScript engines that dispatched the next instruction in the script. (VBScript and JScript at the time compiled to a proprietary bytecode language.) The bad switch was killing performance of the script engines.

The MSVC team could not prioritize fixing the bug fast enough for us, so we ended up writing heroic macro code that would let us express the switch statement in a way that looked reasonable to read, but behind the scenes the macros would generate the appropriate jump table!

Fortunately you can rely on the C# compiler and the jitter to generate good code for you. You shouldn't have to worry about what code is generated for a switch.

Comment

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: