* * * * *

          Some more usless µbenchmarks checking for integer overflow

Using the INTO instruction to check for overflow was dog slow [1], so what
about using JO (Jump on Overflow)? Will that be slow?

The results speak for themselves (reminder—the expressions are compiled and
run 1,000,000 times):

Table: x = 1 - 0
overflow        method  time    result
------------------------------
true    INTO    0.009080000     1
true    JO      0.006808000     1
false   -       0.005938000     1

Table: x = 1 + 1 + 1 + 1 + 1 + 1 * 100 / 13
overflow        method  time    result
------------------------------
true    INTO    0.079844000     46
true    JO      0.030274000     46
false   -       0.030245000     46

Even though the code using the JO instruction is longer than either version:

>       xor     eax,eax
>       mov     ax,0x1
>       add     ax,1
>       jo      error
>       add     ax,1
>       jo      error
>       add     ax,1
>       jo      error
>       add     ax,1
>       jo      error
>       add     ax,1
>       jo      error
>       imul    100
>       jo      error
>       mov     bx,13
>       cwd
>       idiv    bx
>       jo      error
>       mov     [$0804F58E],ax
>       ret
> error:        into
>       ret
>

it performed about the same as the non-overflow checking version. That's
probably due to the branch prediction having very little overhead on
performance. One thing to notice, however, is that were a compiler to go down
this path and check explicitely for overflow, not only would the code be
larger, but overall it might be a bit slower than normal as there are
commonly used optimizations (at least on the x86 architecture) that cannot be
used. For instance, a cheap way to multiply a value by 5 is to skip the IMUL
instruction and instead do LEA EAX,[EAX*4 + EAX], but the LEA (Load Effective
Address) does not set the overflow flag. Doing three INC EAX in a row is
smaller (and just as fast) as doing ADD EAX,3, but while the INC (INCrement)
instruction does set the overflow flag, you have to check the flag after each
INC or you could miss an actual overflow, which defeats the purpose of using
INC to generate smaller code.

And one more thing before I go, and this is about DynASM [2]—it's not stated
anywhere, but if you use local labels, you have to call dasm_setupglobal()
[3] or else the program will crash. I found this out the hard way.

[1] gopher://gopher.conman.org/0Phlog:2015/09/05.2
[2] http://luajit.org/dynasm.html
[3] http://corsix.github.io/dynasm-doc/reference.html#dasm_setupglobal

Email author at [email protected]