* * * * *
Of course it's slower, but I didn't expect it to be quite that bad
Time for another useless µbenchmark! This time, the overhead of trapping
integer overflow!
So, inspired by this post about trapping integer overflow [1], I thought it
might be interesting to see how bad the overhead is of using the x86 [2]
instruction INTO [3] to catch integer overflow. To do this, I'm using DynASM
[4] to generate code from an expression that uses INTO after every operation.
There are other ways of doing this, but the simplist way is to use INTO. I'm
also using 16-bit operations, as the numbers involved (between -32,768 and
32,767) are reasonable (for a human) to deal with (unlike the 32-bit range -
2,147,483,648 to 2147483647 or the insane 64-bit range of -
9,223,372,036,854,775,808 to 9,223,372,036,854,775,807).
The one surprising result was that Linux treats the INTO trap as a segfault!
Even requesting additional information (passing the SA_SIGINFO flag with
sigaction()) doesn't tell you anything. But that in itself tells you it's not
a real segfault, as a real segfault will report a memory mapping error.
Personally, I would have expected a floating point fault, even though it's
not a floating point operation, because on Linux, integer division by 0
results in floating point fault (and oddly enough, a floating point division
by 0 results in ∞ but no fault)!
But, aside from that, some results. I basically run the expression one
million times and simply record how long it takes. The first is just setting
a variable to a fixed value (and the “- 0” bit is there just to ensure an
overflow check is included):
Table: x = 1 - 0
overflow time expression result
------------------------------
true 0.009080000 1
false 0.006820000 1
Okay, not terribly bad. But how about a longer expression? (and remember, the
expresssion isn't optimized)
Table: x = 1 + 1 + 1 + 1 + 1 + 1 * 100 / 13
overflow time expression result
------------------------------
true 0.079528000 46
false 0.030125000 46
Yikes! (But this is also including the function call overhead). For the
curious, the last example compiled down to:
> xor eax,eax
> mov ax,1
> add ax,1
> into
> add ax,1
> into
> add ax,1
> into
> add ax,1
> into
> add ax,1
> into
> imul 100
> into
> mov bx,13
> cwd
> idiv bx
> into
> mov [$0804f50E],ax
> ret
>
The non-overflow version just had the INTO instructions missing—otherwise it
was the same code.
I think what's surprising the most here is that the INTO instruction just
checks the overflow flag and only if set does it cause a trap. The timings I
have (and I'll admit, the figures I have are old and for the 80486) show that
INTO only has a three-cycle overhead if not taken. I'm guessing things are
worse with the newer multipipelined multiscalar multiprocessor monstrosities
we use these days.
Next I'll have to try using the JO instruction [5] and see how well that
fares.
[1]
http://blog.regehr.org/archives/1154
[2]
https://en.wikipedia.org/wiki/X86
[3]
http://x86.renejeschke.de/html/file_module_x86_id_142.html
[4]
gopher://gopher.conman.org/0Phlog:2015/09/05.1
[5]
gopher://gopher.conman.org/0Phlog:2015/09/07.1
Email author at
[email protected]