* * * * *

                       Stupid multithreaded benchmarks

Dan Lyke ran a stupid benchmark [1]—just incrementing a variable to a
billion, one just flat out, and one between some pthreads locking primitives.

I wanted to see what stupid results I would get if I used spin locks [2].
First, the relevant bit of code (used nasm [3] to compile it under a 2.6GHz
(gigaHertz) dual-core Pentium running Linux 2.6):

>               bits    32
>               global  t1
>               global  t2
>
>               section .data
> gv            dd      0
> glock         dd      0
>
>               section .text
>
>               align   16
> t1:           mov     eax,[gv]
>               inc     eax
>               mov     [gv],eax
>               cmp     eax,1000000000
>               jl      t1
>
>               ; ret
>
>       ; the following implements _exit()
>       ; for the multithreaded version
>
>               xor     ebx,ebx
>               mov     eax,252
>               int     $80
>               mov     eax,1
>               int     $80
>               hlt
>
>               align   16
> t2:           mov     al,1
> t2.wait:      xchg    al,[glock]
>               or      al,al
>               jne     t2.wait
> t2.here:      mov     eax,[gv]
>               inc     eax
>               mov     [gv],eax
>               mov     byte [glock],0
>               cmp     eax,1000000000
>               jl      t2
>
>               ;ret
>
>               xor     ebx,ebx
>               mov     eax,252
>               int     $80
>               mov     eax,1
>               int     $80
>               hlt
>

Straightforward implementations here. t1() is the straight through counting
routine, while t2() is the one with the spin lock. Running single threaded
yielded these results:

Table: Counting, single threaded
routine time to execute
------------------------------
t1()    2.454s
t2()    39.752s

While I expected the spin lock to be faster than the pthread locking, I
wasn't expecting it to be this slow. But maybe, just maybe, I'll get some of
that speed back by running dual threads. At the very least, it should be a
bit faster than single core, right?

Right?

Bueller? [4] Bueller? [5]

Table: Counting, dual-threaded
routine time to execute
------------------------------
t1()    0m10.334s
t2()    2m31.307s

Um …

Wow.

I didn't expect spinlocks to be so expensive.

Ouch.

[1] http://www.flutterby.com/archives/comments/10723.html
[2] gopher://gopher.conman.org/0Phlog:1999/12/08.2
[3] http://nasm.sourceforge.net/
[4] http://www.imdb.com/title/tt0091042/
[5] http://www.idiotsavant.com/ftp/sounds/bueller.wav

Email author at [email protected]