* * * * *
Stupid multithreaded benchmarks
Dan Lyke ran a stupid benchmark [1]—just incrementing a variable to a
billion, one just flat out, and one between some pthreads locking primitives.
I wanted to see what stupid results I would get if I used spin locks [2].
First, the relevant bit of code (used nasm [3] to compile it under a 2.6GHz
(gigaHertz) dual-core Pentium running Linux 2.6):
> bits 32
> global t1
> global t2
>
> section .data
> gv dd 0
> glock dd 0
>
> section .text
>
> align 16
> t1: mov eax,[gv]
> inc eax
> mov [gv],eax
> cmp eax,1000000000
> jl t1
>
> ; ret
>
> ; the following implements _exit()
> ; for the multithreaded version
>
> xor ebx,ebx
> mov eax,252
> int $80
> mov eax,1
> int $80
> hlt
>
> align 16
> t2: mov al,1
> t2.wait: xchg al,[glock]
> or al,al
> jne t2.wait
> t2.here: mov eax,[gv]
> inc eax
> mov [gv],eax
> mov byte [glock],0
> cmp eax,1000000000
> jl t2
>
> ;ret
>
> xor ebx,ebx
> mov eax,252
> int $80
> mov eax,1
> int $80
> hlt
>
Straightforward implementations here. t1() is the straight through counting
routine, while t2() is the one with the spin lock. Running single threaded
yielded these results:
Table: Counting, single threaded
routine time to execute
------------------------------
t1() 2.454s
t2() 39.752s
While I expected the spin lock to be faster than the pthread locking, I
wasn't expecting it to be this slow. But maybe, just maybe, I'll get some of
that speed back by running dual threads. At the very least, it should be a
bit faster than single core, right?
Right?
Bueller? [4] Bueller? [5]
Table: Counting, dual-threaded
routine time to execute
------------------------------
t1() 0m10.334s
t2() 2m31.307s
Um …
Wow.
I didn't expect spinlocks to be so expensive.
Ouch.
[1]
http://www.flutterby.com/archives/comments/10723.html
[2]
gopher://gopher.conman.org/0Phlog:1999/12/08.2
[3]
http://nasm.sourceforge.net/
[4]
http://www.imdb.com/title/tt0091042/
[5]
http://www.idiotsavant.com/ftp/sounds/bueller.wav
Email author at
[email protected]