Introduction
Introduction Statistics Contact Development Disclaimer Help
STATUS - libzahl - big integer library
git clone git://git.suckless.org/libzahl
Log
Files
Refs
README
LICENSE
---
STATUS (7064B)
---
1 Optimisation progress for libzahl. Benchmarks are done in the
2 range 1 to 4097 bits. So far all benchmarks on libzahl are
3 done with the following combinations of cc and libc:
4
5 gcc + glibc
6 gcc + musl
7 clang + glibc
8
9 Benchmarks on the other libraries are done with gcc and glibc.
10
11 All benchmarks are done on an x86-64 (specifically an Intel
12 Core 2 Quad CPU Q9300), without any extensions turned on
13 during compilation, and without any use of extensions in
14 assembly code. The benchmarks are performed with Linux as
15 the OS's kernel with 50 µs timer slack, and the benchmarking
16 processes are fixed to one CPU.
17
18
19 The following functions are probably implemented optimally:
20
21 zset .................... always fastest
22 zseti(a, +) ............. tomsfastmath is faster
23 zseti(a, -) ............. tomsfastmath is faster
24 zsetu ................... tomsfastmath is faster
25 zswap ................... always fastest
26 zzero ................... always fastest (shared with gmp)
27 zsignum ................. always fastest (shared with gmp)
28 zeven ................... always fastest (shared with gmp)
29 zodd .................... always fastest (shared with gmp)
30 zeven_nonzero ........... always fastest (shared with gmp)
31 zodd_nonzero ............ always fastest (shared with gmp)
32 zbtest .................. always fastest
33 zsave ................... always fastest
34 zload ................... always fastest
35
36
37 The following functions are probably implemented optimally, but
38 depends on other functions or call-cases for better performance:
39
40 zneg(a, b) .............. always fastest
41 zabs(a, b) .............. always fastest
42 ztrunc(a, b, c) ......... always fastest
43 zbset(a, b, 1) .......... always fastest
44 zbset(a, b, 0) .......... always fastest
45 zbset(a, b, -1) ......... always fastest
46 zsplit .................. alternating with gmp for fastest, but gmp is a…
47
48
49 The following functions are probably implemented close to
50 optimally, further optimisation should not be a priority:
51
52 zadd_unsigned ........... fastest after ~140 (depends on cc and libc) co…
53 ztrunc(a, a, b) ......... fastest until ~100, then 77 % (gcc) or 68 % (c…
54 zbset(a, a, 1) .......... always fastest
55 zbset(a, a, 0) .......... always fastest
56 zbset(a, a, -1) ......... always fastest (only marginally faster than gm…
57 zlsb .................... always fastest <<suspicious>>
58 zlsh .................... not too fast anymore
59 zand .................... fastest after ~400, tomsfastmath before
60 zor ..................... fastest after ~1150, tomsfastmath before
61 zxor .................... alternative with gmp after ~700, tomsfastmath …
62 znot .................... always fastest
63
64
65 The following functions require structural changes for
66 further optimisations:
67
68 zneg(a, a) .............. always fastest (shared with gmp (gcc))
69 zabs(a, a) .............. 34 % (clang) or 8 % (gcc) of tomsfastmath
70
71
72 The following functions are probably implemented optimally
73 or close to optimally, except they contain some code that
74 should not be necessary after some bugs have been fixed:
75
76 zbits ................... always fastest
77 zcmpi(a, +) ............. always fastest
78 zcmpi(a, -) ............. always fastest
79 zcmpu ................... always fastest
80
81
82 It may be possible optimise the following functions
83 further:
84
85 zadd .................... fastest after ~90 (clang), ~260 (gcc+musl), or…
86 zcmp .................... almost always fastest (musl), almost always sl…
87 zcmpmag ................. always fastest <<suspicious, see zcmp>>
88
89
90 The following functions could be optimised further:
91
92 zrsh .................... gmp is almost always faster; also tomsfastmath…
93 zsub_unsigned ........... always fastest (compared against zsub too)
94 zsub .................... always fastest (slower with gcc+glibc than gcc…
95
96
97 The following functions could probably be optimised further,
98 but their performance can be significantly improved by
99 optimising their dependencies:
100
101 zmul .................... slowest
102 zsqr .................... slowest
103 zstr_length(a, 10) ...... gmp is faster (clang is faster than gcc, musl …
104 zstr(a, b, n) ........... slowest
105
106
107 musl has more stable performance than glibc. clang is better at
108 inlining than gcc. (Which is better at optimising must be judged
109 on a per-function basis.)
110
111
112
113 {{{ [out of date legacy area, this being phased out]
114 Optimisation progress for libzahl, compared to other big integer
115 libraries. These comparisons are for 152-bit integers. Functions
116 in parenthesis the right column are functions that needs
117 optimisation to improve the peformance of the function in the
118 left column. Double-parenthesis means there may be a better way
119 to do it. Inside square-brackets, there are some comments on
120 multi-bit comparisons.
121
122 zgcd .................... 21 % of gmp (zcmpmag)
123 zmodmul(big mod) ........ slowest ((zmul, zmod))
124 zmodsqr(big mod) ........ slowest ((zmul, zmod))
125 zmodmul(tiny mod) ....... slowest ((zmul))
126 zmodsqr(tiny mod) ....... slowest ((zmul))
127 zpow .................... slowest (zmul, zsqr)
128 zpowu ................... slowest (zmul, zsqr)
129 zmodpow ................. slowest (zmul, zsqr. zmod)
130 zmodpowu ................ slowest (zmul, zsqr, zmod)
131 zsets ................... 13 % of gmp
132 zrand(default uniform) .. 51 % of gmp
133 zptest .................. slowest (zrand, zmodpow, zsqr, zmod)
134 zdiv(big denum) ......... tomsfastmath is faster (zdivmod)
135 zmod(big denum) ......... fastest (zdivmod)
136 zdivmod(big denum) ...... fastest
137 zdiv(tiny denum) ........ slowest
138 zmod(tiny denum) ........ slowest
139 zdivmod(tiny denum) ..... slowest
140 }}}
141
142
143
144 Note, some corresponding functions are not implemented in
145 some other libraries. In such cases, they have been implemented
146 in the translation layers (found under bench/). Those
147 implementations are often suboptimal, but probably in style
148 with what you would write if you need that functionality.
149 Note further, if for example, you want do perform addition
150 and you know that your operands are non-negative, you would
151 choose zadd_unsigned in libzahl, but if you are using a
152 library that does not have the corrsponding function, you are
153 better of with the regular addition (zadd). This is however
154 reflected in the comment column.
155
156 Also note, TomsFastMath does not support arbitrarily large
157 integers, the limit is set at compile-time, which gives is
158 a significant performance advantage. Furthermore, no failure
159 check is done with GMP. Additionally, hebimath has been
160 excluded from these comparison because it is not fully
161 operational.
162
163 Also note, NOT does not mean the same thing in all libraries,
164 for example in GMP it means (-x - 1), thus, znot does not
165 use GMP's NOT in the translations layer.
166
167
168 The following optimisation flags have been tested:
169
170 -O0 ...................... Bad
171 -O1 ...................... Bad
172 -O2 ...................... Not so good
173 -O3 ...................... Good
174 -fno-builtin ............. Bad
You are viewing proxied material from suckless.org. The copyright of proxied material belongs to its original authors. Any comments or complaints in relation to proxied material should be directed to the original authors of the content concerned. Please see the disclaimer for more details.