STATUS - libzahl - big integer library | |
git clone git://git.suckless.org/libzahl | |
Log | |
Files | |
Refs | |
README | |
LICENSE | |
--- | |
STATUS (7064B) | |
--- | |
1 Optimisation progress for libzahl. Benchmarks are done in the | |
2 range 1 to 4097 bits. So far all benchmarks on libzahl are | |
3 done with the following combinations of cc and libc: | |
4 | |
5 gcc + glibc | |
6 gcc + musl | |
7 clang + glibc | |
8 | |
9 Benchmarks on the other libraries are done with gcc and glibc. | |
10 | |
11 All benchmarks are done on an x86-64 (specifically an Intel | |
12 Core 2 Quad CPU Q9300), without any extensions turned on | |
13 during compilation, and without any use of extensions in | |
14 assembly code. The benchmarks are performed with Linux as | |
15 the OS's kernel with 50 µs timer slack, and the benchmarking | |
16 processes are fixed to one CPU. | |
17 | |
18 | |
19 The following functions are probably implemented optimally: | |
20 | |
21 zset .................... always fastest | |
22 zseti(a, +) ............. tomsfastmath is faster | |
23 zseti(a, -) ............. tomsfastmath is faster | |
24 zsetu ................... tomsfastmath is faster | |
25 zswap ................... always fastest | |
26 zzero ................... always fastest (shared with gmp) | |
27 zsignum ................. always fastest (shared with gmp) | |
28 zeven ................... always fastest (shared with gmp) | |
29 zodd .................... always fastest (shared with gmp) | |
30 zeven_nonzero ........... always fastest (shared with gmp) | |
31 zodd_nonzero ............ always fastest (shared with gmp) | |
32 zbtest .................. always fastest | |
33 zsave ................... always fastest | |
34 zload ................... always fastest | |
35 | |
36 | |
37 The following functions are probably implemented optimally, but | |
38 depends on other functions or call-cases for better performance: | |
39 | |
40 zneg(a, b) .............. always fastest | |
41 zabs(a, b) .............. always fastest | |
42 ztrunc(a, b, c) ......... always fastest | |
43 zbset(a, b, 1) .......... always fastest | |
44 zbset(a, b, 0) .......... always fastest | |
45 zbset(a, b, -1) ......... always fastest | |
46 zsplit .................. alternating with gmp for fastest, but gmp is a… | |
47 | |
48 | |
49 The following functions are probably implemented close to | |
50 optimally, further optimisation should not be a priority: | |
51 | |
52 zadd_unsigned ........... fastest after ~140 (depends on cc and libc) co… | |
53 ztrunc(a, a, b) ......... fastest until ~100, then 77 % (gcc) or 68 % (c… | |
54 zbset(a, a, 1) .......... always fastest | |
55 zbset(a, a, 0) .......... always fastest | |
56 zbset(a, a, -1) ......... always fastest (only marginally faster than gm… | |
57 zlsb .................... always fastest <<suspicious>> | |
58 zlsh .................... not too fast anymore | |
59 zand .................... fastest after ~400, tomsfastmath before | |
60 zor ..................... fastest after ~1150, tomsfastmath before | |
61 zxor .................... alternative with gmp after ~700, tomsfastmath … | |
62 znot .................... always fastest | |
63 | |
64 | |
65 The following functions require structural changes for | |
66 further optimisations: | |
67 | |
68 zneg(a, a) .............. always fastest (shared with gmp (gcc)) | |
69 zabs(a, a) .............. 34 % (clang) or 8 % (gcc) of tomsfastmath | |
70 | |
71 | |
72 The following functions are probably implemented optimally | |
73 or close to optimally, except they contain some code that | |
74 should not be necessary after some bugs have been fixed: | |
75 | |
76 zbits ................... always fastest | |
77 zcmpi(a, +) ............. always fastest | |
78 zcmpi(a, -) ............. always fastest | |
79 zcmpu ................... always fastest | |
80 | |
81 | |
82 It may be possible optimise the following functions | |
83 further: | |
84 | |
85 zadd .................... fastest after ~90 (clang), ~260 (gcc+musl), or… | |
86 zcmp .................... almost always fastest (musl), almost always sl… | |
87 zcmpmag ................. always fastest <<suspicious, see zcmp>> | |
88 | |
89 | |
90 The following functions could be optimised further: | |
91 | |
92 zrsh .................... gmp is almost always faster; also tomsfastmath… | |
93 zsub_unsigned ........... always fastest (compared against zsub too) | |
94 zsub .................... always fastest (slower with gcc+glibc than gcc… | |
95 | |
96 | |
97 The following functions could probably be optimised further, | |
98 but their performance can be significantly improved by | |
99 optimising their dependencies: | |
100 | |
101 zmul .................... slowest | |
102 zsqr .................... slowest | |
103 zstr_length(a, 10) ...... gmp is faster (clang is faster than gcc, musl … | |
104 zstr(a, b, n) ........... slowest | |
105 | |
106 | |
107 musl has more stable performance than glibc. clang is better at | |
108 inlining than gcc. (Which is better at optimising must be judged | |
109 on a per-function basis.) | |
110 | |
111 | |
112 | |
113 {{{ [out of date legacy area, this being phased out] | |
114 Optimisation progress for libzahl, compared to other big integer | |
115 libraries. These comparisons are for 152-bit integers. Functions | |
116 in parenthesis the right column are functions that needs | |
117 optimisation to improve the peformance of the function in the | |
118 left column. Double-parenthesis means there may be a better way | |
119 to do it. Inside square-brackets, there are some comments on | |
120 multi-bit comparisons. | |
121 | |
122 zgcd .................... 21 % of gmp (zcmpmag) | |
123 zmodmul(big mod) ........ slowest ((zmul, zmod)) | |
124 zmodsqr(big mod) ........ slowest ((zmul, zmod)) | |
125 zmodmul(tiny mod) ....... slowest ((zmul)) | |
126 zmodsqr(tiny mod) ....... slowest ((zmul)) | |
127 zpow .................... slowest (zmul, zsqr) | |
128 zpowu ................... slowest (zmul, zsqr) | |
129 zmodpow ................. slowest (zmul, zsqr. zmod) | |
130 zmodpowu ................ slowest (zmul, zsqr, zmod) | |
131 zsets ................... 13 % of gmp | |
132 zrand(default uniform) .. 51 % of gmp | |
133 zptest .................. slowest (zrand, zmodpow, zsqr, zmod) | |
134 zdiv(big denum) ......... tomsfastmath is faster (zdivmod) | |
135 zmod(big denum) ......... fastest (zdivmod) | |
136 zdivmod(big denum) ...... fastest | |
137 zdiv(tiny denum) ........ slowest | |
138 zmod(tiny denum) ........ slowest | |
139 zdivmod(tiny denum) ..... slowest | |
140 }}} | |
141 | |
142 | |
143 | |
144 Note, some corresponding functions are not implemented in | |
145 some other libraries. In such cases, they have been implemented | |
146 in the translation layers (found under bench/). Those | |
147 implementations are often suboptimal, but probably in style | |
148 with what you would write if you need that functionality. | |
149 Note further, if for example, you want do perform addition | |
150 and you know that your operands are non-negative, you would | |
151 choose zadd_unsigned in libzahl, but if you are using a | |
152 library that does not have the corrsponding function, you are | |
153 better of with the regular addition (zadd). This is however | |
154 reflected in the comment column. | |
155 | |
156 Also note, TomsFastMath does not support arbitrarily large | |
157 integers, the limit is set at compile-time, which gives is | |
158 a significant performance advantage. Furthermore, no failure | |
159 check is done with GMP. Additionally, hebimath has been | |
160 excluded from these comparison because it is not fully | |
161 operational. | |
162 | |
163 Also note, NOT does not mean the same thing in all libraries, | |
164 for example in GMP it means (-x - 1), thus, znot does not | |
165 use GMP's NOT in the translations layer. | |
166 | |
167 | |
168 The following optimisation flags have been tested: | |
169 | |
170 -O0 ...................... Bad | |
171 -O1 ...................... Bad | |
172 -O2 ...................... Not so good | |
173 -O3 ...................... Good | |
174 -fno-builtin ............. Bad |