GopherProxy

	STATUS - libzahl - big integer library
	git clone git://git.suckless.org/libzahl
	Log
	Files
	Refs
	README
	LICENSE
	---
	STATUS (7064B)
	---
	1 Optimisation progress for libzahl. Benchmarks are done in the
	2 range 1 to 4097 bits. So far all benchmarks on libzahl are
	3 done with the following combinations of cc and libc:
	4
	5 gcc + glibc
	6 gcc + musl
	7 clang + glibc
	8
	9 Benchmarks on the other libraries are done with gcc and glibc.
	10
	11 All benchmarks are done on an x86-64 (specifically an Intel
	12 Core 2 Quad CPU Q9300), without any extensions turned on
	13 during compilation, and without any use of extensions in
	14 assembly code. The benchmarks are performed with Linux as
	15 the OS's kernel with 50 µs timer slack, and the benchmarking
	16 processes are fixed to one CPU.
	17
	18
	19 The following functions are probably implemented optimally:
	20
	21 zset .................... always fastest
	22 zseti(a, +) ............. tomsfastmath is faster
	23 zseti(a, -) ............. tomsfastmath is faster
	24 zsetu ................... tomsfastmath is faster
	25 zswap ................... always fastest
	26 zzero ................... always fastest (shared with gmp)
	27 zsignum ................. always fastest (shared with gmp)
	28 zeven ................... always fastest (shared with gmp)
	29 zodd .................... always fastest (shared with gmp)
	30 zeven_nonzero ........... always fastest (shared with gmp)
	31 zodd_nonzero ............ always fastest (shared with gmp)
	32 zbtest .................. always fastest
	33 zsave ................... always fastest
	34 zload ................... always fastest
	35
	36
	37 The following functions are probably implemented optimally, but
	38 depends on other functions or call-cases for better performance:
	39
	40 zneg(a, b) .............. always fastest
	41 zabs(a, b) .............. always fastest
	42 ztrunc(a, b, c) ......... always fastest
	43 zbset(a, b, 1) .......... always fastest
	44 zbset(a, b, 0) .......... always fastest
	45 zbset(a, b, -1) ......... always fastest
	46 zsplit .................. alternating with gmp for fastest, but gmp is a…
	47
	48
	49 The following functions are probably implemented close to
	50 optimally, further optimisation should not be a priority:
	51
	52 zadd_unsigned ........... fastest after ~140 (depends on cc and libc) co…
	53 ztrunc(a, a, b) ......... fastest until ~100, then 77 % (gcc) or 68 % (c…
	54 zbset(a, a, 1) .......... always fastest
	55 zbset(a, a, 0) .......... always fastest
	56 zbset(a, a, -1) ......... always fastest (only marginally faster than gm…
	57 zlsb .................... always fastest <<suspicious>>
	58 zlsh .................... not too fast anymore
	59 zand .................... fastest after ~400, tomsfastmath before
	60 zor ..................... fastest after ~1150, tomsfastmath before
	61 zxor .................... alternative with gmp after ~700, tomsfastmath …
	62 znot .................... always fastest
	63
	64
	65 The following functions require structural changes for
	66 further optimisations:
	67
	68 zneg(a, a) .............. always fastest (shared with gmp (gcc))
	69 zabs(a, a) .............. 34 % (clang) or 8 % (gcc) of tomsfastmath
	70
	71
	72 The following functions are probably implemented optimally
	73 or close to optimally, except they contain some code that
	74 should not be necessary after some bugs have been fixed:
	75
	76 zbits ................... always fastest
	77 zcmpi(a, +) ............. always fastest
	78 zcmpi(a, -) ............. always fastest
	79 zcmpu ................... always fastest
	80
	81
	82 It may be possible optimise the following functions
	83 further:
	84
	85 zadd .................... fastest after ~90 (clang), ~260 (gcc+musl), or…
	86 zcmp .................... almost always fastest (musl), almost always sl…
	87 zcmpmag ................. always fastest <<suspicious, see zcmp>>
	88
	89
	90 The following functions could be optimised further:
	91
	92 zrsh .................... gmp is almost always faster; also tomsfastmath…
	93 zsub_unsigned ........... always fastest (compared against zsub too)
	94 zsub .................... always fastest (slower with gcc+glibc than gcc…
	95
	96
	97 The following functions could probably be optimised further,
	98 but their performance can be significantly improved by
	99 optimising their dependencies:
	100
	101 zmul .................... slowest
	102 zsqr .................... slowest
	103 zstr_length(a, 10) ...... gmp is faster (clang is faster than gcc, musl …
	104 zstr(a, b, n) ........... slowest
	105
	106
	107 musl has more stable performance than glibc. clang is better at
	108 inlining than gcc. (Which is better at optimising must be judged
	109 on a per-function basis.)
	110
	111
	112
	113 {{{ [out of date legacy area, this being phased out]
	114 Optimisation progress for libzahl, compared to other big integer
	115 libraries. These comparisons are for 152-bit integers. Functions
	116 in parenthesis the right column are functions that needs
	117 optimisation to improve the peformance of the function in the
	118 left column. Double-parenthesis means there may be a better way
	119 to do it. Inside square-brackets, there are some comments on
	120 multi-bit comparisons.
	121
	122 zgcd .................... 21 % of gmp (zcmpmag)
	123 zmodmul(big mod) ........ slowest ((zmul, zmod))
	124 zmodsqr(big mod) ........ slowest ((zmul, zmod))
	125 zmodmul(tiny mod) ....... slowest ((zmul))
	126 zmodsqr(tiny mod) ....... slowest ((zmul))
	127 zpow .................... slowest (zmul, zsqr)
	128 zpowu ................... slowest (zmul, zsqr)
	129 zmodpow ................. slowest (zmul, zsqr. zmod)
	130 zmodpowu ................ slowest (zmul, zsqr, zmod)
	131 zsets ................... 13 % of gmp
	132 zrand(default uniform) .. 51 % of gmp
	133 zptest .................. slowest (zrand, zmodpow, zsqr, zmod)
	134 zdiv(big denum) ......... tomsfastmath is faster (zdivmod)
	135 zmod(big denum) ......... fastest (zdivmod)
	136 zdivmod(big denum) ...... fastest
	137 zdiv(tiny denum) ........ slowest
	138 zmod(tiny denum) ........ slowest
	139 zdivmod(tiny denum) ..... slowest
	140 }}}
	141
	142
	143
	144 Note, some corresponding functions are not implemented in
	145 some other libraries. In such cases, they have been implemented
	146 in the translation layers (found under bench/). Those
	147 implementations are often suboptimal, but probably in style
	148 with what you would write if you need that functionality.
	149 Note further, if for example, you want do perform addition
	150 and you know that your operands are non-negative, you would
	151 choose zadd_unsigned in libzahl, but if you are using a
	152 library that does not have the corrsponding function, you are
	153 better of with the regular addition (zadd). This is however
	154 reflected in the comment column.
	155
	156 Also note, TomsFastMath does not support arbitrarily large
	157 integers, the limit is set at compile-time, which gives is
	158 a significant performance advantage. Furthermore, no failure
	159 check is done with GMP. Additionally, hebimath has been
	160 excluded from these comparison because it is not fully
	161 operational.
	162
	163 Also note, NOT does not mean the same thing in all libraries,
	164 for example in GMP it means (-x - 1), thus, znot does not
	165 use GMP's NOT in the translations layer.
	166
	167
	168 The following optimisation flags have been tested:
	169
	170 -O0 ...................... Bad
	171 -O1 ...................... Bad
	172 -O2 ...................... Not so good
	173 -O3 ...................... Good
	174 -fno-builtin ............. Bad