GopherProxy

	Implement the Unicode Bidirectional Algorithm (UAX #9) - libgrapheme - unicode …
	git clone git://git.suckless.org/libgrapheme
	Log
	Files
	Refs
	README
	LICENSE
	---
	commit 5998352d2d2e6e37531548f8e986abae5ff8ef02
	parent dd15fea026c3e0b389381ae8cc08e0f39fa1a8f7
	Author: Laslo Hunhold <[email protected]>
	Date: Tue, 25 Oct 2022 13:20:47 +0200

	Implement the Unicode Bidirectional Algorithm (UAX #9)

	To be frank, I never heard about this until I started learning more
	about Unicode, but this is an absolute must for all languages that go
	from right to left (Hebrew, Arabic, Farsi, etc.) and any case where you
	mix RTL and LTR languages.

	The Unicode Bidirectional Algorithm is the normative procedure you apply
	on a string to obtain embedding levels that can then be used to reorder
	the string such that you obtain the proper reading direction. The
	central aspect is that strings are always stored LTR in memory and only
	reordered for presentation on the screen.

	Currently, only ICU and GNU fribidi implement the algorithm, and as
	usual it's pretty convoluted to use them. There are many memory
	allocations, kitchen-sink-madness and legacy cruft, but the demand is
	there (there's even a bidi-patch for dwm[0]).

	What's special about this implementation? There are no memory
	allocations at runtime. The user provides a 32-bit-integer-array which
	is then filled with the embedding levels. The levels themselves only
	range from -1 to 125 (by the standard!) and would fit in a signed
	8-bit-integer, but the algorithm naturally needs a scratchpad to store
	processing data.

	A complication of the algorithm is that you, at some point, have to
	break the paragraph into lines and based on the line breaks the level
	determination is affected. GNU fribidi and ICU make this very
	complicated and hard to understand. The API is not final as you see it
	here, but the final process will be (each number corresponding to a
	function):

	1) "preprocessing" the string up to the part where the algorithm
	does not depend on the line breaks
	2) determining line embedding levels for a line
	(by specifying the preprocessed data buffer and an output
	level-buffer)
	3) reordering a line (by specifying the preprocessed data buffer
	and an output string that is allowed to be the input string)

	Conformance is obviously a large priority: There are literally over a
	million automatic conformance tests for the bidirectional algorithm split
	across the files BidiTest.txt and BidiCharacterTest.txt that are
	automatically parsed into the header gen/bidirectional-test.h.

	Currently, only BidiTest.txt is used for tests (which we all pass),
	given bracket-pairs have not been implemented yet. This and (maybe)
	arabic shaping are what is left to be implemented, but this here is
	already a big step.

	One more note: Yes, the data files are very large, but they compress
	down very well and the tarball stays below 800K. It's very important
	to me that there's no need to pull any data from the web for compilation
	or testing for obvious reasons.

	[0]:https://dwm.suckless.org/patches/bidi/

	Signed-off-by: Laslo Hunhold <[email protected]>


	Diff is too large, output suppressed.