Update README - libgrapheme - unicode string library | |
git clone git://git.suckless.org/libgrapheme | |
Log | |
Files | |
Refs | |
README | |
LICENSE | |
--- | |
commit 1774b5430fe46d8d5511075d3cd644716ad4c3c8 | |
parent 5939cf21cdb050e1c9bce964a30c9ad94f7440b9 | |
Author: Laslo Hunhold <[email protected]> | |
Date: Thu, 6 Oct 2022 22:57:31 +0200 | |
Update README | |
Signed-off-by: Laslo Hunhold <[email protected]> | |
Diffstat: | |
M README | 55 +++++++++++++++++------------… | |
1 file changed, 30 insertions(+), 25 deletions(-) | |
--- | |
diff --git a/README b/README | |
@@ -1,25 +1,34 @@ | |
libgrapheme | |
=========== | |
-The libgrapheme library provides functions to properly handle Unicode | |
-strings according to the Unicode specification. Unicode strings are made | |
-up of user-perceived characters (so-called "grapheme clusters") that are | |
-made up of one or more Unicode codepoints, which in turn are encoded in | |
-one or more bytes in an encoding like UTF-8. | |
- | |
-There is a widespread misconception that it was enough to simply | |
-determine codepoints in a string and treat them as user-perceived | |
-characters to be Unicode compliant. While this may work in some cases, | |
-this assumption quickly breaks, especially for non-Western languages and | |
-decomposed Unicode strings where user-perceived characters are usually | |
-represented using multiple codepoints. | |
- | |
-Despite the complicated multilevel structure of Unicode strings, | |
-libgrapheme provides methods to work with them at the byte-level (i.e. | |
-UTF-8 ‘char’ arrays) while also providing codepoint-level methods. | |
- | |
-See libgrapheme(7) to get started and try out the self-contained examples | |
-given on the manual pages for each function. | |
+libgrapheme is an extremely simple freestanding C99 library providing | |
+utilities for properly handling strings according to the latest Unicode | |
+standard 15.0.0. It offers fully Unicode compliant | |
+ | |
+ - grapheme cluster (i.e. user-perceived character) segmentation | |
+ - word segmentation | |
+ - sentence segmentation | |
+ - detection of permissible line break opportunities | |
+ - case detection (lower-, upper- and title-case) | |
+ - case conversion (to lower-, upper- and title-case) | |
+ | |
+on UTF-8 strings and codepoint arrays, which both can also be | |
+null-terminated. | |
+ | |
+The necessary lookup-tables are automatically generated from the Unicode | |
+standard data (contained in the tarball) and heavily compressed. Over | |
+10,000 automatically generated conformance tests and over 150 unit tests | |
+ensure conformance and correctness. | |
+ | |
+There is no complicated build-system involved and it's all done using one | |
+POSIX-compliant Makefile. All you need is a C99 compiler, given the | |
+lookup-table-generators and compressors are also written in C99. The | |
+resulting library is freestanding and thus not even dependent on a | |
+standard library to be present at runtime, making it a suitable choice | |
+for bare metal applications. | |
+ | |
+It is also way smaller and much faster than the other established | |
+Unicode string libraries (ICU, GNU's libunistring, libutf8proc). | |
Requirements | |
------------ | |
@@ -38,15 +47,11 @@ Afterwards enter the following command to build and install… | |
Conformance | |
----------- | |
The libgrapheme library is compliant with the Unicode 15.0.0 | |
-specification (September 2022). | |
- | |
-To ensure conformance, libgrapheme includes hundreds of tests including | |
-all provided with the standard-provided test-data that is parsed | |
-automatically. The tests can be run with | |
+specification (September 2022). The tests can be run with | |
make test | |
-to check standard conformance. | |
+to check standard conformance and correctness. | |
Usage | |
----- |