Introduction
Introduction Statistics Contact Development Disclaimer Help
Update README - libgrapheme - unicode string library
git clone git://git.suckless.org/libgrapheme
Log
Files
Refs
README
LICENSE
---
commit 1774b5430fe46d8d5511075d3cd644716ad4c3c8
parent 5939cf21cdb050e1c9bce964a30c9ad94f7440b9
Author: Laslo Hunhold <[email protected]>
Date: Thu, 6 Oct 2022 22:57:31 +0200
Update README
Signed-off-by: Laslo Hunhold <[email protected]>
Diffstat:
M README | 55 +++++++++++++++++------------…
1 file changed, 30 insertions(+), 25 deletions(-)
---
diff --git a/README b/README
@@ -1,25 +1,34 @@
libgrapheme
===========
-The libgrapheme library provides functions to properly handle Unicode
-strings according to the Unicode specification. Unicode strings are made
-up of user-perceived characters (so-called "grapheme clusters") that are
-made up of one or more Unicode codepoints, which in turn are encoded in
-one or more bytes in an encoding like UTF-8.
-
-There is a widespread misconception that it was enough to simply
-determine codepoints in a string and treat them as user-perceived
-characters to be Unicode compliant. While this may work in some cases,
-this assumption quickly breaks, especially for non-Western languages and
-decomposed Unicode strings where user-perceived characters are usually
-represented using multiple codepoints.
-
-Despite the complicated multilevel structure of Unicode strings,
-libgrapheme provides methods to work with them at the byte-level (i.e.
-UTF-8 ‘char’ arrays) while also providing codepoint-level methods.
-
-See libgrapheme(7) to get started and try out the self-contained examples
-given on the manual pages for each function.
+libgrapheme is an extremely simple freestanding C99 library providing
+utilities for properly handling strings according to the latest Unicode
+standard 15.0.0. It offers fully Unicode compliant
+
+ - grapheme cluster (i.e. user-perceived character) segmentation
+ - word segmentation
+ - sentence segmentation
+ - detection of permissible line break opportunities
+ - case detection (lower-, upper- and title-case)
+ - case conversion (to lower-, upper- and title-case)
+
+on UTF-8 strings and codepoint arrays, which both can also be
+null-terminated.
+
+The necessary lookup-tables are automatically generated from the Unicode
+standard data (contained in the tarball) and heavily compressed. Over
+10,000 automatically generated conformance tests and over 150 unit tests
+ensure conformance and correctness.
+
+There is no complicated build-system involved and it's all done using one
+POSIX-compliant Makefile. All you need is a C99 compiler, given the
+lookup-table-generators and compressors are also written in C99. The
+resulting library is freestanding and thus not even dependent on a
+standard library to be present at runtime, making it a suitable choice
+for bare metal applications.
+
+It is also way smaller and much faster than the other established
+Unicode string libraries (ICU, GNU's libunistring, libutf8proc).
Requirements
------------
@@ -38,15 +47,11 @@ Afterwards enter the following command to build and install…
Conformance
-----------
The libgrapheme library is compliant with the Unicode 15.0.0
-specification (September 2022).
-
-To ensure conformance, libgrapheme includes hundreds of tests including
-all provided with the standard-provided test-data that is parsed
-automatically. The tests can be run with
+specification (September 2022). The tests can be run with
make test
-to check standard conformance.
+to check standard conformance and correctness.
Usage
-----
You are viewing proxied material from suckless.org. The copyright of proxied material belongs to its original authors. Any comments or complaints in relation to proxied material should be directed to the original authors of the content concerned. Please see the disclaimer for more details.