Introduction
Introduction Statistics Contact Development Disclaimer Help
SpecialCasing.txt - libgrapheme - unicode string library
git clone git://git.suckless.org/libgrapheme
Log
Files
Refs
README
LICENSE
---
SpecialCasing.txt (16832B)
---
1 # SpecialCasing-15.1.0.txt
2 # Date: 2023-01-05, 20:35:03 GMT
3 # © 2023 Unicode®, Inc.
4 # Unicode and the Unicode Logo are registered trademarks of Unicode, Inc…
5 # For terms of use, see https://www.unicode.org/terms_of_use.html
6 #
7 # Unicode Character Database
8 # For documentation, see https://www.unicode.org/reports/tr44/
9 #
10 # Special Casing
11 #
12 # This file is a supplement to the UnicodeData.txt file. It does not def…
13 # properties, but rather provides additional information about the casin…
14 # Unicode characters, for situations when casing incurs a change in stri…
15 # or is dependent on context or locale. For compatibility, the UnicodeDa…
16 # file only contains simple case mappings for characters where they are …
17 # and independent of context and language. The data in this file, combin…
18 # the simple case mappings in UnicodeData.txt, defines the full case map…
19 # Lowercase_Mapping (lc), Titlecase_Mapping (tc), and Uppercase_Mapping …
20 #
21 # Note that the preferred mechanism for defining tailored casing operati…
22 # the Unicode Common Locale Data Repository (CLDR). For more information…
23 # discussion of case mappings and case algorithms in the Unicode Standar…
24 #
25 # All code points not listed in this file that do not have a simple case…
26 # in UnicodeData.txt map to themselves.
27 # ======================================================================…
28 # Format
29 # ======================================================================…
30 # The entries in this file are in the following machine-readable format:
31 #
32 # <code>; <lower>; <title>; <upper>; (<condition_list>;)? # <comment>
33 #
34 # <code>, <lower>, <title>, and <upper> provide the respective full case…
35 # of <code>, expressed as character values in hex. If there is more than…
36 # they are separated by spaces. Other than as used to separate elements,…
37 # to be ignored.
38 #
39 # The <condition_list> is optional. Where present, it consists of one or…
40 # or casing contexts, separated by spaces. In these conditions:
41 # - A condition list overrides the normal behavior if all of the listed …
42 # - The casing context is always the context of the characters in the or…
43 # NOT in the resulting string.
44 # - Case distinctions in the condition list are not significant.
45 # - Conditions preceded by "Not_" represent the negation of the conditio…
46 # The condition list is not represented in the UCD as a formal property.
47 #
48 # A language ID is defined by BCP 47, with '-' and '_' treated equivalen…
49 #
50 # A casing context for a character is defined by Section 3.13 Default Ca…
51 # of The Unicode Standard.
52 #
53 # Parsers of this file must be prepared to deal with future additions to…
54 # * Additional contexts
55 # * Additional fields
56 # ======================================================================…
57
58 # ======================================================================…
59 # Unconditional mappings
60 # ======================================================================…
61
62 # The German es-zed is special--the normal mapping is to SS.
63 # Note: the titlecase should never occur in practice. It is equal to tit…
64
65 00DF; 00DF; 0053 0073; 0053 0053; # LATIN SMALL LETTER SHARP S
66
67 # Preserve canonical equivalence for I with dot. Turkic is handled below.
68
69 0130; 0069 0307; 0130; 0130; # LATIN CAPITAL LETTER I WITH DOT ABOVE
70
71 # Ligatures
72
73 FB00; FB00; 0046 0066; 0046 0046; # LATIN SMALL LIGATURE FF
74 FB01; FB01; 0046 0069; 0046 0049; # LATIN SMALL LIGATURE FI
75 FB02; FB02; 0046 006C; 0046 004C; # LATIN SMALL LIGATURE FL
76 FB03; FB03; 0046 0066 0069; 0046 0046 0049; # LATIN SMALL LIGATURE FFI
77 FB04; FB04; 0046 0066 006C; 0046 0046 004C; # LATIN SMALL LIGATURE FFL
78 FB05; FB05; 0053 0074; 0053 0054; # LATIN SMALL LIGATURE LONG S T
79 FB06; FB06; 0053 0074; 0053 0054; # LATIN SMALL LIGATURE ST
80
81 0587; 0587; 0535 0582; 0535 0552; # ARMENIAN SMALL LIGATURE ECH YIWN
82 FB13; FB13; 0544 0576; 0544 0546; # ARMENIAN SMALL LIGATURE MEN NOW
83 FB14; FB14; 0544 0565; 0544 0535; # ARMENIAN SMALL LIGATURE MEN ECH
84 FB15; FB15; 0544 056B; 0544 053B; # ARMENIAN SMALL LIGATURE MEN INI
85 FB16; FB16; 054E 0576; 054E 0546; # ARMENIAN SMALL LIGATURE VEW NOW
86 FB17; FB17; 0544 056D; 0544 053D; # ARMENIAN SMALL LIGATURE MEN XEH
87
88 # No corresponding uppercase precomposed character
89
90 0149; 0149; 02BC 004E; 02BC 004E; # LATIN SMALL LETTER N PRECEDED BY APO…
91 0390; 0390; 0399 0308 0301; 0399 0308 0301; # GREEK SMALL LETTER IOTA WI…
92 03B0; 03B0; 03A5 0308 0301; 03A5 0308 0301; # GREEK SMALL LETTER UPSILON…
93 01F0; 01F0; 004A 030C; 004A 030C; # LATIN SMALL LETTER J WITH CARON
94 1E96; 1E96; 0048 0331; 0048 0331; # LATIN SMALL LETTER H WITH LINE BELOW
95 1E97; 1E97; 0054 0308; 0054 0308; # LATIN SMALL LETTER T WITH DIAERESIS
96 1E98; 1E98; 0057 030A; 0057 030A; # LATIN SMALL LETTER W WITH RING ABOVE
97 1E99; 1E99; 0059 030A; 0059 030A; # LATIN SMALL LETTER Y WITH RING ABOVE
98 1E9A; 1E9A; 0041 02BE; 0041 02BE; # LATIN SMALL LETTER A WITH RIGHT HALF…
99 1F50; 1F50; 03A5 0313; 03A5 0313; # GREEK SMALL LETTER UPSILON WITH PSILI
100 1F52; 1F52; 03A5 0313 0300; 03A5 0313 0300; # GREEK SMALL LETTER UPSILON…
101 1F54; 1F54; 03A5 0313 0301; 03A5 0313 0301; # GREEK SMALL LETTER UPSILON…
102 1F56; 1F56; 03A5 0313 0342; 03A5 0313 0342; # GREEK SMALL LETTER UPSILON…
103 1FB6; 1FB6; 0391 0342; 0391 0342; # GREEK SMALL LETTER ALPHA WITH PERISP…
104 1FC6; 1FC6; 0397 0342; 0397 0342; # GREEK SMALL LETTER ETA WITH PERISPOM…
105 1FD2; 1FD2; 0399 0308 0300; 0399 0308 0300; # GREEK SMALL LETTER IOTA WI…
106 1FD3; 1FD3; 0399 0308 0301; 0399 0308 0301; # GREEK SMALL LETTER IOTA WI…
107 1FD6; 1FD6; 0399 0342; 0399 0342; # GREEK SMALL LETTER IOTA WITH PERISPO…
108 1FD7; 1FD7; 0399 0308 0342; 0399 0308 0342; # GREEK SMALL LETTER IOTA WI…
109 1FE2; 1FE2; 03A5 0308 0300; 03A5 0308 0300; # GREEK SMALL LETTER UPSILON…
110 1FE3; 1FE3; 03A5 0308 0301; 03A5 0308 0301; # GREEK SMALL LETTER UPSILON…
111 1FE4; 1FE4; 03A1 0313; 03A1 0313; # GREEK SMALL LETTER RHO WITH PSILI
112 1FE6; 1FE6; 03A5 0342; 03A5 0342; # GREEK SMALL LETTER UPSILON WITH PERI…
113 1FE7; 1FE7; 03A5 0308 0342; 03A5 0308 0342; # GREEK SMALL LETTER UPSILON…
114 1FF6; 1FF6; 03A9 0342; 03A9 0342; # GREEK SMALL LETTER OMEGA WITH PERISP…
115
116 # IMPORTANT-when iota-subscript (0345) is uppercased or titlecased,
117 # the result will be incorrect unless the iota-subscript is moved to th…
118 # of any sequence of combining marks. Otherwise, the accents will go on…
119 # This process can be achieved by first transforming the text to NFC be…
120 # E.g. <alpha><iota_subscript><acute> is uppercased to <ALPHA><acute><I…
121
122 # The following cases are already in the UnicodeData.txt file, so are on…
123
124 # 0345; 0345; 0399; 0399; # COMBINING GREEK YPOGEGRAMMENI
125
126 # All letters with YPOGEGRAMMENI (iota-subscript) or PROSGEGRAMMENI (iot…
127 # have special uppercases.
128 # Note: characters with PROSGEGRAMMENI are actually titlecase, not upper…
129
130 1F80; 1F80; 1F88; 1F08 0399; # GREEK SMALL LETTER ALPHA WITH PSILI AND Y…
131 1F81; 1F81; 1F89; 1F09 0399; # GREEK SMALL LETTER ALPHA WITH DASIA AND Y…
132 1F82; 1F82; 1F8A; 1F0A 0399; # GREEK SMALL LETTER ALPHA WITH PSILI AND V…
133 1F83; 1F83; 1F8B; 1F0B 0399; # GREEK SMALL LETTER ALPHA WITH DASIA AND V…
134 1F84; 1F84; 1F8C; 1F0C 0399; # GREEK SMALL LETTER ALPHA WITH PSILI AND O…
135 1F85; 1F85; 1F8D; 1F0D 0399; # GREEK SMALL LETTER ALPHA WITH DASIA AND O…
136 1F86; 1F86; 1F8E; 1F0E 0399; # GREEK SMALL LETTER ALPHA WITH PSILI AND P…
137 1F87; 1F87; 1F8F; 1F0F 0399; # GREEK SMALL LETTER ALPHA WITH DASIA AND P…
138 1F88; 1F80; 1F88; 1F08 0399; # GREEK CAPITAL LETTER ALPHA WITH PSILI AND…
139 1F89; 1F81; 1F89; 1F09 0399; # GREEK CAPITAL LETTER ALPHA WITH DASIA AND…
140 1F8A; 1F82; 1F8A; 1F0A 0399; # GREEK CAPITAL LETTER ALPHA WITH PSILI AND…
141 1F8B; 1F83; 1F8B; 1F0B 0399; # GREEK CAPITAL LETTER ALPHA WITH DASIA AND…
142 1F8C; 1F84; 1F8C; 1F0C 0399; # GREEK CAPITAL LETTER ALPHA WITH PSILI AND…
143 1F8D; 1F85; 1F8D; 1F0D 0399; # GREEK CAPITAL LETTER ALPHA WITH DASIA AND…
144 1F8E; 1F86; 1F8E; 1F0E 0399; # GREEK CAPITAL LETTER ALPHA WITH PSILI AND…
145 1F8F; 1F87; 1F8F; 1F0F 0399; # GREEK CAPITAL LETTER ALPHA WITH DASIA AND…
146 1F90; 1F90; 1F98; 1F28 0399; # GREEK SMALL LETTER ETA WITH PSILI AND YPO…
147 1F91; 1F91; 1F99; 1F29 0399; # GREEK SMALL LETTER ETA WITH DASIA AND YPO…
148 1F92; 1F92; 1F9A; 1F2A 0399; # GREEK SMALL LETTER ETA WITH PSILI AND VAR…
149 1F93; 1F93; 1F9B; 1F2B 0399; # GREEK SMALL LETTER ETA WITH DASIA AND VAR…
150 1F94; 1F94; 1F9C; 1F2C 0399; # GREEK SMALL LETTER ETA WITH PSILI AND OXI…
151 1F95; 1F95; 1F9D; 1F2D 0399; # GREEK SMALL LETTER ETA WITH DASIA AND OXI…
152 1F96; 1F96; 1F9E; 1F2E 0399; # GREEK SMALL LETTER ETA WITH PSILI AND PER…
153 1F97; 1F97; 1F9F; 1F2F 0399; # GREEK SMALL LETTER ETA WITH DASIA AND PER…
154 1F98; 1F90; 1F98; 1F28 0399; # GREEK CAPITAL LETTER ETA WITH PSILI AND P…
155 1F99; 1F91; 1F99; 1F29 0399; # GREEK CAPITAL LETTER ETA WITH DASIA AND P…
156 1F9A; 1F92; 1F9A; 1F2A 0399; # GREEK CAPITAL LETTER ETA WITH PSILI AND V…
157 1F9B; 1F93; 1F9B; 1F2B 0399; # GREEK CAPITAL LETTER ETA WITH DASIA AND V…
158 1F9C; 1F94; 1F9C; 1F2C 0399; # GREEK CAPITAL LETTER ETA WITH PSILI AND O…
159 1F9D; 1F95; 1F9D; 1F2D 0399; # GREEK CAPITAL LETTER ETA WITH DASIA AND O…
160 1F9E; 1F96; 1F9E; 1F2E 0399; # GREEK CAPITAL LETTER ETA WITH PSILI AND P…
161 1F9F; 1F97; 1F9F; 1F2F 0399; # GREEK CAPITAL LETTER ETA WITH DASIA AND P…
162 1FA0; 1FA0; 1FA8; 1F68 0399; # GREEK SMALL LETTER OMEGA WITH PSILI AND Y…
163 1FA1; 1FA1; 1FA9; 1F69 0399; # GREEK SMALL LETTER OMEGA WITH DASIA AND Y…
164 1FA2; 1FA2; 1FAA; 1F6A 0399; # GREEK SMALL LETTER OMEGA WITH PSILI AND V…
165 1FA3; 1FA3; 1FAB; 1F6B 0399; # GREEK SMALL LETTER OMEGA WITH DASIA AND V…
166 1FA4; 1FA4; 1FAC; 1F6C 0399; # GREEK SMALL LETTER OMEGA WITH PSILI AND O…
167 1FA5; 1FA5; 1FAD; 1F6D 0399; # GREEK SMALL LETTER OMEGA WITH DASIA AND O…
168 1FA6; 1FA6; 1FAE; 1F6E 0399; # GREEK SMALL LETTER OMEGA WITH PSILI AND P…
169 1FA7; 1FA7; 1FAF; 1F6F 0399; # GREEK SMALL LETTER OMEGA WITH DASIA AND P…
170 1FA8; 1FA0; 1FA8; 1F68 0399; # GREEK CAPITAL LETTER OMEGA WITH PSILI AND…
171 1FA9; 1FA1; 1FA9; 1F69 0399; # GREEK CAPITAL LETTER OMEGA WITH DASIA AND…
172 1FAA; 1FA2; 1FAA; 1F6A 0399; # GREEK CAPITAL LETTER OMEGA WITH PSILI AND…
173 1FAB; 1FA3; 1FAB; 1F6B 0399; # GREEK CAPITAL LETTER OMEGA WITH DASIA AND…
174 1FAC; 1FA4; 1FAC; 1F6C 0399; # GREEK CAPITAL LETTER OMEGA WITH PSILI AND…
175 1FAD; 1FA5; 1FAD; 1F6D 0399; # GREEK CAPITAL LETTER OMEGA WITH DASIA AND…
176 1FAE; 1FA6; 1FAE; 1F6E 0399; # GREEK CAPITAL LETTER OMEGA WITH PSILI AND…
177 1FAF; 1FA7; 1FAF; 1F6F 0399; # GREEK CAPITAL LETTER OMEGA WITH DASIA AND…
178 1FB3; 1FB3; 1FBC; 0391 0399; # GREEK SMALL LETTER ALPHA WITH YPOGEGRAMME…
179 1FBC; 1FB3; 1FBC; 0391 0399; # GREEK CAPITAL LETTER ALPHA WITH PROSGEGRA…
180 1FC3; 1FC3; 1FCC; 0397 0399; # GREEK SMALL LETTER ETA WITH YPOGEGRAMMENI
181 1FCC; 1FC3; 1FCC; 0397 0399; # GREEK CAPITAL LETTER ETA WITH PROSGEGRAMM…
182 1FF3; 1FF3; 1FFC; 03A9 0399; # GREEK SMALL LETTER OMEGA WITH YPOGEGRAMME…
183 1FFC; 1FF3; 1FFC; 03A9 0399; # GREEK CAPITAL LETTER OMEGA WITH PROSGEGRA…
184
185 # Some characters with YPOGEGRAMMENI also have no corresponding titlecas…
186
187 1FB2; 1FB2; 1FBA 0345; 1FBA 0399; # GREEK SMALL LETTER ALPHA WITH VARIA …
188 1FB4; 1FB4; 0386 0345; 0386 0399; # GREEK SMALL LETTER ALPHA WITH OXIA A…
189 1FC2; 1FC2; 1FCA 0345; 1FCA 0399; # GREEK SMALL LETTER ETA WITH VARIA AN…
190 1FC4; 1FC4; 0389 0345; 0389 0399; # GREEK SMALL LETTER ETA WITH OXIA AND…
191 1FF2; 1FF2; 1FFA 0345; 1FFA 0399; # GREEK SMALL LETTER OMEGA WITH VARIA …
192 1FF4; 1FF4; 038F 0345; 038F 0399; # GREEK SMALL LETTER OMEGA WITH OXIA A…
193
194 1FB7; 1FB7; 0391 0342 0345; 0391 0342 0399; # GREEK SMALL LETTER ALPHA W…
195 1FC7; 1FC7; 0397 0342 0345; 0397 0342 0399; # GREEK SMALL LETTER ETA WIT…
196 1FF7; 1FF7; 03A9 0342 0345; 03A9 0342 0399; # GREEK SMALL LETTER OMEGA W…
197
198 # ======================================================================…
199 # Conditional Mappings
200 # The remainder of this file provides conditional casing data used to pr…
201 # full case mappings.
202 # ======================================================================…
203 # Language-Insensitive Mappings
204 # These are characters whose full case mappings do not depend on languag…
205 # depend on context (which characters come before or after). For more in…
206 # see the header of this file and the Unicode Standard.
207 # ======================================================================…
208
209 # Special case for final form of sigma
210
211 03A3; 03C2; 03A3; 03A3; Final_Sigma; # GREEK CAPITAL LETTER SIGMA
212
213 # Note: the following cases for non-final are already in the UnicodeData…
214
215 # 03A3; 03C3; 03A3; 03A3; # GREEK CAPITAL LETTER SIGMA
216 # 03C3; 03C3; 03A3; 03A3; # GREEK SMALL LETTER SIGMA
217 # 03C2; 03C2; 03A3; 03A3; # GREEK SMALL LETTER FINAL SIGMA
218
219 # Note: the following cases are not included, since they would case-fold…
220
221 # 03C3; 03C2; 03A3; 03A3; Final_Sigma; # GREEK SMALL LETTER SIGMA
222 # 03C2; 03C3; 03A3; 03A3; Not_Final_Sigma; # GREEK SMALL LETTER FINAL SI…
223
224 # ======================================================================…
225 # Language-Sensitive Mappings
226 # These are characters whose full case mappings depend on language and p…
227 # context (which characters come before or after). For more information
228 # see the header of this file and the Unicode Standard.
229 # ======================================================================…
230
231 # Lithuanian
232
233 # Lithuanian retains the dot in a lowercase i when followed by accents.
234
235 # Remove DOT ABOVE after "i" with upper or titlecase
236
237 0307; 0307; ; ; lt After_Soft_Dotted; # COMBINING DOT ABOVE
238
239 # Introduce an explicit dot above when lowercasing capital I's and J's
240 # whenever there are more accents above.
241 # (of the accents used in Lithuanian: grave, acute, tilde above, and ogo…
242
243 0049; 0069 0307; 0049; 0049; lt More_Above; # LATIN CAPITAL LETTER I
244 004A; 006A 0307; 004A; 004A; lt More_Above; # LATIN CAPITAL LETTER J
245 012E; 012F 0307; 012E; 012E; lt More_Above; # LATIN CAPITAL LETTER I WIT…
246 00CC; 0069 0307 0300; 00CC; 00CC; lt; # LATIN CAPITAL LETTER I WITH GRAVE
247 00CD; 0069 0307 0301; 00CD; 00CD; lt; # LATIN CAPITAL LETTER I WITH ACUTE
248 0128; 0069 0307 0303; 0128; 0128; lt; # LATIN CAPITAL LETTER I WITH TILDE
249
250 # ======================================================================…
251
252 # Turkish and Azeri
253
254 # I and i-dotless; I-dot and i are case pairs in Turkish and Azeri
255 # The following rules handle those cases.
256
257 0130; 0069; 0130; 0130; tr; # LATIN CAPITAL LETTER I WITH DOT ABOVE
258 0130; 0069; 0130; 0130; az; # LATIN CAPITAL LETTER I WITH DOT ABOVE
259
260 # When lowercasing, remove dot_above in the sequence I + dot_above, whic…
261 # This matches the behavior of the canonically equivalent I-dot_above
262
263 0307; ; 0307; 0307; tr After_I; # COMBINING DOT ABOVE
264 0307; ; 0307; 0307; az After_I; # COMBINING DOT ABOVE
265
266 # When lowercasing, unless an I is before a dot_above, it turns into a d…
267
268 0049; 0131; 0049; 0049; tr Not_Before_Dot; # LATIN CAPITAL LETTER I
269 0049; 0131; 0049; 0049; az Not_Before_Dot; # LATIN CAPITAL LETTER I
270
271 # When uppercasing, i turns into a dotted capital I
272
273 0069; 0069; 0130; 0130; tr; # LATIN SMALL LETTER I
274 0069; 0069; 0130; 0130; az; # LATIN SMALL LETTER I
275
276 # Note: the following case is already in the UnicodeData.txt file.
277
278 # 0131; 0131; 0049; 0049; tr; # LATIN SMALL LETTER DOTLESS I
279
280 # EOF
281
You are viewing proxied material from suckless.org. The copyright of proxied material belongs to its original authors. Any comments or complaints in relation to proxied material should be directed to the original authors of the content concerned. Please see the disclaimer for more details.