Locales mini-HOWTO
 Peeter Joot, [email protected]
 v1.5, 21 July 1997

 This document describes how to set up your Linux machine to use
 locales.

 1.  Introduction

 This is really a description of what I had to do to get localedef
 installed, compile some locales, and try them out.  I did this just
 for fun, and thought that perhaps some people would be interested in
 trying it out themselves.  Once it is set up you should be able to use
 NLS enabled applications with the locale of your choice.  After a
 while, locale support should be part of the standard distributions,
 and most of this mini-HOWTO will be redundant.

 2.  What is a "locale" anyhow?

 Locales encapsulate some of the language/culture specific things that
 you shouldn't hard code in your programs.

 If you have various locales installed on your computer then you can
 select via the following list of environment variables how a locale
 sensitive program will behave.  The default locale is the C, or POSIX
 locale which is hard coded in libc.

    LANG
       This sets the locale, but can be overridden with any other
       LC_xxxx environment variables

    LC_COLLATE
       Sort order.

    LC_CTYPE
       Character definitions, uppercase, lowercase, ...  These are used
       by the functions like toupper, tolower, islower, isdigit, ...

    LC_MONETARY
       Contains the information necessary to format money in the
       fashion expected.  It has the definitions of things like the
       thousands separator, decimal separator, and what the monetary
       symbol is and how to position it.

    LC_NUMERIC
       Thousands, and decimal separators, and the numeric grouping
       expected.

    LC_TIME
       How to specify the time, and date.  This has the things like the
       days of the week, and months of the year in abbreviated, and non
       abbreviated form.

    LC_MESSAGES
       Yes, and No expressions.

    LC_ALL
       This sets the locale, and overrides any other LC_xxxx
       environment variables.

 Here are some other locales, and there are lots more.

    en_CA
       English Canadian.

    en_US
       US English.

    de_DE
       Germany's German.

    fr_FR
       France's French.

 If you are writing a program, and want to to be usable internationally
 you should utilize locales.  The most glaring reason for this is that
 not everybody is going to use the same character set/code page as you.

 Make sure in your programs that you don't do things like:

      /* check for alphabetic characters */
      if ( (( c >= 'a') && ( c <= 'z' )) ||
           (( c >= 'A') && ( c <= 'Z' )) ) { ... }

 If you write that type of code your program assumes that the
 user/file/... is ASCII and nothing but ASCII, and it does not respect
 the code page definitions of the user's locale.  For example it
 preludes characters such as a-umelaut which would be used in a German
 environment.  What you should do instead is use the locale sensitive
 functions like isalpha().  If your program does expliticly require use
 of only US-ASCII alphabetics, you still use the isalpha() function,
 but you must also either do setlocale(LC_CTYPE,"C") or set the LANG,
 LC_CTYPE, or LC_ALL environment variables to "C".

 Locales allow a large degree of flexibility and make certain
 assumptions that a programmer may have made in ASCII based C programs
 invalid.

 For instance, you cannot assume the code positions of characters.
 There is nothing stopping you from creating a charmap file that
 defines the code position of 'A' to be 0xC1 rather than 0x41.  This is
 in fact the code point mapping for 'A' in IBM code page 37, used on
 mainframes, while the former is used for US-ASCII, iso8859-x, and
 others.

 The basic idea is different people speak different languages, expect
 different sorting orders, use different code pages, and live in
 different countries.  Locales and locale sensitive programs give one a
 means to respect such things, and handle them accordingly.  It is not
 really much extra work to do so, it just requires a slightly different
 frame of mind when writing programs.

 3.  Notes.

 �  In order to set up locales on my machine I had to upgrade a few
    things.  Apparently ftp.tu-clausthal.de:/pub/linux/SLT/nls contains
    a a.out version of locale and localedef (in the file
    nlsutils-0.5.tar.gz), so if you don't have an ELF system, or don't
    want one you can use the above.  There is probably a copy of the
    nlsutils package some other place, but I have not looked for it.  I
    hadn't known that there was a stand alone version of locale and
    localedef, and kind of figured that you would have to have the
    corresponding libc installed.  Because of this a lot of this HOWTO
    is just a log of what I had to do to upgrade libc and family.  If
    you do this, as I have you, will need to be running an ELF system,
    or upgrade to one as you set up your locales.

 �  The sorts of system upgrades that I did are the same sort of
    upgrades that have to be done to upgrade from a.out to ELF.  If you
    haven't done this, or if you have upgraded to ELF by reinstalling
    Linux then you should get the resent ELF HOWTO from a sunsite
    mirror.  This is an excellent guide, and gives additional guidance
    for installing libc, ld.so, and other ELF system upgrades.

 �  For anything that you install, read the appropriate release notes,
    or README type files.  If you mess up your system by
    misinterpreting something that I say here, or ( hopefully not ) by
    doing something that I say in here, please don't blame me.

 �  Mis-installing a new libc, and ld.so, could leave you with an
    unbootable system.  You probably ought to have a boot disk handy,
    and make sure any critical, non-replaceable, data is backed up.

 4.  What you need.

 A few things need to be down loaded from various places.  Everything
 here except for the locale source files can be obtained from
 sunsite.unc.edu, tsx-11.mit.edu, or, preferably, a local mirror of
 these sites.  When I did this originally I used libc-5.2.18, which is
 now quite out of date.  As of now I have been told that the current
 libc is 5.4.17, and this substitution has been made below.  However,
 libc 5.4.17, will likely be old before you can blink, so just use the
 lastest version when you do this.

 You may want to consider using glibc (gnu libc) rather than Linux libc
 5 for any internationalization work.  As of now glibc 2.0.4 (gnu libc)
 is available but no distributions have started using it as the
 standard libc yet (at least for Intel based Linux distributions).  As
 well as being fully reentrant and having built in threading support,
 glibc is fully internationalized and has excellent
 internationalization support for programming.  What
 internationalization has been done in libc 5 has been mostly taken
 from glibc.  The locales and charmaps for glibc are bundled with the
 the glibc locale add on.

 If you opt for using glibc then you can ignore this mini-howto.
 Including the locale add on in the glibc compilation and installation
 is trivial, and is covered in the glibc installation documentation.
 Be warned that a full upgrade is not a trivial job!  I am hoping that
 redhat (which I use) will have a glibc based release soon, as I am not
 inclined to recompile my entire system.

 �  locale, and charmap sources --- These are what you compile using
    localedef.

 �  libc-5.4.17.bin.tar.gz --- the ELF shared libraries for the c and
    math libraries.  Note that the precompiled program localedef for
    libc.5.4.17 is apparently corrupt and creates LC_CTYPE with invalid
    magic number.  This probably means that an older localedef got into
    the binary distribution.

 �  libc-5.4.17.tar.gz --- the source code for the ELF shared
    libraries.  You may need this to compile localedef.

 �  make-3.74.tar.gz --- you may need to compile make to incorporate a
    patch to fix the dirent bug.

 �  release.libc-5.2.18 --- these release notes have the patch to make
    make.  it's been a while since this make bug happened, and it is
    likely that you don't have to worry about it.

 �  ld.so-1.7.12+ --- the dynamic linker.

 �  ELF gcc-2.7.2+ --- to compile things.

 �  an ELF kernel ( eg. 2.0.xx )  --- to compile things.

 �  binutils 2.6.0.2+ --- to compile things.

 There are probably lots of places that you can get locale sources.  I
 have found public domain locale and charmap sources at
 dkuug.dk:/i18n/WG15-collection/locales
 <ftp://dkuug.dk/i18n/WG15-collection/locales> and
 dkuug.dk:/i18n/WG15-collection/charmaps
 <ftp://dkuug.dk/i18n/WG15-collection/charmaps>  respectively.

 5.  Installing everything.

 This is what I did to install everything.  I already had an ELF system
 ( compiler, kernel, ... ) installed before I did this.

 1. First I installed the binutils package.  tar xzf
    binutils-2.6.0.2.bin.tar.gz -C /

 2. Next I installed the dynamic linker:

      tar zxf ld.so-1.7.12.tar.gz -C /usr/src
      cd /usr/src/ld.so-1.7.12
      sh instldso.sh

 3. Next I installed the libc binaries.  See release.libc-5.4.17 for
    more instructions.

      rm -f /usr/lib/libc.so /usr/lib/libm.so
      rm -f /usr/include/iolibio.h /usr/include/iostdio.h
      rm -f /usr/include/ld_so_config.h /usr/include/localeinfo.h
      rm -rf /usr/include/netinet /usr/include/net /usr/include/pthread
      tar -xzf libc-5.4.17.bin.tar.gz -C /

 4. Now ldconfig must be run to locate the new shared libraries.
    ldconfig -v.

 5. There is a bug that was fixed in libc that breaks make, and some
    other programs.  Here is what I did in order to rebuild and install
    make.

      tar zxf make-3.74.tar.gz -C /usr/src
      cd /usr/src/make-3.74
      patch < /whereever_you_put_it/release.libc-5.4.17
      configure --prefix=/usr
      sh build.sh
       ./make install
      cd ..
      rm -rf make-2.74

 6. Now localedef can be compiled and installed.

      mkdir /usr/src/libc
      tar zxf libc-5.4.17.tar.gz -C /usr/src/libc
      cd /usr/src/libc
      cd include
      ln -s /usr/src/linux/include/asm .
      ln -s /usr/src/linux/include/linux .
      cd ../libc
       ./configure
      # I am not sure if these two makes are necessary, but just to be safe :
      make clean ; make depend
      cd locale
      make programs
      mv localedef /usr/local/bin
      mv locale /usr/local/bin

 7. Put the charmaps where localedef will find them.  This uses the
    charmaps and locale sources which I down loaded from dkuug.dk ftp
    site as charmaps.tar, and locales.tar respectively.  The older
    localedef (5.2.18) looked in /usr/share/nls/charmap for charmap
    sources, but now localedef looks in /usr/share/i18n/charmaps and
    /usr/share/i18n/locales by default for the charmap and locale
    sources:

      mkdir /usr/share/i18n
      mkdir /usr/share/i18n/charmaps
      mkdir /usr/share/i18n/locales
      tar xf charmaps.tar -C /usr/share/i18n/charmaps
      tar xf locales.tar -C /usr/share/i18n/locales

 The newer localedef (5.4.17) has been made smarter and will look for
 other locale source files when handling the `copy' statement, whereas
 the older localedef needed to have the locale objects already created
 in order to handle the copy statement.  This list of commands has the
 dependencies sorted out and can be used to generate all the locale
 objects regardless of which libc version is being used, but you should
 now be able to create only the ones that you wish.

      localedef -ci en_DK -f ISO_8859-1:1987 en_DK
      localedef -ci sv_SE -f ISO_8859-1:1987 sv_SE
      localedef -ci fi_FI -f ISO_8859-1:1987 fi_FI
      localedef -ci sv_FI -f ISO_8859-1:1987 sv_FI
      localedef -ci ro_RO -f ISO_8859-1:1987 ro_RO
      localedef -ci pt_PT -f ISO_8859-1:1987 pt_PT
      localedef -ci no_NO -f ISO_8859-1:1987 no_NO
      localedef -ci nl_NL -f ISO_8859-1:1987 nl_NL
      localedef -ci fr_BE -f ISO_8859-1:1987 fr_BE
      localedef -ci nl_BE -f ISO_8859-1:1987 nl_BE
      localedef -ci da_DK -f ISO_8859-1:1987 da_DK
      localedef -ci kl_GL -f ISO_8859-1:1987 kl_GL
      localedef -ci it_IT -f ISO_8859-1:1987 it_IT
      localedef -ci is_IS -f ISO_8859-1:1987 is_IS
      localedef -ci fr_LU -f ISO_8859-1:1987 fr_LU
      localedef -ci fr_FR -f ISO_8859-1:1987 fr_FR
      localedef -ci de_DE -f ISO_8859-1:1987 de_DE
      localedef -ci de_CH -f ISO_8859-1:1987 de_CH
      localedef -ci fr_CH -f ISO_8859-1:1987 fr_CH
      localedef -ci en_CA -f ISO_8859-1:1987 en_CA
      localedef -ci fr_CA -f ISO_8859-1:1987 fr_CA
      localedef -ci fo_FO -f ISO_8859-1:1987 fo_FO
      localedef -ci et_EE -f ISO_8859-1:1987 et_EE
      localedef -ci es_ES -f ISO_8859-1:1987 es_ES
      localedef -ci en_US -f ISO_8859-1:1987 en_US
      localedef -ci en_GB -f ISO_8859-1:1987 en_GB
      localedef -ci en_IE -f ISO_8859-1:1987 en_IE
      localedef -ci de_LU -f ISO_8859-1:1987 de_LU
      localedef -ci de_BE -f ISO_8859-1:1987 de_BE
      localedef -ci de_AT -f ISO_8859-1:1987 de_AT
      localedef -ci sl_SI -f ISO_8859-2:1987 sl_SI
      localedef -ci ru_RU -f ISO_8859-5:1988 ru_RU
      localedef -ci pl_PL -f ISO_8859-2:1987 pl_PL
      localedef -ci lv_LV -f BALTIC lv_LV
      localedef -ci lt_LT -f BALTIC lt_LT
      localedef -ci iw_IL -f ISO_8859-8:1988 iw_IL
      localedef -ci hu_HU -f ISO_8859-2:1987 hu_HU
      localedef -ci hr_HR -f ISO_8859-4:1988 hr_HR
      localedef -ci gr_GR -f ISO_8859-7:1987 gr_GR

 6.  Now what.

 After doing all the stuff above you should now be able to use the
 locales that have been created.  Here is a simple example program.

 /* test.c : a simple test to see if the locales can be loaded, and
  * used */
 #include <locale.h>
 #include <stdio.h>
 #include <time.h>

 main(){
  time_t t;
  struct tm * _t;
  char buf[256];

  time(&t);
  _t = gmtime(&t);

  setlocale(LC_TIME,"");
  strftime(buf,256,"%c",_t);

  printf("%s\n",buf);
 }

 You can use the locale program to see what your current locale
 environment variable settings are.

      $ # compile the simple test program above, and run it with
      $ # some different locale settings
      $ gcc -s -o Test test.c
      $ # see what the current locale is :
      $ locale
      LANG=POSIX
      LC_COLLATE="POSIX"
      LC_CTYPE="POSIX"
      LC_MONETARY="POSIX"
      LC_NUMERIC="POSIX"
      LC_TIME="POSIX"
      LC_MESSAGES="POSIX"
      LC_ALL=
      $ # Ho, hum... we're using the boring C locale
      $ # let's change to English Canadian:
      $ export LC_TIME=en_CA
      $ Test
      Sat 23 Mar 1996 07:51:49 PM
      $ # let's try French Canadian:
      $ export LC_TIME=fr_CA
      $ Test
      sam 23 mar 1996 19:55:27

 7.  catopen bug fix.

 Installing the locales fixes a bug (feature ?)  that is in the catopen
 command in Linux libc.  Say you create a program that uses message
 catalogs, and you create an German catalog and put it in
 /home/peeter/catalogs/de_DE.

 Now upon doing the following, without the de_DE locale installed :

 export LC_MESSAGES=de_DE
 export NLSPATH=/home/peeter/catalogs/%L/%N.cat:$NLSPATH

 the German message catalog does not get opened, and the default mes�
 sages in the catgets calls are used.

 This is because catopen does a setlocale call to get the right message
 category, the setlocale fails even though the environment variable has
 been set.  catopen then attempts to load the message catalog
 substituting "C" for all the "%L"'s in the NLSPATH.

 You can still use your message catalog without installing the locale,
 but you would have to explicitly set the "%L" part of the NLSPATH like

      export NLSPATH=/home/peeter/catalogs/de_DE/%N.cat:$NLSPATH

 , but this defeats the whole purpose of the locale catagory environ�
 ment variables.

 8.  Questions and Answers.

 This section could grow into a FAQ, but isn't really one yet.

 8.1.  msgcat question

 I am an user of LINUX, and have written the following test program:

      --------------------------------------------------------------------
      #include <stdio.h>
      #include <locale.h>
      #include <features.h>
      #include <nl_types.h>

      main(int argc, char ** argv)
      {
       nl_catd catd;

       setlocale(LC_MESSAGES, "");
       catd = catopen("msg", MCLoadBySet);
       fprintf(stderr,catgets(catd, 1, 1, "locale message fail\n"));
       catclose(catd);
      }
      --------------------------------------------------------------------
      $ msg.m
      $set 1

      1 locale message pass\n
      --------------------------------------------------------------------

 If I use absolute path in catopen like
 catopen("/etc/locale/msg.cat",MCLoadBySet); ,I got the right result.
 But,if I use above example,catopen return -1 (failure).

 8.2.  msgcat answer

 This question is sort of answered in the previous section, but here is
 some additional information.

 There are a number of valid places where you can put your message
 catalogs.  Even though you may not have NLSPATH explicitly defined in
 your environment settings it is defined in libc as follows :

      $ strings /lib/libc.so.5.4.17 | grep locale | grep %L
      /etc/locale/%L/%N.cat:/usr/lib/locale/%L/%N.cat:/usr
      /lib/locale/%N/%L:/usr/share/locale/%L/%N.cat:/usr/
      local/share/locale/%L/%N.cat

 so you if you have done one of :

      $ export LC_MESSAGES=en_CA
      $ export LC_ALL=en_CA
      $ export LANG=en_CA

 With the NLSPATH above and the specified environment , the
 catopen("msg", MCLoadBySet); should work if your message catalog has
 been copied to any one of :

      /etc/locale/en_CA/msg.cat
      /usr/lib/locale/en_CA/msg.cat
      /usr/lib/locale/msg/en_CA
      /usr/share/locale/en_CA/msg.cat
      /usr/local/share/locale/en_CA/msg.cat

 This, however, will not work if you don't have the en_CA locale
 installed because the setlocale will fail, and "C" will be substituted
 for "%L" in the catopen routine ( rather than "en_CA" ).

 9.  More information.

 Well that's it.  Hopefully this guide has been some help to you.
 There are probably lots of places that you can look for additional
 information on writing locale sensitive programs, and documents on
 internationalization, and localization in general.  I'll bet that if
 you browse the web a bit you will be able to find a lot of info.
 Ulrich Drepper who implemented much of the gnu internationalization
 code has some information about internationalization and localization
 on his home page <http://i44www.info.uni-karlsruhe.de/~drepper>, and
 you can look there to start.  There is also some information in the
 info pages for libc, and of course, there are always man pages.