kconv version 0.3 -- readme

Copyright (C) 1996, Eccosys Co. Ltd. (Sen Nagata <[email protected]>)

kconv is a module for Python (tested on 1.3 and 1.4) which allows one to
convert text in a Japanese encoding scheme (choose one from JIS, SJIS
(Shift JIS), EUC at a time) to an encoding scheme in that same group of
schemes (yuck -- what a poor sentence -- basically, this means that you
should be able to convert from JIS to SJIS or EUC, SJIS to JIS or EUC, and
EUC to JIS or SJIS).  It is based on the Kconv kanji conversion package for
perl -- which in turn is based on nkf (network kanji filter).

nkf was originally written by Itaru Ichikawa of Fujitsu LTD. - see nkf.c
for his copyright notice.

This code uses a modified nkf.c written by Ikuo Nakagawa for the perl
Kconv package.  (ftp://ftp.intec.co.jp/pub/utils/ -- the file name should
look like Kconv-1.1.tar.gz)

I am not sure of what terms to apply to this software, but since it would
be silly not to specify any, I will temporarily use a notice similar to
the one I found in the original nkf.c.

       Copyright (C) 1996, Eccosys Co. Ltd. (Sen Nagata)
       For the portion of the software written by Sen Nagata
       (this excludes the included files nkf.c and kconv.h),
       everyone is permitted to do anything on this program
       including copying, modifying, improving as long as you don't
       try to make money off it, or pretend that you wrote it.
       i.e., the above copyright notice has to appear in all copies.
       THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE.

I think this is consistent with the previous copyright statements.
(Note: I'm not firm on the above copyright statement holding for future
versions (> 0.3) of the software.)

Look in the file INSTALL for installation instructions.

Sen Nagata ([email protected])

P.S.  If anyone knows of other Japanese conversion packages for Python,
     please let me know!

--------------------------------------------------------------------------

Some Questions
--------------

- It's clear how to define python 'functions' using C libraries...so,
 how do you define 'constants' or 'variables-with-values'?
 In particular, how do I get 'AUTO', 'JIS', etc. to have appropriate
 values through the process of 'import kconv'?

- How useful would it be to have a 'python-native' kanji conversion
 ability?  I guess it would save people the trouble of trying to
 get the C code to compile on non-UNIX platforms.  Practically
 speaking, how good would a port of jcode for perl be?

- Am I dealing with reference counting correctly in this module?

- Would a code-detection function be useful?  What kind of results
 should such a beast return?  It seems like a good idea for it
 to make a guess if necessary and if it does guess, to indicate that
 it is guessing, and if possible to give some measure of how confident
 it is of its guess.

- Is there a good way to deal with half-width katakana?

--------------------------------------------------------------------------

Statement of Thanks
-------------------

Thanks go out to (in no particular order):

- Ted Crossman ([email protected]) for his help in getting this working.
 You were a great help Ted!  I really appreciate it!

- Guido van Rossum for Python - and everyone else for making Python what
 it is today and contributing to the Python home page at
 http://www.python.org/

- KDD Labs for their mirror of the Python home page at
 http://w3.lab.kdd.co.jp/www.python.org/

- Mark Lutz for the 'Programming Python' book.

- Ken Lunde for the 'Japanese Information Processing' book a.k.a. 'the
 Blowfish book'.

- Itaru Ichikawa for nkf.

- Ikuo Nakagawa for Kconv (and modifications to nkf).

- Shohei Takeuchi for his helpful comments about Kconv.c.

- People who document their code well!!!!  There aren't enough of you out
 there!


--------------------------------------------------------------------------


Various Encoding Scheme Wishes:
-------------------------------

- Single ecoding scheme for all languages.
- Abolishment of half-width katkana.
- Existence of an algorithm for telling EUC apart from SJIS 99.9% of
 the time.
- Existence of a similarly reliable algorithm for the detection of
 half-width katakana.