EPUB2TXT by Ben Collver <[email protected]>
===============================================
* Description
* DOS Instructions
* Unix Instructions
* Requirements
* Known Limitations
* Regression testing
* Source code

Description
===========
epub2txt.awk converts EPUB files to plain text.  It can run on DOS
and Unix.

Output:

* index.html         - HTML content, restructured for readability.
                      HTML content requires LFN support on DOS.
* plaintxt/index.txt - Plain text with UTF-8 encoding
* plaintxt/index.dos - Plain text with CP437 encoding, hard wrapped

DOS Instructions
================

* LFN (Long File Name) support is required to run this script on DOS.
 If your DOS lacks LFN support, run:

   doslfn.com

* Make sure the EPUB2TXT environment variable matches your directory
 path.

   SET EPUB2TXT=C:\epub2txt

* Change to the directory where you want the output to go

   cd \book

* Run the script

   \epub2txt\epub2txt.bat \Documents\book.epub

It can take a long time to process, depending on the size of the EPUB
file and how puny the DOS machine is.  Be patient and perhaps go do
something else for a while.

WARNING: Don't use GAWK.EXE to run this script on DOS.

When i tested this script using DJGPP GAWK.EXE on FreeDOS, it was
prone to FAT corruption.  It seemed to be affected by the BUFFERS=
setting and the memory manager, but i could not find a stable,
working configuration.  So i used NAWK32.EXE instead.

Unix Instructions
=================

* Make sure all the required utilities are in your path

* Change to the directory where you want the output to go

   cd ~/book

* Run the script

   ~/epub2txt/epub2txt.awk ~/Documents/book.epub

It can take a while to process.

Requirements
============
I have already included the required utilities to run this script on
DOS. I list them in parenthesis below.

This script requires the following commands in your path:

* cp (gnucp.exe, from DJGPP cp.exe)
* awk (nawk32.exe)
* find (gnufind.exe, from DJGPP find.exe)
* unzip (https://infozip.sf.net/)
* utf8tocp (gopher://tilde.pink/1/~bencollver/files/dos/util/utf8tocp/)
* webdump (gopher://codemadness.org/1/phlog/webdump/)
* xml2tsv.awk (xml2tsv.bat)

On DOS, this script also requires the following in your path:
* comp.com (from FreeDOS)
* deltree.com (from FreeDOS)
* doslfn.com (from FreeDOS)
* redir.exe (from DJGPP)

Notes:

* This script uses a modified version of utf8tocp in order to
 transliterate "unknown" Unicode codepoints to meaningful CP437
 and ASCII equivalents.

Known Limitations
=================
* No support for SVG images
* No precautions against malicious/pathological EPUB files

Regression testing
==================
See tests/readme.txt for details on regression testing.

On DOS, the test scripts rely on deltree.com

Source code
===========
Download or view the source code at:

https://chiselapp.com/user/bencollver/repository/epub2txt