EPUB2TXT by Ben Collver <
[email protected]>
===============================================
* Description
* DOS Instructions
* Unix Instructions
* Requirements
* Known Limitations
* Regression testing
* Source code
Description
===========
epub2txt.awk converts EPUB files to plain text. It can run on DOS
and Unix.
Output:
* index.html - HTML content, restructured for readability.
HTML content requires LFN support on DOS.
* plaintxt/index.txt - Plain text with UTF-8 encoding
* plaintxt/index.dos - Plain text with CP437 encoding, hard wrapped
DOS Instructions
================
* LFN (Long File Name) support is required to run this script on DOS.
If your DOS lacks LFN support, run:
doslfn.com
* Make sure the EPUB2TXT environment variable matches your directory
path.
SET EPUB2TXT=C:\epub2txt
* Change to the directory where you want the output to go
cd \book
* Run the script
\epub2txt\epub2txt.bat \Documents\book.epub
It can take a long time to process, depending on the size of the EPUB
file and how puny the DOS machine is. Be patient and perhaps go do
something else for a while.
WARNING: Don't use GAWK.EXE to run this script on DOS.
When i tested this script using DJGPP GAWK.EXE on FreeDOS, it was
prone to FAT corruption. It seemed to be affected by the BUFFERS=
setting and the memory manager, but i could not find a stable,
working configuration. So i used NAWK32.EXE instead.
Unix Instructions
=================
* Make sure all the required utilities are in your path
* Change to the directory where you want the output to go
cd ~/book
* Run the script
~/epub2txt/epub2txt.awk ~/Documents/book.epub
It can take a while to process.
Requirements
============
I have already included the required utilities to run this script on
DOS. I list them in parenthesis below.
This script requires the following commands in your path:
* cp (gnucp.exe, from DJGPP cp.exe)
* awk (nawk32.exe)
* find (gnufind.exe, from DJGPP find.exe)
* unzip (
https://infozip.sf.net/)
* utf8tocp (
gopher://tilde.pink/1/~bencollver/files/dos/util/utf8tocp/)
* webdump (
gopher://codemadness.org/1/phlog/webdump/)
* xml2tsv.awk (xml2tsv.bat)
On DOS, this script also requires the following in your path:
* comp.com (from FreeDOS)
* deltree.com (from FreeDOS)
* doslfn.com (from FreeDOS)
* redir.exe (from DJGPP)
Notes:
* This script uses a modified version of utf8tocp in order to
transliterate "unknown" Unicode codepoints to meaningful CP437
and ASCII equivalents.
Known Limitations
=================
* No support for SVG images
* No precautions against malicious/pathological EPUB files
Regression testing
==================
See tests/readme.txt for details on regression testing.
On DOS, the test scripts rely on deltree.com
Source code
===========
Download or view the source code at:
https://chiselapp.com/user/bencollver/repository/epub2txt