2024-08-13 - EPUB2TXT, Convert EPUB To Plain Text On DOS
========================================================

Recently I wrote an XML parser in AWK.

<gopher://tilde.pink/1/~bencollver/log/2024-08-01-xml2tsv>

I let it loose into the black abyss of DOS.  It came back with books
in a bag of holding.

Now i am announcing epub2txt.awk, a script to convert EPUB files to
plain text for reading on DOS.  It it relies on a bunch of utilities
to do this conversion.  The only one i modified was UTF8TOCP.COM from
FreeDOS.  I changed it to decode unknown Unicode codepoints into
\uHHHH and \UHHHHHHHH formats instead of just '?'.

It takes patience because the script can run for a long time.
Once completed, it produces several outputs.

The plaintxt/ directory contains short 8.3 filenames for all of the
images used in the EPUB; index.txt is UTF-8 encoded; index.dos is
CP437 encoded, has hard wrapped paragraphs, and has ASCII-art
transliterations for some Unicode codepoints.

The plaintxt/ directory would be a good one to zip up for viewing
on any DOS machine.

index.html gives a table of contents to read the EPUB content in
a web browser.  In FreeDOS i can view this with inline images by
running:

   links -g -mode 1024x768x256 index.html

I tested epub2txt with nawk on FreeDOS 1.3 and MS-DOS 6.22.  I tested
it with gawk, mawk, and nawk on Slackware 15.  I advise running on
Unix if possible, then copying plaintxt over to DOS for viewing.

To download the DOS package:

<gopher://tilde.pink/1/~bencollver/files/dos386/util/epub2txt/>

To view the source code:

<https://chiselapp.com/user/bencollver/repository/epub2txt>

See the directions in README.TXT and have fun!

* * *

I found a very similar project written in C.  It is very fast.  It
does not include references to images and external links.  See link
below for code and details.

<gopher://tilde.club/1/~freet/gophhub/?
https://github.com/kevinboone/epub2txt2>

tags: bencollver,retrocomputing,technical

Tags
====

bencollver
<gopher://tilde.pink/1/~bencollver/log/tag/bencollver/>
retrocomputing
<gopher://tilde.pink/1/~bencollver/log/tag/retrocomputing/>
technical
<gopher://tilde.pink/1/~bencollver/log/tag/technical/>