2024-08-13 - EPUB2TXT, Convert EPUB To Plain Text On DOS
========================================================
Recently I wrote an XML parser in AWK.
<
gopher://tilde.pink/1/~bencollver/log/2024-08-01-xml2tsv>
I let it loose into the black abyss of DOS. It came back with books
in a bag of holding.
Now i am announcing epub2txt.awk, a script to convert EPUB files to
plain text for reading on DOS. It it relies on a bunch of utilities
to do this conversion. The only one i modified was UTF8TOCP.COM from
FreeDOS. I changed it to decode unknown Unicode codepoints into
\uHHHH and \UHHHHHHHH formats instead of just '?'.
It takes patience because the script can run for a long time.
Once completed, it produces several outputs.
The plaintxt/ directory contains short 8.3 filenames for all of the
images used in the EPUB; index.txt is UTF-8 encoded; index.dos is
CP437 encoded, has hard wrapped paragraphs, and has ASCII-art
transliterations for some Unicode codepoints.
The plaintxt/ directory would be a good one to zip up for viewing
on any DOS machine.
index.html gives a table of contents to read the EPUB content in
a web browser. In FreeDOS i can view this with inline images by
running:
links -g -mode 1024x768x256 index.html
I tested epub2txt with nawk on FreeDOS 1.3 and MS-DOS 6.22. I tested
it with gawk, mawk, and nawk on Slackware 15. I advise running on
Unix if possible, then copying plaintxt over to DOS for viewing.
To download the DOS package:
<
gopher://tilde.pink/1/~bencollver/files/dos386/util/epub2txt/>
To view the source code:
<
https://chiselapp.com/user/bencollver/repository/epub2txt>
See the directions in README.TXT and have fun!
* * *
I found a very similar project written in C. It is very fast. It
does not include references to images and external links. See link
below for code and details.
<
gopher://tilde.club/1/~freet/gophhub/?
https://github.com/kevinboone/epub2txt2>
tags: bencollver,retrocomputing,technical
Tags
====
bencollver
<
gopher://tilde.pink/1/~bencollver/log/tag/bencollver/>
retrocomputing
<
gopher://tilde.pink/1/~bencollver/log/tag/retrocomputing/>
technical
<
gopher://tilde.pink/1/~bencollver/log/tag/technical/>