View source | |
# 2024-08-13 - EPUB2TXT, Convert EPUB To Plain Text On DOS | |
Recently I wrote an XML parser in AWK. | |
gopher://tilde.pink/1/~bencollver/log/2024-08-01-xml2tsv | |
I let it loose into the black abyss of DOS. It came back with books | |
in a bag of holding. | |
Now i am announcing epub2txt.awk, a script to convert EPUB files to | |
plain text for reading on DOS. It it relies on a bunch of utilities | |
to do this conversion. The only one i modified was UTF8TOCP.COM from | |
FreeDOS. I changed it to decode unknown Unicode codepoints into | |
\uHHHH and \UHHHHHHHH formats instead of just '?'. | |
It takes patience because the script can run for a long time. | |
Once completed, it produces several outputs. | |
The plaintxt/ directory contains short 8.3 filenames for all of the | |
images used in the EPUB; index.txt is UTF-8 encoded; index.dos is | |
CP437 encoded, has hard wrapped paragraphs, and has ASCII-art | |
transliterations for some Unicode codepoints. | |
The plaintxt/ directory would be a good one to zip up for viewing | |
on any DOS machine. | |
index.html gives a table of contents to read the EPUB content in | |
a web browser. In FreeDOS i can view this with inline images by | |
running: | |
links -g -mode 1024x768x256 index.html | |
I tested epub2txt with nawk on FreeDOS 1.3 and MS-DOS 6.22. I tested | |
it with gawk, mawk, and nawk on Slackware 15. I advise running on | |
Unix if possible, then copying plaintxt over to DOS for viewing. | |
To download the DOS package: | |
gopher://tilde.pink/1/~bencollver/files/dos386/util/epub2txt/ | |
To view the source code: | |
https://chiselapp.com/user/bencollver/repository/epub2txt | |
See the directions in README.TXT and have fun! | |
* * * | |
I found a very similar project written in C. It is very fast. It | |
does not include references to images and external links. See link | |
below for code and details. | |
gopher://tilde.club/1/~freet/gophhub/?https://github.com/kevinboone/epub2txt2 | |
tags: bencollver,retrocomputing,technical | |
# Tags | |
bencollver | |
retrocomputing | |
technical |