Introduction
Introduction Statistics Contact Development Disclaimer Help
View source
# 2024-08-13 - EPUB2TXT, Convert EPUB To Plain Text On DOS
Recently I wrote an XML parser in AWK.
gopher://tilde.pink/1/~bencollver/log/2024-08-01-xml2tsv
I let it loose into the black abyss of DOS. It came back with books
in a bag of holding.
Now i am announcing epub2txt.awk, a script to convert EPUB files to
plain text for reading on DOS. It it relies on a bunch of utilities
to do this conversion. The only one i modified was UTF8TOCP.COM from
FreeDOS. I changed it to decode unknown Unicode codepoints into
\uHHHH and \UHHHHHHHH formats instead of just '?'.
It takes patience because the script can run for a long time.
Once completed, it produces several outputs.
The plaintxt/ directory contains short 8.3 filenames for all of the
images used in the EPUB; index.txt is UTF-8 encoded; index.dos is
CP437 encoded, has hard wrapped paragraphs, and has ASCII-art
transliterations for some Unicode codepoints.
The plaintxt/ directory would be a good one to zip up for viewing
on any DOS machine.
index.html gives a table of contents to read the EPUB content in
a web browser. In FreeDOS i can view this with inline images by
running:
links -g -mode 1024x768x256 index.html
I tested epub2txt with nawk on FreeDOS 1.3 and MS-DOS 6.22. I tested
it with gawk, mawk, and nawk on Slackware 15. I advise running on
Unix if possible, then copying plaintxt over to DOS for viewing.
To download the DOS package:
gopher://tilde.pink/1/~bencollver/files/dos386/util/epub2txt/
To view the source code:
https://chiselapp.com/user/bencollver/repository/epub2txt
See the directions in README.TXT and have fun!
* * *
I found a very similar project written in C. It is very fast. It
does not include references to images and external links. See link
below for code and details.
gopher://tilde.club/1/~freet/gophhub/?https://github.com/kevinboone/epub2txt2
tags: bencollver,retrocomputing,technical
# Tags
bencollver
retrocomputing
technical
You are viewing proxied material from tilde.pink. The copyright of proxied material belongs to its original authors. Any comments or complaints in relation to proxied material should be directed to the original authors of the content concerned. Please see the disclaimer for more details.