| View source | |
| # 2024-08-13 - EPUB2TXT, Convert EPUB To Plain Text On DOS | |
| Recently I wrote an XML parser in AWK. | |
| gopher://tilde.pink/1/~bencollver/log/2024-08-01-xml2tsv | |
| I let it loose into the black abyss of DOS. It came back with books | |
| in a bag of holding. | |
| Now i am announcing epub2txt.awk, a script to convert EPUB files to | |
| plain text for reading on DOS. It it relies on a bunch of utilities | |
| to do this conversion. The only one i modified was UTF8TOCP.COM from | |
| FreeDOS. I changed it to decode unknown Unicode codepoints into | |
| \uHHHH and \UHHHHHHHH formats instead of just '?'. | |
| It takes patience because the script can run for a long time. | |
| Once completed, it produces several outputs. | |
| The plaintxt/ directory contains short 8.3 filenames for all of the | |
| images used in the EPUB; index.txt is UTF-8 encoded; index.dos is | |
| CP437 encoded, has hard wrapped paragraphs, and has ASCII-art | |
| transliterations for some Unicode codepoints. | |
| The plaintxt/ directory would be a good one to zip up for viewing | |
| on any DOS machine. | |
| index.html gives a table of contents to read the EPUB content in | |
| a web browser. In FreeDOS i can view this with inline images by | |
| running: | |
| links -g -mode 1024x768x256 index.html | |
| I tested epub2txt with nawk on FreeDOS 1.3 and MS-DOS 6.22. I tested | |
| it with gawk, mawk, and nawk on Slackware 15. I advise running on | |
| Unix if possible, then copying plaintxt over to DOS for viewing. | |
| To download the DOS package: | |
| gopher://tilde.pink/1/~bencollver/files/dos386/util/epub2txt/ | |
| To view the source code: | |
| https://chiselapp.com/user/bencollver/repository/epub2txt | |
| See the directions in README.TXT and have fun! | |
| * * * | |
| I found a very similar project written in C. It is very fast. It | |
| does not include references to images and external links. See link | |
| below for code and details. | |
| gopher://tilde.club/1/~freet/gophhub/?https://github.com/kevinboone/epub2txt2 | |
| tags: bencollver,retrocomputing,technical | |
| # Tags | |
| bencollver | |
| retrocomputing | |
| technical |