With a bit of python, lynx, and tidy I was able to pull very clean
  plain text versions of my WordPress posts. The sparse HTML can be found
  at [1]http://tokyogringo.myjp.net and the markdown text version can be
  found on my gopher site at [2]gopher://sdf.org:70/0/users/tokyogringo/

  How did I do it? This site has full text RSS for everyone's enjoyment.
  No one has to actually visit https://www.prjorgensen.com in order to
  consume the high value content I generate. The feed contains everything
  needed for this plain text life. How to make use of it?

  I fumbled through my first in a long time python script relying heavily
  on the very powerful feedparser module.

  This Just In: python's documentation is terse almost to the point of
  incomprehension While accurate, the documentation does not help
  beginning (and maybe middling) python coders get to solving problems.
  Oddly, the Reddits and StackExchange sites are also of limited utility
  as the answers there often point back to or copy the documentation.

  Anyway, taking a very Unix approach I decided not to do everything in
  python. I know tidy for making valid HTML. I know lynx for
  terminal-based web browsing, and the '-dump' option produces markdown
  versions of web pages.

  Once I got the script to the point of providing the website data in a
  reliable and eventually parse-able way, then I turned to getting all my
  posts.

  I cranked the RSS feed of prjorgensen.com up to 20,000 to make sure the
  feed briefly included all of my posts. I moved my parsing script to my
  MacBook Pro because I didn't want to choke the sdf.org servers with my
  madness. I installed modules and localized the script to run on the
  MBP.

  I ran the script. I checked my email. I then got up to … hmmm. The
  script finished in under two minutes. Suddenly I had all of my posts
  back to 2011 in both very clean HTML and in plain text. I synced them
  to their proper home. I reset my website feed back to a more reasonable
  number.

  There are any number of improvements I can make:
    * My script does not grab images
    * I capture categories and tags from WordPress but don't do anything
      useful with them
    * I need to include modifying my gophermap and my index.html (as
      appropriate)
    * A full text RSS feed of the plain HTML site
    * A full text RSS feed of the gopher site
    * Maybe use a static web site generator like Jekyll for the plain
      HTML site
    * Maybe use this for tokyogringo.com and PVCSec.com? If so, then I
      need to handle …
    * Media enclosures

  Watch this space for the link to my script on GitHub. Which is [3]here!
    __________________________________________________________________

  My original entry is here: [4]Plain text life, including gopher. It
  posted Fri, 08 Jun 2018 11:51:38 +0000.
  Filed under: administrivia, tech,

References

  1. http://tokyogringo.myjp.net/
  2. gopher://sdf.org/0/users/tokyogringo/
  3. https://github.com/zenshinji/gopher-parser
  4. https://www.prjorgensen.com/?p=1203