* * * * *
Changing the historical record of my blog
Twenty-one years ago I was worried about loosing the historical presentation
of my blog [1] both because it was template driven, and through the use of
CSS (Cascading Style Sheets). Changes that effect everything at once
certainly appeared quite Orwellian to me, although I might be in a very small
minority in worring about this.
And yet, since then, I've tweaked the CSS quite a bit since I wrote that. I
figure I'm not changing the content, so it's okay. right?
It was over a year ago when I noticed that a lot of my earlier entries had
the initial paragraph shifted over to the left, due to a change in the
template file I made around 2003. The old template had an initial <P> tag so
I didn't have to type it, and the new one removed said tag. That left maybe a
thousand posts (give or take) that needed fixing. I started doing the job
manually at first, then gave up at the sheer number of posts to fix. Again,
it was not changing the content but fixing the presentation. And it bothered
me that there were posts that weren't formatted correctly.
About a week or two ago, I realized that the markup I used for foreign words:
-----[ HTML ]-----
<span lang="de" title="My hovercraft is full of eels">Mein Luftkissenfahrzeug ist voller Aale</span>
-----[ END OF LINE ]-----
is probably not sematically sound HTML (HyperText Markup Language). I even
wrote about that issue twenty years ago [2], and now realize it should be:
-----[ HTML ]-----
<i lang="de" title="My hovercraft is full of eels">Mein Luftkissenfahrzeug ist voller Aale</i>
-----[ END OF LINE ]-----
Around the same time, I read up on the “proper” use of <BLOCKQUOTE [3]> and
that the attribution should appear outside the blockquote, not inside as I've
been doing for years, even though I was doing The Right Thing™ when I first
started blogging, but changed for some reason I long forgot.
And then several days ago, I noticed the sample BASIC code [4] was incorrect
and it was bugging me—the keyword THEN would always show up as THENNOT. How
that happened is a topic for another post [5], but in the meantime, I decided
to fix the issue without mentioning it. The change didn't change the intended
meaning of the post, it was fixing incorrect output, not saying we were
always at war with Eastasia.
After that, I decided to go back and fix the “formatting” issues in the blog.
I have code that will read entries and parse the HTML I use into into an AST
(Abstract Syntax Tree) (or should it be a DOM (Document Object Model), even
though I'm using Lua, not Javascript?) which I use to generate the Gopher [6]
and Gemini [7] versions. To fix the initial paragraph issue, all I needed to
do was identify the entries that didn't start with a <P> tag and just prefix
the raw text with said tag.
To update the HTML for foreign words, it was enough to identify entries with
<SPAN LANG="language"> and with some sed magic, switch it to read <I
LANG="language"> (and fix the corresponding closing tags). It's just fixing
the semantics of the HTML, not changing the past, right?
The fix for the <BLOCKQUOTE> issue wasn't quite so easy—I still had over 700
entries that needed to be fixed, so I ended up writing code that would spit
out the parsed HTML back into HTML. It would have been easy to output it as:
-----[ HTML ]-----
<p>I've been following the various Linux <abbr title="Initial Public Offerin
g">IPO</abbr>s and today I see that <a class="external" href="
http://www.val
inux.com/">VA Linux Systems</a> had their <a class="external" href="http://d
ailynews.yahoo.com/h/nm/19991209/bs/markets_valinux_1.html">IPO today.</a>.
Briefly, it IPOed (can you verb a TLA? Can you verb the word “verb?” Whate
ver … ) at US$30 and opened at US$299. Inbloodysane.</p><p><a class="extern
al" href="
http://www.andover.net/">Andover.Net</a> wasn't nearly as inbloody
sane.</p>
-----[ END OF LINE ]-----
one long line—the browsers don't care, but I do if I ever have to go back and
edit this. Instead, I want the output to still be editable:
-----[ HTML ]-----
<p>I've been following the various Linux <abbr title="Initial Public
Offering">IPO</abbr>s and today I see that <a class="external"
href="
http://www.valinux.com/">VA Linux Systems</a> had their <a
class="external"
href="
http://dailynews.yahoo.com/h/nm/19991209/bs/markets_valinux_1.html">IPO
today.</a>. Briefly, it IPOed (can you verb a TLA? Can you verb the word
“verb?” Whatever … ) at US$30 and opened at US$299. Inbloodysane.</p>
<p><a class="external" href="
http://www.andover.net/">Andover.Net</a> wasn't
nearly as inbloodysane.</p>
-----[ END OF LINE ]-----
That meant handling not only <P> but all the block level tags in HTML,
<BLOCKQUOTE>, <TABLE>, <DL> (which I use for emails [8] and screenplay dialog
[9]), <UL>, <OL>, and <PRE>. Now that I have that working, I can identify the
citation paragraphs for blockquotes, and move them to the appropriate
location.
I'm about to do that, yet I'm still a bit hesitent. Yes, it's just fixing the
semantic presentation, but now that I have the code to read and write HTML,
future mass changes are easy to do.
I'm probably thinking too much on this.
I think.
[1]
gopher://gopher.conman.org/0Phlog:2002/07/23.1
[2]
gopher://gopher.conman.org/0Phlog:2003/02/05.2
[3]
https://html.spec.whatwg.org/#the-blockquote-element
[4]
gopher://gopher.conman.org/0Phlog:2023/05/10.1
[5]
gopher://gopher.conman.org/0Phlog:2023/09/27.1
[6]
gopher://gopher.conman.org:70/1Phlog:
[7] gemini://gemini.conman.org/boston/
[8]
gopher://gopher.conman.org/0Phlog:2023/03/01.1
[9]
gopher://gopher.conman.org/0Phlog:2008/06/20.1
Email author at
[email protected]