* <<G15.1092>> Good lord, where have my meditations brought me?
Reading the subtext of my last couple entries, it's becoming clear to
me that I am actually suggesting that semantically tagged text, with
those tags interleaved in the text stream, is preferable to the
semantically barren chaos that is plain-text. That is to say, I
believe something like HTML is *better* for the recording of human
language than plain-text. A shocking turn of events since I have
cited Nelson's "Embedded Markup Considered Harmful" more than a few
times, and been in support of it.

The fact is, plain-text is full of explicit semantic markup already
in the form of punctuation, and it's full of implicit semantic markup
in the form of space.

One of the troubles with punctuation is that many important marks
have been tasked with multiple semantic functions over the centuries.
The period, for example, in the context of a sentence may be an
end-of-sentence signifier or an abbreviation marker, and when
embedded inside a number it indicates the decimal point; the single
quotation mark is infamously double-tasked with the function of an
apostrophe in addition to its nominal purpose; numerous characters
are used for different functions when part of a mathematical
expression, and so-on.

Despite these problems with the explicit semantics of punctuation, it
is a far greater problem to deal with the implicit semantics of
space.  In the absence of explicit markers for them(1), paragraphs
must be implicitly indicated by the presence of empty space. This may
be blank lines above and below the paragraph, or it may be an
indentation of the first line. Likewise, section headings and titles
are indicated by the presence of empty visual space.

When writing in plain-text, authors are constantly devising their own
idiosyncratic conventions of heading placement, section breaks,
asterisms, and so-on.  Human readers can, of course, work-out the
meanings of these inventions, but software isn't necessarily so
clever.   Is it important to the author that his asterism is one
asterisk in the center of the page, or three asterisks separated by
spaces at the beginning of a line?  Or is it simply important that
there be an asterism?  If the latter, shouldn't there be a code?

To some extent, ambiguities could be alleviated by the use of
typographical marks to make the implicit explicit: stick a pilcrow at
the beginning of every paragraph, a section-sign at the beginning of
every heading, use an ellipsis character instead of three periods,
use an asterism character instead of asterisks, and so-on.  If one
were to do this, the whitespace in a plain-text document could be
collapsed, and the structure of the document would be retained –
unless, that is, the author wanted to put a pilcrow in the middle of
a paragraph illustratively: ¶.  What about that, then?  That is why
you'd need control codes, or markers, or tokens, or whatever you want
to call them: "characters" that are not meant to be printed as such,
but which are intended to trigger behaviours in the software
rendering/processing the document.

The thrust of what needs to be said about "plain" text is this: in
order for true plaintext, in order to escape the reality of
plain-text as a teletypewriter operation language, we need to
eradicate any assumption of a PAGE.  Text is a STREAM, it is
one-dimensional – Plaintext needs to be readable on a single-line
LCD display, it needs to work on screen readers for the blind or
deaf-blind,…

On the space character itself: its most common and important function
is as word-separator. But spaces also separate sentences, clauses,
and other structures, despite this being the explicit function of
other characters such as comma, parenthesis, semicolon, and terminal
punctuation.

--
Excerpted from:

PUBLIC NOTES (G)
http://alph.laemeur.com/txt/PUBNOTES-G
©2016 Adam C. Moore (LÆMEUR) <[email protected]>