= Text has styles

Or: Gopher content isn't as simple as it first appears!


Plain text isn't plain.

I'm not talking about encoding formats - that's a whole 'nother
ball of wax.

I mean, well-structured plain text has *visual* cues to indicate
title, author, section titles, lists, paragraphs, block quotes,
code listings, etc.

In that sense, it has 'styles'.

What the heck are you talking about today, Ratfactor, you
crazy animal?

Let's consider a concrete example: the RFC.



== RFC

Request For Comments (RFCs) are the publications by which
Internet Standards are proposed (in truth, they usually *become*
the canonical standards).  They are (almost) always submitted
as ASCII text and follow the rules described in RFC 2223.
I'll quote a few formatting rules:

       - pages are limited to 58 lines

       - lines are limited to 72 chars (cols)

       - do not attempt to justify right margin

       - single-space between words

       - separate paragraphs with one blank line

It goes on to specify headers and footers (single-line, RFC ####,
centered title, date, page number, etc.), the format and content
of the first (cover) page, and standard and required sections
in the body of the document.

RFCs are plain text, but it would be an unbelievable pain
to format them by hand.  So they are (almost always) authored
using a member of the *roff typesetting family.  (Fun aside,
RFC 2223 was written in 1997 and at that time, even the Lords of
the Internet could not get nroff to do the right thing between
page breaks, so they describe piping the nroff output through
a Perl script called 'fix.pl' to complete the formatting process!)

RFCs are formatted plain text.  But they aren't formatted the
way I would format a plain text email message to a friend.
And they're not how I'd format a post to this phlog.  So RFCs
have a 'style' all their own - even though they're what we refer
to as "plain text".

So text has styles.

Let's explore this.

But first, a bit of a meander through the world of document markup
languages - starting with the one we've just been discussing,
troff.



== Troff

I became quite interested in *roffs (which I'll refer to as
just 'troff' for the remainder of this post) a while back
because they are part of the standard Unix text formatting and
publishing/printing ecosystem.

(By the way, 'roff' comes from the phrase "run off" as in "run
the document off on the printer" and the original programs
were created to generate a typeset document for very specific
(and very expensive) printers back in the 1960s and 1970s.
The programs became increasingly generalized and powerful as they
progressed from nroff ("newer") to troff ("typesetter") to ditroff
("device independant") and taken under the GNU wing as groff.)

For a while, I had a good time sending really high quality
documents through my laser printer from various groff macros.
I could also produce PDFs, PostScript files, and even formatted
text documents.  I kinda thought I'd finally discovered the One
True Plain Text Document Format.

But after actually *using* raw troff formatting and then, ms,
mm, and mom macros for a while, I realized that I hated it.
Yeah the quality was great and the tool was fast and ubiquitous.
But in the age of Markdown, troff is a freakishly old-fashioned
and painful format to type.  It's noisy and cryptic and really
gets in the way.

The macros help (mom, in particular, is really nice if your
document is a good fit for what it provides), but not enough.

The _real_ problem with troff, though, is that it's more about
the appearance of things rather than being about the semantic
structure of documents.  (The macro packages do help with this.)
This makes perfect sense given its origins.



== Markdown

I believe that the document source file should be as
human-readable (and typeable! and memorable!) as possible.

Markdown provides this.  Markdown is limited and has some serious
shortcomings, so while I adore it over, say, Microsoft Word,
I was never happy with using it as the One True Plain Text
Document Format.

The real positive impact that Markdown's popularity has done
above all else, I think, is that it has re-introduced the idea
that "plain" text _has_ structure, even if it's entirely ad hoc.
By taking unwritten rules for text formatting from sources such
as email and Usenet, John Gruber and Aaron Swartz codified and
popularized in Markdown the rules we were already playing by.
The reason for the popularity isn't important - it only matters
that it is.

Whatever Markdown's failings and limitations, it has undoubtably
made the Web a better place.

I've made several mentions of Markdown's problems, but I
should probably make it a little clearer what I mean before
continuing on.  I could make a list, but it mostly comes down
to three categories:

       1. The spec is ambiguous, so we have many different
          interpretations (how to make part of a word italic,
          for example)

       2. It's _too_ limited for "real world" documents (tables,
          footnotes), so we have many different extensions to
          Markdown to fill the gaps

       3. The link syntax is stupid and I hate it

Okay, that 3rd one is mine.  But you'll find plenty of supporting
material for the first two if you look hard enough on the Web.
The point it, Markdown is simply _not_ enough, on its own,
for documents of any complexity.  That's a strength as well as
a weakness.

After that, to be completely clear, let me state again that
I really like Markdown and (especially) what it has done for
popularizing a readable source format for textual documents!



== Alternatives to Markdown

Having established an actual need for an alternative to Markdown,
what do we have?

A curated and opinionated list:

       troff        no fun to read/write, emphasis on typesetting

       HTML         perfect except no fun to read/write

       TeX              (and LaTeX) powerful; noisy and cryptic

       RTF              TeX without any of the advantages

       creole       a bit noisy and wiki-centric

       reST         reStructuredText, fine but a bit eccentric

       EtText       inspiration for Markdown, emphasis on HTML

       Textile      another language from the Markdown era

       Org-mode     essentially tied to Emacs, specialized

       Texinfo      Great for 1986, never took off outside GNU

       docbook      comprehensive format for books!; XML :-(

       AsciiDoc     docbook without the XML!

(Keep in mind that these comments and comparisons are based on
ONE use-case: having a source format that is nearly effortless to
type (like Markdown) and yet is capable of enough complexity to
satisfy _most_ document needs: a memo, an article, a blog/phlog
post, a novel, a technical book, etc.  When I dismiss HTML,
it is not because it isn't capable (it's _very_ capable) but
it's no fun to type.  I've been hand-writing the stuff for over
twenty years now and I know it well, but it's NO FUN TO TYPE
when I just want to create some dang content.)

When you look at the initial release date for a lot of these,
you find that the idea really came to a head in the early 2000s
with reStructuredText in 2001, AsciiDoc and Textile in 2002,
Org-mode in 2003, Texy and Markdown in 2004 (and a huge number
of less notable others throughout - heck, I had my own little
goofy line-based format to generate parts of my website around
that time...).

It's so interesting how ideas for inventions occur almost
simultaneously in different parts of the world across human
history: math concepts, cars, radios, etc.

       "It takes a thousand men to invent a telegraph, or a steam
       engine, or a phonograph, or a photograph, or a telephone
       or any other important thing - and the last man gets the
       credit and we forget the others. He added his little mite
       - that is all he did. These object lessons should teach
       us that ninety-nine parts of all things that proceed from
       the intellect are plagiarisms, pure and simple; and the
       lesson ought to make us modest. But nothing can do that."
               -- Mark Twain

Okay, enough asides.

So you may have noticed that I listed AsciiDoc last and did not
list any negatives.  Savvy readers may have surmised that I have
picked my horse in the race.

You would be right.



== AsciiDoc

So what makes AsciiDoc so compelling?

Without going into an exhaustive history and waxing on and on
about AsciiDoc, I'd like to mention some highlights:

       - high quality implementations are available (AsciiDoctor)

       - with a large number of output document types

       - it's equivalent to DocBook, made for authoring books!

       - O'Reilly books are (or were) authored with it

       - it's as readable as Markdown

That final statement is, of course, completely subjective.
I don't *love* every little bit of AsciiDoc syntax.  But it
doesn't get in my way.

The biggest factors for me are the high quality implementations
for generating HTML and the fact that it has all of the semantic
structure needed to create honest-to-goodness published technical
books!

These, to me, are the hallmarks of a One True Plain Text Document
Format.  So I've been slowly-but-surely converting just about
everything I wrote to AsciiDoc.

(To that end, I picked Hugo as a static generator for my website
(scrapping my 4th generation home-grown static site generator)
because it can use AsciiDoctor as the backend HTML renderer
(and with some modifications it can render HTML that doesn't
make me want to stab my hand with a fork when I "view source".))

What I'd like to do is use the AsciiDoc format for generating
Gopher content:

       1. It's a proven format

       2. I can start with a subset of AsciiDoc and add as needed

       3. I can seamlessly publish my Web site content to Gopher

But wait, everything we've been talking about so far is for
producing HTML output (or PDF or PostScript, etc.).  Why would
I need to do anything at all to my AsciiDoc content?  Couldn't I
just upload the raw page source to Gopher and call it a day?

Yeah, kinda.  You could, in theory, read the content just fine.
But there's a difference between text that is marginally readable
in a Gopher client and text which was purpose-crafted to be
beautifully readable at a fixed column width containing ASCII
art and cute little section breaks and other niceties.

Let's come full circle and see if we can tie this all together.



== Back to "plain text has styles" and let's talk about Gopher

In my _Gopher Logging in Eleven Lines of Shell_ phlog post [1]
I made an ill-fated attempt to create a tiny streamlined process
for publishing entries.

(There were a couple of problems with it, but the number one
was that using GNU 'fmt' completely wrecked my code example
and ASCII directory tree.  fmt does a beautiful job with prose
(including indents), but doesn't know how to leave other types of
things alone.  'fold' has the same problem.  'par', on the other
hand, is a replacement for 'fmt' and seems to be far more capable.
I will be experimenting with it to post this today.)

One thing that became clear to me is that I have already defined
certain 'styles' for my Gopher phlog posts, such as tab-indented
source code examples.  And that I'm still experimenting with
the 'style' of other elements such as section titles, lists,
title/header, etc.

And Gopher's tricky because many clients assume fixed-column
layouts, so we have to be really careful about long lines.  Now I
have some opinions about that state of affairs - but would take
me way off topic and I think this is getting long enough for a
Phlog entry as it is.

Anyway, when I *do* come up with my personal styles of Gopher
content, it would be really neat to be able to run all of my
source documents through a renderer and have them all come out
with the same styles.  To do that, my source documents need to
store the *semantic* information about the document structure
and elements in a common format.

Ah, so, enter Ratfactor's choice for the One True Plain Text
Document Format: AsciiDoc!

And now, at last, I can truly demonstrate what I mean when I
say plain text 'styles' (and not just 'formatting').



== Example

Imagine there is a hypothetical language TSS: Text Style Sheets.

Let's imagine we have a working renderer already built and
we're feeding it two elements: a 'style sheet' (.tss) and an
AsciiDoc source file (.adoc).  Our imaginary renderer (actually
yours truly and his keyboard) will produce Gopher-formatted text
before your very eyes!


First, here's the content of our article source, 'clowns.adoc':

       ----------------------------------------------------------
       = Clown

       Clowns are closely related to parametric
       polymorphism. For example, note that the type of
       wiggler as specified would be the parametrically
       polymorphic type bing -> [bang] -> Barf.

               function clown(){
                       if(honk==toot) mime_style = false;
                       emit_gags();
                       return 1;
               }

       Of course, there is a more common definition:

       > A clown is a comic performer who employs
       slapstick or similar types of physical comedy,
       often in a mime style. (Wikipedia)

       == More clowning around

       Now we shall consider the clown as a mammal...


And here's a (syntax hypothetical) 'simple.tss':

       ----------------------------------------------------------
       body {
               width: 40;
       }
       title {
               text-align: center;
               template: "*** $title ***";
               margin-bottom: 2;
       }
       h2 {
               margin-top: 2;
               text-transform: uppercase;
       }
       blockquote {
               margin-left: 1t;
       }



And here is the glorious output of me, the human Gopher phlog
renderer applying the styles of 'simple.tss':

       ----------------------------------------------------------
                          *** Clown ***


       Clowns are closely related to
       parametric polymorphism. For example,
       note that the type of wiggler as
       specified would be the parametrically
       polymorphic type bing -> [bang]
       -> Barf.

               function clown(){
                       if(honk==toot) mime_style = false;
                       emit_gags();
                       return 1;
               }

       Of course, there is a more common
       definition:

               A clown is a comic performer
               who employs slapstick or
               similar types of physical
               comedy, often in a mime
               style. (Wikipedia)


       MORE CLOWNING AROUND

       Now we shall consider the clown as a
       mammal...



(Note how the body of the document is now constrained to
40 columns, the title is centered, the heading is changed to
all-caps and has a margin of two lines above it, and so on.)

I was planning to have multiple examples, but I think this
should be enough to get the idea across.  Anyway, that stuff
takes work, whew!

I'm not married to the name 'TSS', to the syntax (a la CSS),
or anything of the rest of it (except AsciiDoc).  As a matter
of fact, I have long been a big critic of CSS as it is used on
the Web, but this usage is actually *much* more closely aligned
with CSSs strengths, so it may not be a bad choice.

At first, this seemed like a fairly simple thing, but even just
in typing the above example, I can see how the complexity of
the 'style' specification language could quickly spiral out of
control!  It would be smart to take a look at existing template
languages and things like troff's macros for ideas.

Clearly, an opinionated AsciiDoc-to-Gopher converter without user
styles would be an order of magnitude easier to create. If,
hypothetically speaking, I were to start on such a project,
that's where I'd start.



== Conclusion

Ratfactor is a crazy sewer rat.



                           *   *   *

Community notes:

1. I finally tried out solderpunk's VF-1 gopher
  client. Fantastic! I'm not sure if it will oust lynx or not,
  but it's full of really great ideas and I love the keyboard
  navigation. :-)

2. Thanks to tomasino for adding me to the Phlog Roll on
  gopher.black! [2]  I didn't realize you were speaking of a
  published list when you mentioned it on IRC the other day.
  That's really cool and I appreciate the vote of confidence. :-)


[1] gopher://sdf.org/0/users/ratfactor/phlog/2018-08-13-Gopher-Logging-in-Eleven-Lines-of-Shell
[2] gopher://gopher.black/1/moku-pona/