# Docbook
by Seth Kenlon

Computers do math really well, and that's what they were used for, when
they were first invented. But it didn't take long for users to repurpose
their futuristic calculators into fancy, dynamic typewriters. Now
human-readable text drives computing, so choosing the right format for
the text you write is an important decision.

Docbook is an XML schema. XML is an extensible markup language a lot
like HTML. It's truly ubiquitous, but you may know it by RSS or Atom,
the Open Document formats of LibreOffice and Apache OpenOffice, Inkscape
and the SVG file format, and much more. In fact, it's safe to say that
if you own a computer or mobile, there's XML on it.

This is what it looks like in its raw form:

         <chapter>
       <title>My title goes here</title>

       <para>
         Paragraph text goes here.
       </para>

       <section>
         <title>A section title</title>

         <para>
           More paragraph text. Some in <emphasis>italics</emphasis>.
         </para>
       </section>
         </chapter>


Docbook itself is easy to learn and easy to write, and it's also one of
the most flexible formats available. What other formats, like Markdown
and reStructured Text lack, Docbook provides. And what Docbook doesn't
provide is made possible through generic XML.

But why bother learning Docbook in a time when simpler alternatives
exist? Why bother with a markup language at all when you can instead
impose a little structure to your otherwise plain text and end up with
highly-portable, computer and human-readable, data?

Settle in. All will be revealed.


## Fail faster

A distinct difference between working in simpler formats and working in
Docbook is that when you get something wrong in Docbook, you"re told
about it. Many other formats, like Markdown and HTML, fail silently. And
usually that feels good, because the end result is that your document is
rendered. You press the <span class="keycombo">Enter</span> key and your
document gets processed by whatever parser or processor it requires for
conversion, and you're done. What a great feeling.

The reality of failing silently, though, is that it has still failed.
You might have gotten output, and most of it might look just fine, but
what about the error that didn't get caught? Maybe the error causes
something to render incorrectly, but it's buried in page 42 of a 200
page document. When will you notice? Or maybe the error rendered
correctly in the web version of your document but incorrectly for the
print version.

Docbook, like all XML, is famously strict. If you, for instance, place a
&lt;para&gt; after you've closed your &lt;chapter&gt;, then your
document build fails, and it generally fails verbosely. Since Docbook is
XML, you can even run your source through xmllint to find errors early.

Experiencing errors is never easy. It's not fun to watch your work
fizzle out in a pool of illegal tags and syntax errors instead of
building into a beautifully rendered EPUB, web page, or PDF. To get
around that disappointment, most processors accept an option to
temporarily ignore errors, such as `--skip-validation`, and there's a
significant difference between a fatal ERROR and a mere WARNING, but
ultimately failure is important. It identifies imperfections in your
source and protects you from unpleasant surprises in your end product.


## Easier than it looks

Docbook sometimes has a reputation for being hard to learn, but I have
found that more often it's not Docbook that's difficult, but the unique
tool chains people build around it that have the learning curve.

Compared to HTML, Docbook's tags are self-describing. Do you want to
write an article or a book? Start with either the &lt;article&gt; or
&lt;book&gt; tag, respectively. Start a new chapter in a book or a new
section in an article with &lt;chapter&gt; or &lt;section&gt;,
respectively. Start a paragraph with &lt;para&gt;, an ordered list with
&lt;orderedlist&gt;, enter a list item with &lt;listitem&gt;, and so on.

Compared to Markdown and Asciidoc, Docbook appears complex, but if you
consider all the rules that aren't intuitive in structured text, then
Docbook's rules don't seem so bad.

Learning syntax from the original Markdown "spec" was often a process of
trial and error, followed by a series of desperate Internet searches,
which meant wading through all the different Markdown flavors and
parsers for the best applicable candidate for a correct answer.
Commonmark, a project dedicated to defining a more arduous and strict
specification, has helped, but users are often lulled into a sense of
false security by how easy it is to learn the basics, only to find that
achieving advanced results introduces a surprise learning curve.
Luckily, Markdown accepts HTML as a fallback markup option, and there
are several tools and Markdown variants out there to make up for what
the original spec lacks. Even so, if you're writing complex documents
for several different output targets, it may not be as easy as it looks
in all the *learn Markdown in just 15 minute*-style blurbs.

The flow of logic to learn something new in Docbook tends to be
consistently simple:

1.  Go to the Docbook site

2.  Find an appropriate tag in the master list

3.  Refer to the tag's documentation to find out how to correctly use it

That's all there is to it. It's about the same as learning HTML; learn
the basics in the first few minutes, and keep a reference handy to learn
more as needed.

Depending on how much you know about XML, there can be a few surprises,
but the Docbook website clearly defines valid parent and child
relationships for each and every tag, and each entry for each tag
provides big blocks of examples.


## Semantics

Finally, Docbook is important because it provides data about your data.
Docbook tags aren't meant to dictate a style over your content, but to
classify the information you are trying to convey. Like HTML and CSS,
styling Docbook comes later, and it's completely malleable. Docbook tags
provide semantic meaning to your words.

Semantics might not seem that important to you now, but here are two
great examples of times that metadata became truly important in the real
world:

1.  Before mobile phones existed, nobody on the Internet would have ever
   thought that a telephone number would ever need a &lt;tel&gt; tag.
   If anything, surely a &lt;em&gt; or &lt;strong&gt; tag would do. And
   then mobile phones happened, and people all over the world were
   browsing the Internet on the same device that they used to make
   phone calls, and it was a downright inconvenience not to be able to
   look up a company's phone number and then click on it to make
   the call.

2.  A major phone company in New Zealand had been called Telecom
   for years. When they rebranded as Spark, throughout their entire
   online documentation, the word *telecommunication* appeared as
   *sparkmunication*. This glitch was live on their website for several
   days before the obvious find/replace error was noticed
   and corrected. Better regex would have helped, but it wouldn't have
   happened at all with Docbook entities or the &lt;trademark&gt; tag.

Classifying the information you write is important now, and as
technology develops.


## Create your first Docbook document the easy way

Here's a quick and easy way to get started with Docbook. This method
emphasizes learning Docbook tags and syntax rather than building a
complex and flexible tool chain.

1.  First, open a text editor. Use whatever text editor you are most
   comfortable with, as long as it can save plain text files. All the
   good ones do: [Gedit](https://wiki.gnome.org/Apps/Gedit),
   [Geany](https://www.geany.org/Download/Releases),
   [Kate](https://kate-editor.org/get-it/),
   [Nano](https://www.geany.org/Download/Releases),
   [Jove](https://opensource.com/article/17/1/jove-lightweight-alternative-vim),
   [Emacs](http://gnu.org/software/emacs), [Atom](https://atom.io/),
   and many others.

2.  Open a web browser to
   [tdg.docbook.org/tdg/5.2](http://tdg.docbook.org/tdg/5.2)
   for reference.

3.  Open another tab in your web browser to
   [tdg.docbook.org/tdg/5.2/article.html](http://tdg.docbook.org/tdg/5.2/article.html)
   and scroll to the bottom of the page. Copy the text in the example
   box and paste it into your text editor.

4.  Use the example text as a template, and write something. Some of the
   example's header is more verbose than you probably need, so in my
   example I've trimmed off some of the excess.

             <article xmlns='http://docbook.org/ns/docbook'>
               <info>
                 <title>My first docbook document</title>
                 <author><personname>
               <firstname>Seth</firstname>
               <surname>Kenlon</surname>
                 </personname></author>
                 <publisher><publishername>opensource.com</publishername></publisher>
                 <pubdate>2017</pubdate>
               </info>

               <section id="intro">
                 <title>Introduction</title>
                 <para>Introductory text goes here.</para>
               </section>

               <section id="body">
                 <title>Section with a title</title>
                 <para>Main body text goes here.</para>
               </section>

               <section id="conclusion">
                 <title>Conclusion</title>
                 <para>Exciting and inspiring conclusion goes here.</para>
               </section>
             </article>


   If you are ever in doubt over whether a tag is required or not, just
   refer to the tag's documentation. The synopsis section tells you
   what is required and what is optional. For example, the
   &lt;section&gt; element specifies that one or more title-related
   elements are required, but that all other tags are optional.

5.  Once you've finished writing, it's time to render your document.
   There are several XML processors available, but the easiest for
   beginners is [Pandoc](http://pandoc.org/). It's one of those "Swiss
   army knife" applications that converts almost any kind of text into
   almost any other kind of text. What makes it especially nice for
   Docbook is that it has attractive stylesheets by default, while most
   other processors render very generic output under the assumption
   that you intend to apply your own XSL stylesheet.

   There are all kinds of potential targets, but the commands are all
   basically the same:

             $ pandoc --from docbook --to epub3 --output myDocbook.epub myDocbook.xml

             $ pandoc --from docbook --to markdown --output myDocbook.md myDocbook.xml

             $ pandoc --from docbook --to html --output myDocbook.html myDocbook.xml

             $ pandoc --from docbook --to latex --output myDocbook.pdf myDocbook.xml


And that's all there is to it. The more you write in Docbook, the more
tags and attributes you learn, and eventually you'll probably find it
hard to go back to a less explicit format.

![ PDF render ](render1.png)


## Advanced Docbook, with style

Pandoc makes Docbook as easy as HTML, but XML is flexible, so if you
need to, you can customize how you build your Docbook documents.

The default Docbook render from most processors aside from Pandoc looks
a little something like this:

![ Default PDF render ](renderdefault.png){width="6in"}

It's professional, but painfully so. Still, it's an important foundation
upon which additional styles can be applied.


### HTML and EPUB output

If your target involves HTML, you can continue to use Pandoc,
instructing it to use your custom CSS.

       $ pandoc --from docbook --to html \
       --css=myStyle.css \
       --output myDocbook.html myDocbook.xml

       $ pandoc --from docbook --to epub3 \
       --epub-stylesheet=myStyle.css --epub-cover-image=cover.jpg \
       --epub-embed-font=fonts/foo.ttf --epub-embed-font=fonts/bar.ttf \
       --output myDocbook.epub myDocbook.xml


The end result is dynamic, lightweight, modern, and attractive as you
yourself make it.


### PDF and print output

Rendering to PDF for either digital distribution or for printing relies
either on LaTeX or XSL. I don't know or use LaTeX, so I choose XSL, but
if you're a LaTeX user, you can [use Pandoc with custom
templates](http://pandoc.org/MANUAL.html#templates). Otherwise, here's a
brief introduction to XSL and the
[xsltproc](http://xmlsoft.org/XSLT/xsltproc2.html) command.

XSL is the eXtensible Stylesheet Language and is the CSS of the XML
world. If you install Docbook from your Linux distribution or from the
Docbook web site, then you are installing all the default Docbook
stylesheets. These serve as the fallback styles whenever you use a tool
like xsltproc or xmlto.

If you cannot, or choose not to, install Docbook, you can point to the
stylesheets manually in your xsltproc command.

Building a PDF with xsltproc is a two-step process. First, you must
generate the .fo file, which is a combination of your XML and your XSL,
translated into XSL-FO (Formatting Objects) markup. Then you process the
`.fo` file with Apache FOP, a Java application that converts Formatting
Objects to PDF.

       $ xsltproc --output tmp.fo myDocbook.xml

       $ fop tmp.fo myDocbook.pdf


An easy modification to make when just getting started with styling
Docbook is your choice of font. Fonts are easy to change but make a
noticeable difference in your end product.

1.  The first step in adding to the default style is editing some
   external stylesheet. For font detection, create a directory called
   `fonts` in your working directory. Then create a file called
   `fonts.xml` and enter this text:

               <fop version="2.0">
                 <renderers>
               <renderer mime="application/pdf">
                 <fonts>
                   <directory recursive="true">./fonts</directory>
                   <auto-detect/>
                 </fonts>
               </renderer>
                 </renderers>
               </fop>


   This registers all TTF fonts found in the `fonts` directory. Put
   whatever fonts you want to use in your PDF in that directory.

2.  The next step when modifying style is to set your new style option
   so that your processor knows what it is. There are two ways to make
   a change to XSL parameters. You can set a parameter dynamically as
   part of your xsltproc command, or you can make the change in an
   additional stylesheet.

   I use both methods, depending on the gravity of the change. For
   simple styles that I find myself changing often, I pass parameters
   as part of my command. That way, I can change them quickly and
   easily and independently of my custom stylesheets. I can even set
   them to change based on a Makefile. To set fonts:

               $ xsltproc --string-param body.font.family "League Gothic" --output tmp.fo myDocbook.xml


   A list of valid parameters can be found at
   [docbook.sourceforge.net/release/xsl/1.78.1/doc/param.html](http://docbook.sourceforge.net/release/xsl/1.77.1/doc/param.html).

   To output to PDF, tell FOP to register your fonts with your
   `fonts.xml` file:

               $ fop -c fonts.xml tmp.fo myDocbook.pdf


### XSL stylesheet

For styles less likely to change depending on printer requirements, page
size, or mood, I place rules in a custom XSL template. XSL templates can
get very complex, so making minor adjustments and learning over time is
a good approach.

A common visual cue in printed books is the idea of an admonition, like
a note, tip, or warning, which gets a background color to let the reader
know that a topic is separate from the current narrative but still
important to the topic. Admonitions are distinct elements in Docbook, so
they're relatively simple to style.

The process is similar to styling fonts.

First, create a new file called `mystyle.xsl` in your working directory.
Edit it so that it contains this heading:

           <?xml version='1.0'?>
           <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
             <xsl:import href="/usr/share/xml/docbook/xsl-stylesheets-1.78.1/fo/docbook.xsl"/>


The `xsl:import` line must point to the stylesheet on your system,
whether you have installed it or you are using it from a nonstandard
location in your home directory.

In this same file, enter some style rules:

           <xsl:template match="note">
             <xsl:variable name="id">
           <xsl:call-template name="object.id"/>
             </xsl:variable>
             <fo:block xmlns:fo="http://www.w3.org/1999/XSL/Format"
               space-before.minimum="0.8em"
               space-before.optimum="1em"
               space-before.maximum="1.2em"
               start-indent="0.25in"
               end-indent="0.25in"
               padding-top="6pt"
               padding-bottom="2pt"
               padding-left="4pt"
               padding-right="4pt"
               background-color="#ffffbd">
           <xsl:if test="$admon.textlabel != 0 or title">
             <fo:block xmlns:fo="http://www.w3.org/1999/XSL/Format"
                   keep-with-next='always'
                   xsl:use-attribute-sets="admonition.title.properties"
                   font-family="League Script Thin"
                   color="#348fdf"
                   font-weight="bold">
               <xsl:apply-templates select="." mode="object.title.markup"/>
             </fo:block>
           </xsl:if>

           <fo:block xmlns:fo="http://www.w3.org/1999/XSL/Format"
                 xsl:use-attribute-sets="admonition.properties"
                 font-family="League Gothic">
             <xsl:apply-templates/>
           </fo:block>
             </fo:block>
             </xsl:template>
   </xsl:stylesheet>


This creates a template in your stylesheet for all elements that match
the "note". Whenever the XSL processor finds a &lt;note&gt; tag, it
drops in the XSL-FO blocks to describe how elements are to be printed
(whether the paper is digital or physical).

Apply the styles with xsltproc and output to PDF to fop:

           $ xsltproc --string-param body.font.family "League Gothic" \
           mystyle.xsl --output tmp.fo \
           myDocbook.xml

           $ fop -c fonts.xml tmp.fo myDocbook.pdf


And the output:

![ Styled PDF render ](note.png)

The syntax is nowhere as terse or simple as CSS syntax. However, simple
styles all follow the same format:

1.  Create an &lt;xsl:template&gt; block for the tag that you want
   to affect.

2.  Look up the available XSL attributes at
   [docbook.sourceforge.net/release/xsl/current/doc/fo/index.html](http://docbook.sourceforge.net/release/xsl/current/doc/fo/index.html)

3.  Set the attributes you want to apply in a &lt;fo:block&gt;.

Like CSS, it takes time and practise to get to know all of your options,
but once you get the hang of it, it's simple. More complex XML gets you
more complex rules with dependencies, variables, conditionals, and more.
For an exhaustive overview, see the definitive
[sagehill.net/docbookxsl](http://www.sagehill.net/docbookxsl/) web site.

## Using Docbook

Docbook was invented for tech writers, and many of its tags reflect
that. However, I use Docbook for everything, whether it's tech writing,
fiction, or [RPG
design](http://www.dmsguild.com/product/219635/Adventure-Template-For-Docbook-XML).
It's a powerful, industry-strength system.

This doesn't mean that there's no place in the world for Markdown or
org-mode or other text formats. If I'm writing a `README` file or a
short note to myself, Docbook is overkill, because the source document
is also meant to be the final delivery format of the document. In other
words, where I would historically have used plain text, I use Markdown
because Markdown's structure is a vast improvement over unstructured
text.

I also use Markdown as an intermediate format. I usually write
[opensource.com](http://opensource.com) articles in Docbook and then
output to Markdown so that a site editor can easily review and convert
my work. Going directly from Docbook to HTML is great if you're running
your own site and can govern what tags, classes, and IDs get used, but
Markdown serves as an excellent intermediate step when you temporarily
want to ignore your source metadata and just deliver the written words.

For everything else, Docbook is a great solution. Give it a try, and
you'll never look at word processors, text, or XML the same way again.