* * * * *
Semantic HTML
There's quite the buzz in the weblogging community over Mark Pilgrim's
(Pushing the envelope) [1] use of the <CITE> tag (among other more esoteric
tags in HTML (HyperText Markup Language)). It's a nice idea, but all the
standard (HTML 4.0: § 9.2.1—Phrase elements) [2] says about <CITE> is:
> **CITE:**
> Contains a citation or a reference to other sources.
>
“HTML 4.0 § 9.2.1 Phrase elements [3]”
And only a few scant and quite trivial examples. I'm not sure of the exact
usage of the <CITE> tag. In the following:
> In _Snowcrash_, Neal Stephenson explored the implications of neuro-
> linguistic hacking …
>
Now, am I supposed to mark that up like:
> In <CITE>Snowcrash</CITE>, Neal Stephenson explored the implications of
> neuro-linguistic hacking ...
>
Because I'm citing the book _Snowcrash_? So, along those lines, if I had
instead written it as:
> Neal Stephenson, in his book _Snowcrash_, explored the implications of
> neuro-linguistic hacking …
>
Would I then mark it up as:
> <CITE>Neal Stephenson</CITE>, in his book Snowcrash, explored the
> implications of neuro-linguistic hacking ...
>
since now I'm emphasizing Neal Stephenson over the book? But the book was
written by Neal Stephenson so should it instead be:
> In <CITE>Snowcrash</CITE>, <CITE>Neal Stephenson</CITE> explored the
> implications of neuro-linguistic hacking ...
>
Okay, so it's a contrived example, but generating semantically correct markup
isn't trivial and expecting the general public to get it correct is asking a
bit too much. As one person pointed out [4], given a hypothetical tag like
<EDITOR>, is it:
> <EDITOR>Joe Blow</EDITOR>
>
or
> <EDITOR>vi</EDITOR>
>
(except when it's <EDITOR>Frontpage</EDITOR> but I won't go there)?
There are other semi-obscure tags for semantic mark-up and fortunately, most
of them are less ambiguous as for usage, like <CODE> is for mark-up of
computer source code, or <SAMP> for program output. Unfortunately the HTML
spec lists both <CODE> and <SAMP> as an inline tag, not a block tag which
really restricts their use. I'm not sure what the W3C (World Wide Web
Consortium) [5] was thinking when they made <CODE> and <SAMP> inline. Using
<CODE> to mark-up code fragments will turn something like:
-----[ C ]-----
for (i = 0 ; types[i].sl != NULL ; i++)
{
if (strstr(filename,types[i].sl) != NULL)
return(types[i].sl);
}
return("text/plain");
-----[ END OF LINE ]-----
into:
> for (i = 0 ; types[i].sl != NULL ; i++) { if (strstr(filename,types[i].sl
> != NULL) return(types[i].sl); } return("text/plain");
>
Nice, huh?
Dougal Campbell [6] suggests using:
-----[ CSS ]-----
CODE
{
white-space: pre;
}
-----[ END OF LINE ]-----
Which sounds good, but doesn't work. The CSS spec (§ 16.6 Whitespace—
Cascading Style Sheets Level 2) [7] states that white-space is only valid for
a display type of “block”, which <CODE> isn't (remember, it's “inline”). To
work, you really need:
-----[ CSS ]-----
CODE
{
display: block;
white-space: pre;
}
-----[ END OF LINE ]-----
Which works fine in Mozilla [8], but fails for IE (Microsoft Internet
Explorer) 5x (which is most likely a bug) and Lynx [9], which doesn't even
look at the CSS (Cascading Style Sheet) file (and it looks like I have one
regular reader who uses Lynx). As much as I would love to use <CODE> and
<SAMP> for semantically better mark-up, I'm afraid I'm still stuck with using
<PRE>; otherwise I'll end up with:
-----[ C ]-----
<CODE>for (i = 0 ; types[i].sl != NULL ; i++)</CODE><BR>
<CODE>{</CODE></BR>
<CODE> if (strstr(filename,types[i].sl != NULL)</CODE><BR>
<CODE> return(types[i].sl);</CODE><BR>
<CODE>}</CODE></BR>
<CODE>return("text/plain");</CODE><BR>
-----[ END OF LINE ]-----
Which is silly. (Okay, it's easy enough to write some code to automatically
convert the source code, but semantically, does it even make sense?)
The upshot of all this rambling about semantically correct HTML? Um … not
much really. I won't be changing the mark-up I use too much since I do lose
the visual appearance in most browsers (although I may try giving the <CITE>
tag a bit of a go).
[1]
http://diveintomark.org/archives/2002/12/27.html#pushing_the_envelope
[2]
http://www.w3.org/TR/1998/REC-html40-19980424/struct/text.html#h-9.2.1
[3]
http://www.w3.org/TR/1998/REC-html40-19980424/struct/text.html#h-9.2.1
[4]
http://www.kuro5hin.org/comments/2002/12/29/202939/15/5#5
[5]
http://www.w3c.org/
[6]
http://dougal.gunters.org/myphpblog//archive.php?blogid=1&tem=emc3&y=2002&m=12&d=27
[7]
http://www.w3.org/TR/REC-CSS2/text.html#white-space-prop
[8]
http://www.mozilla.org/
[9]
http://lynx.browser.org/
Email author at
[email protected]