* * * * *
A blogger's HTML
An article (Dubliners) [1] about the Dublin Core (Dublin Core Metadata
Initiative) [2] at CONTENU.nu (contenu.nu: a content consultancy) [3] got me
thinking about the problem of indexing weblogs [4]. The major problem is that
there is no semantic markup to include meta-information in the body of a
webpage. Sure, you can include meta-information in the <HEAD> section, using
both <META> and <LINK> tags, and that's fine when the page in question is
about a single topic.
But a weblog has several, mostly unrelated entries on a single page, with the
rare weblog having several nearly article-length entries on the main page
(and by extension, the archive pages). Google [5] indexes these pages as if
it were on a single topic and as a result, you get fodder the Disturbing
Search Requests [6].
There are heuristics that can be used to index a weblog page, but it would be
nice to have some defined way to mark individual entries, with the ability to
include meta-information for each entry. I had intended for my software [7]
here to build up the <META> tags (since I do include keywords/classification
for each entry I write) and while that may be viable for up to a weeks worth
of entries on a page, it starts getting silly for a month, and for a whole
year? It's just not practical.
But from the Dublin Core article, I ended up at the W3C (World Wide Web
Consortium) [8] site and came across XHTML 1.1 (XHTML 1.1 - Module-Based
XHTML) [9] (eXtensible HyperText Markup Language), which is still being
worked on, but (and this is the exciting part here) this version of XHTML
(eXtensible HyperText Markup Language) can be extended! (unlike XHTML 1.0,
even though the name says it's extensible) It's completely modular so new
variants of XHTML (for example, it can be extended to MathML (W3C Math Home)
[10]) can be constructed from bits and pieces of existing XHTML modules.
So in the future, it may be possible to extend XHTML to include meta-
information in the middle of a page, instead of just in the <HEAD> section
(sorry, <head> section—XHTML uses lower case for tags). So instead of having
to parse code like:
-----[ HTML ]-----
<h3><a class="local" id="2002/07/16.1" href="/2002/07/16.1">The Ins
and Outs of Calculating Browser Usage</a></h3>
<!-- programming, statistics, web browsers, web log files -->
<p>
I spent the past few ...
<h2><a class="local" id="2002/07/14" href="/2002/07/14">Sunday, July
14, 2002</a></h2>
<h3><a class="local" id="2002/07/14.1" href="/2002/07/14.1">Probability</a></h3>
<p>
..
-----[ END OF LINE ]-----
It can, instead, have an eaiser time with:
-----[ HTML ]-----
<entry>
<head>
<meta name="keywords" content="programming, statistics,
web browsers, web log files">
<link rel="permalink" href="/2002/07/16.1">
<link rel="next" href="/2002/07/17.1">
<link rel="previous" href="/2002/07/14.1">
</head>
<body>
<p>
I spent the past few ...
</body>
</entry>
<entry>
<head>
<meta name="keywords" content="daily life, web pages, home pages,
six degress of separation, Tom Hoylrod">
<link rel="permalink" href="/2002/07/14.1">
<link rel="next" href="/2002/07/16.1">
<link rel="previous" href="/2002/07/13.1">
</head>
<body>
..
</body>
</entry>
-----[ END OF LINE ]-----
[1]
http://www.contenu.nu/article.htm?id=1224
[2]
http://dublincore.org/
[3]
http://www.contenu.nu/
[4]
gopher://gopher.conman.org/0Phlog:2002/05/01.3
[5]
http://www.google.com/
[6]
http://searchrequests.weblogs.com/
[7]
https://boston.conman.org/about/
[8]
http://www.w3.org/
[9]
http://www.w3.org/TR/xhtml11/
[10]
http://www.w3.org/Math/
Email author at
[email protected]