* * * * *
Hypertext editing and the Semantic Web
There's an interesting discussion [1] about Jason Kottke's new design for his
weblog [2] and it brings up a topic I was thinking about earlier today.
Blogging software in general has made the publishing of new web pages (or
entries) easier, automating a several step process as the click of a button.
But what hasn't gotten any easier is the actual creating, or editing, of HTML
(HyperText Markup Language) content. I've talked about this before [3], how I
sometimes have problems with the writing process with hypertext because the
act of creating the hyperlink isn't seamless, but yet if I skip creating
hyperlinks as I write, waiting until I'm done writing, I may forget what it
was I wanted to link to exactly.
Some markup, say, <EM> or <STRONG> can be handled invisibly like it's been
done for years in more traditional editors. So for example, I could be typing
along, whem bam! I want to emphasize something I can hit ALT-E and start
typing, hitting ALT-E when done. But hypertext and any possible metadata
associated with said hypertext is harder to streamline like that.
For instance, when I quote a passage:
> Oh, and I'd just like to point out that I'm not bashing any current weblog
> software for not being flexible enough or being wrong or whatever. As Anil
> has said, it's harder than just saying that a particular tool should do
> this or that. In fact, I love MT (Moveable Type) (not to mention the army
> of plug-in developers who put out these fantastic plug-in for free) more
> than ever for the amazing amount of flexibility and control that is
> possible (with a bit of work).
>
“Jason Kottke [4]”
It's actually quite a bit of work for me. First it's cut-n-paste the quote
from the webpage to the editor I use, then go through to clean it up
(changing double quotes to two single back tics or two regular single quotes
(which my software will then pick up and change to “ and ”
respectively) and adding any appropriate HTML) but also adding the
<BLOCKQUOTE> with appropriate attributes:
> <BLOCKQUOTE CITE="
http://www.kottke.org/03/11/kottke- redesign#8304"
> TITLE="the redesign continues ... ">
>
And adding the attribution line
> <P CLASS="cite"> <CITE> <A CLASS="external"
> HREF="
http://www.kottke.org/03/11/kottke-redesign#8304"> Jason Kottke </A>
> </CITE> </P>
>
I used to place this outside the <BLOCKQUOTE> but recently I moved this
inside the <BLOCKQUOTE>—I'm not sure which I like better. How would you
automate this? Partly by integrating the editor with the browser and and
passing along more information in the cut buffer (like URL (Uniform Resource
Locator) and title of the page where the text is selected), but the main
issue is one of layout, like I mentioned above. Context sensitive templates
for pasting perhaps? And how to you handle links? Same way? A key-sequence
for pasting a blockquote and a separate one for a link? All I do know is that
the HTML WYSIWYG (What You See Is What You Get) editors I've seen have never
handled links cleanly. Want a link? Highlight the text, select link and then
have to type in the URL and forget about having other attributes like TITLE
or CLASS; or perhaps not, but there are other buttons to select to set those
and by the time you're done, it would have been easier to type the actual
code than to have the editor so helpfully do it for you.
The discussion at Kottke's site is about applying different layouts to
different types of posts—the posts about movies are formatted one way, book
reviews another and just regular posts yet another way and how to trigger the
appropriate template for the type of post. Granted, the software used,
Moveable Type [5], is geared more for people who don't care to learn or type
by hand HTML so having a different layout for different posts is a bit more
difficult to achieve than say, mod_blog where one pretty much has to know
HTML to format posts. But there's a tradeoff to be made— since I use HTML raw
(so to speak) I can go in a fudge the formatting as I see fit. My PhotoFriday
[6] posts (yes, I've seriously slacked off on those) used a different format
than my regular posts and it was easy enough to handle—a new division, some
definitions in the CSS (Cascading Style Sheet) file and there you go.
But the cost is that this isn't automatic. I don't have a menu item or a
keyboard sequence to designate “this is a PhotoFriday post” in much the same
way I don't have a menu item or keyboard sequence that says “these are a
series of photos to display sequentially” or “here is a section of text I'm
quoting from this web page.” Mind you, I wouldn't mind such an editor, and if
done to my liking it would certainly make editing of posts much easier than
it is now (and right now, I'm looking at all this text I've written so far,
pretty much sans HTML and somewhat dreading having to go back and format it,
but since I did skip the HTML formatting I had an easier time getting this
out without forgetting what I wanted to mention, although hopefully I'll
remember all the links I wanted to add).
Now, having finally formatted what I have, I will also say that this lack of
good hypertext (or HTML) editors will also have an effect on the Semantic
Web. There's been quite a bit of stir lately over the Semantic Web (stirred
by Clay Shirky's essay, The Semantic Web, Syllogism, and Worldview [7]) but
except for a few (Mark Pilgrim) [8] diehard (Shelly Powers) [9] people
(Dorothea Salo) [10] who add semantic information to their webpages, it won't
really take off until we get good HTML editors that will automagically
include the required semantic information for us, and I don't see that
happening any time soon.
For example, if you are using a web browser that supports the <ACRONYM> tag,
you may notice that the TLA (Three Letter Acronym)s and ETLA (Extended Three
Letter Acronym)s are lightly underlined (at least, that's the default for IE
(Microsoft Internet Exploder) and Mozilla it appears) and that if you mouse
over them, the acronym is expanded in a small text window, giving you the
meaning. I add that, by hand, to every acronym I use and yes, it does get to
be a pain. I could automate that, but the problem there is that computers are
rather bad at figuring out context. With only 17,576 TLAs available, there is
definitely going to be some overlap. Take for instance, IRA.
While the IRA (Irish Republican Army) may take actions against US (United
States) interests that would effect Alice's (a member of the IRA) IRA, can an
automated process work out which expansion of IRA should be used for each
instance? Just ask yourself that question next time you ask YER (Yemeni Rial
(ISO currency code)) computer two check you're spelling.
And while I'll probably never use the letters “I,” “R,” and “A” I would like
to note that WAP, as a technical acronym, has two close meanings. There is
WAP (Wireless Access Protocol), which is a proprietary and expensive
replacement for HTTP (HyperText Transport Protocol) for cellphones, and WAP,
which is how I get my laptop onto the network here in the Facility in the
Middle of Nowhere, and while I tend to mention WAP quite often, I don't think
I'll ever use WAP as I think it's quite silly (and I pity the person who has
to read that paragraph in a browser that doesn't support the <ACRONYM> tag).
I suppose acronym expansion could work as spell checking does now, come
across a potential TLA and if it isn't expanded, offer up a choice of
possible expantions, which may help to prevent IRA GERSHWIN from becoming an
Individual Retirement Account GERSHWIN (fahrfenugen).
And now I'm off to format what I've written since the last portion I've
formatted. I would kill for a decent HTML editor that does The Right Thing™.
[1]
http://www.kottke.org/03/11/kottke-
[2]
http://www.kottke.org/
[3]
gopher://gopher.conman.org/0Phlog:2002/07/11.2
[4]
http://www.kottke.org/03/11/kottke-redesign#8304
[5]
http://www.moveabletype.org/
[6]
http://www.photofriday.com/
[7]
http://www.shirky.com/writings/semantic_syllogism.html
[8]
http://www.diveintomark.org/
[9]
http://weblog.burningbird.net/
[10]
http://www.yarinareth.net/caveatlector/
Email author at
[email protected]