* * * * *

                     “Oh! You meant DUTTON, not DAYTON!”

I'm currently working on a website where one of the requirements is to obtain
the latitude and longitude of the user. This was something I was dreading not
from a programming perspective (since just asking for the latitude and
longitude is dead simple) but from a user interface perspective (since from
the user end, it's not quite so dead simple). It'd be nice if I could just
ask the user for the city they're in, and get the latitude and longitude from
that.

But where could I get that type of information?

From the US Census Bureau [1].

D'oh!

Granted, that still leaves me asking the rest of the world to locate their
own latitude and longitude, but since this site is initially geared towards
us Murkins I'm not that concerned about it yet.

Now it's a simple matter of getting the city and state from the user, then
looking up the latitude and longitude of the city. Easy.

Until a user misspells a city. The easy thing (for me) would be to print an
error like, “City Cininatee, OH not found—try again” and have the user try
spelling [DELETED-Sinsinati-DELETED] [DELETED-Cininatee-DELETED] [DELETED-
Cinsinati-DELETED] Cincinnati [2] (there we go!) correctly (or give up and
say they're in Bratenahl [3] as it's easier to spell). The harder thing to do
is figure out what they're trying to spell and use that.

Only it's not that much harder. I've used both Soundex [4] and Metaphone [5]
in another project to correct misspellings and it seems easy enough to apply
that here. Lookup the latitude and longitude with the city and state
supplied. If not found, then filter the city through Soundex, and look up the
correct spelling based on that. If that doesn't return a result (or too many
results) then fall back to Metaphone.

Sounds good in theory.

Not so great in practice.

In setting up the appropriate datafiles, I went through the list of city
latitude/longitude I picked up from the US (United States) Census Bureau and
marked where Soundex and Metaphone clashed on multiple city names (each state
is treated seperately, so I'm only concerned with clashes within a given
state). There, I hit a problem:

> conflict(soundex/AL): D500 = [DOTHAN] [DAYTON]
> conflict(soundex/AL): D500 = [DUTTON] []
> conflict(metaphone/AL): TTN = [DUTTON] [DAYTON]
>

Dothan [6], Dayton [7] and Dutton [8] (all in Alabama [9]) have a Soundex
code of D500. Falling back to Metaphone, Dutton and Dayton have a Metaphone
code of TTN. So what to do here if a user types in “Daytun”?

I think the correct thing to do at this point would be to list the
posibilities and have the user select the proper one. But this will
necessitate a change in how I store the data.

It's not easy to create an easy to use interface. In fact, it's downright
hard.

[1] http://www.census.gov/geo/www/tiger/latlng.txt
[2] http://www.cincinnati.com/
[3] http://www.fact-index.com/b/br/bratenahl__ohio.html
[4] http://www.archives.gov/research_room/genealogy/census/soundex.html
[5] http://www.nist.gov/dads/HTML/metaphone.html
[6] http://www.dothan.org/
[7] http://www.pe.net/~rksnow/alcountydayton.htm
[8] http://dutton.alabama.com/
[9] http://www.alabama.gov/

Email author at [email protected]