An offline city database

An offline city database
------------------------

Somehow March is already drawing to a close, and I'm scrambling to
handle all the things I have obligated myself to handle this month,
one of which is my submission for the inaugural OFFLFIRSOCH. To this
end I've just pushed to a public repo[1] the rough first
implementation of my very imaginatively tool, `city`. Hey, after
ROOPHLOCH and OFFLFIRSOCH plus VF-1 and AV-98, I am entitled to one
exceptionally mundane project name, okay? I had hoped to have this a
little more polished by now, but the essential functionality is
already there. It'll likely be "done" in the space of a week or so.

city is designed to provide an offline solution to two problems
I routinely solve via the absurd use a web browser: answering
"what time is it in city X right now?", and "how far is city X
from city Y?". I am often interested in this second question in
the context of utility radio DXing. This past winter I finally
did some decoding of Digital Selective Calling (DSC) messages from
coastal stations on 2187.5 kHz and regularly found myself wondering
just how far away some small coastal city I'd never heard of was.

Cities don't move around much, and while they do change their
timezone occasionally, it's a pretty rare event, so the information
required to answer these questions doesn't really go stale.
This makes these tasks a prime candidate for having an offline
solution. I ended up using the "Gazetteer" dataset[2] from
GeoNames[3], a CC-BY licensed dataset which provides basic
information, including timezone and latitude and longitude, for a
huge number of cities around the world. You can download simple
tab delimited files for all cities with populations over 500, over
1,000, over 5,000 or over 15,000. Obviously the files get larger
as you include smaller cities. I went with the 15,000 cut-off for
this project to keep things small, as this was kind of an
experimental effort which might be superseded in future (more on
that later).

The basic user experience is like this: you just list one or more
cities as command line arguments and hit enter:

----------
$ city Adelaide
Adelaide:
Local time: Fri 29 Mar 2024 22:27:30 ACDT
Location: -34.93°, 138.60°
Elevation: 59m
Population: 1387290
----------

(future editions will add some polish to this output, like formatting
-34.93° as 34.93° S and 1387290 as 1,387,290)

If your city has a space in its name, use quotes to enclose it as
appropriate for your shell of choice:

----------
$ city 'San Francisco'
San Francisco:
Local time: Fri 29 Mar 2024 05:00:20 PDT
Location: 37.77°, -122.42°
Elevation: 28m
Population: 864816
----------

(yes, at time of writing you really do need to provide 'San
Francisco' with uppercase S and F, it will claim not to find 'san
francisco', this will naturally be fixed soon)

If you list more than a single city, you'll get this display for
each of them, plus a table of distances at the end:

----------
$ city Helsinki Tampere Turku
Helsinki:
Local time: Fri 29 Mar 2024 14:02:43 EET
Location: 60.17°, 24.94°
Elevation: 26m
Population: 658864
Tampere:
Local time: Fri 29 Mar 2024 14:02:43 EET
Location: 61.50°, 23.79°
Elevation: 114m
Population: 244315
Turku:
Local time: Fri 29 Mar 2024 14:02:43 EET
Location: 60.45°, 22.27°
Elevation: 22m
Population: 195301
Distance between Helsinki and Tampere is 160.40 km
Distance between Helsinki and Turku is 150.17 km
Distance between Tampere and Turku is 142.40 km
----------

(I think I'd like to get the distance table sorted from nearest to
furthest or vice versa. The distances for the "Finnish Triangle"
here *are* sorted, but that's a happy coincidence arising from the
order I listed the cities in, which was alphabetical but is happily
also by decreasing population, ihana!)

That's pretty much the whole shebang. I guess I'll add some
method of getting output in miles and feet, and also need a way
of specifying a country so you can disambiguate Paris, France from
Paris, Texas, but that's really it. Does what it says on the tin,
as they say.

In terms of implementation, I wanted to experiment with writing
this tool in a manner which complies well with my current
conceptualisation of permacomputing, which has changed quite a
lot since I first encountered the term. I no longer think of
permacomputing at all in environmental/ecological terms, at least
not primarily. I think of it instead as computing in a way that
emphasises natural immunity against "bit rot" (not an unavoidable
natural phenomenon of computing but an artificial and self-inflicted
one) and resisting the externally-driven obsolescence of tools
and skills. Of course, doing this also has enviro/eco benefits,
but it has other benefits too which are more psycho-social-cultural
in nature and honestly I think those are at least as important if
not more. Not that the other issues aren't important, but they
are honestly much better addressed by simply computing less than
computing differently. Anyway, much more detail on this line of
thinking in a future post, I hope. The main points here are I
wanted maximum portability, minimum dependencies and ultralight
or ideally zero coupling to any toolchains for either development
or installation.

To these ends, city is a single Lua script, Lua being a very
portable and widely ported language whose standard implementation is
written in a very mature and unchanging language (C89), and which
is one of the very few languages where the average span of time
between subsequent major new releases has reliably trended downward
throughout its history. The single file is called `city.lua` in the
repo to facilitate easy syntax highlighting etc. but the notion is
that you "install" it by placing a copy of or link to this file named
just `city` in /usr/local/bin or wherever the heck you wanna put it
which is in your $PATH or whatever the local equivalent concept is
in your preferred computing environment. This single file contains
both the city data and the logic to search it. The city data is
stored in a single large table variable, and it's not populated by
parsing a copy of the original tab delimited data structure, rather
it is written in the source code as a huge literal. The first 11
non-empty, non-comment lines of `city.lua` look like this:

----------
all_cities = {
{name="Shanghai", ascii=nil, lat=31.22222, lon=121.45806, country="CN", pop=22315474, elev=12, tz="Asia/Shanghai"},
{name="Beijing", ascii=nil, lat=39.9075, lon=116.39723, country="CN", pop=18960744, elev=49, tz="Asia/Shanghai"},
{name="Shenzhen", ascii=nil, lat=22.54554, lon=114.0683, country="CN", pop=17494398, elev=4, tz="Asia/Shanghai"},
{name="Guangzhou", ascii=nil, lat=23.11667, lon=113.25, country="CN", pop=16096724, elev=15, tz="Asia/Shanghai"},
{name="Kinshasa", ascii=nil, lat=-4.32758, lon=15.31357, country="CD", pop=16000000, elev=281, tz="Africa/Kinshasa"},
{name="Lagos", ascii=nil, lat=6.45407, lon=3.39467, country="NG", pop=15388000, elev=11, tz="Africa/Lagos"},
{name="Istanbul", ascii=nil, lat=41.01384, lon=28.94966, country="TR", pop=14804116, elev=39, tz="Europe/Istanbul"},
{name="Chengdu", ascii=nil, lat=30.66667, lon=104.06667, country="CN", pop=13568357, elev=499, tz="Asia/Shanghai"},
{name="Mumbai", ascii=nil, lat=19.07283, lon=72.88261, country="IN", pop=12691836, elev=8, tz="Asia/Kolkata"},
{name="São Paulo", ascii="Sao Paulo", lat=-23.5475, lon=-46.63611, country="BR", pop=12400232, elev=769, tz="America/Sao_Paulo"},
----------

(I'm sharing the first 11 lines and not the first 10 because having
São Paulo in there lets me point out that `city` supports searching
by "plain" and accented variants of names, which is nice)

You will notice immediately that this is woefully inefficient in
terms of storage space. The strings "name", "ascii", "lat", "lon",
etc. literally occur nearly twenty eight thousand times in city.lua,
in exactly the same order on every line, and the timezones are
written out in full each time, each though "Asia/Shanghai" occupies
enough bytes for a 13 digit number which is many times more than
necessary to enumerate all the world's timezones. From an orthodox
software engineering perspective, this solution is cringe-inducing,
but you know what, doing it this way the entire file is just over
3 megabytes, which has been trivially small on a PC for decades, so
who cares? And, yep, your queries are looked up in this big table
by stepping through it from top to bottom in order. The cities
are stored in decreasing order of population on the assumption that
queries for big cities will happen more often than for small cities
and so this is fastest. This is also cringe compared to using a
real database, but on my 13 year old laptop this approach does not
provide a subjectively slower user experience than using Google,
so who cares? Doing the "right thing" and storing this data in an
external SQLite file would require depending upon additional third
party libraries, would reduce the range of systems the tool could
be easily ported to, and would complicate installation. It's a
bad trade-off in this context.

Of course, this data could be used for a lot of different purposes,
and I quite enjoy the idea of having it in a local database
and having a suite of tools which all make use of it, using an
environment variable to convey the filename for the SQLite file to
all of those tools. But this highly integrated standalone tool
idea has merit too, and is much more OFFLFIRSOCH friendly, so I
used it first. The costs of the inefficiency are small enough
for a one-off tool that they are worth paying for the benefits,
but the more tools you bake this dataset into, the more space you
are wasting. Of course, even ten tools would total maybe 35 meg,
which remains trivial, so maybe this also doesn't matter, and one
might argue that having the same data stored in multiple files
in multiple formats in multiple languages actually provides an
additional layer of resilience...

Anyway, that's `city`. It scratches a personal itch and I'm certain
I will continue using it. I hope it's useful for some other folk,
too. I will add some of the "polish" features mentioned above in
the coming days or weeks, but I'm not really interested in adding
additional functionality. If you send me a *really* good idea in
the next week or so I might consider it, but otherwise once I have
the polish added I will call it 1.0.0 and it will go into a long-term
maintenance mode. I will make a new release once per year, probably
each March as part of OFFLFIRSOCH since that's a convenient reminder,
where I update the city data to the latest version from GeoNames.
That need is mostly driven by the population data, which is the only
thing which will change appreciably over time. Perhaps it was a bad
idea to even include that, as without it things would be essentially
static. But, well, it was there and it was easy to add, and it's
nice to have quick and easy access to that too (elevation ended
up in there for the same reason). Having population data which is
one year out of date is not a huge problem for many casual purposes.

[1] https://git.sr.ht/~solderpunk/city
[2] https://download.geonames.org/export/dump/
[3] http://www.geonames.org/