In the last few days I've been making a concerted effort to, despite my

In the last few days I've been making a concerted effort to, despite my
not being any sort of professional musician or professional
musician-in-training, begin doing ear training so that I can recognize
at least intervals. I am unfortunately not one of the lucky few that
have perfect pitch [1], but I would like to make transcribing music or
noodling around on some instrument figuring out how to play it a more
pleasant experience, and it seems that interval recognition would help.

So far, I've tried out GNU Solfege [2] and a custom Anki deck that I
made with some synthesized interval noises on the front and the name of
the interval on the back. Compared to using Anki to learn _anything_
else I've used Anki to learn, intervals are a profoundly frustrating
experience. What I've learned so far is that I can recognize an
interval with an error of something like 50% of the number of semitones
(e.g. a 4 ST interval I might guess as anywhere from 2--5), which is not
fantastic. Exceptions are a major third down, because that's every
doorbell, and more strangely, a minor second up, because it triggers
within me a particular interval in the opening ambient music in _Super
Metroid_ after you land on Zebes.

This makes me think: it's odd that I can whistle a tune with the right
pitches, but that I can't sit down and put my fingers on the right
pitches. There's some connection in my brain that can conjure a melody
and route it directly to my mouth muscles, but it's disconnected from
any circuits that could propriocept the position of those mouth muscles
and tell me what note is on my lips. Or to any circuits that could route
the melody to my hands.

The most plausible half-explanation I can think of is that it has
something to do with how such a large volume of our brains is dedicated
to language---routing a note somewhere must be a 'hard' task, and so the
to-lips route somehow hijacks the same mechanisms that normally would
unconsciously route a complex sound cluster I think of into the motions
of my mouth. I could do that by the time I was two, but I learned how to
do it without conscious practice or the intervention of a
teacher. Learning how to write took both, and is a much larger invention
that speech. Maybe, then, I could learn how to translate something I
hear in my head directly to my hands, but I'm not optimistic. When I
learned how to write, my brain was much more plastic, and virtually all
of society had aligned toward the purpose of getting me to learn it.
In comparison, what chance has a dude with stuff to do and a sclerotic
brain?

[1]: Although with some Anki-based practice I can often recognize notes
within 2 semitones-ish. That's not very good, but is that normal?

[2]: note to self: on my default installation Solfege wants to use
`timidity` to generate its synthesized piano notes, and it had some
strange popping artefacts that sounded horrible. The fix for this
was to add `--output-24bit` to the command line for timidity in the
Solfege options.