ON-DRIVE PROCESSING

I wasted most of the time I was going to spend writing this on
fixing GophHub again, because GitHub inevitably removed newlines
from their JSON output, which I was relying on to parse it in Bash.
So this is the short version:

Large data storage/processing systems are based around clusters of
small computers. These can execute tasks in parallel patterns to
reduce lattency.

When you're just doing something simple like searching for a string
or moving/copying data, why not have some way to program the drive
so it can do that itself. GPUs already have OpenGL for telling them
to go and do repetitive mathematical transformations on their own,
why not HDDs and SSDs?

The drive controllers aren't like modern GPUs in processing
capacity, but like early GPUs had simple logical manipulations that
could be applied to the image buffer, drive controllers could have
simple transformation routines too.

Obviously different formatting, fragmentation, and partitioning
would get in the way, but provided compression or encryption isn't
used there should be some way to take advantage of this approach at
a stage before the usual loops of data access/manipulation. If
there was a standard way of running custom code on the drive
controller, like CUDA on GPUs, you could even send the drive a
minimal routine for reading/writing in the partition format.

It seems like a missed opportunity to me - instead of clusters of
computer boxes full of drives, the drives become the computing
clusters themselves. On a small scale maybe it wouldn't make much
difference compared to adding small computer boards, but scaled up
to these massive datacentres you hear about...

Anyway this came to me while thinking about how I'd implement my
big website idea. Getting the most out of hardware is important on
my budget of hardly anything. Unfortunately hacking drives to add
this functionality to their firmware would tie you to particular
models for replacement, so really you need it part of the package
from the manufacturer. But if you had a database and partition
format in one (I'm crazy enough to be considering this), you could
potentially send database commands to the drives directly.

Perhaps there are some cheap microcontroller type chips that can
interface with drives while sharing the bus with another computer
somehow? The main computer gives the little drive computer a
command then the drive computer stalls any further accidental drive
commands from the main computer on the drive bus until it's
finished the job. Presumably fancy things like DMA will make this
less tidy in practice though.

- The Free Thinker