I did an experiment with language last year. I even stuck it up
on Amazon for fun.
I wondered, "What if the Entire English Language was taken out
of context?"
Well, I couldn't find the entire English language, but I *did*
find a commonly used Thesaurus containing 106,000 words and
their synonyms. {Moby Thesaurus.. its the one all the linguists
use and gets put into most software because the original author
released it.. and, well, it's huge].
So, I did a simple process of putting synonyms to synonyms and
seeing if any natural "silos" formed; words that, in -some way-
were related to one another, even if not obviously apparent, due
to their relationships with other words. Almost killed my poor
laptop processing each query. [at its peak, the database ALMOST
blew itself up, expanding to 1.9 G using Microsoft Access 2k
(primitive yeah) in one of the connecting queries. [the max MS
access 2k can do is a 2G file.. so it was right to its own
limit).
Ended up with 5 levels. 106,000 words compressed into 5
segments.
They're not mutually exclusive but they are naturally formed
categories, simply by connecting them. repeatedly and seeing
what sticks.
What to do with the discovery? Nothing. It means something to
me. But how can you explain something 'new' without sounding
like a crackpot? Hard to do. So instead of trying to write up a
paper (because I don't write in patterned languages like
academic papers well], I just sat on it for a year.
Then, wrote a few pages trying to explain it, stuck it up on
Amazon just to get it "out there" in some form, and left it.
Moved on to other projects. It's hard to "have something" and be
unable to explain it properly.