(C) Alec Muffett's DropSafe blog.
Author Name: Alec Muffett
This story was originally published on allecmuffett.com. [1]
License: CC-BY-SA 3.0.[2]
“Seeds of the Internet” and “Knowledge as Infrastructure” – Magnet URLs, AI, & the increasing, terrifying irrelevance of political borders
2024-04-10 02:21:31+00:00
One thing which keeps me increasingly hopeful about AI technologies is my hazy but reasonable understanding of data compression. All those who dabble in cryptography at some point realise that it’s (literally, in the case of monoalphabetic substitution) a matter of trading one symbolic representation — or: encoding — of information for another.
Sometimes those encodings are lossless in which case you can get back what you started with, and sometimes they are lossy in which case you cannot… but what you do get back is probably still fit for your purposes.
Sometimes those encodings are bigger, in which case you might be aiming for redundancy or robustness or error-correction or padding; or they might be smaller in which case you are implementing compression (like lossless ZIP or PNG, or lossy JPEG) – or (even smaller) they might be of a fixed/ish size, in which case you’re implementing digestification or hashing, or even fuzzy matching.
So I see the very largest of Large Language Models as being lossy compression of (to a bad first approximation) all the world’s text, and the fact that a 22 billion parameter model can be downloaded using a semi-decentralised BitTorrent Magnet link which leaves very little opportunity for state censorship… so, well, if you were a Government which wanted to ensure that you maintained oversight regarding what people could learn, know, and do, wouldn’t you be panicking right now?
We — the tech-savvy generation that grew up optimistic about the internet, the web, USENET and subsequent forms of social media — used to opine that the ability to access knowledge via the Internet would be some sort of utter transformation and equitable levelling of human experience; this became ruined as the generational wheel turned such that 30 to 40 years later the Internet infrastructure is being forcibly populated with middlemen and gatekeepers who seek to assure that our behaviour is proper and approved and attributed to our government identity.
So what’s happening now? Not in response – no, this is not a conspiracy – but simply the abundance of technology means that intangible data is spontaneously peeling away from those shackles of physical infrastructure that were previously necessary to give it form. These lossy compressed “seeds” of the internet are static, of course, being severed from the dynamic ecosystem of the web, but they are powerful — we can measure this not least by the voices who scream for this all to be stopped.
But the attached fits comfortably on a £40 USB thumb-drive, and the tools necessary to query it will eventually probably run, perhaps for the moment a bit slowly, on a standard mobile phone.
The knowledge is gradually becoming its own infrastructure; all that remains needed by a user is the means to navigate their corpora.
[END]
[1] URL:
https://alecmuffett.com/article/109597
[2] URL:
https://creativecommons.org/licenses/by-sa/3.0/
DropSafe Blog via Magical.Fish Gopher News Feeds:
gopher://magical.fish/1/feeds/news/alecmuffett/