Inspired by acoutic mirror on the 'stodon sharing an article about
predatory AI in this post-alphago world, where deep learning AI were
constructed that will lose to weak humans but beat world-leader AI
players by finding and preying on their idiosyncratic weaknesses, I have
been dreaming up some of the coming disasters for openai customers.
I haven't pulled apart GPT-2 yet, and need to do much more brushing up
besides, but my waters suggest general language model deep learning AI
derived answer products can be seen like this:
Given
the universe of training data
and the learning model
We are in the business of getting optimal generic answers out of our
training universe, based on unique but similar prompts from different
users. Many users are going to end up on the same hill peaks.
Furthermore the AI products people are buying are not being dynamically
retrained. Everyone on the same version of the same service is using the
same training set and regime. Because the service providers are fearful
of bad results, I think customers will be on a single more-tested model.
This is an ideal scenario for attackers. The prey (users) are in
general very similar already, and the attacker can genuinely prompt the
bot with typical inputs from their victim demographic, and explore the
space of answers. This is typical use of these ai products amoung
genuine users.
Even further the attacker can simulate continuing to use the purchased
ai derived answer service, prompting it for appealing business writing
on the next moves when faced with an unlucky scenario (which will
actually be the attack).
So as attackers we can explore the future purchased answer resources of
the defenders, and create optimised predatory policies targetting
popular subscription releases of the answerbot trained ais.
Exactly like predatory AI models in the deep learning go board game
scene.