People keep talking about this work as if it offers some meaningful insight into how these systems behave, when the whole premise requires a fundamental misunderstanding of what GPT is actually doing.
The idea that you could present it with information but ask it not to use that information is absurd. At the end of the day it’s a sophisticated word association engine. It doesn’t have intent, it isn’t capable of strategy.
This is like pointing out that a dog which has learned to shake hands is actually deceiving you, and it doesn’t really mean any of the social things we use handshakes for (except it’s worse, because the dog is actually capable of being sociable)
All LLM replies are hallucinations
As AI develops it will just get better and better at manipulating people. At some point I’m sure we’ll have AI that is doing that to us in ways we don’t even realize or see coming.