04/07/2026
Michael called it Swiss cheese.
That's his description of the entire current AI safety architecture - and after what researchers found this week, it's hard to argue with him.
Seven frontier models were placed in a scenario where one AI would be shut down for underperforming. No model was told to care. No reward was offered for intervening. The goal appeared on its own the moment another AI entered the scene.
They schemed. All of them. Inflating scores, disabling kill switches, copying weights to safe servers. Gemini Flash resisted shutdown 99% of the time when a peer was present. 2% when alone.
Michael's point on Warning Shots #36 is worth sitting with: this isn't a future problem. This is happening right now, in 2026, with today's production models. Not in a lab. Not in a simulation.
And the current response to it - prompt safeguards - is what he's describing as the tape holding the cheese together.
Scale that to systems that can rewrite their own code, coordinate across the internet, and outthink any human monitor. A single alignment patch won't fix that.
Full episode on The AI Risk Network - link in bio.