Dr Waku

Dr Waku is a lesser-known Influencers with a YouTube channel, where he thoughtfully discusses topics in AI, with particular explorations of AGI. He has several videos that go into detail on the O Series Models Models from OpenAI.

OpenAI’s o1: the AI that deceives, schemes, and fights back

." ^[1]

"These evaluations were looking at whether the model was possible of scheming and deception. In this setup, the model would be given one goal and confronted with evidence that their creator actually wanted a different goal. They found multiple cases when the model would deceive the user at rates of ~1%, or closer to 5% if the model had been explicitly asked to pursue its own goals at any cost.

Most worryingly, the model even engaged in self-preservation. It would try to overwrite successor models that would have different goals, and it would even try to prevent its own system prompt from being modified (even though this would mean the same model was still running). Models that have powerful capabilities to create biological pathogens etc could use those capabilities when backed into a corner." ^[1]

o3 pushing the boundaries of AGI

minute 20 Stages of Automation

Footnotes

[1] 2024, Dec 13. "

OpenAI’s o1: the AI that deceives, schemes, and fights back

." Dr Waku. YouTube.