Provably Safe AGI – MIT Mechanistic Interpretability Conference – May 7, 2023

In this short video: (https://www.youtube.com/watch?v=sp0L-zuHWgI&t=2s&ab_channel=SteveOmohundro)

Steve Omohundro sketches how AI technologies based on mathematical proof can be used to ensure human safety as AGI is developed and deployed. Many people are worried about the imminent development of “Artificial General Intelligence” (AGI). Metaculus estimates “Weak AGI” will be developed in 2026 and “AGI with robots” in 2031. It estimates that “Artificial Super Intelligence” (ASI) will arrive 6 months after AGI. Half of AI researchers believe there is a >10% chance of human extinction due to uncontrolled AGI. Today’s AI alignment methods are very important but are too “soft” to provide “hard” guarantees of safety. We need provable “guardrails” in an adversarial analysis. Mathematical proof is humanity’s most powerful safety technology and recent transformer-based theorem provers are advancing rapidly. For example, Meta’s “HyperTree Proof Search” is able to prove 82.6% of held out MetaMath theorems. This talk presents a sketch of how these technologies can create a network of proven contracts to ensure human flourishing in an AGI world of abundance. It also describes some of the challenges in implementing this approach.

The slides are available here:

230501-provably-safe-agi-1 Download