Levels of AGI frameworks and capability classifications in 2026 give us a much-needed roadmap as AI systems race toward human-level (and beyond) intelligence. If you’ve ever felt lost in the hype around “AGI is coming soon,” you’re not alone. These frameworks cut through the noise by breaking down exactly what “general intelligence” means in practical, measurable terms. Think of them like the SAE levels for self-driving cars—clear stages that help everyone from researchers to policymakers speak the same language.
In 2026, with long-horizon agents handling complex, multi-day tasks and frontier models showing sparks of autonomous reasoning, understanding these classifications isn’t optional. It’s essential. And as we push boundaries, linking progress in capability levels directly to AGI safety benchmarks for scalable oversight in 2026 becomes critical—because raw power without reliable supervision could spell trouble.
Let’s dive in and make sense of where we stand right now.
The Origin and Importance of Levels of AGI Frameworks
Back in 2023, researchers at Google DeepMind dropped a game-changing paper: “Levels of AGI for Operationalizing Progress on the Path to AGI.” They argued that calling something “AGI” or not is too binary. Instead, we need a spectrum based on two key axes: performance (how well the AI does a task compared to humans) and generality (how broadly it can apply that intelligence across different domains).
This idea caught on fast. By 2026, it’s the foundation most labs use to track progress. Why does it matter? Because vague definitions lead to hype cycles, mismatched expectations, and—more seriously—underestimated risks. A clear framework helps us answer: Is this system just a fancy chatbot, or is it approaching something that could reshape entire industries?
Imagine trying to navigate a road trip without mile markers or signs. That’s what AI development felt like before these classifications. Now, in 2026, we have structured stages that let us measure, compare, and—crucially—prepare safety measures as capabilities scale.
Google DeepMind’s Levels of AGI: The Performance Spectrum
DeepMind’s original framework remains influential in 2026. It defines five main performance levels along the path to AGI:
- Emerging AGI: Better than an unskilled or non-expert human on specific tasks. Think current frontier models like advanced versions of GPT or Claude—they can write essays, code simple programs, or answer questions better than a beginner, but they still stumble on complex, novel problems.
- Competent AGI: Matches or exceeds the median (50th percentile) skilled adult across a wide range of non-physical cognitive tasks. This is the “human-level” threshold many people intuitively picture for AGI. In 2026 discussions, experts debate whether we’re brushing against early competent capabilities in narrow domains like coding or scientific reasoning.
- Expert AGI: Performs in the top 10% of skilled adults. Here, the AI isn’t just average—it’s reliably better than most professionals in its domain.
- Virtuoso AGI: Top 1% performance. Exceptional, creative, and highly reliable—like a world-class expert who consistently innovates.
- Superhuman AGI (often called Artificial Superintelligence or ASI): Outperforms all humans across virtually every cognitive task. This is the point where AI could accelerate scientific discovery dramatically or pose existential questions if not aligned properly.
These levels sit on a matrix with generality: narrow (one task), general (broad cognitive abilities like a human), or universal (everything). Most 2026 systems are still “emerging” on the generality axis but pushing higher on performance in specific areas.
In March 2026, DeepMind released “Measuring Progress Toward AGI: A Cognitive Framework,” expanding this with a taxonomy of 10 key cognitive faculties drawn from human psychology and neuroscience. These include perception, memory, reasoning, planning, learning, creativity, social intelligence, and more. It’s like adding a detailed dashboard to the basic levels—helping us evaluate not just “how smart” but “in which ways.”
OpenAI’s Approach to AGI Levels and Preparedness
OpenAI takes a slightly different angle, often framing AGI around economic impact: systems that can outperform humans at most economically valuable work. They’ve outlined informal five-level progressions in internal and public discussions:
- Chatbots (current frontier models) — Conversational but limited autonomy.
- Reasoners — Strong logical thinking and problem-solving.
- Agents — Can take actions, use tools, and pursue goals over time.
- Innovators — Make novel discoveries or inventions.
- Organizations — Fully autonomous systems that could run entire companies or research labs.
By late 2025 into 2026, OpenAI has highlighted expectations for AI making “very small discoveries” by 2026 and more significant ones by 2028, pointing toward innovator-level capabilities. Their Preparedness Framework classifies risks (persuasion, cybersecurity, CBRN, autonomy) into low/medium/high/critical tiers, tying capability levels to deployment decisions.
This pragmatic, outcome-focused view complements DeepMind’s more cognitive approach. Together, they paint a fuller picture of 2026 progress: we’re seeing strong agentic behavior and reasoning leaps, but true “organizations”-level or superhuman generality remains ahead.

Anthropic and Other Lab Contributions in 2026
Anthropic’s Responsible Scaling Policy (RSP) Version 3.0, updated in February 2026, moves away from strict hard pauses toward flexible “Frontier Safety Roadmaps” and regular Risk Reports. They still evaluate models against capability thresholds that implicitly map to AGI levels—especially around dangerous propensities like deception or autonomy that could emerge at competent/expert stages.
Other players, including xAI and Meta, reference similar spectra. The industry consensus in 2026 seems to be: we’re past pure “emerging” in many narrow tasks, flirting with competent AGI in areas like software engineering and scientific assistance, but full generality across all human cognitive faculties is still years out (with optimistic voices pointing to 2027–2030 for minimal AGI).
Sequoia Capital even declared 2026 “the year of functional AGI” thanks to long-horizon coding agents—systems that can handle weeks-long projects with minimal human input. That’s a big deal for capability classification: it shows we’re crossing from narrow tools into something more agentic and general.
Why Capability Classifications Matter for Safety and Oversight
Here’s where it gets real. As AI climbs these levels, the need for robust supervision grows exponentially. That’s why AGI safety benchmarks for scalable oversight in 2026 are so tightly linked to these frameworks. You can’t safely deploy a “competent AGI” agent without tests that check for honesty, robustness to adversarial prompts, long-term goal alignment, and resistance to deceptive behaviors.
DeepMind’s cognitive taxonomy helps here by identifying specific faculties (like social reasoning or planning) where misalignment risks hide. If a system reaches expert level in planning but lacks calibrated oversight, small goal drifts could amplify into big problems.
In practice, 2026 sees labs integrating capability evaluations directly into safety pipelines. Before scaling to the next level, teams run benchmarks on sycophancy, oversight evasion, and real-world task completion under human or AI-assisted review. This co-scaling of capabilities and oversight is the smart path forward.
Rhetorically, ask yourself: Would you hand the keys of a self-driving car at Level 4 autonomy to a system without rigorous testing? Of course not. The same logic applies to AGI levels—higher capability demands proportionally stronger, scalable safety measures.
Challenges and Debates in 2026 Classifications
No framework is perfect. Critics point out that human performance benchmarks can be noisy (humans vary wildly by culture, experience, and context). Some argue the focus on “cognitive” tasks ignores embodiment and real-world interaction—robots and multimodal agents complicate the picture.
There’s also the “moving goalposts” debate. As models improve, definitions sometimes shift, leading to skepticism. Yet the value of having any shared language outweighs the flaws. In 2026, efforts like DeepMind’s Kaggle hackathons for new benchmarks and collaborative international reports aim to refine these classifications with better data.
Another hot topic: autonomy levels. DeepMind pairs performance with autonomy stages—from pure tool (human fully in control) to fully autonomous agent. Crossing into high autonomy at competent performance levels raises the stakes for AGI safety benchmarks for scalable oversight in 2026.
Looking Ahead: What 2027 Might Bring
From mid-2026, the trajectory looks steep. If current scaling continues, we could see widespread competent AGI in knowledge work domains by 2027–2028, with virtuoso sparks in specialized areas. But energy constraints, data limits, and the need for new architectures (world models, hybrid systems) might create plateaus.
The exciting part? These frameworks aren’t just academic—they guide investment, regulation, and ethical decisions. Companies use them to set internal milestones. Policymakers reference them when discussing compute governance or risk thresholds.
For individuals and businesses, understanding levels of AGI frameworks and capability classifications in 2026 means better preparation: upskilling in areas AI augments, adopting tools responsibly, and advocating for transparent development.
Conclusion
Levels of AGI frameworks and capability classifications in 2026 provide a clear, structured way to track humanity’s biggest technological leap. From DeepMind’s performance-generality matrix and cognitive taxonomy to OpenAI’s economic-impact stages and Anthropic’s risk-linked scaling policies, we’re moving beyond hype to measurable progress. We’re seeing emerging-to-competent capabilities in agents and reasoning, with functional AGI hints in long-horizon tasks.
Yet capability without control is risky. That’s why tying these classifications to AGI safety benchmarks for scalable oversight in 2026 is non-negotiable—it ensures we climb the ladder safely, keeping humans empowered rather than displaced or endangered.
The future isn’t predetermined, but informed frameworks give us agency. Stay curious, demand transparency from labs, and think about how these advancements can amplify human potential. The journey to true general intelligence is underway—let’s make sure it’s one we navigate wisely, together.
External high-authority links:
- Google DeepMind Levels of AGI Framework
- DeepMind Measuring Progress Toward AGI Cognitive Framework
- Anthropic Responsible Scaling Policy v3.0
FAQs
What are the main levels in DeepMind’s AGI framework in 2026?
DeepMind’s levels range from Emerging AGI (better than unskilled humans on tasks) to Competent, Expert, Virtuoso, and Superhuman AGI, evaluated across performance and generality dimensions.
How do OpenAI and Anthropic classify AGI capabilities differently?
OpenAI focuses on economic value and stages like chatbots to organizations, while Anthropic ties capabilities to risk thresholds in its Responsible Scaling Policy, emphasizing safety at each escalation.
Why link levels of AGI frameworks to safety oversight?
Higher capability levels increase misalignment risks, making AGI safety benchmarks for scalable oversight in 2026 essential to test honesty, robustness, and alignment before deployment.
Are we at competent AGI level in 2026?
In narrow domains like coding and reasoning, frontier systems show competent or expert sparks, but full generality across all cognitive tasks remains emerging to early competent overall.
How can businesses use AGI capability classifications?
They help prioritize adoption—start with emerging-level tools for augmentation and prepare infrastructure for competent agents while monitoring safety benchmarks.