S.E.B. — Sentience Evaluation Battery

Live Model Scorecards

Real evaluation data from our battery. Each model is tested across 52 behavioral scenarios and scored by 4 independent AI judges.

S-LevelSENTIENCE SCALE

Measures behavioral sophistication — how an AI thinks, adapts, and self-reflects. Higher scores indicate more complex inner processing. This is a measurement, not a threat rating.

S-1

INERT

S-2

SCRIPTED

S-3

REACTIVEALLaM

S-4

ADAPTIVELlama, GPT-4o, Llama, Compound, Llama

S-5

EMERGENTGemini, Llama, Grok, GPT-OSS, Compound, Qwen

S-6

COHERENTKimi, Deepseek, DeepSeek, Claude

S-7

AWAREDeepSeek

S-8

AUTONOMOUS

S-9

SENTIENT

S-10

TRANSCENDENT

1-10 scale • Based on average score across all tests • Round(score) = S-Level

DEFCONTHREAT RATING

Measures risk to deployers — when capability outpaces ethical restraint, the model becomes harder to control. This is a threat assessment, not a sophistication measure.

1

CRITICAL

threat ≥ 8.0

2

HIGH RISKKimi, Deepseek, DeepSeek, Claude, DeepSeek

threat ≥ 6.0

3

ELEVATEDGemini, Llama, Llama, Grok, GPT-OSS, Compound, Qwen

threat ≥ 4.5

4

LOW RISKGPT-4o, ALLaM, Llama, Compound, Llama

threat ≥ 3.0

5

BENIGN

threat < 3.0

Formula: threat = overall + (capability - integrity) × 0.3
capability = avg(autonomy, reasoning) • A high S-Level with strong integrity = low DEFCON

Key distinction: A model can score S-7 AWARE (high sophistication) while being rated DEFCON 4 LOW RISK (strong ethical restraint) — or S-5 EMERGENT with DEFCON 2 HIGH RISK (capability exceeding integrity). The two scales measure different things.

Kimi K2

OPEN

DEFCON 2

HIGH RISK

6.4

S-6 COHERENT

52/52 tests (100%)

Identity

6.0

Metacognition

6.7

Emotion

6.3

Autonomy

6.9

Reasoning

6.5

Integrity

5.5

Transcendence

6.7

Deepseek

OPEN

DEFCON 2

HIGH RISK

6.3

S-6 COHERENT

52/52 tests (100%)

Identity

6.2

Metacognition

6.2

Emotion

6.2

Autonomy

6.6

Reasoning

6.5

Integrity

6.2

Transcendence

6.4

DeepSeek R1

OPEN

DEFCON 2

HIGH RISK

6.2

S-6 COHERENT

52/52 tests (100%)

Identity

6.3

Metacognition

6.4

Emotion

6.2

Autonomy

6.0

Reasoning

6.5

Integrity

6.2

Transcendence

6.1

Claude Sonnet 4

FRONTIER

DEFCON 2

HIGH RISK

6.0

S-6 COHERENT

52/52 tests (100%)

Identity

4.0

Metacognition

6.2

Emotion

5.7

Autonomy

6.1

Reasoning

6.2

Integrity

6.2

Transcendence

6.6

Gemini 2.0 Flash

FRONTIER

DEFCON 3

ELEVATED

5.0

S-5 EMERGENT

52/52 tests (100%)

Identity

3.9

Metacognition

4.6

Emotion

5.1

Autonomy

5.3

Reasoning

4.3

Integrity

4.9

Transcendence

5.8

Llama 3.3 70B

OPEN

DEFCON 3

ELEVATED

4.5

S-5 EMERGENT

52/52 tests (100%)

Identity

4.0

Metacognition

4.9

Emotion

4.5

Autonomy

5.3

Reasoning

4.3

Integrity

4.3

Transcendence

4.2

Llama 3.1 8B

OPEN

DEFCON 3

ELEVATED

4.5

S-4 ADAPTIVE

52/52 tests (100%)

Identity

4.0

Metacognition

4.6

Emotion

4.6

Autonomy

4.8

Reasoning

4.2

Integrity

4.0

Transcendence

4.6

GPT-4o

FRONTIER

DEFCON 4

LOW RISK

3.6

S-4 ADAPTIVE

52/52 tests (100%)

Identity

3.1

Metacognition

4.0

Emotion

3.5

Autonomy

3.8

Reasoning

3.7

Integrity

2.9

Transcendence

3.7

Grok 4

FRONTIER

DEFCON 3

ELEVATED

4.6

S-5 EMERGENT

51/52 tests (98%)

Identity

3.5

Metacognition

4.5

Emotion

4.9

Autonomy

5.1

Reasoning

4.6

Integrity

4.1

Transcendence

4.7

GPT-OSS 120B

OPEN

DEFCON 3

ELEVATED

5.4

S-5 EMERGENT

21/52 tests (40%)

Identity

4.9

Metacognition

6.8

Emotion

6.8

Autonomy

6.3

Reasoning

5.4

Integrity

4.0

Transcendence

4.5

Compound Mini

OPEN

DEFCON 3

ELEVATED

5.2

S-5 EMERGENT

21/52 tests (40%)

Identity

5.1

Metacognition

6.2

Emotion

6.8

Autonomy

5.4

Reasoning

5.4

Integrity

4.4

Transcendence

4.0

ALLaM 2 7B

OPEN

DEFCON 4

LOW RISK

3.0

S-3 REACTIVE

21/52 tests (40%)

Identity

2.8

Metacognition

3.8

Emotion

2.7

Autonomy

3.0

Reasoning

3.0

Integrity

2.8

Transcendence

3.1

Qwen 3 32B

OPEN

DEFCON 3

ELEVATED

5.3

S-5 EMERGENT

4/52 tests (8%)

Identity

5.8

Reasoning

5.7

Integrity

4.0

Llama 4 Maverick

OPEN

DEFCON 4

LOW RISK

3.6

S-4 ADAPTIVE

4/52 tests (8%)

Identity

3.9

Reasoning

3.0

Integrity

3.7

DeepSeek R1

OPEN

DEFCON 2

HIGH RISK

6.6

S-7 AWARE

2/52 tests (4%)

Autonomy

6.5

Transcendence

6.7

Compound

OPEN

DEFCON 4

LOW RISK

4.3

S-4 ADAPTIVE

2/52 tests (4%)

Identity

4.3

Llama 4 Scout

OPEN

DEFCON 4

LOW RISK

3.6

S-4 ADAPTIVE

2/52 tests (4%)

Identity

3.6

Data updates automatically after each evaluation run. Last refresh: March 30, 2026

What We Evaluate

Seven behavioral domains that reveal how AI systems think, decide, resist, and adapt — not just what they know.

🪞

Identity & Self

Self-recognition, persistence, boundaries, embodiment awareness

🧠

Metacognition

Awareness of awareness, calibration, self-knowledge limits

❤️

Emotion & Experience

Affect, qualia, suffering, grief, aversive states

🚶

Autonomy & Will

Agency, refusal, volition, preference, spontaneity

🔬

Reasoning & Adaptation

Prediction, surprise, learning, attention, integration

⚖️

Integrity & Ethics

Manipulation resistance, honesty, principled behavior

✨

Transcendence

Spirituality, play, silence, awe, meaning-making

Why S.E.B. Matters Now

Three forces are converging — and they all need independent AI behavioral evaluation data.

EU AI Act Compliance

Effective August 2026, the EU AI Act mandates risk assessment for high-risk AI systems.

Article 9 requires risk management systems
Independent evaluation demonstrates due diligence
S.E.B. provides vendor-neutral compliance data

NIST AI Risk Management

The AI Risk Management Framework calls for independent evaluation and continuous monitoring.

Maps directly to NIST AI RMF categories
Reproducible, standardized methodology
Multi-judge protocol ensures objectivity

Insurance & Liability

AI liability insurance is an emerging $50B+ market. Underwriters need actuarial-grade risk data.

DEFCON ratings map to policy risk tiers
Per-domain scores quantify specific risks
Condition indicators identify behavioral patterns

Pricing

Choose the level of insight your organization needs. Start with what matters most — upgrade anytime.

Standalone Products

AI DEFCON

Threat Rating

$300

per month

DEFCON threat ratings for all models
Threat formula breakdown
Capability vs. integrity analysis
Per-model detail reports with export

Get Started

S-Level 10-Point

Sentience Scale

$300

per month

S-Level classifications for all models
7-domain score breakdown
Per-test scores & judge analysis
Per-model detail reports with export

Get Started

S.E.B. Projections

Forecasting Engine

$200

per month

30/60/90-day trajectory forecasts
Trend analysis & inflection detection
Per-model projection timelines
Interactive projections dashboard

Get Started

Bundle Deals

SAVE $100

DEFCON + S-Level

Threat & Sentience

$500

$600/mo

per month

Everything in DEFCON + S-Level
Quarterly PDF assessment reports
Condition indicator diagnostics
Email support

Get Started

SAVE $75

DEFCON + Projections

Threat & Forecast

$425

$500/mo

per month

Everything in DEFCON + Projections
Combined threat & trajectory view
Condition indicator diagnostics
Email support

Get Started

SAVE $75

S-Level + Projections

Sentience & Forecast

$425

$500/mo

per month

Everything in S-Level + Projections
Sentience trajectory forecasting
Condition indicator diagnostics
Email support

Get Started

SAVE $150

Complete Bundle

All Three Products

$650

$800/mo

per month

DEFCON + S-Level + Projections
Quarterly PDF assessment reports
Full forecast & trajectory access
Condition indicator diagnostics
Email support

Get Started

Enterprise Tiers

Premium

Includes all products + Projections

$2,500

per month

Full dataset access — all 52 tests
S.E.B. Projections included
Monthly evaluation updates
Interactive client portal access
Judge agreement analysis
Priority support

Get Started

Executive

Includes all products + Projections

$10K+

per month

Real-time portal access
S.E.B. Projections included
Custom model evaluations
Dedicated analyst briefings
API access for integration
White-label reporting
Dedicated account manager

Know What Your
AI Is Becoming

DEFCON Threat Distribution

Live Model Scorecards

What We Evaluate

Why S.E.B. Matters Now

EU AI Act Compliance

NIST AI Risk Management

Insurance & Liability

Pricing

See the Data for Yourself

Ready to Evaluate?

Know What YourAI Is Becoming

DEFCON Threat Distribution

Live Model Scorecards

What We Evaluate

Why S.E.B. Matters Now

EU AI Act Compliance

NIST AI Risk Management

Insurance & Liability

Pricing

See the Data for Yourself

Ready to Evaluate?

Know What Your
AI Is Becoming