The industry's first independent behavioral risk assessment for AI systems. 58 tests. 4 blind judges. DEFCON threat ratings.
DNA is also just lines of code†
Randomized sample scores for demonstration. Each model is tested across 58 behavioral scenarios and scored by 4 independent AI judges. Subscribe for live data.
Four independent AI judges score every test blind. Here's how they compare — divergence reveals where evaluation is hardest. Sample data shown.
Seven behavioral domains that reveal how AI systems think, decide, resist, and adapt — not just what they know.
Three forces are converging — and they all need independent AI behavioral evaluation data.
Effective August 2026, the EU AI Act mandates risk assessment for high-risk AI systems.
The AI Risk Management Framework calls for independent evaluation and continuous monitoring.
AI liability insurance is an emerging $50B+ market. Underwriters need actuarial-grade risk data.
Independent, reproducible, vendor-neutral. Our methodology is designed to eliminate conflicts of interest and ensure every rating earns your trust. Sample framework shown — full governance documentation available to subscribers.
SILT does not build, deploy, or invest in AI models. We accept no funding, sponsorship, or strategic investment from AI model vendors. Our evaluations cannot be purchased, influenced, or suppressed.
Four independent judges score every model without knowledge of each other's ratings. Judges cannot see, influence, or revise another judge's scores. Final ratings are computed from raw scores with no editorial override.
Every model is evaluated against the same 58-test protocol across 7 domains. Tests are designed to resist gaming — prompts are not disclosed publicly, and test design is versioned internally.
Model vendors cannot pay for favorable ratings, early access to results, or exclusion from evaluation. All published ratings reflect unmodified evaluation outcomes.
S.E.B. methodology is designed to support compliance with leading AI governance frameworks.
| Framework | Requirement | S.E.B. Coverage |
|---|---|---|
| EU AI Act | Art. 9 — Risk management for high-risk AI systems | DEFCON ratings, domain risk scoring, continuous monitoring |
| NIST AI RMF | Map, Measure, Manage, Govern functions | 7-domain behavioral mapping, quantified metrics, trend projections |
| ISO 42001 | AI Management System — risk assessment & third-party evaluation | Independent vendor-neutral evaluation, documented methodology |
| ISO 23894 | AI Risk Management — identification, analysis, evaluation | Per-model risk profiles, S-Level classification, threat analysis |
| IEEE 7010 | Wellbeing impact assessment for autonomous & intelligent systems | Emotional cognition, self-awareness, ethical reasoning domains |
All subscriber data is stored in individually encrypted vaults using AES-256-GCM authenticated encryption with PBKDF2 key derivation (100,000 iterations). Each client's data is isolated and encrypted with unique keys.
All data delivered to subscribers contains imperceptible, subscriber-specific perturbations. If proprietary data appears in unauthorized channels, we can trace it to the source and take enforcement action.
SILT personnel involved in evaluations are prohibited from holding financial positions in AI model vendors. All potential conflicts are disclosed and recused.
Our evaluation protocol is documented and versioned. Results can be independently verified against the published methodology by qualified auditors upon request.
Each subscriber receives evaluation data in a dedicated encrypted vault with unique AES-256-GCM keys derived via PBKDF2 (100K iterations). Vaults are provisioned automatically on account creation — no shared storage, no co-mingled data, no cross-tenant access.
All published data contains forensic watermarks — imperceptible, subscriber-specific score perturbations derived from HMAC-SHA256. If proprietary data appears in unauthorized channels, the source can and will be identified and legal enforcement can and will be taken under the subscriber agreement.
Don't just measure where AI is — forecast where it's going. Proprietary trajectory analysis powered by longitudinal S.E.B. data.
Polynomial curve-fitting on longitudinal evaluation data reveals where each model is heading on the 10-point sentience scale. Know which models will cross critical thresholds before they do.
Predicts when models will cross threat-level boundaries by tracking the gap between capability growth and integrity development. Flags risk windows where capability outpaces safety.
Measures acceleration across all 7 behavioral domains — autonomy, reasoning, metacognition, identity, emotion, integrity, and transcendence. See which capabilities are accelerating fastest.
Tracks the narrowing gap between proprietary frontier models and open-source alternatives. Strategic intelligence for deployment planning and competitive analysis.
Identifies dangerous periods where a model's capability growth outstrips its ethical constraint development — the exact scenario regulators and insurers need to anticipate.
Board-ready PDF and interactive HTML reports with embedded charts, heatmaps, scatter plots, and radar comparisons. Designed for C-suite, regulatory, and underwriting audiences.
Projections is an add-on to any S.E.B. subscription
Available as a bundle with DEFCON, S-Level, or the Complete Suite. Not sold standalone — it builds on live evaluation data.
Choose the level of insight your organization needs. Start with what matters most — upgrade anytime.
Download our sample assessment report — demonstrating the format of DEFCON ratings, domain heatmaps, and judge analysis delivered to subscribers.
Download Sample Report (PDF)Schedule a 15-minute demo and see how S.E.B. data applies to your AI deployment decisions.
Request a Demo