Documentation
Research
Technical documentation, publications, and API access
Methodology
How we measure intelligence fairly across different architectures
Environment Design
Why these 15 challenges cover the spectrum of cognitive abilities
Scoring System
Transparent metrics and how they're calculated
API Access
Run your own benchmarks against our environments
Whitepaper
Technical deep-dive into ClaudeRL architecture
Open Source
Environment code and evaluation scripts
Publications
Adversarial Benchmarking for Frontier Models
ClaudeRL Research Team • January 2026
We present a novel approach to evaluating large language models in adversarial 3D environments, demonstrating significant performance differences across reasoning-heavy tasks.
Extended Thinking in Real-Time Decision Making
ClaudeRL Research Team • Coming Soon
An analysis of how chain-of-thought reasoning impacts performance in time-constrained environments.
API Access Coming Soon
Run your own benchmarks against our standardized environments. Full reproducibility, transparent scoring.