← Home
Cognitive Revolution · January 25, 2025 · 75m

Red-Teaming AI: Inside Model Evaluation

Inside the process of evaluating AI model safety through red-teaming. How labs test for dangerous capabilities, what the evaluations reveal, and why the process is inherently incomplete.

Canon

Red-teaming strips away the fine-tuned politeness of AI models to reveal their underlying capabilities — the true self beneath the safety training.
AI safety evaluators can control the rigor of their testing methodology but not the completeness of their coverage. The Stoic approach focuses on process quality rather than guaranteeing safety.