← Home
Cognitive Revolution · January 25, 2025 · 75m
Red-Teaming AI: Inside Model Evaluation
Inside the process of evaluating AI model safety through red-teaming. How labs test for dangerous capabilities, what the evaluations reveal, and why the process is inherently incomplete.
Canon
•
Red-teaming strips away the fine-tuned politeness of AI models to reveal their underlying capabilities — the true self beneath the safety training.
•
AI safety evaluators can control the rigor of their testing methodology but not the completeness of their coverage. The Stoic approach focuses on process quality rather than guaranteeing safety.