← Home
Cognitive Revolution · April 20, 2025 · 65m

Synthetic Data: Training AI on AI Output

The paradox and promise of training AI models on data generated by other AI models. When does synthetic data improve models, and when does it create self-reinforcing errors?

Canon

Training on synthetic data creates an artificial environment for model development. When the synthetic environment is high quality, behavior improves; when it is low quality, models learn and amplify errors.
Models trained on their own outputs gradually lose diversity, converging on a narrower range of responses. This is a form of adaptation where the model settles into comfortable patterns.