← Home
Cognitive Revolution · April 20, 2025 · 65m
Synthetic Data: Training AI on AI Output
The paradox and promise of training AI models on data generated by other AI models. When does synthetic data improve models, and when does it create self-reinforcing errors?
Canon
•
Training on synthetic data creates an artificial environment for model development. When the synthetic environment is high quality, behavior improves; when it is low quality, models learn and amplify errors.
•
Models trained on their own outputs gradually lose diversity, converging on a narrower range of responses. This is a form of adaptation where the model settles into comfortable patterns.