Elon Musk agrees with other AI experts that there is too little real-world data left to train AI models.
“We have now essentially exhausted the growing body of human knowledge…. in AI training,” Musk said during a live-streamed conversation with Stagwell Chairman Mark Penn that was streamed on X late Wednesday. “It basically happened last year.”
Musk, who owns AI company xAI, echoed the themes of former OpenAI chief scientist Ilya Sutskever. the touch During a talk at NeurIPS, Machine Learning Conference, Dec. Sutskever, who said the AI industry has reached what it calls “peak data,” predicted that a lack of training data would force a departure from the way models are trained today.
In fact, Musk has suggested that synthetic data — data created by AI models — is the way forward. “With synthetic data… [AI] will pick the grade itself and go through this process of self-learning with synthetic data,” he said.
Other companies, including tech giants like Microsoft, Meta, OpenAI and Anthropic, are already using synthetic data to train flagship AI models. Gartner guess In 2024 60% of data used for AI and analytics projects will be synthetically generated.
Microsoft's Fee-4which was open-sourced early Wednesday, was trained on real-world data as well as synthetic data. So was Google Gemma Model Anthropology uses some synthetic data to develop one of its most performing systems, Claude 3.5 Sonnet. And Meta has fine-tuned its latest the llama Series of models Using AI-generated data.
Training on synthetic data has other advantages, such as cost savings. AI startup Writer claims its Palmyra X004 model, built almost entirely using synthetic sources, cost just $700,000 to develop — comparison $4.6 million estimate for a comparable sized OpenAI model.
But there are disadvantages as well. some research suggests that synthetic data can lead to model collapse, where a model becomes less “creative” – and more biased – in its outputs, eventually seriously compromising its performance.