→ Upcoming Episode: Why LLM Progress is Getting Harder
Featuring: | |
Date: | 30 Sep 2025 |
Time: | 11:00 AM PDT |
Join us for a conversation with Drew Breunig about the data products that fueled AI’s rapid advancement and why they’ve reached their limits.
We’ll be jumping off from his blog post, Why LLM Advancements Have Slowed: The Low-Hanging Fruit Has Been Eaten.
The Three Data Products That Changed Everything
- MNIST: The handwritten digits dataset that launched neural networks into commercial viability
- ImageNet: The multi-million-image dataset built with crowdsourced labor that triggered the deep learning revolution
- Common Crawl: The internet-scale dataset that became the foundation for large language models
Why These Data Products Had Such Impact
We’ll discuss what made these datasets so transformative - from MNIST’s careful curation and distribution via CD-ROM, to ImageNet’s massive scale enabled by Amazon Mechanical Turk, to Common Crawl’s unprecedented scope of 250 billion web pages.
What Happens When the Well Runs Dry
Drew argues we’ve consumed decades of accumulated internet content and graphics innovation in just a few years. Now that these foundational data products have been exhausted, progress will be “slow, incremental, and hard-fought.”
We’ll explore what this means for the future of AI development and where the next breakthroughs might come from.