This is Hamamoto from TIMEWELL.
Source
Source: https://gisford.medium.com/five-takeaways-from-the-april-2023-ai-economy-2381255d7a96
Large language models (LLMs) play a central role in the development of AI applications, and demand is expected to grow further. But LLM development comes with significant challenges—proprietary data access and GPU availability among them. Proprietary data is a critical input for LLM development, and its importance will only increase. Companies are also consistently underestimating how much GPU capacity they actually need.
Evaluating LLM accuracy is another challenge, though current approaches remain largely qualitative. More data-driven evaluation methods are needed.
Within the LLM ecosystem, there are real tensions: open-source vs. closed models, self-hosting vs. cloud hosting by major players. That said, organizations like Hugging Face—part of the Lux portfolio—are actively supporting the open-source ecosystem, and Anthropic is particularly interesting given its potential to collaborate with cloud and ecosystem partners outside the OpenAI/Microsoft orbit.
Key Insight 1: GPU Access Is More Constrained Than Companies Realize
The LLM ecosystem requires overcoming both these challenges and these tensions. Companies need to find the right combination of open/closed and self-hosted/cloud-hosted approaches based on their actual production use cases.
Running large-scale language models today requires expensive compute contracts—typically A100 GPUs provided by major cloud providers (AWS, GCP, Azure) or access to supercomputer clusters.
Most companies significantly underestimate the number of GPUs needed to run AI applications at scale. As AI becomes more common across all applications and multiple use cases appear within each workflow, GPU demand will only increase.
Today, large GPU access is concentrated among a small number of providers. Some players—like the OpenAI API—offer compute at reduced cost in exchange for consumer product pricing. And while Microsoft and Google are dominant, Meta and Oracle shouldn't be underestimated.
Looking for AI training and consulting?
Learn about WARP training programs and consulting services in our materials.
Key Insight 2: Clean Training Data Is Running Out
There is currently a shortage of "good, clean" data on the web for training new large language models. Unstructured data is being stretched to its limits. Data is a critical blocker in LLM development.
Data incentives will increasingly push organizations to contribute their own data. Proprietary datasets for specialized domains will become important moats for AI applications—especially when combined with targeted data products and workflow flywheels for specific use cases.
RunwayML, part of the Lux portfolio and focused on generative AI for the creative economy and video production, is a good example. Note that over time, more advanced LLMs (like GPT-5 or GPT-6) may eventually outperform even models trained on specialized vertical datasets.
Key Insight 3: Infrastructure Is Commoditizing
Most major cloud providers (AWS SageMaker, AzureML, Google Cloud) are already offering—or planning to offer—the full AI infrastructure stack for free, from inference to model deployment and experimentation.
Many infrastructure models are low-margin, making it difficult for new entrants to win price-sensitive customers away from major players. New infrastructure players need to demonstrate ROI beyond cost savings—such as self-hosted or distributed servers, vertical-specific infrastructure, or dramatically better user/developer experience. Together.xyz, which combines cryptographic principles with AI, is an interesting example.
Key Insight 4: Evaluating LLM Accuracy Remains Hard
Determining whether a large language model responded accurately or produced a satisfactory answer is still difficult. Most available options today are qualitative—essentially "it looked right." Measuring whether a model hallucinated or gave a completely wrong answer is still challenging (the TruthfulQA paper offers interesting examples here).
Developing more quantitative, data-driven ways to evaluate LLMs for accuracy and Q&A performance is critically important.
Key Insight 5: Open vs. Closed Tension Will Persist
There are real tensions between open-source and closed models (e.g., OpenAI vs. Meta's LLaMA), and between self-hosted vs. cloud-hosted AI infrastructure (e.g., MosaicML vs. AzureML).
Will companies be able to maintain sovereignty over their AI applications—whether running on major providers' hosted platforms or on their own private cloud?
Over time, will the LLM ecosystem come to resemble the semiconductor industry, the database industry, or something entirely different? My intuition is that companies will end up with a combination of open/closed models and self/cloud-hosted infrastructure based on their production use cases.
Hugging Face, part of the Lux portfolio, supports the open-source ecosystem and recently hosted WoodstockAI—the largest open-source meetup ever, with over 5,000 attendees. Anthropic is particularly interesting given its positioning outside the OpenAI/Microsoft ecosystem.
TIMEWELL AI Consulting
TIMEWELL supports business transformation in the AI agent era.
Our Services
- ZEROCK: High-security AI agent running on domestic servers
- TIMEWELL Base: AI-native event management platform
- WARP: AI talent development program
