How hyperscalers are spending $320 billion—and which data platforms will capture the value
The Great Capital Deployment
In 2025, the world’s largest technology companies are engaged in the most aggressive infrastructure build-out in history. The numbers are staggering:
- Meta: Up to $72 billion on AI infrastructure in 2025, with plans for $600 billion through 2028
- Microsoft: $80+ billion in fiscal 2025
- Google: $75+ billion
- Amazon: $75+ billion
Total hyperscaler spending: $320+ billion in 2025 alone.
This isn’t gradual investment—it’s a full-scale arms race. No hyperscaler can afford to fall behind because AI infrastructure exhibits winner-takes-most dynamics. The company with the most compute can train better models, attract better talent, and compound their advantages.
But here’s the critical question: Where does this money actually go, and how do investors capture value?
The Hyperscaler Stack: Where Money Flows
Hyperscaler capital expenditure breaks down roughly as follows:
1. Compute Infrastructure (50-60%)
- AI accelerators (Nvidia GPUs, custom ASICs)
- Servers and supporting hardware
- Networking equipment
- Storage systems
2. Physical Infrastructure (20-30%)
- Data center construction
- Power and cooling systems
- Land acquisition
- Connectivity infrastructure
3. Software and Tools (10-15%)
- Development platforms
- Orchestration tools
- Monitoring and management systems
- Security infrastructure
4. Data Infrastructure (5-10%)
- Data warehouses and lakes
- ETL and pipeline tools
- Analytics platforms
- Database systems
That last category—data infrastructure—is where some of the most interesting investment opportunities exist. Here’s why.
The Data Infrastructure Layer: Why It Matters
AI models are only as good as the data they train on. Every AI application follows the same pattern:
- Collect massive amounts of data
- Store it somewhere accessible
- Transform it into usable format
- Analyze it to extract insights
- Feed it to models for training/inference
Companies that control these data workflows capture enormous value because:
- Consumption economics: Revenue grows as customers process more data
- High switching costs: Migrating data infrastructure is expensive and risky
- Network effects: More data makes platforms more valuable
- Expansion opportunities: Start with storage, expand to analytics, then to AI
The Data Platform Landscape
Let’s examine the key players in data infrastructure and their AI positioning.
Snowflake: The Data Warehouse That Ate Everything
What they do: Snowflake started as a cloud data warehouse but has evolved into a full data platform. They handle storage, compute, and analytics across multiple clouds.
AI positioning:
- Snowflake Cortex: Built-in AI functions for analysis without moving data
- Document AI: Extract insights from unstructured data
- ML features: Train and deploy models directly in Snowflake
Why it matters: Every AI application needs clean, accessible data. Snowflake is where enterprises store and prepare that data. As AI usage grows, data consumption grows, and Snowflake’s revenue compounds.
Investment thesis: Snowflake has best-in-class consumption economics—the more AI applications customers run, the more compute and storage they consume. They’re benefiting from AI without needing to compete directly with OpenAI or Anthropic.
Real-world validation: Companies using Snowflake for AI workloads report 3-5x increases in data processing. This isn’t linear growth—it’s exponential as AI applications multiply.
Risks:
- Compression from cloud providers building competing services
- High valuation means expectations are sky-high
- Consumption model creates revenue volatility
Portfolio position: Core holding in the data infrastructure category. Size appropriately but recognize the premium valuation.
MongoDB: The Operational Database for Modern Apps
What they do: MongoDB is a NoSQL database designed for developers building modern applications. Unlike traditional databases with rigid schemas, MongoDB handles unstructured and semi-structured data natively.
AI positioning:
- Vector search: Native support for AI embeddings and similarity search
- Atlas for AI: Integrated tools for building AI applications
- Retrieval-Augmented Generation (RAG): Critical for AI applications that need to access proprietary data
Why it matters: AI applications need operational databases that can handle real-time queries, unstructured data, and vector embeddings. MongoDB is positioned exactly there.
Investment thesis: As developers build AI-powered applications, they need databases that understand vector embeddings and semantic search. MongoDB is adding these capabilities natively, making it the natural choice for AI application development.
Consider: Every enterprise building a chatbot over their internal data needs vector search. Every recommendation engine needs similarity matching. These are MongoDB’s sweet spot.
Risks:
- Competition from PostgreSQL with vector extensions (open source)
- Cloud providers bundling vector databases into existing services
- Execution risk as they expand from database to full data platform
Portfolio position: Strong supplementary holding. Smaller position size than Snowflake but meaningful exposure to the operational database layer.
Palantir: The AI Integration Platform
What they do: Palantir is harder to categorize—they’re part consulting, part software platform, part systems integrator. They help large organizations integrate data, build AI applications, and deploy them at scale.
AI positioning:
- AIP (Artificial Intelligence Platform): Connects LLMs to enterprise data
- Ontology: A unified data layer that makes sense of complex enterprise data
- Bootcamps: Hands-on implementation to get AI running in days, not months
Why it matters: Most enterprises have data scattered across dozens of systems. Palantir specializes in integrating that chaos and making it accessible to AI models. They’re not competing on better models—they’re enabling organizations to actually use AI with their proprietary data.
Investment thesis: Palantir has found product-market fit with AIP. Their “bootcamp” model—getting AI applications running in days—is resonating with enterprises desperate to deploy AI but lacking the expertise. This is a high-margin, sticky business with strong net retention.
Recent data point: Palantir’s U.S. commercial revenue accelerated to 54% growth in Q3 2024, driven almost entirely by AI platform adoption.
Risks:
- Controversial company with government/defense ties
- Expensive (40x+ revenue multiple)
- Can hyperscalers build “AI integration” internally?
- Stock trades on hype as much as fundamentals
Portfolio position: Speculative growth holding. Only for investors comfortable with volatility and controversy. Size small—this can be a 10x or a 50% drawdown.
The Second Tier: Specialized Players
Beyond the big three, several companies are carving out niches:
Databricks (private): Direct Snowflake competitor with a focus on data science workflows. Strong in machine learning operations. If they IPO, watch closely.
Confluent: Real-time data streaming based on Apache Kafka. Critical for AI applications that need live data.
Elastic: Search and observability platform. AI applications generate massive amounts of logs and metrics—Elastic handles monitoring.
CrowdStrike: Not pure data infrastructure, but AI-powered security is essential as attack surfaces expand. Every AI application is a new vulnerability.
The Hyperscaler Question: Who Wins the Cloud Platform War?
For investors, the big question is which hyperscaler to own. Each has different AI strategies:
Microsoft / Azure
Strengths:
- OpenAI partnership gives them the leading LLM
- Enterprise distribution through Microsoft 365
- GitHub Copilot proving developers will pay for AI
Weaknesses:
- OpenAI could leave or compete directly
- Azure margins are lower than AWS
Investment thesis: Microsoft is the safest hyperscaler AI play. They have distribution, partnerships, and are effectively monetizing AI today through Copilot (~$30/user/month at massive scale).
Amazon / AWS
Strengths:
- Largest cloud provider with deepest enterprise relationships
- Building proprietary models (Bedrock)
- Owns the infrastructure layer (custom chips, data centers)
Weaknesses:
- Later to AI than Microsoft
- No breakthrough consumer AI product yet
Investment thesis: AWS’s dominance in cloud infrastructure means they’ll capture AI spending regardless of whose model wins. More of a “picks and shovels” approach.
Google / GCP
Strengths:
- Invented the Transformer architecture (the “T” in GPT)
- Strong technical foundation in AI research
- Gemini models are competitive with GPT-4
Weaknesses:
- Distant third in cloud market share
- History of product launches without monetization
- Search business under threat from AI
Investment thesis: Google has world-class AI but struggles to monetize it. Riskiest of the three hyperscalers, but potential for dramatic revaluation if they figure out AI products.
Meta
Unique position: Not selling cloud services, but open-sourcing Llama models and building massive AI infrastructure for internal use (recommendation engines, content moderation, etc.).
Investment thesis: Meta benefits from AI without competing in the platform wars. They’re one of the largest CapEx spenders (up to $72B in 2025) and driving hardware demand. Stock is cheaper than Microsoft or Google on P/E basis.
Portfolio allocation guidance: If forced to pick one hyperscaler, Microsoft has the clearest AI monetization. But consider that Meta is indirectly exposed—they’re buying the infrastructure without the cloud platform risks.
The Power Law in Cloud Platforms
An important pattern emerges: cloud platforms exhibit extreme winner-takes-most dynamics.
In traditional infrastructure:
- Operating systems: Windows had 90%+ market share
- Search: Google has 90%+ market share
- Social networks: Facebook/Instagram dominate
- E-commerce: Amazon has 40%+ share and growing
In AI infrastructure, the same pattern is emerging:
- GPUs: Nvidia has 90%+ share of AI accelerators
- Foundries: TSMC has 90%+ share of advanced nodes
- Cloud: AWS+Azure+GCP have 65%+ combined share
This means:
- Leaders compound advantages: More customers → more data → better models → more customers
- Second and third place still win: But at much lower margins and growth rates
- Fourth place and below struggle: Lack of scale makes competing economically unviable
Investment implication: Favor the clear leaders (Snowflake in warehousing, MongoDB in operational databases, Palantir in enterprise AI). Second-tier players need much larger discounts to compensate for lower probability of winning.
How to Evaluate AI Software Companies
Not every company claiming “AI” deserves a premium valuation. Here’s a framework for evaluation:
1. Is AI core to the product or a feature?
Core: Palantir (AI integration is the product), CrowdStrike (AI-powered threat detection is the moat)
Feature: Most SaaS companies adding “AI chatbots” or “AI insights”
Investment implication: Pay premiums for AI-core companies. Be skeptical of AI features.
2. Does the company have proprietary data?
AI models are commoditizing rapidly. Proprietary data is the real moat.
Strong data moats: Bloomberg (financial data), Epic (healthcare records), Adobe (creative content)
Weak data moats: Companies using public data or customer data they can’t train on
3. Is there real customer traction?
Track these metrics:
- Net dollar retention: Are existing customers spending more? (Should be >120% for growth-stage AI companies)
- Consumption growth: Is AI driving more platform usage?
- Customer count: Are they signing new logos or just expanding existing?
Red flag: Companies with accelerating revenue but flat/declining customer counts—suggests price increases, not growth.
4. What’s the competitive moat?
AI is lowering barriers to entry in many markets. The moat needs to be stronger than “we use better models.”
Strong moats:
- Network effects (Palantir’s ontology gets better with more users)
- Data gravity (Snowflake—moving data is expensive)
- Integration depth (CrowdStrike—deeply embedded in security operations)
Weak moats:
- “Better AI” (models commoditize quickly)
- First-mover advantage (easy to copy)
- Brand (AI market is too new for brand loyalty)
Portfolio Construction: Building the Data Stack
Based on this analysis, here’s how to think about cloud/data infrastructure allocation:
Core Holdings (60-70% of cloud allocation)
Snowflake: The data warehouse layer. Every AI application needs data stored somewhere. Consumption economics align with AI growth.
One hyperscaler: Microsoft (safest), Amazon (most diversified), or Meta (highest AI CapEx, cheapest valuation)
Rationale: These are the platforms where AI infrastructure spending flows. They benefit regardless of which AI applications win.
Growth Holdings (20-30%)
MongoDB: Operational database for AI applications. Vector search positioning is strategic.
Palantir: If you can stomach the volatility and valuation. They’re executing on enterprise AI integration.
CrowdStrike: AI security is essential but often overlooked. Every new AI application expands attack surface.
Speculative/Emerging (10-20%)
Databricks: If they IPO, direct Snowflake competitor with strong data science positioning.
Confluent: Real-time data streaming for AI applications needing live data.
Emerging AI platforms: Companies solving specific bottlenecks (vector databases, feature stores, LLM operations)
What to Avoid
Traditional databases: Oracle, IBM DB2 losing relevance to cloud-native platforms
Legacy data warehouses: Teradata, Netezza replaced by Snowflake/Databricks
Companies claiming AI without traction: Check net dollar retention and consumption metrics
Too many small positions: The winner-takes-most dynamic means top 2-3 in each category will capture most value
Validation Checkpoints
Monitor these metrics quarterly:
- Hyperscaler CapEx trends: Is spending holding at $300B+? Any cuts signal demand concerns.
- Snowflake/MongoDB net retention: Should stay >120%. Declining retention suggests consumption slowing.
- Palantir commercial growth: U.S. commercial is the key metric. Government is stable but slower growth.
- AI application launches: Are major enterprises announcing AI deployments? This validates the infrastructure investment.
- Cloud pricing: Stable/rising prices = supply constrained (good). Price cuts = overcapacity (bad).
The Second-Order Effects
The cloud platform build-out creates investment opportunities beyond the obvious players:
Energy Infrastructure
Data centers are massive power consumers. AI training clusters can draw 100+ megawatts—equivalent to a small city.
Investment angle:
- Utilities with data center exposure
- Natural gas (primary fuel for new power generation)
- Nuclear (Microsoft signed 20-year deal for Three Mile Island power)
- Grid infrastructure companies
Networking Equipment
Moving data between GPUs, between servers, and between data centers requires massive networking capacity.
Players:
- Arista Networks: Data center networking leader
- Broadcom: High-speed interconnects
- Marvell: Optical networking components
Cooling Systems
Modern AI accelerators generate enormous heat. Traditional air cooling isn’t sufficient.
Trend: Liquid cooling becoming standard for AI clusters. This creates opportunities in specialized cooling equipment manufacturers.
Risks to Monitor
No investment thesis is complete without acknowledging what could go wrong:
1. Model Efficiency Breakthroughs
What if models can be trained with 10x less compute? This would dramatically reduce infrastructure demand.
Likelihood: Medium. Efficiency improvements happen but tend to be offset by demand for more capable models.
Mitigation: Focus on platforms with consumption economics—even if per-model costs drop, number of models will multiply.
2. Hyperscaler Vertical Integration
What if AWS, Azure, and GCP build their own versions of Snowflake, MongoDB, and Palantir?
Likelihood: High—they’re already doing this.
Mitigation: Best products still win. AWS offers dozens of database options, yet MongoDB thrives. Snowflake competes directly with Google BigQuery and Amazon Redshift but grows faster.
3. Open Source Alternatives
What if open-source tools replace paid platforms?
Likelihood: Medium. Always a risk in infrastructure software.
Mitigation: Enterprise buyers pay for reliability, support, and managed services. Open source handles technology risk, not operational risk.
4. Regulatory Constraints
AI regulation could slow deployment and reduce infrastructure spending.
Likelihood: Low for infrastructure spending, high for specific applications.
Mitigation: Infrastructure companies are picks-and-shovels. Even heavily regulated industries need data platforms.
5. Economic Downturn
Recession could force enterprises to cut spending.
Likelihood: Cyclical risk always exists.
Mitigation: AI infrastructure is strategic, not discretionary. Even in 2023 tech layoffs, hyperscaler CapEx stayed elevated.
Conclusion: Positioning for Platform Wars
The cloud platform wars will define the next decade of technology investing. Here’s what matters:
- Hyperscalers will spend $300B+ annually through 2030. This is the largest infrastructure build-out in history.
- Data infrastructure captures lasting value. While AI models commoditize, data platforms compound through consumption economics and switching costs.
- Winner-takes-most dynamics favor leaders. Snowflake, MongoDB, and Palantir are establishing dominant positions in their respective categories.
- Second-order effects create opportunities. Energy, networking, and cooling are under-appreciated beneficiaries.
- Valuation matters, but less than position. Yes, these companies trade at premium multiples. But in winner-takes-most markets, the winners earn their valuations over time.
The opportunity isn’t in predicting which specific AI application will win—it’s in owning the platforms where all AI applications must be built.
The platforms get built first. And platforms compound for decades.
This concludes the three-part AI Infrastructure series.
- The AI Infrastructure Supercycle – Why this build-out is different and where we are in the cycle
- The Semiconductor Value Chain – ASML’s monopoly, TSMC’s dominance, and the memory bottleneck
- The Cloud Platform Wars – Hyperscaler spending and data infrastructure winners
These articles provide a complete framework for understanding and investing in the AI infrastructure opportunity. The foundation is being built right now. The platforms are consolidating. The applications come last.
Position accordingly.
