Financially, the stack splits into two worlds.
At the
infrastructure and model layers (Compute & Hardware, Cloud & Capacity, Foundation Models), P&Ls are dominated by
capex and R&D. Chip makers and clouds spend huge amounts upfront on fabs, data centers and networks. Foundation-model teams add massive training runs and research salaries on top. These businesses need scale and time to break even: years of investment before utilization and volume are high enough for profits to compound. When they finally monetize, they do it by turning those fixed costs into high-margin variable revenue – selling chips, GPU hours or tokens.
Model Access & MLOps sits in between: less capex, more product and platform development. Revenue comes from platform fees and enterprise contracts. Break even can be faster than for pure infra or models, but profits still depend on volume flowing through the platform and on how well they can price orchestration and tooling around someone else’s models.
Higher up, in the
Applications and
Services & Integrators layers, the economics feel much closer to classic SaaS and consulting:
- Applications face familiar trade-offs: product & engineering, sales & marketing, support. The new ingredient is AI COGS – every API call to a model is a small variable cost driven by user behavior. Profitability depends on unit economics: can you price your product so that revenue per user comfortably covers AI COGS, other marginal costs and customer acquisition? Time to break even can be relatively short if you reach positive unit economics and find a repeatable distribution channel.
- Services & Integrators are mostly people businesses. Their main costs are salaries, their main profit lever is utilization. They can reach nominal profitability quickly once a team is billable, but scale is bounded by how many experts they can hire and keep busy.
If you read this stack from bottom to top, you can see the cost cascade: hardware and cloud capex → model training and inference costs → API pricing and platform fees → your AI cost per request and per user in the Application layer.
The rest of this article lives exactly at that last step. We’ll treat the lower layers as given and focus on a practical question: how to turn those token prices and usage patterns into a clear, realistic AI cost block in your own financial model.