Manage cookies
We use cookies to provide the best site experience.
Manage cookies
Cookie Settings
Cookies necessary for the correct operation of the site are always enabled.
Other cookies are configurable.
Essential cookies
Always On. These cookies are essential so that you can use the website and use its functions. They cannot be turned off. They're set in response to requests made by you, such as setting your privacy preferences, logging in or filling in forms.
Analytics cookies
Disabled
These cookies collect information to help us understand how our Websites are being used or how effective our marketing campaigns are, or to help us customise our Websites for you. See a list of the analytics cookies we use here.
Advertising cookies
Disabled
These cookies provide advertising companies with information about your online activity to help them deliver more relevant online advertising to you or to limit how many times you see an ad. This information may be shared with other advertising companies. See a list of the advertising cookies we use here.

How to Calculate AI Costs for Your Product: From GenAI Value Chain to Unit Economics

Most AI startups don’t train models — they buy intelligence as an API. This article explains what that means for your economics: how token pricing turns into COGS, how to model cost per request and per subscriber, and how to design free plans, limits and pricing that make sense. A practical guide to building sustainable GenAI apps with healthy margins.
When founders talk about AI, they usually talk about capabilities. When investors assess AI products, they look at economics. And when those two perspectives don’t meet, companies end up with magical demos and broken margins.

This article explains the economic architecture behind GenAI products — from the industry value chain to the cost of a single AI request — and shows how to use that understanding to design pricing, limits and tiers that actually work.
It builds on my previous piece about the GenAI value chain, but goes deeper on the part that most founders feel every day: AI COGS and unit economics in the application layer.

Below is the full structure — you can navigate by section:

  1. GenAI Value Chain and AI Costs
  2. The Application Layer — Where Most Startups Actually Live
  3. From Tokens to COGS: Modeling AI API Costs in an AI-Enabled App
  4. What These Numbers Change in Your Financial Model
  5. Ready-to-Use Financial Model Templates
  6. FAQ: AI COGS, Token Costs, and GenAI Unit Economics

GenAI Value Chain and AI Costs

Generative AI looks chaotic from the outside: new models, new GPUs, new copilots every month. But if you zoom out, it’s a very structured stack. Each layer has its own P&L logic: where money is invested, where it’s earned, and how long it takes to see real profit. What is a cost center in one layer becomes revenue in the next.

In my Generative AI value chain explanation article I use a six-layer structure. Here we’ll look at the same stack, but through a financial lens and with one clear goal: to understand what eventually becomes your AI COGS in the Application layer.

GenAI Stack at a Glance

Insight for founders #1: Where you sit in the GenAI value chain
The GenAI stack runs from Compute & Hardware → Cloud & Capacity → Foundation Models → Model Access & MLOps → Applications → Services & Integrators. As a startup building an app or SaaS product, you are in the Application layer and you buy intelligence from the layers below, usually paying for APIs.
Starting from the bottom:

  • Compute & Hardware – chip makers and networking vendors (GPUs, TPUs, high-speed interconnects). They provide the physical capacity that makes training and inference possible.
  • Cloud & Capacity – hyperscalers and specialized GPU clouds renting that hardware as elastic GPU/TPU fleets. They turn capex-heavy hardware into on-demand infrastructure.
  • Foundation Models – companies that train general and domain models for text, code, images, video and more. They sit on top of cloud capacity and sell access to their models.
  • Model Access & MLOps – platforms that host, route, evaluate and monitor models: from a product’s perspective, this is the “switchboard” that connects your app to one or many models.
  • Applications – the products most end users see: chat assistants, copilots inside tools, vertical AI apps. This is where product design, UX and domain logic live.
  • Services & Integrators – consulting and implementation partners who roll out these solutions inside companies, adapt workflows and handle change management.

Running alongside all of this is Data Supply – the datasets, logs and domain data that feed models and apps. It doesn’t form a separate “floor” in the stack; instead, it seeps into several of them, from training to customer-specific fine-tunes.

By the time you stand in the Application layer as a founder, all of the lower layers have already compressed their economics into a few numbers you can actually see: which model you call, what you pay per token, and how quickly you can scale usage up or down.

How P&L Looks Across the GenAI Stack

Insight for founders #2: How other layers become your AI COGS
Infrastructure and foundation-model players carry huge capex and R&D and recover it through pricing per tokens. By the time it reaches you, all of that becomes just one line in your model: AI API costs. Your job in the application layer is to turn those token costs, plus distribution, into healthy unit economics per user.
Financially, the stack splits into two worlds.

At the infrastructure and model layers (Compute & Hardware, Cloud & Capacity, Foundation Models), P&Ls are dominated by capex and R&D. Chip makers and clouds spend huge amounts upfront on fabs, data centers and networks. Foundation-model teams add massive training runs and research salaries on top. These businesses need scale and time to break even: years of investment before utilization and volume are high enough for profits to compound. When they finally monetize, they do it by turning those fixed costs into high-margin variable revenue – selling chips, GPU hours or tokens.

Model Access & MLOps sits in between: less capex, more product and platform development. Revenue comes from platform fees and enterprise contracts. Break even can be faster than for pure infra or models, but profits still depend on volume flowing through the platform and on how well they can price orchestration and tooling around someone else’s models.

Higher up, in the Applications and Services & Integrators layers, the economics feel much closer to classic SaaS and consulting:

  • Applications face familiar trade-offs: product & engineering, sales & marketing, support. The new ingredient is AI COGS – every API call to a model is a small variable cost driven by user behavior. Profitability depends on unit economics: can you price your product so that revenue per user comfortably covers AI COGS, other marginal costs and customer acquisition? Time to break even can be relatively short if you reach positive unit economics and find a repeatable distribution channel.

  • Services & Integrators are mostly people businesses. Their main costs are salaries, their main profit lever is utilization. They can reach nominal profitability quickly once a team is billable, but scale is bounded by how many experts they can hire and keep busy.

If you read this stack from bottom to top, you can see the cost cascade: hardware and cloud capex → model training and inference costs → API pricing and platform fees → your AI cost per request and per user in the Application layer.

The rest of this article lives exactly at that last step. We’ll treat the lower layers as given and focus on a practical question: how to turn those token prices and usage patterns into a clear, realistic AI cost block in your own financial model.

The Application Layer — Where Most Startups Actually Live

Insight for founders #3: Most “AI startups” are application companies
The vast majority of new “AI startups” sit in the Application layer. They don’t train their own foundation models – they assemble workflows and products on top of APIs from model providers and model-access platforms. Your real leverage is not in the model weights, but in how well you solve a specific problem for a specific user.
After looking at the full stack, it’s time to stand where most founders actually operate: the Application layer. This is where abstract “models and tokens” turn into concrete products, and where API price per million tokens becomes a line in your P&L.

Typical Products and Business Models in the GenAI Apps Layer

If you map the current wave of AI products onto the Applications layer, a pattern appears quickly.

There are horizontal productivity tools with AI assistants: write this email, summarize this meeting, draft this document. There are vertical copilots that live in specific workflows: for lawyers, doctors, teachers, marketers, engineers. There are AI modes inside existing SaaS: the “magic wand” button in a CRM, helpdesk, HR system or analytics tool.

On the revenue side, nothing exotic happens. Application-layer companies borrow familiar models:

  • subscriptions – monthly or annual plans with AI features baked in;
  • usage-based pricing – paying for seats plus volume of AI actions;
  • freemium – AI is limited or branded on the free plan and unlocked on paid tiers;
  • enterprise licenses when the product becomes part of a critical internal workflow.

This is good news for financial modeling: the top of the P&L still looks like standard SaaS or apps. Revenue logic is familiar. The new complexity hides lower down, where “AI button” quietly changes your variable costs.

How App-Layer AI Startups Add Value Using External Models

Insight for founders #4: Risks and advantages in the Application layer
If your product is just a thin wrapper over a popular API, you’re easy to copy by model providers and big SaaS. Real advantages in the Application layer come from non-public or hard-to-replicate data, owning a critical workflow, and trusted access to a specific audience – not from the model itself.
Once foundation models are available via API, it’s easy to underestimate the Application layer: “it’s just a wrapper over OpenAI or Anthropic.” In reality, the successful products in this layer do three things that models alone don’t do: they own data, they own workflow, and they own trust.

Take Cursor, an AI-native code editor. Cursor doesn’t pretend to have a secret foundation model; it openly routes work to multiple providers and models. The raw capability – understanding and generating code – comes from the underlying models.

The value Cursor adds is in how that capability is applied to a specific developer and a specific codebase: indexing repositories, understanding project structure, inserting suggestions exactly where you’re working, and orchestrating several models depending on the task. Over time, Cursor also accumulates context and interaction data that generic model don’t see.

Now look at Siemens Industrial Copilot. Under the hood it uses foundation models delivered via a cloud platform, but the product itself lives deep inside industrial workflows. It connects to engineering tools, automation projects and plant configurations. It sees PLC code, diagrams, machine logs – data that is not public on the internet and not trivial to access from the outside. Siemens brings domain constraints, safety requirements and decades of process knowledge to shape what the copilot is even allowed to suggest.

These two examples show the same pattern in different worlds:
  • the foundation model provides a general reasoning and generation engine,
  • the application provides specific data, specific context, and specific decisions.

Cost Structure of an AI-Enabled App or SaaS Product

On paper, the P&L of an AI-enabled application looks a lot like the P&L of a classic SaaS product: revenue at the top, cost of goods sold, gross margin, operating expenses. But one new line inside COGS changes how sensitive that structure becomes to user behavior.

In a non-AI SaaS, COGS usually includes:
  • cloud hosting and storage,
  • third-party services (e.g. email delivery, analytics, payment processing),
  • sometimes support or onboarding costs if you classify them there.

In an AI-enabled product, you add AI API usage:
  • each time a user runs an AI action, you send a request,
  • each request consumes input and output tokens,
  • tokens are billed at a price per million by the model or platform provider.

Everything else in the P&L stays familiar: you still have to pay engineers, market the product, support customers, and keep an eye on CAC and LTV. The goal of explicitly modeling AI COGS is not to complicate your life; it’s to give you a steering wheel.

Once you can see how many dollars of AI cost you incur per active user and per plan, you can design sensible free tiers, choose model classes, and set pricing with a clear view of gross margin.

From Tokens to COGS: Modeling AI API Costs in an AI-Enabled App

Insight for founders #5: Most of you are buying models, not building them
Training your own foundation model is a capex-heavy, R&D-heavy game. Most application-layer startups will never do it – and don’t need to. The realistic scenario is: you buy intelligence as an API and your main financial question is not “how much does training cost?”, but “what does it cost me when users actually click the AI button?”
In my financial model templates for startups I use a compact block inside COGS – the AI API Costs table on the screenshot. It turns a few assumptions into two key numbers:
  • cost per request,
  • average cost per paying subscriber using AI.
AI API cost calculation block from a financial model template showing assumptions such as % of users using AI, request caps, tokens per request, token pricing, and derived cost per request and cost per AI-active subscriber.
AI API Costs block: token usage, pricing, and cost per request in financial model template

Step 1 – Who Uses AI: Free vs Paying Users

First, we distinguish who can even generate AI costs:

  • % of Free MAU Using AI Features – among all monthly active free users, how many actually try AI?
  • % of Paying Subscribers Using AI Features – among all paying users, how many use AI in a typical month?

Reality is always uneven: maybe 10–20% of free MAU touch AI, while 60–80% of paying users do. These percentages translate total users into active AI users in each segment.

Step 2 – How Often They Use It: Caps and Average Requests

Next, we describe how intensely those AI users behave:

  • Monthly Request Cap per Free User – a hard ceiling for the free plan. This is your decision about how much AI usage you’re willing to sponsor as marketing.
  • Avg Monthly Requests per Paying AI User – how many AI actions an average paying user who does use AI runs per month.

Together with adoption, this tells you how many AI calls you’re budgeting for:
  • Free: “up to cap” per AI-active free user,
  • Paid: “avg requests” per AI-active paying subscriber.

Step 3 – How “Heavy” Each Request Is: Tokens per Request

Then we move from “how many requests” to “how big each request is”.
Two rows define the weight of an average AI action in your product:

  • Avg Input Tokens per Request – prompt, instructions, and context,
  • Avg Output Tokens per Request – average reply length from the model.

Let's say a typical AI request ≈ 200 input + 300 output = 500 tokens.
If you start sending more history, more context, or generating longer answers, this number grows.

Step 4 – Supplier Pricing: Price per 1M Input and Output Tokens

Now we bring in the part you don’t control: what your provider charges.
Two more lines in the table come directly from the pricing page:

  • Price per 1M Input Tokens
  • Price per 1M Output Tokens

All the capex, infra and R&D from the lower layers of the stack are hidden in these two numbers. For your model, they’re just multipliers that turn tokens into dollars.

Step 5 – What It Means for Unit Economics

With these assumptions, the last rows of the block calculate:

  • Cost per Request,
  • Avg Cost per Paying Subscriber Using AI.

Using the example from the screenshot:
  • 200 input tokens at $2 per 1M → $0.0004 per request,
  • 300 output tokens at $10 per 1M → $0.0030 per request.
So, one average AI action costs you about $0.0034 – a bit more than a third of a cent.

Then we multiply by behavior for paying users:

  • Avg Monthly Requests per Paying AI User = 100,
  • Cost per Request = $0.0034.
A typical paying subscriber who actively uses AI costs you about 34 cents per month in AI costs.

Now connect this to pricing.

Suppose:
  • your subscription price is $20/month,
  • your non-AI COGS (hosting, payment fees, other APIs) is $2/user,
  • you target 80% gross margin, which means you can “afford” up to $4 of total COGS per user.

In this base scenario:
  • non-AI COGS: $2.00
  • AI COGS: $0.34
  • Total COGS per active AI user: $2.34

Your gross margin is still very healthy: roughly 88% on that plan. AI isn’t a problem; it’s well inside your margin envelope.

But now change the assumptions:
  • prompts get longer and you send more context → 600 input + 1,000 output tokens per request,
  • you make the assistant more proactive and users send 200 requests per month instead of 100.

  • If we plug these into the same formula, cost per paying AI user per month grow to roughly $2.24. Suddenly your total COGS is $4.24 now.
At a $20 price point, your gross margin drops towards 79%, before you’ve changed anything in salaries or marketing – just because product decisions silently pushed up tokens and requests.

AI feels cheap at the request level, but it lives in your unit economics. A few extra tokens, a more generous free plan, or a more “chatty” assistant can move your margin by several percentage points.

What These Numbers Change in Your Financial Model

Once you know your cost per request and average AI COGS per paying user, you’re no longer guessing. You can see, in money, what “more AI” means for your margin. The next question is practical: how do you turn this into a free plan, tiers and limits that make sense for both users and your P&L?

Designing Free Plans and AI Limits with Real Numbers

A healthy free plan with AI usually does three things at once:

  • lets users feel the product “for real”,
  • makes your AI bill predictable,
  • nudges serious usage into paid tiers.

You see this pattern in tools like Canva, where free users get a small monthly allowance of AI-powered features, while Pro/Teams users get a much higher limit that resets each month. GitHub Copilot’s free tier works similarly: limited monthly chat requests and completions, enough to taste the product, but not enough to use it as your coding assistant.

If you know your own cost per request, you can design the same way instead of copying numbers blindly.

  • decide what you’re willing to spend on AI per free user per month,
  • translate that into a request cap using your cost-per-request figure.

If one request costs you $0.003–0.004, a cap of 10–20 requests per month per free user keeps you safely in that range even if adoption among free users is high. That’s exactly the logic behind the “Monthly request cap per free user” row in your model: it’s not an arbitrary friction point, it’s a budget.

With a cost-per-request number in the model, you can tune that balance: tighten caps if you see free-only AI usage exploding, or relax them if conversion is weak and AI COGS are still tiny.

Pricing and Tiering When AI Is the Main Variable Cost

Insight for founders #6: Tiers should reflect AI cost
If AI is your main variable cost, plan tiers from the margin backwards. For each plan, estimate typical AI usage, compute AI COGS per active user, and only then decide what “unlimited”, “Pro” or “Business” really mean.
On paid plans, the question is no longer “how much AI can we give for free?”, but “how do we package AI so that heavy usage sits in the right place and doesn’t kill our margin?” In practice, most products end up with some mix of three simple ideas.

AI is limited on cheaper plans and fully unlocked higher up:
Basic tiers get a small allowance or only some AI features. Full, frequent use lives in Pro / Business plans where ARPU is higher.

AI-rich features are clearly tied to more expensive plans or add-ons:
Instead of “AI everywhere for everyone”, you group the heaviest or most valuable AI use cases into premium tiers or an AI add-on. The add-on price should comfortably cover expected AI COGS per active user plus margin.

Very expensive operations are explicitly metered:
Things like bulk processing, long-context analysis, image/video generation or agents can come with separate “credits” or soft limits, even on high tiers.

The mechanics behind all of this are straightforward:

  1. For each plan, estimate how an average AI-active user behaves (requests, tokens).
  2. Calculate AI COGS per AI-active user on that plan.
  3. Check: does the plan price still leave enough room for other COGS and your target gross margin?

Investor Story for GenAI App

Investors need to see that you understand how usage of AI and pricing interact in your product. Here, you don’t have to show them the whole calculation block. It’s enough to be able to say things like:

“On our main plan, a typical AI-active user costs us about X per month in tokens, which is Y% of the plan price. We design tiers so that AI stays under Z% of revenue across scenarios.”

“Our free plan includes up to N AI actions per month per user. Even at 100% adoption, that’s a maximum of $… in AI COGS at current prices, which we treat as acquisition spend.”

“If we move to a more expensive model or see a 2x increase in usage, gross margin compresses by 5–7 p.p.; here’s how we’d respond (pricing, limits, model mix).”

That combination – a clear story on value in the application layer plus a sober view of AI as a variable cost line you can steer – is what makes a GenAI product look less like a toy and more like a business.

Ready-to-Use Financial Model Templates

If you want to use this logic in practice, you don’t need to start from scratch.
My financial model templates for startups already include:

  • a clean AI API Cost block aligned with the structure explained above,
  • dynamic COGS, margin and pricing calculations,
  • and ready-made assumptions you can adapt for your product.

They’re designed for founders who want clear numbers without spending weeks assembling a model.
FAQ: AI COGS, Token Costs, and GenAI Unit Economics
You only need a few assumptions:
  1. how many users actually use AI,
  2. how many requests they make,
  3. how many tokens each request consumes,
  4. your provider’s input/output token prices.
  5. Multiply tokens × price and you get cost per request and cost per AI-active user. These are the core of your AI COGS.