When does on-demand usage make sense for a SaaS?

When each unit is large and visible — a coding-agent session, an AI rendering, a transcription job — and the customer can see the unit cost before deciding to consume it. It fails when units are invisible (per-token, per-autocomplete) or when the customer cannot moderate consumption.

Why did Cursor's pricing change cause backlash?

Pro users assumed 'unlimited within the plan' until the cutover to on-demand overage hit. The pricing page disclosed it; the product UI did not surface the quota or the per-unit cost in real time. The model was reasonable; the communication failed.

Is on-demand the same as overage?

Mathematically yes. Linguistically, 'on-demand' implies the customer's choice while 'overage' implies a billing penalty. The model that wins is overage clearly communicated; the framing matters less than the dashboard.

Should I default to on-demand or flat monthly?

Default to flat monthly with soft-cap-and-overage. It covers 90% of customers without surprise. Add a pure on-demand 'pay as you go' tier only when a meaningful number of heavy customers request it.

On-demand usage vs flat monthly: the Cursor model and when it fits

Q: What is 'on-demand usage' billing?

An overage model where the plan includes a quota, and consumption beyond the quota is billed per unit at a published rate. The 'on-demand' framing emphasizes the customer chose to consume beyond the included quota; the math matches standard overage.

The model swap was loud. A February 2026 r/Cursor / HN post (HN 46966879, 11 points, 6 comments) captured the moment: "Cursor switches pay-per-token when your plan limit ends. Calls 'On-Demand usage'." Users had paid for "Pro" and assumed unlimited. The new behavior was: hit the plan ceiling and the system flips to per-token billing. The pricing page had said so for weeks. The product had not communicated it. The internet noticed.

The Cursor case is the cleanest recent example of the trade-off makers face when AI cost is real. This article is the model and the decision framework.

What "on-demand usage" actually means

On-demand usage is overage with a different name. The plan includes a quota; beyond the quota, the customer pays per-unit at a published rate. The "on-demand" framing emphasizes that the customer chose to keep going past the quota, rather than the plan failing them. The math is identical to overage; the framing is more friendly.

It works when the customer is in control of unit consumption — they explicitly fired the action, they can see how much it cost, and the unit is large enough that they can decide whether to do it or not.

Where it fits

The clean cases:

Large discrete actions. One coding-agent run. One AI rendering. One transcription job. One bulk export. The customer pressed a button and got something worth paying for.
Power-user features. Premium features behind a paywall, where overage is the upgrade pathway rather than the trap.
Workloads with visible cost per unit. A "this will cost N tokens, continue?" confirmation that the customer can decline.

The cleanest model in this category right now is Replit's bundled approach: r/replit thread (2026-04, 7 points) walked through their integration where the platform fires usage events and a third-party billing service handles the meter. The customer sees the unit cost, presses go, and pays for what they used. No surprises.

Where it does not fit

The cases on-demand usage breaks:

Background work the customer did not initiate. Autocomplete suggestions firing on every keystroke. AI tooltips on every hover. Each one is "free" in user-attention but each one consumes tokens and a "you used 12,000 tokens this hour" surprise is a churn event.
Invisible units. "Tokens" mean nothing to most customers. If you cannot explain the unit in one sentence to a non-technical buyer, on-demand usage will produce angry support tickets.
Workloads where the customer cannot moderate. If the customer is using a feature for a job that has to be done (e.g., processing every uploaded document), they cannot self-throttle. They will use what they need and then complain about the bill.

The Cursor backlash, dissected

Cursor's specific failure was not the model. It was the communication. Users who paid for Pro had a mental model of "unlimited within the plan." The on-demand cutover broke that model without an in-product event that said "you have used your plan quota; the next session is N tokens at $X." The pricing page disclosed it; the product did not.

The fix for the model is the dashboard piece that everyone underrates: show the bar, show the per-unit cost, show the running total. When the customer can see what they are doing, the on-demand model feels fair. When the meter is invisible, the model feels like a trap.

Choosing between flat monthly and on-demand

The decision framework, in three questions:

Is the unit visible to the customer? If yes, on-demand can work. If no, stay flat monthly with a hard cap.
Is the unit large enough to be worth paying for individually? A coding-agent session, yes. A single autocomplete, no.
Can the customer moderate consumption? If they can choose to fire it or not, on-demand fits. If the consumption is dictated by their workflow, flat-monthly-with-soft-cap fits better.

The hybrid that often wins

Most successful maker products end up at the hybrid: flat-monthly base plan with a soft cap (the included quota), on-demand overage above the cap at a published per-unit rate, and a visible dashboard at all times. The customer who never exceeds the quota sees a normal flat-monthly experience. The customer who does is opt-in to overage with a clear unit and clear price.

This is the model UsageBox, Metronome, Flexprice, and the others are converging on. It is also what the better-communicated AI products (Anthropic's API, OpenAI's API, Replit) actually ship. Cursor went there too, just with worse messaging.

What I would actually do

Default to flat monthly with soft-cap-and-overage. It is the model most customers understand and most products fit into.
Add on-demand-only ("pay as you go") plans for the heavy customers who specifically request them. Power users will tell you when they want this model.
Never ship on-demand without the dashboard. The dashboard is the model; the meter alone is a trap.
Communicate the cutover loudly. Email when the customer hits 50% and 80% of quota. Email when overage begins. Email on the first overage charge. Three friction touches beat one surprised invoice.

The honest framing: on-demand usage is a model that works when the customer can see the unit. If the unit is invisible, the model is a billing-mistake generator. Decide on visibility first, then the model follows.