The model swap was loud. A February 2026 r/Cursor / HN post (HN 46966879, 11 points, 6 comments) captured the moment: "Cursor switches pay-per-token when your plan limit ends. Calls 'On-Demand usage'." Users had paid for "Pro" and assumed unlimited. The new behavior was: hit the plan ceiling and the system flips to per-token billing. The pricing page had said so for weeks. The product had not communicated it. The internet noticed.

The Cursor case is the cleanest recent example of the trade-off makers face when AI cost is real. This article is the model and the decision framework.

What "on-demand usage" actually means

On-demand usage is overage with a different name. The plan includes a quota; beyond the quota, the customer pays per-unit at a published rate. The "on-demand" framing emphasizes that the customer chose to keep going past the quota, rather than the plan failing them. The math is identical to overage; the framing is more friendly.

It works when the customer is in control of unit consumption — they explicitly fired the action, they can see how much it cost, and the unit is large enough that they can decide whether to do it or not.

Where it fits

The clean cases:

The cleanest model in this category right now is Replit's bundled approach: r/replit thread (2026-04, 7 points) walked through their integration where the platform fires usage events and a third-party billing service handles the meter. The customer sees the unit cost, presses go, and pays for what they used. No surprises.

Where it does not fit

The cases on-demand usage breaks:

The Cursor backlash, dissected

Cursor's specific failure was not the model. It was the communication. Users who paid for Pro had a mental model of "unlimited within the plan." The on-demand cutover broke that model without an in-product event that said "you have used your plan quota; the next session is N tokens at $X." The pricing page disclosed it; the product did not.

The fix for the model is the dashboard piece that everyone underrates: show the bar, show the per-unit cost, show the running total. When the customer can see what they are doing, the on-demand model feels fair. When the meter is invisible, the model feels like a trap.

Choosing between flat monthly and on-demand

The decision framework, in three questions:

  1. Is the unit visible to the customer? If yes, on-demand can work. If no, stay flat monthly with a hard cap.
  2. Is the unit large enough to be worth paying for individually? A coding-agent session, yes. A single autocomplete, no.
  3. Can the customer moderate consumption? If they can choose to fire it or not, on-demand fits. If the consumption is dictated by their workflow, flat-monthly-with-soft-cap fits better.

The hybrid that often wins

Most successful maker products end up at the hybrid: flat-monthly base plan with a soft cap (the included quota), on-demand overage above the cap at a published per-unit rate, and a visible dashboard at all times. The customer who never exceeds the quota sees a normal flat-monthly experience. The customer who does is opt-in to overage with a clear unit and clear price.

This is the model UsageBox, Metronome, Flexprice, and the others are converging on. It is also what the better-communicated AI products (Anthropic's API, OpenAI's API, Replit) actually ship. Cursor went there too, just with worse messaging.

What I would actually do

  1. Default to flat monthly with soft-cap-and-overage. It is the model most customers understand and most products fit into.
  2. Add on-demand-only ("pay as you go") plans for the heavy customers who specifically request them. Power users will tell you when they want this model.
  3. Never ship on-demand without the dashboard. The dashboard is the model; the meter alone is a trap.
  4. Communicate the cutover loudly. Email when the customer hits 50% and 80% of quota. Email when overage begins. Email on the first overage charge. Three friction touches beat one surprised invoice.

The honest framing: on-demand usage is a model that works when the customer can see the unit. If the unit is invisible, the model is a billing-mistake generator. Decide on visibility first, then the model follows.