What is a soft cap with overage?

A pricing model where the plan includes a generous quota, and consumption beyond the quota is billed at a published per-unit rate. The customer is never blocked from using the product; they pay for what they use above the quota.

How much quota should I include in the base plan?

Look at the 90th percentile of usage among existing customers. Set the included quota at or just above that number. The goal is that 90% of customers never see overage; the top 10% pay for what they consume.

What overage rate should I charge?

Your underlying cost per unit plus the same margin multiple as your base plan. If a prompt costs $0.02 and your base plan is 5x cost, the overage rate is $0.10. Round to clean numbers ($0.05, $0.10) for communication.

Do I need to enforce caps server-side in my no-code app?

No. Fire the metered event from the no-code app's outbound webhook. The metering service counts events and computes overage. The no-code app shows the usage bar; the customer self-throttles.

Are no-code platforms shipping usage-based billing as a feature now?

Some are (Lovable's recent feature in one Reddit thread, Replit's billing integration, others coming). The plumbing is being commoditized. The pricing-model design — quota size, overage rate, what counts as one unit — still requires looking at your real usage distribution.

AI prompt cost control: the soft cap that saves your margin

The first AI feature ships with flat-monthly pricing because that is the model your customers already understand. It works for three months. Then the support ticket lands: "I noticed you charged me $9.99 last month and the API logs say one customer used $87 of Claude. What is happening?"

What is happening is the standard distribution of AI usage. A 2025-07 HN thread ("How do you handle charging users for AI usage?") captured the typical shape: a handful of customers run thirty to fifty prompts per day, the median runs three, and the bottom half barely uses the feature at all. Flat-monthly assumes the median; the heavies are what you actually pay for.

This article is the soft-cap pattern that lets you keep flat-monthly as the headline price while protecting the margin from the long tail.

Why hard caps are the wrong default

The instinct is to set a hard cap: "Plan includes 100 prompts per month, after that you cannot run more." It feels clean. It is, in practice, terrible product behavior:

The customer who tried your product on day 28 and loved it cannot use it on day 29.
The customer who is mid-task hits the wall and rage-quits before completing the action that would have made them upgrade.
Your support gets emails titled "your AI is broken" because nobody reads quota emails until they get cut off.

Hard caps are right in two cases: when each unit is expensive in absolute terms (transcription minutes, image generation) and unit overage feels like a surprise; or when you are running a free tier and the cap is the upgrade trigger by design. For paid plans on cheap-per-call AI usage, soft caps win.

The soft cap pattern, three pieces

Piece 1: a generous included quota. Generous enough that 90% of your customers never see the bar fill. Calibrate it by looking at the 90th percentile of usage in the last month. If your top 10% averages 80 prompts and the median is 8, set the quota at 100. Most customers will never reach it.

Piece 2: a published overage rate. After the quota, every additional unit is billed at a stated price (per prompt, per million tokens, whatever your meter is). The rate should cover your AI cost plus the margin you would expect on a base plan. If a prompt costs you $0.02, charge $0.05 in overage. Round numbers communicate better than precise costs.

Piece 3: the dashboard bar. The customer sees a usage bar that fills as they consume the quota. When it hits 80% they see a one-line nudge: "approaching included quota — overage from prompt 101 at $0.05 each." This is the entire interface; do not over-design it. Customers self-throttle when they can see the number.

Wiring the cap from a no-code app

Most no-code platforms have outbound webhooks. Every metered event (AI prompt fired) hits a webhook that posts to your metering service. The metering service knows the customer's quota, current consumption, and overage rate. It returns the customer's current state, which the no-code app uses to render the usage bar.

You do not need to enforce the cap server-side in the no-code platform itself. The platform fires the event, the meter counts it, and the customer's invoice picks up the overage. The dashboard bar is the only customer-facing wire.

For makers who do not want to host the meter, services like UsageBox wrap ingestion, aggregation, overage calculation, and the customer-facing dashboard as one piece. There are alternatives — see Frost AI ("stop losing money on AI agents", 2025-06) and AgentPulse (open-source observability for AI agents with cost tracking, 2026-02) which lean more toward observability than billing but cover the cost-attribution side.

What about the "Lovable did it in one prompt" story?

Newer no-code platforms are starting to bundle usage-based billing as a feature. Lovable's r/lovable post (2026-04, 19 points, 20 comments) claims "add usage-based billing to your Lovable app in one prompt." The thread that follows is honest about the failure modes: the prompt generates the meter ingestion endpoint, but the maker still has to design the quota, the overage rate, and the dashboard. The plumbing is "one prompt." The product decision is not.

That is the pattern across all of these tools. The plumbing is getting cheaper, fast. The pricing-model design (quota size, overage rate, what counts as one unit) is still the part that requires you to look at your real usage distribution and make a call.

The three things that go wrong

Setting the quota too tight. If 30% of your customers see the overage line on their first invoice, the quota is wrong. Raise it. The point is to monetize the heavies, not surprise the normals.
Hiding the overage rate. If the customer cannot tell from your pricing page that overage will be charged, the first invoice with overage is a churn event. Publish the number. Be boring about it.
Skipping the dashboard bar. Customers who cannot see their usage do not self-throttle. The bar pays for itself in support hours saved.

The honest framing: AI cost control is not a backend problem. It is a pricing-model problem with three pieces of plumbing. Get the model right and the plumbing is small. Skip the model and no amount of plumbing will save you.