Rate limits & budgets
How budgets reset, how rate limits are bucketed, and what happens when you exceed either.
Rate limits
Every key has a requests-per-minute (RPM) cap. The bucket is a rolling 60-second window. Going over the cap returns:
HTTP 429 Too Many RequestsFree keys cap at 30 RPM; pro keys can go up to 300 RPM. Set the cap lower than the plan max to tighten the blast radius if a key leaks.
Rate limits are enforced server-side by the upstream LiteLLM proxy. Local retries with exponential backoff are encouraged — TensorLoop does not queue rejected requests.
Budgets
Every key has a USD budget that resets every 30 days. The budget tracks spend across all calls made with that key, regardless of model.
When a key's spend reaches its budget:
- The next call returns
429 Budget exceeded. - The key shows up in the dashboard as Budget exhausted (red badge).
- You can either wait for the rolling window to reset, or mint a new key with a higher budget.
The 30-day window starts when the key is first minted, not on the first of the month.
Watching spend
The dashboard surfaces spend three ways:
- Per key — the Budget column in the API keys table shows spend / cap and a fill bar.
- Per day — the Analytics page shows a 30-day timeseries.
- Per call — the Activity page lists the most recent calls with token counts and cost.
Spend is denominated in USD and rounded to four decimals.
Plan ceilings vs key settings
A free user can mint a key with a $2 budget and 10 RPM — well below the plan max. The plan number is just an upper bound:
| Setting | Free max | Pro max |
|---|---|---|
| Budget per key | $5 | $100 |
| RPM per key | 30 | 300 |