> ## Documentation Index
> Fetch the complete documentation index at: https://docs.grantex.dev/llms.txt
> Use this file to discover all available pages before exploring further.

# Rate Limits

> Understand and work with Grantex API rate limits.

Grantex applies one-minute fixed-window limits. Fastify applies one pre-auth policy: the 5,000 requests/minute per-IP default on a route without an override, or that route's configured policy. Standard developer API-key requests also consume a Redis-backed per-developer plan budget. A standard-auth request must pass both its active Fastify policy and its plan policy, so the more restrictive applicable budget wins.

<Info>
  Plan-aware authenticated throughput is implemented in current repository source. A managed deployment must be verified separately; source completion is not evidence that a hosted rollout has occurred.
</Info>

## Default Limits

| Layer or endpoint                                          | Limit         | Key and behavior                                                                                           |
| ---------------------------------------------------------- | ------------- | ---------------------------------------------------------------------------------------------------------- |
| **Default Fastify IP policy** (routes without an override) | 5,000 req/min | Per source IP; public and authenticated traffic share this bucket on routes using the default              |
| **Standard API-key - Free plan**                           | 100 req/min   | Per developer, across routes and source IPs                                                                |
| **Standard API-key - Pro plan**                            | 500 req/min   | Per developer, across routes and source IPs                                                                |
| **Standard API-key - Enterprise plan**                     | 2,000 req/min | Per developer, across routes and source IPs                                                                |
| `POST /v1/authorize`                                       | 10 req/min    | Additional Redis-backed per-developer authorization bucket; the Fastify default and plan budget also apply |
| `POST /v1/token`                                           | 20 req/min    | Stricter — exchanges codes for tokens                                                                      |
| `POST /v1/token/refresh`                                   | 20 req/min    | Stricter — refreshes grant tokens                                                                          |
| `GET /.well-known/jwks.json`                               | **Exempt**    | Public key distribution is never throttled                                                                 |

A route-specific Fastify `config.rateLimit` replaces the 5,000 requests/minute Fastify default for that route; those two Fastify policies are not stacked. The Redis-backed standard developer plan budget remains an additional policy when standard API-key authentication applies.

Commerce, the SCIM Bearer data-plane routes under `/scim/v2/*`, admin, and other custom-auth routes remain outside the standard developer plan bucket. Their active Fastify policy still applies. The standard API-key-authenticated `/v1/scim/tokens` management routes do consume the plan bucket.

## Response Headers

Protected responses include rate-limit headers so your application can track the policy represented by that response. Successful standard developer API-key responses report the developer's plan budget; a request rejected earlier by the active Fastify default or route policy reports that policy instead. Treat the active Fastify policy and Redis plan policy as separate applicable controls, and always honor a `429` plus `Retry-After`.

| Header                  | Description                                               |
| ----------------------- | --------------------------------------------------------- |
| `X-RateLimit-Limit`     | Maximum requests allowed in the current window            |
| `X-RateLimit-Remaining` | Requests remaining in the current window                  |
| `X-RateLimit-Reset`     | Seconds until the current limiter's window resets         |
| `Retry-After`           | Seconds to wait before retrying (only on `429` responses) |

## 429 Error Response

When you exceed a rate limit, the API returns a `429 Too Many Requests` status with the following body:

```json theme={null}
{
  "message": "Plan rate limit exceeded, retry in 42 seconds",
  "code": "RATE_LIMIT_EXCEEDED",
  "requestId": "a1b2c3d4-e5f6-..."
}
```

The `Retry-After` header tells you the minimum number of seconds to wait. Generic IP/route-limit responses are produced by `@fastify/rate-limit` and can use a different error code/message; clients should branch on HTTP status `429` and honor the header rather than matching message text.

## Authenticated Limiter Availability

The standard developer API-key plan limiter fails closed when its Redis transaction cannot be completed:

```json theme={null}
{
  "message": "Authenticated rate limiting is temporarily unavailable",
  "code": "RATE_LIMIT_UNAVAILABLE",
  "requestId": "a1b2c3d4-e5f6-..."
}
```

This response uses `503 Service Unavailable`, not `429`. Retry it as a transient service failure; do not treat it as proof that the request reached the protected handler.

## Reading Rate Limits from SDKs

All three SDKs automatically parse rate limit headers from every response. You can read them via `client.lastRateLimit` (TypeScript/Python) or `client.LastRateLimit()` (Go).

### After a Successful Call

<CodeGroup>
  ```typescript TypeScript theme={null}
  import { Grantex } from '@grantex/sdk';

  const client = new Grantex({ apiKey: 'gx_...' });
  const agents = await client.agents.list();

  const rl = client.lastRateLimit;
  if (rl) {
    console.log(`${rl.remaining}/${rl.limit} requests left, resets at ${rl.reset}`);
  }
  ```

  ```python Python theme={null}
  from grantex import Grantex

  client = Grantex(api_key="gx_...")
  agents = client.agents.list()

  rl = client.last_rate_limit
  if rl:
      print(f"{rl.remaining}/{rl.limit} requests left, resets at {rl.reset}")
  ```

  ```go Go theme={null}
  client := grantex.NewClient("gx_...")
  agents, _ := client.Agents.List(ctx)

  if rl := client.LastRateLimit(); rl != nil {
      fmt.Printf("%d/%d requests left, resets at %d\n", rl.Remaining, rl.Limit, rl.Reset)
  }
  ```
</CodeGroup>

### Handling 429 Errors

When a `429` is returned, the error object includes rate limit info with the `retryAfter` value:

<CodeGroup>
  ```typescript TypeScript theme={null}
  import { GrantexApiError } from '@grantex/sdk';

  try {
    await client.tokens.verify(token);
  } catch (err) {
    if (err instanceof GrantexApiError && err.rateLimit?.retryAfter) {
      console.log(`Rate limited — retry in ${err.rateLimit.retryAfter}s`);
    }
  }
  ```

  ```python Python theme={null}
  from grantex import GrantexApiError

  try:
      client.tokens.verify(token)
  except GrantexApiError as err:
      if err.rate_limit and err.rate_limit.retry_after:
          print(f"Rate limited — retry in {err.rate_limit.retry_after}s")
  ```

  ```go Go theme={null}
  var apiErr *grantex.APIError
  if errors.As(err, &apiErr) && apiErr.RateLimit != nil && apiErr.RateLimit.RetryAfter > 0 {
      fmt.Printf("Rate limited — retry in %ds\n", apiErr.RateLimit.RetryAfter)
  }
  ```
</CodeGroup>

## Retry Strategy

Use exponential backoff with jitter to avoid thundering-herd problems when multiple clients hit the limit simultaneously.

<CodeGroup>
  ```typescript TypeScript theme={null}
  import { GrantexApiError } from '@grantex/sdk';

  async function withRetry<T>(fn: () => Promise<T>, maxRetries = 3): Promise<T> {
    for (let attempt = 0; attempt <= maxRetries; attempt++) {
      try {
        return await fn();
      } catch (err: unknown) {
        if (!(err instanceof GrantexApiError) || err.statusCode !== 429 || attempt === maxRetries) throw err;

        const base = (err.rateLimit?.retryAfter ?? 2 ** attempt) * 1000;
        const jitter = Math.random() * 500;
        await new Promise((r) => setTimeout(r, base + jitter));
      }
    }
    throw new Error('Unreachable');
  }
  ```

  ```python Python theme={null}
  import time
  import random
  from grantex import GrantexApiError

  def with_retry(fn, max_retries=3):
      for attempt in range(max_retries + 1):
          try:
              return fn()
          except GrantexApiError as err:
              if err.status_code != 429 or attempt == max_retries:
                  raise

              base = float(err.rate_limit.retry_after if err.rate_limit and err.rate_limit.retry_after else 2 ** attempt)
              jitter = random.uniform(0, 0.5)
              time.sleep(base + jitter)
  ```

  ```go Go theme={null}
  package main

  import (
  	"errors"
  	"math"
  	"math/rand"
  	"time"

  	grantex "github.com/mishrasanjeev/grantex-go"
  )

  func withRetry[T any](fn func() (T, error), maxRetries int) (T, error) {
  	var zero T
  	for attempt := 0; attempt <= maxRetries; attempt++ {
  		result, err := fn()
  		if err == nil {
  			return result, nil
  		}

  		var apiErr *grantex.APIError
  		if !errors.As(err, &apiErr) || apiErr.StatusCode != 429 || attempt == maxRetries {
  			return zero, err
  		}

  		base := math.Pow(2, float64(attempt))
  		if apiErr.RateLimit != nil && apiErr.RateLimit.RetryAfter > 0 {
  			base = float64(apiErr.RateLimit.RetryAfter)
  		}
  		jitter := rand.Float64() * 0.5
  		time.Sleep(time.Duration((base+jitter)*1000) * time.Millisecond)
  	}
  	return zero, errors.New("unreachable")
  }
  ```
</CodeGroup>

## Best Practices

<Note>
  The JWKS endpoint (`/.well-known/jwks.json`) is exempt from rate limits. Local
  JWKS-based verification avoids the `POST /v1/tokens/verify` rate limit when a
  revocation lookup is not required, but current standalone helpers still make a
  JWKS network request per call.
</Note>

* **Cache tokens** — Grant tokens are valid JWTs. Store and reuse them until they expire instead of requesting new ones per operation.
* **Use local verification deliberately** — `verifyGrantToken()` validates with
  JWKS and avoids the online verification rate limit, but the standalone helper
  fetches JWKS on each call.
* **Use webhooks instead of polling** — Subscribe to [webhook events](/guides/webhooks) like `grant.created` and `grant.revoked` rather than polling grant or audit endpoints.
* **Honor `Retry-After`** — When you receive a `429`, always use the `Retry-After` header value as your minimum wait time.
* **Spread requests** — If your system makes burst requests (e.g., batch token exchanges), add short delays between calls.

## Self-Hosted Deployments

If you're running the Grantex auth service yourself, rate limits are configurable. The default Fastify IP policy is set in `apps/auth-service/src/server.ts`; a route-specific Fastify policy lives beside its route and replaces that default on the route. Standard developer API-key budgets and their window are defined in `apps/auth-service/src/plugins/dynamicRateLimit.ts` and remain additional. All instances must share Redis for standard developer budgets to remain global across the deployment. Fastify IP counters are process-local unless you configure a shared store or equivalent ingress enforcement.

See the [Self-Hosting guide](/guides/self-hosting) for deployment instructions.