Documentation Index
Fetch the complete documentation index at: https://docs.grantex.dev/llms.txt
Use this file to discover all available pages before exploring further.
Effective monitoring ensures your Grantex deployment is healthy, performant, and secure. This guide covers the native Prometheus metrics endpoint, Grafana dashboard templates, alert thresholds, and logging best practices.
Prometheus Metrics Endpoint
The auth service exposes a GET /metrics endpoint in Prometheus exposition format:
curl https://your-auth-service/metrics
This endpoint is unauthenticated (no API key required) and rate-limited to 10 requests/minute per IP.
Counters
| Metric | Labels | Description |
|---|
grantex_token_exchange_total | status | Token exchange attempts |
grantex_authorize_total | status | Authorization requests |
grantex_grants_revoked_total | — | Grants revoked (including cascade) |
grantex_webhook_deliveries_total | status | Webhook delivery outcomes |
grantex_anomalies_detected_total | type, severity | Anomalies detected |
Histograms
| Metric | Labels | Description |
|---|
grantex_authorize_duration_seconds | — | Authorization request duration |
grantex_token_exchange_duration_seconds | — | Token exchange duration |
grantex_http_request_duration_seconds | method, route, status_code | HTTP request duration (all routes) |
Gauges
| Metric | Description |
|---|
grantex_active_grants | Current active grants count |
grantex_anomalies_unacknowledged | Unacknowledged anomalies |
Environment Variables
| Variable | Default | Description |
|---|
METRICS_ENABLED | true | Set to false to disable metrics collection |
Grafana Dashboards
Pre-built Grafana dashboards are available at deploy/grafana/:
| Dashboard | Description |
|---|
overview-dashboard.json | Token exchange rate, success rate gauge, latency p50/p99, grants revoked, active grants, webhook deliveries, anomalies, HTTP error rate |
per-agent-dashboard.json | Per-agent drill-down with a $agent_id template variable |
Import Instructions
- In Grafana, go to Dashboards > Import
- Upload the JSON file or paste its contents
- Select your Prometheus data source when prompted (
${DS_PROMETHEUS})
- Click Import
Health Check Endpoint
The auth service exposes a GET /health endpoint that returns the service status:
curl https://your-auth-service/health
Use this endpoint for:
- Load balancer health checks — poll
/health every 10–30 seconds
- Uptime monitoring — UptimeRobot, Pingdom, Cloud Monitoring
- Kubernetes liveness probes —
livenessProbe.httpGet.path: /health
Alerting Thresholds
Recommended thresholds for production alerting:
| Metric | Warning | Critical | Action |
|---|
| Token exchange failure rate | > 5% | > 15% | Check auth service logs |
| Token refresh failure rate | > 5% | > 15% | Check for refresh token reuse or clock skew |
| Anomalies detected | > 5/hour | > 10/hour | Review anomaly details |
| Webhook delivery success | < 98% | < 95% | Verify endpoint availability |
| 429 rate | > 50/min | > 200/min | Client misconfiguration or abuse |
| Auth request latency (p99) | > 500ms | > 2s | Database or Redis performance issue |
| Health check failures | 1 consecutive | 3 consecutive | Service restart |
Alertmanager Rules
groups:
- name: grantex
rules:
- alert: HighTokenExchangeFailureRate
expr: |
sum(rate(grantex_token_exchange_total{status!="success"}[5m]))
/ sum(rate(grantex_token_exchange_total[5m])) > 0.05
for: 5m
labels:
severity: warning
annotations:
summary: "Token exchange failure rate > 5%"
- alert: HighAuthLatency
expr: |
histogram_quantile(0.99, rate(grantex_authorize_duration_seconds_bucket[5m])) > 2
for: 5m
labels:
severity: warning
annotations:
summary: "Authorization p99 latency > 2s"
- alert: WebhookDeliveryFailure
expr: |
sum(rate(grantex_webhook_deliveries_total{status="failed"}[5m]))
/ sum(rate(grantex_webhook_deliveries_total[5m])) > 0.05
for: 5m
labels:
severity: warning
annotations:
summary: "Webhook delivery failure rate > 5%"
Logging
Structured Logging
The auth service uses Pino for JSON-structured logging:
{
"level": "info",
"msg": "grant.created",
"timestamp": "2026-03-01T12:00:00.000Z",
"grantId": "grnt_abc123",
"agentId": "ag_def456",
"principalId": "user_789",
"scopes": ["calendar:read", "email:send"],
"latencyMs": 45
}
What to Log
| Event | Log Level | Key Fields |
|---|
| Grant created | info | grantId, agentId, principalId, scopes |
| Grant revoked | info | grantId, revokedBy, cascadeCount |
| Token exchanged | info | grantId, agentId |
| Token refreshed | info | grantId, agentId |
| Token verification failed | warn | reason, tokenId |
| Auth request denied | warn | agentId, principalId, reason |
| Rate limit hit | warn | ip, endpoint, retryAfter |
| Anomaly detected | warn | type, severity, agentId |
| Webhook delivery failed | error | webhookId, url, statusCode, attempt |
| Database connection error | error | error, pool |
Webhook-Based Monitoring
Subscribe to webhook events for real-time alerting without polling:
import { Grantex } from '@grantex/sdk';
const grantex = new Grantex({ apiKey: process.env.GRANTEX_API_KEY! });
await grantex.webhooks.create({
url: 'https://your-app.com/webhooks/grantex-alerts',
events: ['grant.revoked', 'token.issued'],
secret: process.env.WEBHOOK_SECRET!,
});