Health Check Endpoint
The auth service exposes a health check atGET /health that probes both PostgreSQL and Redis:
Healthy response (200)
Degraded response (503)
When one or more dependencies are unreachable, the endpoint returns503 with the failing components:
failing array can contain "db" (PostgreSQL unreachable), "redis" (Redis unreachable), or both.
The health endpoint is unauthenticated (skipAuth: true) and does not count against rate limits.
Required Configuration
The auth service validates required environment variables at startup and exits immediately if any are missing. This fail-fast behavior prevents partial startup with a broken configuration.| Variable | Required | Validated at | Failure mode |
|---|---|---|---|
DATABASE_URL | Yes | Startup | Process exits with error |
REDIS_URL | Yes | Startup | Process exits with error |
RSA_PRIVATE_KEY | Yes* | Startup | Process exits unless AUTO_GENERATE_KEYS=true |
JWT_ISSUER | No | Startup | Defaults to https://grantex.dev |
AUTO_GENERATE_KEYS=true to auto-generate an ephemeral RSA keypair. Never use this in production.
Full environment variable reference
See the Self-Hosting guide for the complete list of environment variables including optional Stripe, FIDO, email, and policy engine settings.Startup sequence
The auth service boots in this order:- OpenTelemetry tracing — initialized first to hook module loading (only if
OTEL_EXPORTER_OTLP_ENDPOINTis set) - RSA key initialization — loads or generates the RSA keypair for JWT signing
- Ed25519 key initialization — optional, for DID/VC support
- Database connection — connects to PostgreSQL
- Migrations — runs all
*.sqlmigration files idempotently - Redis connection — connects to Redis with lazy connect
- Seed data — creates dev accounts if
SEED_API_KEYorSEED_SANDBOX_KEYare set - HTTP server — starts listening on
PORT(default 3001) - Background workers — webhook delivery worker and anomaly detection worker start polling
Graceful Shutdown
The auth service handlesSIGTERM signals for graceful shutdown. When running in Kubernetes or Docker, the container runtime sends SIGTERM before force-killing the process.
When OpenTelemetry tracing is enabled, the SIGTERM handler flushes pending trace spans before the process exits:
Kubernetes configuration
Set aterminationGracePeriodSeconds that gives the service enough time to finish in-flight requests:
Database Connection
The auth service uses postgres.js for PostgreSQL connections. The connection is lazily initialized on first use and reused for the lifetime of the process.Connection string format
sslmode=require (or verify-full for stricter validation) in production.
Connection pool behavior
postgres.js manages an internal connection pool. Default settings:| Setting | Default | Description |
|---|---|---|
| Max connections | 10 | Maximum concurrent connections |
| Idle timeout | 0 (no timeout) | Connections stay open until the process exits |
| Connect timeout | 30s | Time to wait for a new connection |
db/client.ts.
Monitoring connections
Check active connections via PostgreSQL:Redis Connection
The auth service uses ioredis with lazy connect. Redis stores:- Rate limit counters — sliding-window request counts per IP
- Ephemeral token metadata — in-flight authorization request state
Reconnection behavior
ioredis automatically reconnects when the Redis connection drops. Default behavior:- Retries with exponential backoff (starting at 50ms, capped at 2 seconds)
- No maximum retry count — reconnects indefinitely
- Queues commands during disconnection and replays them on reconnect
Data durability
Redis is not the source of truth. If Redis data is lost:- Rate limits reset — clients get fresh windows (no security impact beyond a brief burst)
- In-flight auth requests fail — users must restart the consent flow
- No permanent data is lost — PostgreSQL is the durable store
For high-availability deployments, use Redis Sentinel or Redis Cluster. ioredis supports both modes natively.
Webhook Delivery
The auth service delivers webhooks with automatic retry and exponential backoff. A background worker polls thewebhook_deliveries table every 30 seconds for pending deliveries.
Retry policy
| Attempt | Delay | Cumulative wait |
|---|---|---|
| 1st retry | 30 seconds | 30s |
| 2nd retry | 60 seconds | 1.5 min |
| 3rd retry | 120 seconds | 3.5 min |
| 4th retry | 240 seconds | 7.5 min |
| 5th retry | 480 seconds | 15.5 min |
30 * 2^attempt seconds. After 5 failed attempts (configurable via max_attempts in the deliveries table), the delivery is marked as failed.
Delivery mechanics
- Timeout: Each delivery attempt has a 10-second timeout
- Success: Any 2xx response marks the delivery as
delivered - Failure: Non-2xx responses or network errors trigger a retry
- Signature: Every payload includes an
X-Grantex-Signatureheader for HMAC verification - User-Agent: Requests are sent with
Grantex-Webhooks/0.1
Monitoring deliveries
Query delivery status for a webhook endpoint:pending, delivered, failed), attempt count, and error details.
Background Workers
Two background workers run after the HTTP server starts:| Worker | Interval | Purpose |
|---|---|---|
| Webhook delivery | 30 seconds | Processes pending webhook deliveries with exponential backoff |
| Anomaly detection | Configurable | Scans for unusual patterns (rate spikes, off-hours activity, high failure rates) |
setInterval timers and execute one initial run immediately on startup.
Worker health
Workers log errors to stdout but do not crash the process. If a worker iteration fails, it retries on the next interval. Monitor worker health by checking for[webhook-delivery] and [anomaly-detection] prefixed log messages.
Logging
All logs are emitted as structured JSON to stdout, compatible with:- Datadog — auto-parsed by the Datadog agent
- Grafana Loki — ingestible via Promtail
- AWS CloudWatch Logs — auto-parsed in JSON format
- Google Cloud Logging — structured log entries
Log levels
| Level | When |
|---|---|
info | Normal request lifecycle (start, complete) |
warn | Deprecated API usage, approaching rate limits |
error | Failed requests, worker errors, database connection issues |
fatal | Startup failures (missing config, migration errors) |
Database Migrations
Migrations run automatically on every startup. The auth service reads all*.sql files from the migrations/ directory and executes each one using idempotent DDL (CREATE TABLE IF NOT EXISTS, etc.).
To upgrade, restart the service. New migration files are applied automatically. There is no separate migration command or rollback mechanism — migrations are designed to be forward-only and non-destructive.
Operational Checklist
Health check is wired to your load balancer
All required environment variables are set
Database connection uses SSL (
sslmode=require)Redis is on a private network with authentication
Log forwarding is configured (Datadog, Loki, CloudWatch, etc.)
Prometheus metrics are scraped from
/metricsCPU and memory limits are set in your container orchestrator
Automated database backups are configured
Webhook endpoints are monitored for delivery failures
Alerting rules are set for error rate spikes and health check failures