Production architecture

This page covers two audiences. SaaS customers integrating against a hosted Ledgix Vault only need the first section. Enterprise customers self-hosting Vault should read all three.

Integration requirements (SaaS customers)

Most customers do not need Ledgix hosting details. You do need a clean picture of what your own application must have before it can integrate safely.

01
A Vault URL per environment
Use the correct Ledgix URL for development, staging, and production. Do not mix API keys across environments.
02
A tenant API key
Create the key in the customer dashboard and store it in your own secret-management flow. Ledgix shows the raw key only once.
03
Outbound HTTPS access
Your application, worker, or server-side route must be able to reach the Vault URL over TLS.
04
One protected tool boundary
Decide whether your first rollout will call the protected service directly after approval or send the token to an optional gateway in front of that service.

Recommended integration shape

Customer integration pattern

Step 01

Application or worker

Your backend, API route, or job worker makes the Ledgix request before the sensitive action happens.

Step 02

Ledgix Vault URL

The public Ledgix endpoint receives the request and returns the approval decision your code must honor.

Step 03

Protected tool or gateway

Your system performs the real payment, refund, or admin action only after the Ledgix response is approved.

Customer responsibilities

store LEDGIX_VAULT_API_KEY securely in your own environment
keep LEDGIX_VAULT_URL aligned with the environment you are deploying
make sure server-side code, workers, and background jobs use the same customer settings
decide who owns manual review and notification routing before going live

Optional gateway decision

You can start without a gateway if the protected action is simple and your application owns the full execution path. Add a gateway when:

more than one service can trigger the same sensitive action
you want the protected service to require a Ledgix token centrally
you need a clean boundary between approval and execution

Go-live checklist

The application can reach the Vault URL from every runtime that performs the protected action.
The correct API key is present in that runtime.
A policy source has already been uploaded for the first tool you are guarding.
Reviewers know where pending requests appear and how Slack or email notifications are routed.
The team has tested at least one approved and one blocked or paused request before production traffic.

Common integration failures

Wrong Vault URL for the environment.
Expired, deleted, or misplaced API key.
Policy content uploaded after the team started testing, which makes early results look inconsistent.
Threshold and notification settings left at defaults without deciding who handles review.

Runtime architecture (self-hosted enterprise)

A Ledgix production deployment is two services, one TLS ingress, and a backing data plane.

Vault + Judge production topology

Step 01

Caddy TLS ingress

Terminates TLS and routes customer traffic to Vault. Shipped in deployments/docker-compose.yml with an automatic Let's Encrypt flow.

Step 02

Vault (Go)

Signs A-JWTs with Ed25519, serves JWKS, maintains the per-tenant Merkle ledger, and exposes the customer HTTP API. Optionally fronted by the tool-gateway binary for burn-on-consume flows.

Step 03

llm-judge (Python)

FastAPI microservice. Does pgvector RAG over tenant policies and calls a LiteLLM-configured model to return allow/deny/review.

Step 04

Control plane + tenant databases

Control plane Postgres stores memberships and a clients table with tenant_secret_ref. Each tenant has its own isolated Postgres for policies and ledger.

Step 05

AWS Secrets Manager

Holds per-tenant DB passwords, ledger transport keys, and model credentials under TENANT_SECRET_PREFIX. Fetched on demand, cached briefly.

Step 06

S3 anchor bucket

Receives sealed Merkle checkpoints. Bucket versioning is required; object lock is optional. Backfilled on a fixed interval.

Two-database model

Ledgix deliberately splits control-plane state from tenant state.

Control plane (Supabase Postgres): memberships, tenant metadata, clients table. clients.tenant_secret_ref points at an AWS Secrets Manager entry — the row itself never contains tenant DB passwords, ledger transport keys, or Confluence tokens.
Tenant databases (one Postgres per tenant): policies (with pgvector embeddings) and the tenant's ledger. Credentials come from Secrets Manager at runtime.

This split is what makes tenant isolation real. A breach of the control plane does not expose tenant policy content or ledger entries.

A-JWT signing

Approval tokens are signed with Ed25519 (EdDSA). Vault publishes the public key at GET /.well-known/jwks.json.

Claims embedded in every A-JWT:

Claim	Meaning
`iss`	`alcv-vault` by default (override with `VAULT_JWT_ISSUER`)
`aud`	`ledgix-sdk` by default (override with `VAULT_JWT_AUDIENCE`)
`exp`	`iat + VAULT_JWT_TTL` (default 300 seconds)
`jti`	Unique token ID. Burned on `/consume-token`; replay returns `409`.
`tool`	The tool the request was approved for
`agent_id`, `session_id`, `policy_id`	Context for audit and review
`decision`	`yes`, `no`, or `review`
`tool_args_hash`	SHA-256 of canonical JSON of the approved arguments

Merkle ledger and anchoring

Every decision is persisted to the tenant's ledger on DB commit. A background anchor loop sequences accepted events into an append-only Merkle tree, signs the checkpoint, and exports it to S3.

Durability is synchronous on DB commit.
Sequencing and anchoring are asynchronous — they run on the VAULT_LEDGER_ANCHOR_BACKFILL_INTERVAL_SECONDS loop (default 30s).
The anchor bucket must have versioning enabled. Object lock is recommended but not required.

Async clearance queue

POST /request-clearance is queue-backed so slow judge calls cannot block the HTTP path.

Default: in-memory queue (VAULT_CLEARANCE_ASYNC_WORKERS, VAULT_CLEARANCE_ASYNC_QUEUE_SIZE).
Production: point VAULT_CLEARANCE_SQS_QUEUE_URL at an SQS FIFO queue. VAULT_CLEARANCE_SQS_FIFO_GROUP_SHARDS controls parallelism across FIFO groups.

Self-hosted configuration reference

Vault reads configuration from environment variables. When AWS_SECRET_NAME is set, Vault pulls the named Secrets Manager bundle on startup and merges it over the environment.

Vault — transport and signing

Field	Type	Required	Description
VAULT_HOST	string	No	Bind address. Defaults to 0.0.0.0.
VAULT_PORT	int	No	Listen port. Defaults to 8000.
VAULT_SIGNER_BACKEND	enum	No	local or aws_kms. Default local.
VAULT_PRIVATE_KEY_FILE	path	Conditional	Ed25519 private key path (local backend).
VAULT_PRIVATE_KEY_BASE64	string	Conditional	Base64-encoded Ed25519 private key (local backend).
VAULT_KMS_KEY_ID	string	Conditional	KMS key id (aws_kms backend).
VAULT_KEY_ID	string	No	JWKS kid header. Default vault-key-001.
VAULT_JWT_TTL	seconds	No	A-JWT validity. Default 300.
VAULT_JWT_ISSUER	string	No	iss claim. Default alcv-vault.
VAULT_JWT_AUDIENCE	string	No	aud claim. Default ledgix-sdk.
VAULT_CORS_ALLOWED_ORIGIN	string	No	CORS origin, e.g. https://dashboard.example.com.

Vault — data plane

Field	Type	Required	Description
DATABASE_URL	DSN	Yes	Control plane Postgres DSN.
VAULT_CONTROL_PLANE_DB_MAX_OPEN_CONNS	int	No	Control plane pool size. Default 50.
VAULT_TENANT_DB_MAX_OPEN_CONNS	int	No	Per-tenant pool size. Default 25.
VAULT_TENANT_DB_SSLMODE	enum	No	Per-tenant SSL mode. Default require.
TENANT_SECRET_PREFIX	string	Yes	Secrets Manager prefix for per-tenant bundles. Example: ledgix/tenants/prod.
VAULT_TENANT_SECRET_CACHE_TTL_SECONDS	seconds	No	Tenant secret cache TTL. Default 300.
AWS_SECRET_NAME	string	No	Name of the Vault bootstrap secret. Pulled on startup and merged over env.
AWS_REGION	string	No	Default AWS region. Default us-east-1.

Vault — clearance queue and rate limiting

Field	Type	Required	Description
VAULT_CLEARANCE_ASYNC_WORKERS	int	No	Workers draining the in-memory queue. Default 16.
VAULT_CLEARANCE_ASYNC_QUEUE_SIZE	int	No	In-memory queue depth. Default 2048.
VAULT_CLEARANCE_SQS_QUEUE_URL	URL	No	When set, switches async clearance to SQS FIFO.
VAULT_CLEARANCE_SQS_FIFO_GROUP_SHARDS	int	No	FIFO group shard count. Default 32.
VAULT_CLEARANCE_SQS_WAIT_TIME_SECONDS	seconds	No	Long-poll wait. Default 20.
VAULT_CLEARANCE_SQS_VISIBILITY_TIMEOUT_SECONDS	seconds	No	Message visibility timeout. Default 60.
VAULT_RATE_LIMIT_RPS	int	No	Per-principal rate limit. Default 20. Set 0 to disable.
VAULT_RATE_LIMIT_BURST	int	No	Burst allowance. Default 40.
VAULT_RATE_LIMIT_TTL_SECONDS	seconds	No	Rate limit window TTL. Default 180.

Vault — ledger anchoring

Field	Type	Required	Description
VAULT_LEDGER_ANCHOR_BUCKET	string	Conditional	S3 bucket for manifest anchoring.
VAULT_LEDGER_ANCHOR_BUCKET_TEMPLATE	string	No	Per-tenant bucket template, e.g. ledgix-anchor-{client_id}.
VAULT_LEDGER_ANCHOR_PREFIX	string	No	Object key prefix inside the bucket.
VAULT_LEDGER_ANCHOR_REGION	string	No	Bucket region. Falls back to AWS_REGION.
VAULT_LEDGER_REQUIRE_BUCKET_VERSIONING	bool	No	Default true. Startup fails if the bucket does not have versioning enabled.
VAULT_LEDGER_REQUIRE_OBJECT_LOCK	bool	No	Default false. Enforces S3 object lock when true.
VAULT_LEDGER_ANCHOR_BACKFILL_INTERVAL_SECONDS	seconds	No	Checkpoint export cadence. Default 30.
VAULT_LEDGER_ANCHOR_BACKFILL_BATCH_SIZE	int	No	Max entries per export batch. Default 500.
VAULT_TRANSPORT_KEY_BASE64	string	Conditional	32-byte base64 key for ledger entry encryption at rest.

Vault — judge integration

Field	Type	Required	Description
VAULT_JUDGE_URL	URL	Yes	llm-judge endpoint.
VAULT_JUDGE_API_KEY	string	Yes	Service-to-service key sent as X-API-Key.
VAULT_ALLOW_STUB_JUDGE	bool	No	Use the deterministic stub judge (development only).

llm-judge — configuration

Field	Type	Required	Description
EMBEDDING_MODEL	string	No	LiteLLM embedding model. Default bedrock/amazon.titan-embed-text-v2:0. Changing this requires re-embedding existing policy chunks.
EVAL_MODEL	string	No	LiteLLM evaluation model. Default bedrock/amazon.nova-pro-v1:0.
JUDGE_API_KEY	string	Yes	Service-to-service API key matching VAULT_JUDGE_API_KEY.
JUDGE_ENDPOINT_RATE_LIMIT	string	No	slowapi rate limit string. Default 600/minute.
JUDGE_UVICORN_WORKERS	int	No	Worker process count. Default 2.
DATABASE_URL	DSN	Yes	Postgres DSN (control plane access for tenant routing).
TENANT_SECRET_PREFIX	string	Yes	Must match the Vault prefix.
AWS_SECRET_NAME	string	No	Judge bootstrap secret name.
LOG_FORMAT	enum	No	json (default, Datadog/CloudWatch) or text.

Deploy via docker-compose

The reference stack lives in deployments/docker-compose.yml. It launches vault and llm_judge on a private Docker network with Caddy as the TLS ingress.

docker compose --env-file .env.prod up -d --build

.env.prod only needs non-sensitive values — primarily VAULT_AWS_SECRET_NAME, JUDGE_AWS_SECRET_NAME, TENANT_SECRET_PREFIX, and AWS_REGION. Each container fetches its secret bundle from AWS Secrets Manager on startup.

For brownfield enterprise rollouts onto pre-existing infrastructure, follow deployments/EXISTING_INFRA_ENTERPRISE_ROLLOUT.md rather than the Terraform stack in deployments/terraform/ (which is greenfield only).