The Sales Agent exposes a health check endpoint at:
GET /health
This endpoint verifies:
A healthy response returns HTTP 200:
{
"status": "healthy",
"database": "connected"
}
An unhealthy response returns HTTP 503 with details about the failure:
{
"status": "unhealthy",
"database": "connection failed",
"error": "connection to server at \"postgres\" (172.18.0.2), port 5432 failed"
}
| Platform | Health Check Configuration |
|---|---|
| Docker | healthcheck directive in docker-compose.yml: curl -f http://localhost:8080/health |
| Fly.io | [[http_service.checks]] in fly.toml with path = "/health" |
| Cloud Run | Startup probe on port 8080, path /health |
| External monitoring | Point Pingdom, UptimeRobot, or similar at https://yourdomain.com/health |
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
interval: 10s
timeout: 5s
retries: 3
start_period: 30s
The Sales Agent uses Logfire for structured, searchable logging. Logfire provides:
Set the LOGFIRE_TOKEN environment variable to enable Logfire:
LOGFIRE_TOKEN=your-logfire-token
When LOGFIRE_TOKEN is set, the application emits structured log events to the Logfire service. Without this token, logs are written to stdout in a structured format.
| Level | Usage |
|---|---|
| DEBUG | Detailed request tracing, SQL queries, adapter calls |
| INFO | Standard operations: request handled, media buy created, token validated |
| WARNING | Non-critical issues: deprecated API usage, approaching rate limits |
| ERROR | Operation failures: database errors, adapter failures, authentication rejections |
| CRITICAL | System-level failures: database unavailable, encryption key missing |
| Platform | Command |
|---|---|
| Docker | docker compose logs salesagent |
| Docker (follow) | docker compose logs -f salesagent |
| Fly.io | fly logs --app adcp-sales |
| Cloud Run | gcloud run services logs read adcp-sales --region=us-central1 |
| Logfire | Logfire web dashboard at https://logfire.pydantic.dev/ |
Every operation performed through the Sales Agent is recorded in the audit_logs database table. This includes operations from all protocols (MCP, A2A, REST API) and the Admin UI. The audit trail is immutable – records are inserted but never updated or deleted.
| Column | Type | Description |
|---|---|---|
log_id |
String | Unique log entry identifier |
tenant_id |
UUID | Tenant where the operation occurred |
operation |
String | Operation name (e.g., create_media_buy, update_product, sync_creatives) |
principal_id |
UUID | The principal who performed the action (null for admin operations) |
principal_name |
String | Display name of the principal |
adapter_id |
String | The ad server adapter involved in the operation |
success |
Boolean | Whether the operation completed successfully |
details |
JSON | Structured data about the operation (parameters, results) |
error |
String | Error message if the operation failed |
ip_address |
String | Source IP address of the request |
created_at |
Timestamp | When the operation was logged |
| Category | Operations |
|---|---|
| Media Buys | create_media_buy, update_media_buy, get_media_buys, get_media_buy_delivery |
| Creatives | sync_creatives, list_creatives |
| Discovery | get_adcp_capabilities, get_products, list_creative_formats, list_authorized_properties |
| Workflows | workflow_approved, workflow_rejected, workflow_created (Admin UI operations) |
| Performance | update_performance_index |
| Admin | create_tenant, update_tenant, create_principal, update_product, update_settings |
| Authentication | token_validated, token_rejected, token_expired, sso_login, sso_login_failed |
Through the Admin UI, navigate to Activity to view the audit log with filters for:
For direct database queries:
-- Recent failures
SELECT operation, principal_name, error, created_at
FROM audit_logs
WHERE tenant_id = 'your-tenant-id'
AND success = false
ORDER BY created_at DESC
LIMIT 50;
-- Operations by a specific principal
SELECT operation, success, details, created_at
FROM audit_logs
WHERE tenant_id = 'your-tenant-id'
AND principal_id = 'principal-uuid'
ORDER BY created_at DESC;
-- Authentication failures in the last 24 hours
SELECT operation, principal_name, ip_address, error, created_at
FROM audit_logs
WHERE operation IN ('token_rejected', 'token_expired', 'sso_login_failed')
AND created_at > NOW() - INTERVAL '24 hours'
ORDER BY created_at DESC;
The Admin UI includes a real-time activity stream powered by Server-Sent Events (SSE). The stream shows business operations as they happen, without requiring page refreshes.
business_activity_service captures events from all protocols (MCP, A2A, REST API)/admin/activity/stream endpointThe activity stream displays:
If using nginx as a reverse proxy, SSE requires specific configuration to prevent buffering:
location /admin/activity/stream {
proxy_pass http://adcp_backend;
proxy_set_header Host $host;
proxy_http_version 1.1;
proxy_set_header Connection "";
proxy_buffering off;
proxy_cache off;
proxy_read_timeout 86400s;
}
The Sales Agent can send notifications to Slack via webhooks. Three webhook URLs can be configured per tenant:
| Webhook | Setting | Purpose |
|---|---|---|
| General | slack_webhook_url |
General operational notifications (media buy created, campaign launched) |
| Audit | slack_audit_webhook_url |
Security and audit events (authentication failures, configuration changes) |
| Human-in-the-Loop | hitl_webhook_url |
Approval requests requiring human action (budget threshold exceeded, policy violation) |
Configure webhooks in the Admin UI under Settings > Notifications, or set them via the API:
curl -X PATCH https://yourdomain.com/api/v1/tenant/settings \
-H "Authorization: Bearer <admin-token>" \
-H "Content-Type: application/json" \
-d '{
"slack_webhook_url": "https://hooks.slack.com/services/T.../B.../xxx",
"slack_audit_webhook_url": "https://hooks.slack.com/services/T.../B.../yyy",
"hitl_webhook_url": "https://hooks.slack.com/services/T.../B.../zzz"
}'
General notifications include:
Audit notifications include:
Human-in-the-loop notifications include:
The Sales Agent runs two background schedulers that automate campaign lifecycle operations. Both schedulers are started during the application lifespan (in src/core/main.py) and stopped on shutdown. They can be disabled with SKIP_CRON=true.
Module: src/services/media_buy_status_scheduler.py
The Media Buy Status Scheduler automatically transitions media buy statuses based on flight dates. It runs on a configurable interval (default: 60 seconds) and works cross-tenant, querying all active tenants on each cycle.
Automatic status transitions:
| From Status | To Status | Condition |
|---|---|---|
pending_activation |
active |
flight_start_date has been reached and all creatives are approved |
active |
completed |
flight_end_date has passed |
| Setting | Value |
|---|---|
| Default interval | 60 seconds |
| Scope | Cross-tenant (all active tenants) |
| Manual pause | Respected – paused media buys are not auto-transitioned |
| On status change | Updates database, triggers Slack notification, creates audit log entry |
Module: src/services/delivery_webhook_scheduler.py
The Delivery Webhook Scheduler sends periodic delivery reports to configured webhook URLs. It integrates with the protocol webhook service for authentication and implements duplicate prevention.
| Setting | Value |
|---|---|
| Default interval | 3600 seconds (1 hour) |
| Payload | Media buy ID, delivery metrics, status |
| Authentication | Integrates with the protocol webhook service (see below) |
| Duplicate prevention | 24-hour window to avoid re-sending the same report |
| Scope | Respects per-tenant webhook configuration |
Module: src/services/protocol_webhook_service.py
The Protocol Webhook Service sends push notifications for campaign events such as status changes and delivery updates. It handles both A2A TaskStatusUpdateEvent payloads and MCP webhook payloads.
Supported authentication schemes:
| Scheme | Header | Description |
|---|---|---|
| Bearer Token | Authorization: Bearer <token> |
Token-based authentication |
| HMAC-SHA256 | X-Signature |
Signed request body with a shared secret key |
| None | – | Unauthenticated (development only) |
Configuration is managed via PushNotificationConfig:
| Field | Description |
|---|---|
url |
Webhook endpoint URL |
authentication_type |
One of bearer, hmac-sha256, or none |
authentication_token |
Bearer token or HMAC secret key |
localhost URLs to the Docker host address when running inside a container.
Additional behavior:
Monitor database connection health through:
GET /health verifies active database connectivityDATABASE_QUERY_TIMEOUT (default: 30 seconds) prevents long-running queries from blocking the poolDATABASE_CONNECT_TIMEOUT (default: 10 seconds) limits time spent establishing new connectionsFor high-traffic deployments, enable PgBouncer connection pooling:
USE_PGBOUNCER=true
When enabled, the application adjusts its connection handling to be compatible with PgBouncer’s transaction pooling mode.
| Metric | Source | Alert Threshold |
|---|---|---|
| Database connectivity | /health endpoint |
Any non-200 response |
| Query latency | Logfire / application logs | p95 > 5 seconds |
| Connection pool utilization | Database server metrics | > 80% of max connections |
| Active connections | pg_stat_activity |
Approaching max_connections |
| Table bloat | pg_stat_user_tables |
n_dead_tup > 10,000 |
| Disk usage | Database server | > 80% of allocated storage |
-- Active connections by state
SELECT state, count(*)
FROM pg_stat_activity
WHERE datname = 'adcp_sales'
GROUP BY state;
-- Slowest recent queries
SELECT query, calls, mean_exec_time, max_exec_time
FROM pg_stat_statements
WHERE dbid = (SELECT oid FROM pg_database WHERE datname = 'adcp_sales')
ORDER BY mean_exec_time DESC
LIMIT 10;
-- Table sizes
SELECT relname, pg_size_pretty(pg_total_relation_size(relid))
FROM pg_catalog.pg_statio_user_tables
ORDER BY pg_total_relation_size(relid) DESC
LIMIT 10;
Set up external monitoring to poll the /health endpoint and alert on failures:
| Tool | Configuration |
|---|---|
| UptimeRobot | HTTP monitor on https://yourdomain.com/health, check every 1 minute, alert on 2 consecutive failures |
| Pingdom | HTTPS check on /health, 1-minute interval |
| AWS CloudWatch | Synthetic canary hitting /health endpoint |
| GCP Cloud Monitoring | Uptime check on Cloud Run service URL |
Monitor the audit log for elevated error rates:
-- Error rate in the last hour
SELECT
COUNT(*) FILTER (WHERE success = false) AS errors,
COUNT(*) AS total,
ROUND(100.0 * COUNT(*) FILTER (WHERE success = false) / COUNT(*), 2) AS error_pct
FROM audit_logs
WHERE created_at > NOW() - INTERVAL '1 hour';
Set alerts when:
create_media_buy with success = false)Monitor media buy budget utilization:
hitl_webhook_url alert operators to pending approvals| Alert | Condition | Severity | Channel |
|---|---|---|---|
| Health check failure | /health returns non-200 for 2+ checks |
Critical | PagerDuty / SMS |
| High error rate | > 5% failures in 15 minutes | High | Slack (slack_webhook_url) |
| Authentication spike | > 10 auth failures in 5 minutes | High | Slack (slack_audit_webhook_url) |
| Database connection failure | Health check reports database unhealthy | Critical | PagerDuty / SMS |
| Disk usage high | Database disk > 80% | Medium | Email / Slack |
| Pending approvals | Workflow tasks pending > 24 hours | Medium | Slack (hitl_webhook_url) |
| Certificate expiry | TLS certificate expires in < 14 days | Medium |