Audit Log, Live Feed, and Analytics¶
You want three things simultaneously:
- Audit: an append-only tamper-evident-ish record of what happened (requests, approvals, operations).
- Live feed: a real-time stream for UI/monitoring (pending approvals, durations, outcomes).
- Analytics + semantic search: fast queries and “find similar events” without leaking secret material.
This doc describes a broker-first approach that keeps Opaque a secrets broker, not a secret store.
1. Storage Strategy (Layered)¶
System of record: SQLite (transactional)¶
- Primary store for:
- audit events (append-only)
- device pairings (public keys)
- client identities (exe hashes, uid/gid)
- provider metadata (non-secret config)
- profiles (name -> secret refs)
SQLite gives durability, migrations, constraints, and low operational overhead.
Analytics store: Arrow/Parquet dataset¶
- Periodically (or continuously) export/roll up audit events to Parquet files with a stable Arrow schema.
- This enables:
- DuckDB queries
- DataFusion queries
- Python/R/BI tooling
Parquet is an Arrow-friendly, columnar, compressible format for long-term history.
Semantic index: LanceDB (Arrow-native)¶
- Build an embeddings index over sanitized event text (no secret values, no raw locators).
- Store:
event_idtsevent_text(sanitized)embeddingvector
LanceDB is a good fit specifically because it is Arrow-native and optimized for vector search.
2. Redaction Policy (Critical)¶
Audit/feeds become an exfiltration path if they contain sensitive data and are accessible to untrusted agent runtimes.
Rules:
- Never log plaintext secrets.
- Prefer not to log full secret locators (e.g., full Vault paths or 1Password item names) in LLM-visible channels.
- Treat these as sensitive metadata:
- secret ref locators
- repository names (sometimes)
- cluster names/namespaces (sometimes)
- exact URLs and response bodies from authenticated HTTP proxy ops
Recommended split:
- Human audit stream: richer detail (still no values).
- Agent audit stream: heavily minimized (operation name + high-level target category + outcome).
Enforce this by separating:
- transport (separate sockets/endpoints) and/or
- authorization (role gating per client identity).
3. Event Model¶
Event taxonomy (suggested)¶
request.receivedpolicy.deniedapproval.requiredapproval.presentedapproval.grantedapproval.deniedoperation.startedoperation.succeededoperation.failedprovider.fetch.started/provider.fetch.finished(metadata only)
Correlation IDs¶
Every operation should carry a correlation chain:
request_id: end-to-end idempotency key from client or generated by daemonapproval_id: approval request id (may be multiple if step-up)event_id: unique per event
Minimal event fields (conceptual)¶
struct AuditEvent {
event_id: String, // uuid
ts_utc_ms: i64,
level: String, // info|warn|error
kind: String, // request.received, approval.granted, ...
request_id: Option<String>,
approval_id: Option<String>,
client: ClientSummary, // observed uid/gid + exe hash + optional codesign
operation: Option<String>, // github.set_actions_secret, k8s.set_secret, ...
target: Option<TargetSummary>,
outcome: Option<String>, // ok|denied|error
latency_ms: Option<i64>, // approval latency, op latency, etc
// Optional and sensitive: store only when explicitly enabled.
location: Option<Location>,
// No secret values.
// Avoid full locators by default; use stable ids or hashed references.
secret_names: Vec<String>, // e.g. ["JWT","DATABASE_URL"]
secret_ref_ids: Vec<String> // e.g. hashed refs or profile keys
}
Location (optional, privacy-sensitive)¶
Location can mean multiple things:
- iOS approval device location (requires explicit permission)
- network info (LAN IP, WiFi SSID) is often more sensitive than helpful
Recommendation:
- default:
location = None - opt-in: store coarse location only:
- country/region if available, or
- geohash with low precision, or
- just “network = home/office” tags from user config
4. Live Feed¶
Feed requirements¶
- near-real-time UI updates:
- new requests
- pending approvals
- granted/denied
- operation outcomes
- filtering:
- by repo/project/cluster
- by operation kind
- by client identity
Implementation shape¶
Internally:
- append each event to SQLite
- publish each event to an in-memory pubsub (e.g.
tokio::broadcast)
Externally (pick one or more):
- local SQLite query via CLI (
opaque audit tail) - HTTP
localhostendpoint using SSE for web/desktop UI (/audit/stream) - (later) Arrow Flight / FlightSQL stream for Arrow-native consumers
Current SSE runtime:
- disabled by default
- enable with
OPAQUE_AUDIT_SSE_ADDR=127.0.0.1:8787 - optional tuning:
OPAQUE_AUDIT_SSE_POLL_MSOPAQUE_AUDIT_SSE_BATCH_LIMIT- endpoint:
GET /audit/stream?since_ms=<unix_ms>
Preventing side channels¶
Make sure untrusted agent clients cannot subscribe to the human feed by default.
5. Analytics¶
Built-in metrics (daemon can compute)¶
- approvals:
- count granted/denied
- median approval latency
- step-up frequency (local_bio only vs local_bio+ios_faceid)
- operations:
- success/error rates by operation
- p95 latency by operation kind
- top targets (repo/project/cluster)
Columnar analytics with Arrow/Parquet + DuckDB¶
For long-term analysis:
- export audit events to Parquet partitions:
- partition by date (
dt=YYYY-MM-DD) - optionally partition by
operation_family
DuckDB can query these locally with high performance, including joins, group-bys, and window functions.
6. Semantic Search¶
What to embed¶
Only embed a sanitized textual summary, e.g.:
"approval denied for github.set_actions_secret repo=org/repo env=prod secret=JWT client=claude-code"
Never embed:
- secret values
- access tokens
- raw HTTP bodies
- full secret ref locators if you consider them sensitive
Indexing pipeline¶
- emit
AuditEvent - derive
event_text_sanitized - compute embedding asynchronously (so approvals/operations are not blocked)
- upsert into LanceDB with
event_idas primary key
Queries¶
audit.search_semantic(query, limit)returns:- event_ids + similarity scores + short snippet
- then fetch details from SQLite (role-gated and redacted appropriately)
7. Retention and Backpressure¶
Audit can grow without bound.
Implemented¶
- Periodic cleanup: the writer thread deletes events older than
retention_daysevery 6 hours (configurable viaaudit_cleanup_interval_secsin config.toml), using batched deletes of 5,000 rows to avoid stalling the writer - Incremental vacuum: after each cleanup pass,
PRAGMA incremental_vacuum(500)reclaims freed pages without a full VACUUM - Auto-vacuum: new databases are created with
PRAGMA auto_vacuum = INCREMENTAL; existing databases log an info message suggesting a one-time VACUUM migration - Disk-backed overflow queue: when the bounded channel (4,096 events) fills, events spill to a separate
audit.overflow.dbfile (up to 100k events) rather than being dropped; the writer thread drains overflow events back into the main DB after each batch - Push-based SSE: the writer thread signals an
AuditNotifyhandle after each successful insert, waking SSE consumers immediately instead of polling on a fixed interval; SSE falls back to polling when no notify handle is available
Recommended (future)¶
- Older events rolled to Parquet and optionally removed from SQLite
- Embeddings store follows the same retention window