Audit Log, Live Feed, and Analytics¶

You want three things simultaneously:

Audit: an append-only tamper-evident-ish record of what happened (requests, approvals, operations).
Live feed: a real-time stream for UI/monitoring (pending approvals, durations, outcomes).
Analytics + semantic search: fast queries and “find similar events” without leaking secret material.

This doc describes a broker-first approach that keeps Opaque a secrets broker, not a secret store.

1. Storage Strategy (Layered)¶

System of record: SQLite (transactional)¶

Primary store for:
audit events (append-only)
device pairings (public keys)
client identities (exe hashes, uid/gid)
provider metadata (non-secret config)
profiles (name -> secret refs)

SQLite gives durability, migrations, constraints, and low operational overhead.

Analytics store: Arrow/Parquet dataset¶

Periodically (or continuously) export/roll up audit events to Parquet files with a stable Arrow schema.
This enables:
DuckDB queries
DataFusion queries
Python/R/BI tooling

Parquet is an Arrow-friendly, columnar, compressible format for long-term history.

Semantic index: LanceDB (Arrow-native)¶

Build an embeddings index over sanitized event text (no secret values, no raw locators).
Store:
event_id
ts
event_text (sanitized)
embedding vector

LanceDB is a good fit specifically because it is Arrow-native and optimized for vector search.

2. Redaction Policy (Critical)¶

Audit/feeds become an exfiltration path if they contain sensitive data and are accessible to untrusted agent runtimes.

Rules:

Never log plaintext secrets.
Prefer not to log full secret locators (e.g., full Vault paths or 1Password item names) in LLM-visible channels.
Treat these as sensitive metadata:
secret ref locators
repository names (sometimes)
cluster names/namespaces (sometimes)
exact URLs and response bodies from authenticated HTTP proxy ops

Recommended split:

Human audit stream: richer detail (still no values).
Agent audit stream: heavily minimized (operation name + high-level target category + outcome).

Enforce this by separating:

transport (separate sockets/endpoints) and/or
authorization (role gating per client identity).

3. Event Model¶

Event taxonomy (suggested)¶

request.received
policy.denied
approval.required
approval.presented
approval.granted
approval.denied
operation.started
operation.succeeded
operation.failed
provider.fetch.started / provider.fetch.finished (metadata only)

Correlation IDs¶

Every operation should carry a correlation chain:

request_id: end-to-end idempotency key from client or generated by daemon
approval_id: approval request id (may be multiple if step-up)
event_id: unique per event

Minimal event fields (conceptual)¶

struct AuditEvent {
  event_id: String,           // uuid
  ts_utc_ms: i64,
  level: String,              // info|warn|error
  kind: String,               // request.received, approval.granted, ...
  request_id: Option<String>,
  approval_id: Option<String>,

  client: ClientSummary,      // observed uid/gid + exe hash + optional codesign
  operation: Option<String>,  // github.set_actions_secret, k8s.set_secret, ...
  target: Option<TargetSummary>,

  outcome: Option<String>,    // ok|denied|error
  latency_ms: Option<i64>,    // approval latency, op latency, etc

  // Optional and sensitive: store only when explicitly enabled.
  location: Option<Location>,

  // No secret values.
  // Avoid full locators by default; use stable ids or hashed references.
  secret_names: Vec<String>,  // e.g. ["JWT","DATABASE_URL"]
  secret_ref_ids: Vec<String> // e.g. hashed refs or profile keys
}

Location (optional, privacy-sensitive)¶

Location can mean multiple things:

iOS approval device location (requires explicit permission)
network info (LAN IP, WiFi SSID) is often more sensitive than helpful

Recommendation:

default: location = None
opt-in: store coarse location only:
country/region if available, or
geohash with low precision, or
just “network = home/office” tags from user config

4. Live Feed¶

Feed requirements¶

near-real-time UI updates:
new requests
pending approvals
granted/denied
operation outcomes
filtering:
by repo/project/cluster
by operation kind
by client identity

Implementation shape¶

Internally:

append each event to SQLite
publish each event to an in-memory pubsub (e.g. tokio::broadcast)

Externally (pick one or more):

local SQLite query via CLI (opaque audit tail)
HTTP localhost endpoint using SSE for web/desktop UI (/audit/stream)
(later) Arrow Flight / FlightSQL stream for Arrow-native consumers

Current SSE runtime:

disabled by default
enable with OPAQUE_AUDIT_SSE_ADDR=127.0.0.1:8787
optional tuning:
OPAQUE_AUDIT_SSE_POLL_MS
OPAQUE_AUDIT_SSE_BATCH_LIMIT
endpoint: GET /audit/stream?since_ms=<unix_ms>

Preventing side channels¶

Make sure untrusted agent clients cannot subscribe to the human feed by default.

5. Analytics¶

Built-in metrics (daemon can compute)¶

approvals:
count granted/denied
median approval latency
step-up frequency (local_bio only vs local_bio+ios_faceid)
operations:
success/error rates by operation
p95 latency by operation kind
top targets (repo/project/cluster)

Columnar analytics with Arrow/Parquet + DuckDB¶

For long-term analysis:

export audit events to Parquet partitions:
partition by date (dt=YYYY-MM-DD)
optionally partition by operation_family

DuckDB can query these locally with high performance, including joins, group-bys, and window functions.

6. Semantic Search¶

What to embed¶

Only embed a sanitized textual summary, e.g.:

"approval denied for github.set_actions_secret repo=org/repo env=prod secret=JWT client=claude-code"

Never embed:

secret values
access tokens
raw HTTP bodies
full secret ref locators if you consider them sensitive

Indexing pipeline¶

emit AuditEvent
derive event_text_sanitized
compute embedding asynchronously (so approvals/operations are not blocked)
upsert into LanceDB with event_id as primary key

Queries¶

audit.search_semantic(query, limit) returns:
event_ids + similarity scores + short snippet
then fetch details from SQLite (role-gated and redacted appropriately)

7. Retention and Backpressure¶

Audit can grow without bound.

Implemented¶

Periodic cleanup: the writer thread deletes events older than retention_days every 6 hours (configurable via audit_cleanup_interval_secs in config.toml), using batched deletes of 5,000 rows to avoid stalling the writer
Incremental vacuum: after each cleanup pass, PRAGMA incremental_vacuum(500) reclaims freed pages without a full VACUUM
Auto-vacuum: new databases are created with PRAGMA auto_vacuum = INCREMENTAL; existing databases log an info message suggesting a one-time VACUUM migration
Disk-backed overflow queue: when the bounded channel (4,096 events) fills, events spill to a separate audit.overflow.db file (up to 100k events) rather than being dropped; the writer thread drains overflow events back into the main DB after each batch
Push-based SSE: the writer thread signals an AuditNotify handle after each successful insert, waking SSE consumers immediately instead of polling on a fixed interval; SSE falls back to polling when no notify handle is available

Recommended (future)¶

Older events rolled to Parquet and optionally removed from SQLite
Embeddings store follows the same retention window