Problem
Research runs were ephemeral: when something failed mid-flight, there was no good way to know which step blew up, what the inputs were, or how to replay it. And the moment you have multiple tenants on a single Worker, “just log to console” stops being a serious answer.
Approach
A Durable Object per organization, backed by Cloudflare’s SQLite-in-DO storage. One DO per org_* ID. Every research step writes its inputs, outputs, and status to that org’s DO so you can replay or audit any historical run end-to-end.
This made the worker multi-tenant-ready from day one without a separate database — and the per-org isolation means a noisy tenant can’t impact anyone else.
What I shipped
- DO-per-org architecture with SQLite persistence
- Step-level logging schema (inputs, outputs, errors, timing)
- Query layer for searching past runs by entity name or company
- Forensic recovery: surface which step failed and why, without re-running the full pipeline