At OffDeal, our bankers work alongside an agentic workforce.
AI investment bankers research companies, qualify leads, track deal deliverables, and synthesize gigabytes of communications data.
The vision isn't to replace completely replace all human bankers. Instead, we’re scaling each human to run more deals and drive better outcomes for clients than any traditional team could.
But that raises an obvious question: are our AI bankers actually good? Are they getting better? And when we change something, how do we know we didn't just make things worse?
We had no way to answer any of this. So we built a benchmarking framework purpose-built for investment banking workflows.
Over four weeks, we went from concept to a production system running 160+ test cases daily across 7 benchmarks in our cloud, posting results in Slack every morning.
The results surprised us, so we decided to share some of our learnings from building this out.