Journal

Field notes from applied AI.

Essays on infrastructure, evaluation, governance, and the unglamorous work that separates demos from products.

Recent

Latest writing.

Article visual
EngineeringNov 24, 20258 min

Speculative decoding without the regrets

How we reduced LLM serving cost by 41% without changing a single model weight — and the three places it bit us in production.

Article visual
EvalsNov 11, 20256 min

The eval harness we wish we'd built first

An opinionated take on what belongs in your evaluation pipeline before you ship anything to a paying customer.

Article visual
ProductOct 28, 20254 min

Introducing Guardrails 2.0

Policy-as-code is now inline at inference — and configurable per tenant, route, and audit class.

Article visual
ResearchOct 14, 202511 min

On the cost of "good enough" retrievers

A measured look at where investing in retrieval quality pays back — and where it doesn't.

Article visual
CustomersSep 30, 20257 min

How Northwind cut inference cost 3.1×

A customer engineering story about moving a multi-tenant LLM workload from a generic provider onto Xelvoraa.

Article visual
NotesSep 18, 20253 min

The case for boring AI infrastructure

Boring is reliable. Reliable is fast. Fast is what your customers actually wanted.