Shipping one ugly coreflow change
This page follows one concrete production change from design to incident response: promoting a new route policy for a live multimodal session.
At coreflow, model behaviour is product behaviour. Latency is UX. Eval coverage is product quality. The code is the easy bit. The hard part is context, verification, rollout, and rollback under real production pressure.
This is how I make risky AI changes boring to ship.
Live route design
Design the request path for a real-time session that can fail over without losing observability or rollback.
Promotion gate
Tune the new route policy until quality improves without wrecking latency or cost.
Evidence-gated rollout
Promote the route policy behind shadow traffic, canary guards, and live business metrics.
GPU failover / provider incident
P99 blows out during peak demand. Stabilise the system without turning a latency event into a product outage.
Bring me a route change, model promotion, adapter swap, stateful backfill, or GPU failover scenario. I will turn it into a boring rollout in the trial.