Writing

The win grows with conversation depth

Same prompt. Same K=128 budget. Same hardware (a Pixel-class on-device runtime). Three policies. SlidingWindow forgot subprime mortgages. Plain TemporalKV forgot 2008. The hybrid kept both. Here's the trace, and why.

kv-cache

May 25, 20267 min read

The ReLU is doing all the work

A linear scorer on the same features scores AUC 0.859. Add one hidden layer of 8 ReLU units — 49 parameters total — and AUC jumps to 0.900. We opened up the trained weights to see what the nonlinearity actually bought us. It was not what we expected.

kv-cache

May 14, 20267 min read

When StreamingLLM beats us — and why

At K/T = 1/16, the dumbest policy in the comparison — 'always keep the last 64 tokens, no exceptions' — outperforms our learned policy on perplexity by 17 points. We dug in to figure out where it was spending its budget.

kv-cache

May 3, 20266 min read

It's all about K/T

If you plot a learned eviction policy's win over heuristics against cache size, you get a confusing picture. Against context length, also confusing. Against their ratio, you get a wall.

kv-cache

April 22, 20265 min read

Sparse eviction is the right llama.cpp primitive

We took our eviction policy off the PyTorch research stack and onto llama.cpp on a real device. Decoding got 5x slower. The fix was four lines of code — and a re-read of the cache's data model.

kv-cache

April 11, 20265 min read

Where AI coding agents go blind on mobile

Three structural blind spots that limit what AI coding agents can do on iOS and Android — and what it takes to fix them.

March 31, 20266 min read

Why I'm rebuilding how I ship mobile

Mobile has been slower to absorb AI coding agents than web. The interesting engineering is in the scaffolding around the agent, not the models.

March 14, 20265 min read

MCP at 97 million

Model Context Protocol hit 97M monthly SDK downloads in March and now sits under the Linux Foundation. What that means for what a mobile engineer should build.

March 5, 20264 min read

Xcode 26.3: Apple, late but serious

Xcode 26.3 ships with agentic coding, Claude and Codex integrations, and MCP support. The MCP part is the real news.

February 26, 20264 min read

Switching the default to Sonnet 4.6

Sonnet 4.6 landed as the new default in Claude Code. Practical notes on when it replaces Opus for mobile work and when it doesn't.

February 19, 20263 min read

One million tokens and the legacy codebase problem

Claude Opus 4.6 ships with a 1M context window. Necessary but not sufficient — a real Android codebase still needs retrieval, not just volume.

February 7, 20264 min read

Cowork and the question of surface

Claude Cowork is a clean read of where desktop agents belong. Mobile agents need a different surface — one that can see the device, not just the files.

January 29, 20264 min read

Vibe coding won't work on a device

Stack Overflow's latest survey shows 72% of pros refuse to ship AI-generated code without review. On mobile, that review gate isn't sentiment — it's load-bearing.

January 20, 20264 min read

The year agents stop being a demo

The 2026 AI pragmatism narrative makes sense on the web. Mobile is still a cycle behind — the demos look good because the feedback loops are still carrying the weight.

January 8, 20263 min read