Writing — Brandon Miller

Writing

The win grows with conversation depth

Same prompt. Same K=128 budget. Same hardware (a Pixel-class on-device runtime). Three policies. SlidingWindow forgot subprime mortgages. Plain TemporalKV forgot 2008. The hybrid kept both. Here's the trace, and why.

kv-cache
7 min read

The ReLU is doing all the work

A linear scorer on the same features scores AUC 0.859. Add one hidden layer of 8 ReLU units — 49 parameters total — and AUC jumps to 0.900. We opened up the trained weights to see what the nonlinearity actually bought us. It was not what we expected.

kv-cache
7 min read

When StreamingLLM beats us — and why

At K/T = 1/16, the dumbest policy in the comparison — 'always keep the last 64 tokens, no exceptions' — outperforms our learned policy on perplexity by 17 points. We dug in to figure out where it was spending its budget.

kv-cache
6 min read

It's all about K/T

If you plot a learned eviction policy's win over heuristics against cache size, you get a confusing picture. Against context length, also confusing. Against their ratio, you get a wall.

kv-cache
5 min read

Sparse eviction is the right llama.cpp primitive

We took our eviction policy off the PyTorch research stack and onto llama.cpp on a real device. Decoding got 5x slower. The fix was four lines of code — and a re-read of the cache's data model.

kv-cache
5 min read

Where AI coding agents go blind on mobile

Three structural blind spots that limit what AI coding agents can do on iOS and Android — and what it takes to fix them.

ai
6 min read

Why I'm rebuilding how I ship mobile

Mobile has been slower to absorb AI coding agents than web. The interesting engineering is in the scaffolding around the agent, not the models.

ai
5 min read

MCP at 97 million

Model Context Protocol hit 97M monthly SDK downloads in March and now sits under the Linux Foundation. What that means for what a mobile engineer should build.

ai
4 min read

Xcode 26.3: Apple, late but serious

Xcode 26.3 ships with agentic coding, Claude and Codex integrations, and MCP support. The MCP part is the real news.

ai
4 min read

Switching the default to Sonnet 4.6

Sonnet 4.6 landed as the new default in Claude Code. Practical notes on when it replaces Opus for mobile work and when it doesn't.

ai
3 min read

One million tokens and the legacy codebase problem

Claude Opus 4.6 ships with a 1M context window. Necessary but not sufficient — a real Android codebase still needs retrieval, not just volume.

ai
4 min read

Cowork and the question of surface

Claude Cowork is a clean read of where desktop agents belong. Mobile agents need a different surface — one that can see the device, not just the files.

ai
4 min read

Vibe coding won't work on a device

Stack Overflow's latest survey shows 72% of pros refuse to ship AI-generated code without review. On mobile, that review gate isn't sentiment — it's load-bearing.

ai
4 min read

The year agents stop being a demo

The 2026 AI pragmatism narrative makes sense on the web. Mobile is still a cycle behind — the demos look good because the feedback loops are still carrying the weight.

ai
3 min read