Writing
The win grows with conversation depth
Same prompt. Same K=128 budget. Same hardware (a Pixel-class on-device runtime). Three policies. SlidingWindow forgot subprime mortgages. Plain TemporalKV forgot 2008. The hybrid kept both. Here's the trace, and why.
The ReLU is doing all the work
A linear scorer on the same features scores AUC 0.859. Add one hidden layer of 8 ReLU units — 49 parameters total — and AUC jumps to 0.900. We opened up the trained weights to see what the nonlinearity actually bought us. It was not what we expected.
When StreamingLLM beats us — and why
At K/T = 1/16, the dumbest policy in the comparison — 'always keep the last 64 tokens, no exceptions' — outperforms our learned policy on perplexity by 17 points. We dug in to figure out where it was spending its budget.
It's all about K/T
If you plot a learned eviction policy's win over heuristics against cache size, you get a confusing picture. Against context length, also confusing. Against their ratio, you get a wall.
Sparse eviction is the right llama.cpp primitive
We took our eviction policy off the PyTorch research stack and onto llama.cpp on a real device. Decoding got 5x slower. The fix was four lines of code — and a re-read of the cache's data model.
Where AI coding agents go blind on mobile
Three structural blind spots that limit what AI coding agents can do on iOS and Android — and what it takes to fix them.
Why I'm rebuilding how I ship mobile
Mobile has been slower to absorb AI coding agents than web. The interesting engineering is in the scaffolding around the agent, not the models.
MCP at 97 million
Model Context Protocol hit 97M monthly SDK downloads in March and now sits under the Linux Foundation. What that means for what a mobile engineer should build.
Xcode 26.3: Apple, late but serious
Xcode 26.3 ships with agentic coding, Claude and Codex integrations, and MCP support. The MCP part is the real news.
Switching the default to Sonnet 4.6
Sonnet 4.6 landed as the new default in Claude Code. Practical notes on when it replaces Opus for mobile work and when it doesn't.
One million tokens and the legacy codebase problem
Claude Opus 4.6 ships with a 1M context window. Necessary but not sufficient — a real Android codebase still needs retrieval, not just volume.
Cowork and the question of surface
Claude Cowork is a clean read of where desktop agents belong. Mobile agents need a different surface — one that can see the device, not just the files.
Vibe coding won't work on a device
Stack Overflow's latest survey shows 72% of pros refuse to ship AI-generated code without review. On mobile, that review gate isn't sentiment — it's load-bearing.
The year agents stop being a demo
The 2026 AI pragmatism narrative makes sense on the web. Mobile is still a cycle behind — the demos look good because the feedback loops are still carrying the weight.