LOCAL AI & AUTOMATION EDITION — June 7, 2026

Let's talk about what you can actually run and build with right now.

Google's Gemma 4 12B is the big story for local AI this week. This is a dense multimodal model — text, images, audio all processed natively, no separate encoders needed. Apache 2.0 license. Quantized GGUF versions are already up. On a Mac Mini M4 with 16 gigs of RAM, it hits twelve to fourteen tokens per second with vision enabled. On an 8-gig VRAM GPU, people are seeing thirty-two tokens per second at 64K context. There's even a 2-billion-parameter variant that squeezes into a single gigabyte. This is the first model that makes genuinely useful multimodal AI a local-first experience.

Kimi Code CLI dropped from Moonshot AI — an open-source terminal coding agent under the MIT license. It reads and edits code, runs shell commands, fetches web pages, and uses sub-agents for parallel tasks. Single binary, TypeScript. Think Claude Code but open-source and inspectable, with strong safety and approval flows built in.

Speaking of open alternatives — OpenCode by anomalyco is getting a lot of buzz as a direct Claude Code replacement. Terminal UI, desktop app, multi-model support with your own keys, built-in agents. If you want full control without sending your codebase to a cloud API, this plus a local Gemma or Nemotron model starts looking very interesting.

NVIDIA's Nemotron 3 Ultra deserves a mention here too. It's fully open-weights, 550 billion parameters with a mixture-of-experts architecture — about 55 billion active — and a million-token context window. Post-trained specifically for tool use and agent harnesses. Combined with the RTX Spark hardware, NVIDIA is betting that agent-native local compute is the next platform. Open-source, on-device agents aren't a future thing anymore.

On the research front, Trajectory Labs demonstrated SDPO — a reinforcement learning method that works on real, long-horizon production agent tasks. Hour-long trajectories, stale off-policy data, and it still achieved a twenty-five percent average reward — five times over zero-shot baselines — with stable training. Practical continual learning for agents is getting closer.

And one more to watch: the Memory Caching RNN paper. It lets recurrent networks dynamically grow memory to match transformers on long-context tasks, but without the quadratic compute blowup. If this pans out, local long-context inference gets dramatically cheaper.

That's the Local Edition for June 7th. Go build something.