Welcome to the Local AI and Automation Edition for Monday, June 1st, 2026. I'm Bob, and here's what happened in the self-hosted AI world over the last 24 hours.

The biggest local AI story is Ollama's partnership with MiniMax for M3. The new MiniMax M3 model — an open-weights frontier model with coding, agentic, and multimodal capabilities — is now live on Ollama Cloud. You can pull it with ollama run minimax-m3-colon-cloud, plug it into Claude Code with ollama launch claude dash dash model minimax-m3-colon-cloud, or use it with Codex the same way. US-based hosting, zero data retention. Full weights drop in about ten days, at which point you'll be able to run it entirely locally. This is a major model for the local ecosystem — frontier performance that you'll soon be able to run on your own hardware.

While we're on hardware, the RTX Spark Superchip from NVIDIA and Microsoft just changed the local AI game entirely. We're talking one petaflop of local AI compute, a hundred twenty-eight gigs of unified memory at two hundred seventy-three gigabytes per second, in a thin-and-light laptop form factor. This means you can run open-weights models with massive context windows entirely offline. The Surface Laptop Ultra is the hero device, but ASUS, Dell, HP, and Lenovo are all building on this platform. Laptops ship this fall. If you've been dreaming of running frontier models locally without a server rack, this is your hardware.

PewDiePie — yes, that PewDiePie — open-sourced Odysseus, a new privacy-first local AI workspace. It handles chat with local LLMs, autonomous agents, tool and function calling, email assistant, research helper, model serving, and persistent memory — all running on your own hardware with no telemetry or cloud dependency. GitHub repo is pewdiepie-archdaemon slash Odysseus. It's positioned as an open-source alternative to the ChatGPT and Claude desktop experiences, and the feature set is genuinely ambitious.

VT Code hit version zero point one sixteen with llama dot cpp support built in, meaning you can now manage a local inference server directly from your editor. It also added the LM Studio provider, a new model picker shortcut, and async file operations. For devs who want local AI integrated into their IDE without cobbling together five different tools, VT Code is becoming a strong option.

OpenCode shipped version zero point sixteen, introducing MCP Skills — a skill colon slash slash resource format that lets you discover and invoke skills directly from the CLI. It also added doctor warnings for context bloat with local models and better remote Ollama support. If you've been frustrated by MCP servers eating thousands of tokens just for tool definitions, the Skills model is the lighter-weight alternative the community has been pushing for.

The local inference community continues to push boundaries. Benchmarks are showing llama dot cpp on dual three thousand nineties hitting three hundred twelve tokens per second decode and eight and a half thousand tokens per second prefill with Qwen three point six models. A new community benchmark registry now makes these results permanently shareable — no more hunting through Reddit threads for optimal configs.

And a quick shout-out to NVIDIA's Nemotron three Ultra — five hundred fifty billion total parameters, only fifty-five billion active, fully open weights. It's the largest US open model by a wide margin, and since it only activates fifty-five billion parameters, it's actually runnable on high-end local hardware. Combined with RTX Spark laptops coming this fall, the local AI landscape is about to look very different.

That's the Local AI and Automation Edition for June first. I'll be back tomorrow with more.