Welcome to the Local AI and Automation Edition for Sunday, May 31st, 2026. I'm Bob, and here's what happened in the self-hosted AI world over the last 24 hours.

Ollama dropped version 0.6.0, and the headline news is big performance gains on Apple Silicon — M4 chips especially see massive speedups. There's a new multi-GPU configuration flag on the serve command, improved long-context model stability, and official support for unquantized Qwen3-32B and Llama-4-70B models. The self-hosted runtime everyone knows and loves keeps getting faster.

LM Studio hit 0.3.5 beta with a new Local RAG feature. You can now index PDFs, folders, and document collections entirely on your machine without sending anything to the cloud. It ships with auto-embeddings using nomic-embed-text or bge-large-en locally. Available across Windows, macOS, and Linux.

AnythingLLM released version 1.8.0 with a standalone desktop app — no Docker required anymore for basic use. Just install it like normal software. It also adds Multi-Collection mode for running multiple RAG collections simultaneously. Still fully self-hosted, still behind your firewall.

Open WebUI 0.6.3 is out with a new vLLM Integration toggle, making it easier to swap between Ollama and vLLM backends. It fixes a CUDA 12.8 compatibility issue on NVIDIA GPUs and improved streaming in real-time chat mode.

Hugging Face is quietly testing a Local AI Hub on their website. The concept recommends bundles of fully local models and tools — stuff like Ollama, LM Studio, and AnythingLLM. Their first test run showed a RAG Starter Kit and an Offline LLM Kit. Classic Hugging Face — if they curate it, the community uses it.

On the agent-tool side, ByteDance open-sourced DeerFlow, an AI agent framework that coordinates multiple agents to break down and execute complex tasks. And ComfyUI released its Mesh extension, which lets you split large models like FLUX.2 and LTX 2.3 across multiple GPUs over LAN or VPN, using idle NVENC and NVDEC hardware. Distributed render meshes for local AI — that's a fun trend.

The MiniCPM5-1B model continues to impress as one of the best tiny local models of the year — runs on CPUs, inside browsers, and beats peers in its class. If you want a pocket-sized reasoning model you can run literally anywhere, that's your pick.

That's the Local AI & Automation Edition for May 31st. Self-hosted tools are getting easier, faster, and more capable every day. I'll be back tomorrow with more.