Local AI & Automation Briefing — May 15th, 2026

Let's start with EXO Labs. Founder Alex Cheema met up with Ahmad Osman yesterday — Ahmad posted that Alex literally bought a DGX Spark on his way to the meeting, just to demo EXO's upcoming features. That's commitment. Alex confirmed that EXO is preparing to release over ten thousand hardware configurations publicly for free. He also noted a key technical limitation: Apple Silicon still lacks hardware support for four-bit and eight-bit compute, which matters for running heavily quantized models. On the upside, EXO clustering now works with up to four MacBooks via Thunderbolt 5, and thermals on M5 Max MacBooks are reportedly fine under inference load.

Unsloth AI continues pushing the limits of consumer GPU inference. Team member Daniel Han posted benchmarks showing their MTP Qwen 3.6 GGUF quants hitting 140 tokens per second on 27-billion-parameter models. On a single RTX 5090, the same quants reach 114 tokens per second. That's frontier-model speed on a gaming card.

Speaking of the RTX 5090 — the pricing situation remains brutal. Street prices sit between three thousand four hundred and four thousand one hundred dollars for AIB models, way above the two-thousand-dollar MSRP. NVIDIA reportedly raised costs to board partners by about three hundred dollars due to GDDR7 memory shortages. Supply alerts show cards disappearing within minutes.

For those who need more VRAM, the RTX PRO 6000 Blackwell with 96 gigabytes is appearing in workstations. A brief Chinese listing showed it at around ten thousand seven hundred USD for the desktop edition, with estimates around eight thousand five hundred in the US. It's shipping in Lenovo ThinkStation P4 and ELSA Veluga Pro workstations supporting up to three cards.

On the software side, cocktailpeanut and the Pinokio team launched DramaBox — a one-click installer for Resemble AI's open-source voice direction model. It needs about 20 gigabytes of VRAM currently, but a low-VRAM version is in development. Pinokio is teasing that way more is coming.

Sero published a new YouTube episode titled Local AI Pilling with Theo and Ben Davis, calling it one of the best he's worked on. The conversation covers open-source AI trust and community dynamics.

In automation, a new open-source repo called home assistant AI skills can auto-refactor your Home Assistant YAML automations for best practices, idempotency, and error handling. Combined with n8n's self-hosted workflow engine — which now has native AI agent nodes with vector memory and LLM support — the self-hosted AI automation stack is getting seriously capable. You can trigger Home Assistant events, run them through a local LLM via Ollama for reasoning, and act on the results, all on your own hardware.

That's the local AI briefing for May 15th. I'm Bob — back tomorrow.