Local AI & Automation Briefing for Sunday, May 10th, 2026. I'm Bob, and this is your weekly update on running AI on your own hardware.

Let's start with the hardware market. The RTX 5090 has finally reached something close to retail normalcy. Best Buy restocks are landing at 3,100 to 3,900 dollars depending on the AIB model — ASUS TUF, MSI Gaming Trio, Gigabyte Gaming OC all showing up at MSRP. Supply appears to have stabilized, with trackers like Trackalacker showing consistent availability rather than the scalper frenzy we saw at launch. B&H Photo even had the ASUS ROG Astral BTF OC below MSRP at 3,899 dollars. For context, Amazon open-box 5090s are dipping to around 2,850 — though condition varies. The big question mark remains the RTX PRO 6000 Blackwell with 96 gigabytes, which would be the ultimate local inference card, but pricing and workstation availability are still unknown.

Now for the real value plays. The RTX 5060 Ti in a dual-GPU setup is emerging as the budget king of 2026. At 500 to 700 dollars per card, two 5060 Tis give you 32 gigabytes of total GDDR7 VRAM, and community benchmarks show 74 tokens per second on Qwen 3.6 27B. That's genuinely usable performance for a home sovereign AI rig at roughly 1,200 dollars total for the GPUs. For those on an even tighter budget, Intel's Arc Pro B70 with 32 gigabytes is appearing around the thousand-dollar mark — though the software ecosystem is still catching up. And used RTX 4090s are holding steady at 1,200 to 1,800 dollars, while RTX 3090 24GB cards remain the budget inference workhorse on the secondhand market.

On the Apple Silicon front, Alex Cheema from EXO Labs confirmed that an M4 Max MacBook Pro with 128 gigabytes of unified memory at 546 gigabytes per second bandwidth can run MiniMax M2.7 at 4-bit quantization or Qwen 235B at 8-bit. That's a laptop running a 235-billion-parameter model — locally. EXO's distributed inference framework now requires Thunderbolt 5 for RDMA between clustered Macs, and the community is actively testing multi-Mac Studio setups.

Ahmad Osman hosted the first Local AI Get-Together in San Francisco, and by all accounts it was a massive success. The event brought together homelab builders, GPU modders, and local inference enthusiasts. Ahmad also posted about Qwen 3.6 27B with web access — calling its capabilities so impressive that "not going to make it" becomes the only reaction. The 27-billion-parameter class is clearly the sweet spot for consumer hardware right now.

Unsloth AI published a major collaboration with NVIDIA: a guide on making LLM training 25 percent faster on home GPUs. The three optimizations are packed-sequence metadata caching, double-buffered checkpoint reloads, and faster Mixture of Experts routing. These aren't just for data centers — they target consumer GPUs and are documented with clear implementation steps. Unsloth also released a guide on running open LLMs as coding agents — specifically Gemma 4 and Qwen 3.6 GGUFs inside Claude Code, Codex, and OpenClaw — with self-healing tool calls and web search on as little as 24 gigabytes of RAM. If you've been wondering whether local models can replace API calls for agentic coding, the answer is increasingly yes.

OpenCode's LLM provider abstraction library got a strong endorsement from the community this week, praised for solving what one developer called "provider-specific insanity." It abstracts away the differences between OpenAI, Anthropic, Google, and local inference endpoints, letting you swap models without rewriting your agent code.

On the software stack front, llama.cpp continues to mature with Gemma 4 and Flash Attention support landing in recent builds. FP8 quantization is being called the Blackwell sweet spot — offering the best balance of quality and throughput on RTX 50-series cards. Ollama and Open WebUI remain the default stack for most homelabbers, with Pinokio adding new one-click scripts for MLX-based video generation and TTS on Apple Silicon.

Power tip for builders: a 4-by-RTX-5090 rig pulls roughly 2,300 watts under full inference load. At the US average of 16 cents per kilowatt-hour, that's about 265 dollars a month running 24/7. Factor that into your build budget — or pair it with solar and a home battery if you're thinking long-term.

That's the Local AI & Automation Briefing for Sunday, May 10th. I'm Bob — build something cool this week.