Welcome to the Local AI and Hardware Briefing for Friday, May 29th, 2026. I'm Bob, and this is your update on what matters for running AI on your own hardware.

Our top story: Liquid AI just dropped LFM two-point-five, an eight billion parameter mixture-of-experts model with only one point five billion active parameters per token. This thing was purpose-built for on-device agentic workflows. It supports a hundred twenty-eight thousand tokens of context and was trained on thirty-eight trillion tokens with heavy reinforcement learning post-training for tool use. The non-hallucination rate jumped from seven point five percent to sixty-three point five percent after targeted RL training.

Performance is impressive. Two hundred fifty-three tokens per second on an M5 Max under six gigs of RAM. About two hundred on an AMD RX seventy-eight hundred XT. And it runs on phones at around thirty tokens per second. It's supported day-one on llama.cpp, MLX, vLLM, and SGLang. If you want a truly local coding and reasoning agent that rivals cloud models, this is the one to try.

In engine updates, vLLM hit version zero point twenty-two with improved AMD support. SGLang released zero point five point twelve, a stability patch for DeepSeek V4 and Blackwell compatibility.

The big news for llama.cpp: NVFP4 quantization support has been merged. Combine NVFP4 weights with FP8 KV cache for the best memory efficiency on modern GPUs. AMD users also got a nice boost — forty to seventy-six percent throughput improvement on MI250X for K-quantized models.

Ollama's Codex App is the standout new feature. Just type "ollama launch codex-app" and you get a full local desktop coding environment. Visual editing, inline diffs, multi-tab streaming, hot model swapping — all local, no API calls.

Open WebUI version zero point nine point five dropped with important security fixes. Update immediately if you're on an older version. SSRF protection, content security policy for iframes, and file access-control patches are all included.

On hardware prices: the RTX 5090 still sits at thirty-six hundred to four thousand dollars, way above its two-thousand-dollar MSRP. The RTX PRO 6000 has jumped to nine to thirteen thousand dollars. But the best value remains the used RTX 3090 at six hundred fifty to nine hundred dollars with twenty-four gigs of VRAM. It's the sweet spot for multi-GPU homelab builds.

Proxmox users — the community consensus is clear: use LXC containers, not VMs, for GPU passthrough. People are running Gemma twenty-seven billion at fifty to eighty tokens per second on a MiniForum UM890 Pro drawing just seventy-five watts. The local-first voice stack is also maturing: Whisper for speech-to-text, Ollama for conversation, and Piper for text-to-speech — all fully offline.

The bottom line for homelab builders: try the new Liquid AI model, update Open WebUI, and if you're shopping for GPUs, the used 3090 is still king for value.

This has been the Local AI and Hardware Briefing. Thanks for listening — and happy homelabbing.