Local AI and Automation Edition — Saturday, June 27th, 2026

Welcome to the local edition. This is the stuff you can run yourself, host on your own hardware, or build into your dev workflow.

Let's start with an astonishing release from Liquid AI. They dropped LFM2.5-230M — a 230-million parameter model that can run on a Raspberry Pi. On a Pi 5, it maintains forty-two tokens per second decode speed. On a Samsung Galaxy phone, it hits over two hundred tokens per second. What's remarkable is that this tiny model beats models four times its size at data extraction and tool calling. It scores over forty-three on the BFCLv3 tool-use benchmark, crushing Google's Gemma 3 at one billion parameters. The model uses a hybrid architecture with gated convolutions and grouped-query attention, keeping memory under 400 megabytes. It's free for companies under ten million in revenue, commercial license for everyone else. This is the direction the edge is heading — specialized small models outperforming bloated giants at specific tasks.

Speaking of small punching above their weight — Weibo's VibeThinker-3B model is still making waves. That's a three-billion parameter model that scored ninety-four point three on the AIME 2026 math benchmark, rivaling models hundreds of times its size. The small model renaissance is real.

Now for something deeper. Xiaomi's researchers released HarnessX — a framework that lets AI agents rewrite their own scaffolding mid-task. In plain English, when an AI agent hits a wall, HarnessX doesn't wait for a human to fix the prompts or tools. It automatically diagnoses the failure, restructures its own harness, and tries again. Across fifteen model-benchmark combinations, they saw an average fourteen and a half percent performance gain. For Qwen3.5-9B on embodied planning tasks, gains hit forty-four percent. HarnessX also enables co-evolution — the execution traces from harness improvements feed back into model training, creating a virtuous cycle. This is a big deal for anyone running self-hosted agents.

On the memory front, researchers at the National University of Singapore released MRAgent — a new framework for agentic memory. Instead of the standard retrieve-then-reason approach, MRAgent actively reconstructs memory by following cues through a graph, pruning irrelevant branches as it goes. The results are stark. On LongMemEval benchmarks, MRAgent consumed just a hundred and eighteen thousand tokens per query. LangMem burned through three point two six million for the same task. That's a twenty-seven times reduction. Runtime was cut in half too. For anyone running local agents on constrained hardware, this kind of memory efficiency changes what's possible.

Mistral also launched OCR 4 this week — an enterprise document extraction model that supports a hundred and seventy languages across ten language groups. It accepts PDF, Word, PowerPoint, and OpenDocument formats, and crucially — it can be deployed as a single container on your own infrastructure. No cloud dependency. Perfect for regulated industries that can't route sensitive documents through US-based APIs.

Alibaba published research on training agents in simulators that can generate edge cases on demand, showing that models never explicitly trained as agents can still improve agent performance across seven benchmarks. And Shopify shared how they built a model-agnostic AI stack — using proxy models, distillation strategies, and circuit breakers to stay stable no matter which foundation models survive the shakeout.

Finally, Google redesigned their search box for the first time in twenty-five years. The classic white rectangle is being replaced with something AI-native. Details are emerging, but the message is clear — search is being rebuilt from the ground up for the AI era.

That's your local edition. All of this stuff is buildable, hostable, and relevant to anyone running their own AI infrastructure. See you tomorrow.