Mainstream Hacker News • 18 hours ago

Running local models is good now

Jun 15 2026 I’ve been working with local models since they came out, and finally, they’re surprisingly good now. I have a 2022 M2 Mac with 64 GB RAM and 1TB storage and I’ve used Mistral 7B Gemma 3 OpenAI OSS-20B Qwen 3 MOE, as well as a number of other Qwen variants like Qwen 2.5 Coder across a lot of different system setups like raw llama. cpp with Open WebUI llama-cpp-python Ollama llamafiles and LM Studio Where are local models now? Early on, models were slow, hard to use, and just not that accurate for most programming tasks. The idea that local models were severely lagging behind was largely true until, for me, the release of GPT-OSS. I have no concrete scientific evidence of this - my own personal vibe metric of “is a model good enough” is, “do I have to double-check it against an API model”, and GPT-OSS was the first one where I started doing that a lot less often. As a result, I’ve mostly been using local models as fast, personalized Google for development questions that don’t require recency. But with the most recent releases from Google in the Gemma 4, family, I’ve finally been able to do agentic coding locally and have loops work at about ~75% the accuracy/speed of frontier models, which is incredible. I’ve so far been using gemma-4-26b-a4b LM Studio implementation as my default local model. I’ve used the local setup so far to: Refactor a Python script that was a notebook into a repo of 5-6 modules, lint that module to use correct type hints for generics (most frontier models now do this automatically, but not always). I’ve also used it to proofread some blog posts, write unit tests, and to bootstrap a repo that stands up a two-tower model for recommendations just to the agent would do with a blank slate. Here’s what it generated, which was pretty basic but still beyond the scope of anything I would have thought possible last year: Note that the environment is restricted because I run all my agentic workflows in a Docker container with limited access to execution. I’m also building an app that surfaces topics from Arxiv papers. Out of curiosity, I had Pi go through my past LM Studio session logs and figure out what I was using LM Studio for: Unsurprisingly, since I’ve been working on Rijksearch, None of these are groundbreaking tasks (again, a lot of personalized Google/docs lookups), and working on them does give my GPUs and RAM a workout and the K-V cache grows to 64 GB RAM.

Original story by Hacker News • View original source

0 comments

0 people discussing

Anonymous Discussion

Real voices. Real opinions. No censorship. Resets in 15 hours.

No account needed Anonymous • Resets in 15h

Loading comments...

MS CNET News

About NewsBin

Freedom of speech first. Anonymous discussion on today's news. All content resets every 24 hours.

No accounts. No tracking. No censorship. Just honest conversation.

Running local models is good now

Anonymous Discussion

Apple Is Reportedly Planning AirPods With Cameras for 2027

A Major Legal Action From the FTC May Soon Target Amazon’s Ads

For the First Time, ChatGPT Reportedly Has Less Than Half of the AI Assistant Market

Exclusive eBook: How AI is becoming the next military advisor

Want to get a data center online quickly? Give it some flex.

About NewsBin