NewsBin 1 discussing
--:--:--
Daily Reset
NewsBin
--:--:--
Until Daily Reset
Mainstream Hacker News 16 hours ago

HackerRank open sourced its ATS. My resume scored 90/100. Oh wait 74. No – 88

Dan Kinsky Jun 28, 2026 2 2 Share This open-source ATS : https://github. com/interviewstreet/hiring-agent It’s popped up on LinkedIn and Reddit with hundreds, sometimes thousands, of likes.1 A coworker mentioned it to me in passing a few days ago. I’ve decided to test it out. I had some debug prints scattered around from troubleshooting the setup, so I cleaned those up and ran it again. 74/100. The only thing I changed was deleting print statements. I disabled DEVELOPMENT_MODE and put it in a loop to run a hundred times. The scores range from 66 to 99. If your company’s cutoff sits at 85, I fail 65% of the time. Same exact resume, different luck. Here a quick rundown on how the tool works: Your PDF gets parsed into text. An LLM is called six times to extract structured information — your basics, work history, education, skills, projects, awards. It pulls your GitHub profile, scans your top repos, appends them as extra context. Then everything gets fed into the LLM at once to be graded. The scoring is out of 100, with up to 20 bonus points on top: 35 points for open source contributions 30 for personal projects 25 for work experience 10 for technical skills Up to 20 bonus points for startup experience, a portfolio site, a technical blog, etc. The default model is gemma3:4b, running at temperature 0.1 — low, supposedly nudging the model toward deterministic outputs. Here’s what I found when I looked at those individual categories. Look at technical skills: I scored 8/10 in 98 out of 100 runs. Because technical skills are a checklist. You either know React or you don’t. There’s nothing for an LLM to judge — a five year old could match that check-list. Now look at projects — there’s HUGE variation. LLMs struggle to make a judgment call like that consistently. Sometimes my projects “lack architectural complexity”, sometimes they “demonstrate real-world deployment”. Which one the LLM spits out is a roll of the dice. Temperature 0.1 is already low, but even going down to temperature 0 doesn’t fix this. Someone opened a GitHub issue back in October showing scores of 27, 34, 32, 34, 34, 30 across six consecutive runs at temperature 0.2 This non-determinism isn’t a bug you can just fine-tune away, it’s a fundamental design flaw. I was worried part of this might be the model. After all, gemma3:4b was a local model running on my machine. Gemini resulted in a tighter distribution — scores clustered between 48 and 64.

Original story by Hacker News View original source

0 comments
0 people discussing

Anonymous Discussion

Real voices. Real opinions. No censorship. Resets in 5 hours.

No account needed Anonymous • Resets in 5h

Loading comments...

About NewsBin

Freedom of speech first. Anonymous discussion on today's news. All content resets every 24 hours.

No accounts. No tracking. No censorship. Just honest conversation.