HackerRank CTO & author of this repo here

There's no better feeling than building something open source and watching it take off. Nine months ago, I built a simple hiring agent to solve one very real problem.

Things it is not: It's not an ATS. We don't use it to screen our open roles. Our customers don't use it either.

Here's what it is: Every year at HackerRank, we get 50,000 to 60,000 intern applications. No human can read that many resumes well. So I built something to rank them, helping me decide which resumes to read first.

[This was before we built AI Interviewer (Chakra) to automate the first round of interviews, so candidates are no longer rejected based on their resumes alone.]

Two things worth clarifying since I've seen them come up in this thread:

The default model is gemma3:4b because it's what runs locally on most laptops - no cloud API needed. Actual resumes are evaluated using a top Gemini model. The repo ships with a demo config, not the production one.

The cutoff score was set very low — the system was designed to rank resumes, not reject them. Only resumes at the very bottom of the distribution were filtered out. The vast majority passed through to human review, where the real decisions were made.

Over the last week, it's taken on a life of its own. People are cloning it, running their own resumes through it, opening issues, sending PRs.

I contributed to open source a lot in college. Somewhere along the way, I drifted away from it. This week reminded me how good that feeling is. This thread has also given me more ideas than I expected. The critiques here are sharp and I'm already thinking about how to act on them. Improvements are coming.

You know you're not writing for LinkedIn? So platitudes about drifting away, watching your project "succeed" by being really popular, is not relevant to the main concerns pushed by this piece. Particularly brushing off the non deterministic score calculation.

I'm a bit disappointed to see "The critiques here are sharp", a Claude tell, in a response which (to me) is trying to subtly argue that hackerrank is not overly reliant on LLMs.

I'm not sure if your intent was to come across as having written this yourself, but it did not have the effect of improving my perception that this approach is flawed.

I was also disappointed that you didn't address the variability in scores. I'm inferring that you believe the larger model takes care of the main observation in the post, but I don't really see you directly addressing the points.

Maybe it's just me.

There is variability in scores and that's expected given we are eventually using a LLM to score. At least, when I used it 7 months ago, the only way I could avoid it was by keeping the cutoff score low (as low as 10 or 20).

Reading this thread, I'm hoping to minimize the variability even further (even though I know it can't be fully removed).

Do you read all ~50,000 then? Just with the ranked ones first?

Or are you using it to screen? I'm confused.

There are some with very low scores that were ignored (like < 20).

Rest of the ones with good scores (at least more than 40K), was reviewed manually.

>>It 's not an ATS.

>>No human can read that many resumes well. So I built something to rank them, helping me decide which resumes to read first

Translation: it's an ATS.

>>the system was designed to rank resumes, not reject them

>>Only resumes at the very bottom of the distribution were filtered out

Translation: it was designed to reject the CVs

Saw this comment at the top with 0 replies and thought “How is that possible??” and then saw the “0 minutes ago” timestamp. Only on HN can you stumble into the comments section just moments after a CTO, founder, author, etc. left unfiltered remarks about the exact topic of the post. Never change HN.

Depends how "unfiltered" you consider LLM output to be.

Thank you for your fantastic work!