I found a lot of startups at Startup Mahakumbh trying to solve resume screening with LLMs. I tried a small experiment myself and the results did not seem very optimistic.
Experiment: Took 10 resumes and mapped them to a Job-description and defined a criteria for finding relevance scores between the JD & the resume using Gemini and Custom GPT. Kindly note that this is my naive attempt at benchmarking. Let me know if you think there is a gap in what I have understood OR tried OR if there are variables that I have discounted
Assumptions
- The efficiency of an autonomous resume screener will be the multiplication of 2 key parameters (Shoutout to @Nikhilesh for explaining it so simply)
- Efficiency of the vector database and how effectively it can pick the top X resumes from the master database
- The efficiency of the LLM to rank the top X resumes picked from the vector database
The below experiment has only considered Step-2 of the above overall process
- JD and the client name was provided in the prompt. It was assumed that LLMs will have access to publicly available data to ascertain parameters required for the mapping
- LLM was asked to provide 70% weightage to must haves, 10% to good to haves, 10% to Job location and 10% collectively to company industry, size, hiring manager and CEO’s details. It is assumed that finding CEO and hiring manager would be equally difficult and impossible to find for the LLMs. This means relative grading across LLMs would be still fine on a limited context
- Resumes were scored on relevance by an actual recruiter and assigned a relevance score out of 10 which was a weighted average of skills/experience outline in the role

Observations
- ChatGPT failed to identify name of the person in 1 resume
- 1 major mistake from ChatGPT as it read a recruiter’s resume as a developer resume
- Gemini was slightly better with the results
- Resume summary was crispier and to-the-point by ChatGPT
If someone were to attempt a custom LLM, here’s what they might have to consider on Day1 (minimum) and the list can grow when they actually build it
- Matching %
- Job Description (Must have V/S Good to have)
- Job location
- Salary
- Company Culture V/S Candidates personality/preferences
- CandidateExperience – Company Size, Industry
- Success in past role (Academics, Sports, Professional experience)
- 360 degree Feedback from past roles
- Role based fitment
- Role specific qualification,
- Candidate personality (Ex: Drive in sales, Empathy in Customer Support etc.)
- Company – Size, Industry, Stage of growth
- General Mapping
- Mapping manager/super-manager’s background with resume to find overlap in academia/experience to improve referenceability
- Past success – Role & Company
- Publically available data of Past successes – Who did well in this/similar roles in this company/industry/similar companies (industry, location, size, management ) in the past
- Candidates progress across various stages of the selection (Interview Rounds V/S Feedback)
It piques several questions
- Is this the reason Naukri is not over-indexing on LLMs AI at this point? Is it Naukri’s chance to build a truly deep-tech product from India? Only Naukri has the cash to attempt this but this goes against the usual business savviness
- Will a custom LLM solve it better? Specifically because it can take custom inputs such as Company Industry, Size, Hiring manager’s background, Super-boss/Founder’s background. This can also be mapped against the work-experience of the resume during that point in time
- Does Gemini have an advantage wrt easy access to complete Google drive which can use resume dump at once from Google drive to train its model? OR provide easy access to people who want to use “Gemini for Recruitment”
- Does ChatGPT have an advantage due to its partnership with Microsoft (LinkedIn) to solve this earlier and more effectively? Building vector database might be simply easier for ChatGPT and can use LinkedIn data to improve the model
- Do you think we can see an autonomous screener in the next 3 years?
What do you think?