In today’s competitive job market, making your resume stand out is crucial. JobFitAI is an innovative solution designed to help both job seekers and recruiters by analyzing resumes and offering actionable feedback. Traditional keyword-based filtering methods can overlook crucial nuances in a candidate’s profile. To overcome these challenges, AI-powered systems can be leveraged to analyze resumes, extract key skills, and match them effectively with job descriptions.
This article was published as a part of the Data Science Blogathon.
DeepSeek-R1 is an advanced open-source AI model designed for natural language processing (NLP) tasks. It is a transformer-based large language model (LLM) trained to understand and generate human-like text. DeepSeek-R1 can perform tasks such as text summarization, question answering, language translation, and more. Because it is open-source, developers can integrate it into various applications, fine-tune it for specific needs, and run it on their hardware without relying on proprietary systems. It is particularly useful for research, automation, and AI-driven applications.
Also Read: Decoding DeepSeek R1’s Advanced Reasoning Capabilities
Gradio is a user-friendly Python library that helps developers create interactive web interfaces for machine learning models and other applications. With just a few lines of code, Gradio allows users to build shareable applications with input components (such as text boxes, sliders, and image uploads) and output displays (such as text, images, or audio). It is widely used for AI model demonstrations, quick prototyping, and user-friendly interfaces for non-technical users. Gradio also supports easy model deployment, allowing developers to share their applications via public links without requiring complex web development skills.
This guide presents JobFitAI, an end-to-end solution that extracts text, generates a detailed analysis, and provides feedback on how well the resume matches a given job description using cutting-edge technologies:
The JobFitAI project is built around a modular architecture, where each component plays a specific role in processing resumes. Below is an overview:
JobFitAI/
│── src/
│ ├── __pycache__/ (compiled Python files)
│ ├── analyzer.py
│ ├── audio_transcriber.py
│ ├── feedback_generator.py
│ ├── pdf_extractor.py
│ ├── resume_pipeline.py
│── .env (environment variables)
│── .gitignore
│── app.py (Gradio interface)
│── LICENSE
│── README.md
│── requirements.txt (dependencies)
Before diving into the code, you need to set up your development environment.
First, create a virtual environment in your project folder to manage your dependencies. Open your terminal and run:
python3 -m venv jobfitai
source jobfitai/bin/activate # On macOS/Linux
python -m venv jobfitai
jobfitai\Scripts\activate # On Windows - cmd
Next, create a file named requirements.txt and add the following libraries:
requests
whisper
PyPDF2
python-dotenv
openai
torch
torchvision
torchaudio
gradio
Install the dependencies by running:
pip install -r requirements.txt
The project requires an API token to interact with the DeepInfra API. Create a .env file in your project’s root directory and add your API token:
DEEPINFRA_TOKEN="your_deepinfra_api_token_here"
Make sure to replace your_deepinfra_api_token_here with the actual token provided by DeepInfra.
Learn to access the DeepInfra API key; here.
The project is structured into several Python modules. In the following sections, we’ll understand the purpose of each file and its context in the project.
Resumes may not always be in text format. In cases where you receive an audio resume, the AudioTranscriber class comes into play. This file uses OpenAI’s Whisper model to transcribe audio files into text. The transcription is then used by the analyzer to extract resume details.
import whisper
class AudioTranscriber:
"""Transcribe audio files using OpenAI Whisper."""
def __init__(self, model_size: str = "base"):
"""
Initializes the Whisper model for transcription.
Args:
model_size (str): The size of the Whisper model to load. Defaults to "base".
"""
self.model_size = model_size
self.model = whisper.load_model(self.model_size)
def transcribe(self, audio_path: str) -> str:
"""
Transcribes the given audio file and returns the text.
Args:
audio_path (str): The path to the audio file to be transcribed.
Returns:
str: The transcribed text.
Raises:
Exception: If transcription fails.
"""
try:
result = self.model.transcribe(audio_path)
return result["text"]
except Exception as e:
print(f"Error transcribing audio: {e}")
return ""
Most resumes are available in PDF format. The PDFExtractor class is responsible for extracting text from PDF files using the PyPDF2 library. This module loops through all pages of a PDF document, extracts the text, and compiles it into a single string for further analysis.
import PyPDF2
class PDFExtractor:
"""Extract text from PDF files using PyPDF2."""
def __init__(self):
"""Initialize the PDFExtractor."""
pass
def extract_text(self, pdf_path: str) -> str:
"""
Extract text content from a given PDF file.
Args:
pdf_path (str): Path to the PDF file.
Returns:
str: Extracted text from the PDF.
Raises:
FileNotFoundError: If the file does not exist.
Exception: For other unexpected errors.
"""
text = ""
try:
with open(pdf_path, "rb") as file:
reader = PyPDF2.PdfReader(file)
for page in reader.pages:
page_text = page.extract_text()
if page_text:
text += page_text + "\n"
except FileNotFoundError:
print(f"Error: The file '{pdf_path}' was not found.")
except Exception as e:
print(f"An error occurred while extracting text: {e}")
return text
The ResumePipeline module acts as the orchestrator for processing resumes. It integrates both the PDF extractor and the audio transcriber. Based on the file type provided by the user, it directs the resume to the correct processor and returns the extracted text. This modular design allows for easy expansion if additional resume formats need to be supported in the future.
from src.pdf_extractor import PDFExtractor
from src.audio_transcriber import AudioTranscriber
class ResumePipeline:
"""
Process resume files (PDF or audio) and return extracted text.
"""
def __init__(self):
"""Initialize the ResumePipeline with PDFExtractor and AudioTranscriber."""
self.pdf_extractor = PDFExtractor()
self.audio_transcriber = AudioTranscriber()
def process_resume(self, file_path: str, file_type: str) -> str:
"""
Process a resume file and extract text based on its type.
Args:
file_path (str): Path to the resume file.
file_type (str): Type of the file ('pdf' or 'audio').
Returns:
str: Extracted text from the resume.
Raises:
ValueError: If the file type is unsupported.
FileNotFoundError: If the specified file does not exist.
Exception: For other unexpected errors.
"""
try:
file_type_lower = file_type.lower()
if file_type_lower == "pdf":
return self.pdf_extractor.extract_text(file_path)
elif file_type_lower in ["audio", "wav", "mp3"]:
return self.audio_transcriber.transcribe(file_path)
else:
raise ValueError("Unsupported file type. Use 'pdf' or 'audio'.")
except FileNotFoundError:
print(f"Error: The file '{file_path}' was not found.")
return ""
except ValueError as ve:
print(f"Error: {ve}")
return ""
except Exception as e:
print(f"An unexpected error occurred: {e}")
return ""
This module is the backbone of the resume analyzer. It initializes the connection to DeepInfra’s API using the DeepSeek-R1 model. The main function in this file is analyze_text, which takes resume text as input and returns analysis summarizing key details from the resume. This file ensures that our resume text is processed by an AI model tailored for resume analysis.
import os
from openai import OpenAI
from dotenv import load_dotenv
# Load environment variables from .env file
load_dotenv()
class DeepInfraAnalyzer:
"""
Calls DeepSeek-R1 model on DeepInfra using an OpenAI-compatible interface.
This class processes resume text and extracts structured information using AI.
"""
def __init__(
self,
api_key: str= os.getenv("DEEPINFRA_TOKEN"),
model_name: str = "deepseek-ai/DeepSeek-R1"
):
"""
Initializes the DeepInfraAnalyzer with API key and model name.
:param api_key: API key for authentication
:param model_name: The name of the model to use
"""
try:
self.openai_client = OpenAI(
api_key=api_key,
base_url="https://api.deepinfra.com/v1/openai",
)
self.model_name = model_name
except Exception as e:
raise RuntimeError(f"Failed to initialize OpenAI client: {e}")
def analyze_text(self, text: str) -> str:
"""
Processes the given resume text and extracts key information in JSON format.
The response will contain structured details about key skills, experience, education, etc.
:param text: The resume text to analyze
:return: JSON string with structured resume analysis
"""
prompt = (
"You are an AI job resume matcher assistant. "
"DO NOT show your chain of thought. "
"Respond ONLY in English. "
"Extract the key skills, experiences, education, achievements, etc. from the following resume text. "
"Then produce the final output as a well-structured JSON with a top-level key called \"analysis\". "
"Inside \"analysis\", you can have subkeys like \"key_skills\", \"experiences\", \"education\", etc. "
"Return ONLY the final JSON, with no extra commentary.\n\n"
f"Resume Text:\n{text}\n\n"
"Required Format (example):\n"
"```\n"
"{\n"
" \"analysis\": {\n"
" \"key_skills\": [...],\n"
" \"experiences\": [...],\n"
" \"education\": [...],\n"
" \"achievements\": [...],\n"
" ...\n"
" }\n"
"}\n"
"```\n"
)
try:
response = self.openai_client.chat.completions.create(
model=self.model_name,
messages=[{"role": "user", "content": prompt}],
)
return response.choices[0].message.content
except Exception as e:
raise RuntimeError(f"Error processing resume text: {e}")
After extracting details from the resume, the next step is to compare the resume against a specific job description. The FeedbackGenerator module takes the analysis from the resume and provides a match score along with recommendations for improvement. This module is crucial for job seekers aiming to refine their resumes to better align with job descriptions, increasing their chances of passing through ATS systems.
from src.analyzer import DeepInfraAnalyzer
class FeedbackGenerator:
"""
Generates feedback for resume improvement based on a job description
using the DeepInfraAnalyzer.
"""
def __init__(self, analyzer: DeepInfraAnalyzer):
"""
Initializes the FeedbackGenerator with an instance of DeepInfraAnalyzer.
Args:
analyzer (DeepInfraAnalyzer): An instance of the DeepInfraAnalyzer class.
"""
self.analyzer = analyzer
def generate_feedback(self, resume_text: str, job_description: str) -> str:
"""
Generates feedback on how well a resume aligns with a job description.
Args:
resume_text (str): The extracted text from the resume.
job_description (str): The job posting or job description.
Returns:
str: A JSON-formatted response containing:
- "match_score" (int): A score from 0-100 indicating job match quality.
- "job_alignment" (dict): Categorization of strong and weak matches.
- "missing_skills" (list): Skills missing from the resume.
- "recommendations" (list): Actionable suggestions for improvement.
Raises:
Exception: If an unexpected error occurs during analysis.
"""
try:
prompt = (
"You are an AI job resume matcher assistant. "
"DO NOT show your chain of thought. "
"Respond ONLY in English. "
"Compare the following resume text with the job description. "
"Calculate a match score (0-100) for how well the resume matches. "
"Identify keywords from the job description that are missing in the resume. "
"Provide bullet-point recommendations to improve the resume for better alignment.\n\n"
f"Resume Text:\n{resume_text}\n\n"
f"Job Description:\n{job_description}\n\n"
"Return JSON ONLY in this format:\n"
"{\n"
" \"job_match\": {\n"
" \"match_score\": <integer>,\n"
" \"job_alignment\": {\n"
" \"strong_match\": [...],\n"
" \"weak_match\": [...]\n"
" },\n"
" \"missing_skills\": [...],\n"
" \"recommendations\": [\n"
" \"<Actionable Suggestion 1>\",\n"
" \"<Actionable Suggestion 2>\",\n"
" ...\n"
" ]\n"
" }\n"
"}"
)
return self.analyzer.analyze_text(prompt)
except Exception as e:
print(f"Error in generating feedback: {e}")
return "{}" # Returning an empty JSON string in case of failure
The app.py file is the main entry point of the JobFitAI project. It integrates all the modules described above and builds an interactive web interface using Gradio. Users can upload a resume/CV file (PDF or audio) and input a job description. The application then processes the resume, runs the analysis, generates feedback, and returns a structured JSON response with both the analysis and recommendations.
import os
from dotenv import load_dotenv
load_dotenv()
import gradio as gr
from src.resume_pipeline import ResumePipeline
from src.analyzer import DeepInfraAnalyzer
from src.feedback_generator import FeedbackGenerator
# Pipeline for PDF/audio
resume_pipeline = ResumePipeline()
# Initialize the DeepInfra analyzer
analyzer = DeepInfraAnalyzer()
# Feedback generator
feedback_generator = FeedbackGenerator(analyzer)
def analyze_resume(resume_path, job_desc):
"""
Gradio callback function to analyze a resume against a job description.
Args:
resume_path (str): Path to the uploaded resume file (PDF or audio).
job_desc (str): The job description text for comparison.
"""
try:
if not resume_path or not job_desc:
return {"error": "Please upload a resume and enter a job description."}
# Determine file type from extension
lower_name = resume_path.lower()
file_type = "pdf" if lower_name.endswith(".pdf") else "audio"
# Extract text from the resume
resume_text = resume_pipeline.process_resume(resume_path, file_type)
# Analyze extracted text
analysis_result = analyzer.analyze_text(resume_text)
# Generate feedback and recommendations
feedback = feedback_generator.generate_feedback(resume_text, job_desc)
# Return structured response
return {
"analysis": analysis_result,
"recommendations": feedback
}
except ValueError as e:
return {"error": f"Unsupported file type or processing error: {str(e)}"}
except Exception as e:
return {"error": f"An unexpected error occurred: {str(e)}"}
# Define Gradio interface
demo = gr.Interface(
fn=analyze_resume,
inputs=[
gr.File(label="Resume (PDF/Audio)", type="filepath"),
gr.Textbox(lines=5, label="Job Description"),
],
outputs="json",
title="JobFitAI: AI Resume Analyzer",
description="""
Upload your resume/cv (PDF or audio) and paste the job description to get a match score,
missing keywords, and actionable recommendations.""",
)
if __name__ == "__main__":
demo.launch(server_name="0.0.0.0", server_port=8000)
After setting up your environment and reviewing all code components, you’re ready to run the application.
python app.py
You can find all the code files in Github repo – here.
The JobFitAI resume analyzer can be applied in various real-world scenarios:
Let us now explore troubleshooting and extensions below-
The JobFitAI resume analyzer is a robust, multi-functional tool that leverages state-of-the-art AI models to bridge the gap between resumes and job descriptions. By integrating DeepSeek-R1 via DeepInfra, along with transcription and PDF extraction capabilities, you now have a complete solution to automatically analyze resumes and generate feedback for improved job alignment.
This guide provided a comprehensive walk-through—from setting up the environment to understanding each module’s role and finally running the interactive Gradio interface. Whether you’re a developer looking to expand your portfolio, an HR professional wanting to streamline candidate screening, or a job seeker aiming to enhance your resume, the JobFitAI project offers practical insights and an excellent starting point for further exploration.
Embrace the power of AI, experiment with new features, and continue refining the project to suit your needs. The future of job applications is here, and it’s smarter than ever!
A: The current version supports resumes in PDF and audio formats. Future updates may include support for additional formats such as DOCX or plain text.
A: No, accessing the DeepSeek-R1 model through the DeepInfra API requires a paid plan. For detailed pricing information, please visit DeepInfra’s official page.
A: Yes! You can adjust the prompt or integrate additional models to tailor the feedback to your specific requirements.
A: Audio transcription may sometimes be delayed, especially for larger files. Verify that your environment meets the necessary computational requirements, and consider optimizing the transcription process or using cloud-based resources if needed.
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.