Explore the FreeResume resume parser's capabilities by selecting an example resume or uploading your own. The parser extracts key information, helping you understand how well your resume is formatted for Application Tracking Systems (ATS).

Candidate Profile
Full Name
Email Address
Phone Number
Location
Website/Portfolio
Professional Summary
Educational Background
Institution
Degree
GPA
Dates Attended
Key Achievements
Professional Experience
Organization
Position
Tenure
Responsibilities & Achievements
Technical & Professional Skills
Skills Overview

Resume Parsing Algorithm: Technical Overview

This section provides a comprehensive exploration of the resume parsing algorithm developed by our organization. It outlines the four-step process designed to extract structured data from single-column English-language resumes.

Step 1: Extract Text Items from PDF

The PDF format, standardized under ISO 32000, encodes content in a complex structure. To process a resume, our parser decodes the PDF using Mozilla's open-source pdf.js library to extract text items, including their content and metadata such as x, y coordinates, bold formatting, and line breaks.

The table below displays 0 text items extracted from the provided resume PDF. Each item includes metadata such as position (relative to the bottom-left corner at origin 0,0), boldness, and newline indicators.

#	Text Content	Metadata

Step 2: Group Text Items into Lines

Extracted text items require further processing to address two challenges:

Challenge 1: FragmentationText items, such as phone numbers (e.g., "(123) 456-7890"), may be split into multiple fragments. To resolve this, adjacent items are merged if their horizontal distance is less than the average character width, calculated as: $Distance = RightTextItemX₁ - LeftTextItemX₂$ The average character width excludes bolded text and newlines to ensure accuracy.

Challenge 2: Lack of ContextRaw text items lack the contextual associations humans infer from visual cues. Our parser groups items into lines, mimicking human reading patterns, to establish these relationships.

The result is 0 lines, displayed below. Multiple text items within a line are separated by a vertical divider.

Lines	Line Content

Step 3: Group Lines into Sections

Building on line grouping, this step organizes lines into sections to enhance contextual understanding. Most sections begin with a title, a common convention in resumes.

Section titles are identified using a primary heuristic requiring:
1. A single text item in the line
2. Bold formatting
3. All uppercase letters
A fallback heuristic uses keyword matching against common resume section titles if the primary criteria are not met.

The table below shows identified sections, with titles in bold and associated lines highlighted in matching colors.

Lines	Line Content

Step 4: Extract Resume Data from Sections

The final step extracts structured resume data using a feature-scoring system. Each resume attribute is evaluated against custom feature sets, which assign positive or negative scores based on matching criteria. The text item with the highest score is selected as the attribute value.

Feature Scoring System

The table below illustrates three attributes extracted from the profile section of the provided resume, showing the highest-scoring text and scores for other candidates.

Resume Attribute	Text (Highest Feature Score)	Feature Scores of Other Texts
Name
Email
Phone

Feature Sets

Feature sets are crafted based on two principles:
1. Relative comparison to other attributes in the same section
2. Manual design reflecting attribute characteristics
The table below details feature sets for the name attribute, including positive scores for matches and negative scores for non-matches.

Name Feature Sets
Feature Function	Feature Matching Score
Contains only letters, spaces, or periods	+3
Is bolded	+2
Contains all uppercase letters	+2
Contains @	-4 (email match)
Contains number	-4 (phone match)
Contains ,	-4 (address match)
Contains /	-4 (URL match)

Core Feature Functions

Each attribute relies on a core feature function for identification, as shown below.

Resume Attribute	Core Feature Function	Regex
Name	Contains only letters, spaces, or periods	/^[a-zA-Z\s\.]+$/
Email	Matches email format xxx@xxx.xxx xxx can be any non-space character	/\S+@\S+\.\S+/
Phone	Matches phone format (xxx)-xxx-xxxx Optional parentheses and dashes	/$?\d{3}$?[\s-]?\d{3}[\s-]?\d{4}/
Location	Matches city and state format City, ST	/[A-Z][a-zA-Z\s]+, [A-Z]{2}/
URL	Matches URL format xxx.xxx/xxx	/\S+\.[a-z]+\/\S+/
School	Contains keywords like College, University, School
Degree	Contains keywords like Associate, Bachelor, Master
GPA	Matches GPA format x.xx	/[0-4]\.\d{1,2}/
Date	Contains year, month, season, or 'Present' keywords	Year: /(?:19\|20)\d{2}/
Job Title	Contains keywords like Analyst, Engineer, Intern
Company	Is bolded or excludes job title/date patterns
Project	Is bolded or excludes date patterns

Handling Subsections

For sections like education or work experience, subsections are detected using a heuristic based on vertical line gaps (1.4x the typical gap) or bolded text. Each subsection is processed independently to extract attributes.

Authored by Farouk Jjingo, January 2025

Resume Parser Playground

Example Resumes

Resume Example 1

Resume Example 2

Parsing Results