FreeResumeFreeResume Logo

Resume Parser Playground

Explore the FreeResume resume parser's capabilities by selecting an example resume or uploading your own. The parser extracts key information, helping you understand how well your resume is formatted for Application Tracking Systems (ATS).

Example Resumes

Resume Example 1

Borrowed from University of La Verne Career Center - Link

Resume Example 2

Created with FreeResume resume builder - Link

Parsing Results

Candidate Profile
Full Name
Email Address
Phone Number
Location
Website/Portfolio
Professional Summary
Educational Background
Institution
Degree
GPA
Dates Attended
Key Achievements
Professional Experience
Organization
Position
Tenure
Responsibilities & Achievements
Technical & Professional Skills
Skills Overview

Resume Parsing Algorithm: Technical Overview

This section provides a comprehensive exploration of the resume parsing algorithm developed by our organization. It outlines the four-step process designed to extract structured data from single-column English-language resumes.

Step 1: Extract Text Items from PDF

The PDF format, standardized under ISO 32000, encodes content in a complex structure. To process a resume, our parser decodes the PDF using Mozilla's open-source pdf.js library to extract text items, including their content and metadata such as x, y coordinates, bold formatting, and line breaks.

The table below displays 0 text items extracted from the provided resume PDF. Each item includes metadata such as position (relative to the bottom-left corner at origin 0,0), boldness, and newline indicators.

#Text ContentMetadata

Step 2: Group Text Items into Lines

Extracted text items require further processing to address two challenges:

Challenge 1: FragmentationText items, such as phone numbers (e.g., "(123) 456-7890"), may be split into multiple fragments. To resolve this, adjacent items are merged if their horizontal distance is less than the average character width, calculated as: Distance = RightTextItemX₁ - LeftTextItemX₂ The average character width excludes bolded text and newlines to ensure accuracy.

Challenge 2: Lack of ContextRaw text items lack the contextual associations humans infer from visual cues. Our parser groups items into lines, mimicking human reading patterns, to establish these relationships.

The result is 0 lines, displayed below. Multiple text items within a line are separated by a vertical divider.

LinesLine Content

Step 3: Group Lines into Sections

Building on line grouping, this step organizes lines into sections to enhance contextual understanding. Most sections begin with a title, a common convention in resumes.

Section titles are identified using a primary heuristic requiring:
1. A single text item in the line
2. Bold formatting
3. All uppercase letters
A fallback heuristic uses keyword matching against common resume section titles if the primary criteria are not met.

The table below shows identified sections, with titles in bold and associated lines highlighted in matching colors.

LinesLine Content

Step 4: Extract Resume Data from Sections

The final step extracts structured resume data using a feature-scoring system. Each resume attribute is evaluated against custom feature sets, which assign positive or negative scores based on matching criteria. The text item with the highest score is selected as the attribute value.

Feature Scoring System

The table below illustrates three attributes extracted from the profile section of the provided resume, showing the highest-scoring text and scores for other candidates.

Resume AttributeText (Highest Feature Score)Feature Scores of Other Texts
Name
Email
Phone

Feature Sets

Feature sets are crafted based on two principles:
1. Relative comparison to other attributes in the same section
2. Manual design reflecting attribute characteristics
The table below details feature sets for the name attribute, including positive scores for matches and negative scores for non-matches.

Name Feature Sets
Feature FunctionFeature Matching Score
Contains only letters, spaces, or periods+3
Is bolded+2
Contains all uppercase letters+2
Contains @-4 (email match)
Contains number-4 (phone match)
Contains ,-4 (address match)
Contains /-4 (URL match)

Core Feature Functions

Each attribute relies on a core feature function for identification, as shown below.

Resume AttributeCore Feature FunctionRegex
NameContains only letters, spaces, or periods/^[a-zA-Z\s\.]+$/
EmailMatches email format xxx@xxx.xxx
xxx can be any non-space character
/\S+@\S+\.\S+/
PhoneMatches phone format (xxx)-xxx-xxxx
Optional parentheses and dashes
/\(?\d{3}\)?[\s-]?\d{3}[\s-]?\d{4}/
LocationMatches city and state format City, ST/[A-Z][a-zA-Z\s]+, [A-Z]{2}/
URLMatches URL format xxx.xxx/xxx/\S+\.[a-z]+\/\S+/
SchoolContains keywords like College, University, School
DegreeContains keywords like Associate, Bachelor, Master
GPAMatches GPA format x.xx/[0-4]\.\d{1,2}/
DateContains year, month, season, or 'Present' keywordsYear: /(?:19|20)\d{2}/
Job TitleContains keywords like Analyst, Engineer, Intern
CompanyIs bolded or excludes job title/date patterns
ProjectIs bolded or excludes date patterns

Handling Subsections

For sections like education or work experience, subsections are detected using a heuristic based on vertical line gaps (1.4x the typical gap) or bolded text. Each subsection is processed independently to extract attributes.

Authored by Farouk Jjingo, January 2025