Lean Hiring Using Machine Learning – Webinar PPT

advertisement
Lean Hiring Aided by
Machine Learning
December 18, 2014
Presented by
Vinayak Joglekar, Co-Founder
and CTO, Synerzip
Discussion Topics
1.
2.
3.
4.
5.
6.
7.
The Problem
Lean Hiring
Resume Ranking - Choice of Algorithm
Data Acquisition and Cleaning Challenges
Initial Results
Improving Accuracy
The Road Ahead
Confidential
December 18, 2014
Needle in the Haystack Situation
1. Waste in the process of hiring =
resumes reviewed but not
selected + candidates
interviewed but not selected.
2. Every candidate “hypes” the
resume to a certain extent.
Much time is wasted reading
pages of hyperbole to discover
the grain of truth.
3. Hiring managers have other
priorities - their precious time is
wasted in interviewing
unsuitable candidates whose
resumes look good.
Confidential
December 18, 2014
More Choice Isn’t Always Better
Can I see
some more
resumes?
Send
Some
more
Some
more
please
Is this
all you
have?
Week 3
Week 2
Week 1
Feedback after 4 weeks
Myth: More choice = better selection
Reality: More choice = waste + delay + confusion
Confidential
December 18, 2014
Make Them Jump through Hoops
1. Common response to the
problem is to have a strict
filtration process consisting of
series of tests and interviews.
2. Good candidates are often
not actively looking for a
change. They get turned off
by the long evaluation
process.
3. The evaluation process is
often flawed with too much
stress on specific skills than
abilities. Tests are
susceptible to gaming.
Confidential
December 18, 2014
Discussion Topics
1.
2.
3.
4.
5.
6.
7.
The Problem
Lean Hiring
Resume Ranking- Choice of Algorithm
Data Acquisition and Cleaning Challenges
Initial Results
Improving Accuracy
The Road Ahead
Confidential
December 18, 2014
Kanban
• Extensively used in automobile industry.
• Principle: Any process consisting of a
workflow can’t run faster than the bottleneck.
• All sub-processes that run faster than the
bottleneck produce waste.
• Kanban ensures all sub-processes march to
the drum-beat set by the bottleneck.
• Kanban is pull-based. A sub-process can’t
pull more work than a pre-set WIP limit.
Confidential
December 18, 2014
Solution: Lets Limit the Choice!
Confidential
December 18, 2014
Challenge: Ranking Resumes
It took half a day for an experienced recruiter to rank a few
resumes. He kept asking what was more important: Was it pay,
soft skills or experience? The answer is that it depends on the job.
Each job needs a different weightage to be assigned to each one
of these attributes. Thus ranking reduces to assigning appropriate
weightages to the attributes.
Job 1
Job 2
Job 3
Job 4
Expected Pay
Experience
Education
Job Switches
Soft skills
Location
Lead Time
Confidential
December 18, 2014
Discussion Topics
1.
2.
3.
4.
5.
6.
7.
The Problem
Lean Hiring
Resume Ranking- Choice of Algorithm
Data Acquisition and Cleaning Challenges
Initial Results
Improving Accuracy
The Road Ahead
Confidential
December 18, 2014
Why Logistic Regression?
• Training a machine learning algorithm for resume ranking would
require a significant number of resumes that are pre-ranked by a
human expert.
• It is very difficult for human expert to rank resumes.
• On the other hand we have a lot of resumes that are classified as
suitable or unsuitable which can be used to train the logistic
regression algorithm.
The graph here shows how the
probability of a resume being suitable
depends on the attributes (X) and the
weights assigned to them (θ)
hθ(X)=1/(1+e-Z) where
Z= θ1X1+ θ2X2 +…+ θnXn where
X1,X2,..,Xn are various attributes and
θ1,θ2 ,..,θn are the weights assigned to
them
Confidential
December 18, 2014
What is Decision Boundary?
• x1 = experience and x2 = pay
• 1 = suitable and 0 = unsuitable
• Observation 1: More pay reduces the
probability of selection and more
experience increases the same
• Decision boundary is an imaginary line that separates suitable &
unsuitable examples
• In this case the line is x1-x2-3=0
• Points below the line are likely to be suitable
• Points above the line are likely to be unsuitable
• Points along the line have equal probability of being suitable or unsuitable
– hence hθ(X)=.5 & θTX=0 along this line
Confidential
December 18, 2014
Training, Test & Validation Sets
Over/ under fitting?
Supervised
Learning
Algorithm
Input
Validation
Validation
Set (20%)
Valid
Training
Validation
Test
}
60%
}20%
}20%
Available
Data
Training
Set(60%)
Testing
Test Set
(20%)
Output
Testing is the measure the accuracy with which true positives and
true negatives are predicted by the algorithm. In tests like cancer
detection; false negatives can prove to be fatal.
Confidential
December 18, 2014
Implementation Challenges
0
0
0 0 0 0
0
1
1
1 1
1 1 1 1
1 11
1
1
0 0 0
0
0
• Very difficult to manually rank/ grade resumes
thus we can’t use standard ranking algorithms.
• Small training sets, even smaller test sets.
0
0 0 0
0
0
Relevant experience
Decision
Boundary
• There are more than 13 attributes - experience,
education, pay, location, availability, stability,
current job, etc. based on which a candidate is
selected or rejected. Many of these attributes
are subjective and need to be quantified.
• There is no clear decision boundary.
• We addressed these challenges by using data from our ATS about 20 job
openings for which more than 3000 resumes were considered and more
than 400 candidates were found suitable to be called for interview.
• We used a sixth degree polynomial that lends itself well to render a
decision boundary with an irregular shape.
• We quantified the subjective attributes like education, stability etc.
• We used every 4th record to test and others to train the algorithm.
Confidential
December 18, 2014
Discussion Topics
1.
2.
3.
4.
5.
6.
7.
The Problem
Lean Hiring
Resume Ranking- Choice of Algorithm
Data Acquisition and Cleaning Challenges
Initial Results
Improving Accuracy
The Road Ahead
Confidential
December 18, 2014
Results and Analysis
• When we used the values of weightages delivered by the
algorithm to predict, we could correctly predict 89% of the
examples in the training set that was used to train the algorithm.
•
The same algorithm could predict 65% of the test examples.
• We improved the accuracy by 9% when we used the sixth degree
polynomial.
• We used the weightages to assign ranks and the ranking was well
accepted and appreciated by hiring managers within Synerzip.
• We started practicing Kanban and lean hiring as ranking enabled
us to put WIP limit on the number of resumes entering the hiring
process.
• The hiring efficiency improved and we were able to fill in more
positions without adding any new recruiters.
Confidential
December 18, 2014
Example Weightages - Analysis
It turns out to be a fairly
distributed set of values
for weightages for
various attributes. Each
job opening uses
independent assessment
of resumes.
This position assigns
positive weightage to
total experience but
negative weightage to
relevant experience.
The requirement was
for a broader skillset
beyond just C++.
This job opening gives extremely negative weightage to “current
compensation” – this means that candidates earning well are not suitable;
while its just the opposite case for most other job openings.
Confidential
December 18, 2014
Comparing Results
Before
After
Hiring Cycle Time
6 to 8 weeks
3 to 4 weeks
Hit Rate
Less than 10%
Almost 50%
Hiring Manager’s Time 15 to 25 hours per
per hire
position
Less than 10 hours
per position
Recruiter’s time per
hire
Close to 100 hours per Less than 40 hours
position
per position
Hiring Mistakes
Low confidence
Confidential
High confidence
December 18, 2014
Discussion Topics
1.
2.
3.
4.
5.
6.
7.
The Problem
Lean Hiring
Resume Ranking- Choice of Algorithm
Data Acquisition and Cleaning Challenges
Initial Results
Improving Accuracy
The Road Ahead
Confidential
December 18, 2014
Improving Accuracy
• We tried using regularization to avoid over-fitting. However it did
not yield any improvement in accuracy. As we have the accuracy
at 89% while predicting the training set itself, it can be intuitively
concluded that over-fitting doesn’t need to be addressed.
• We need to get more training data to improve accuracy. Also the
training data should pertain to a period over which the job
requirements are constant. It’s very hard to find job openings
where more than a hundred candidates are screened. As we plan
to implement lean hiring and Kanban, the chances of having large
training sets is very low.
• We tried seeing if principal component analysis can be used to
reduce the number of attributes to 2 or 3 to be able to plot. We
could not get the “retained variance” anywhere close to 99%. (In
fact, it was close to 50%.)
Confidential
December 18, 2014
Discussion Topics
1.
2.
3.
4.
5.
6.
7.
The Problem
Lean Hiring
Resume Ranking- Choice of Algorithm
Data Acquisition and Cleaning Challenges
Initial Results
Improving Accuracy
The Road Ahead
Confidential
December 18, 2014
Future Roadmap
• Create search engine app with ranking on the
fly to limit the number of search results that
fit within the smartphone screen with no need
to scroll.
• Try using sixth degree polynomial with more
attributes. Currently we are using it only on
expected compensation and relevant
experience. This will most likely improve the
accuracy.
• Using NLP for information extraction and
more precise attribute values.
Confidential
December 18, 2014
www.synerzip.com
Hemant Elhence
hemant@synerzip.com
469.374.0500
Confidential
•23
84
December 18, 2014
Synerzip in a Nutshell
• Software product development partner for small/mid-sized technology
companies
–
–
–
Exclusive focus on small/mid-sized technology companies, typically
venture-backed companies in growth phase
By definition, all Synerzip work is the IP of its respective clients
Deep experience in full SDLC – design, dev, QA/testing, deployment
• Dedicated team of high caliber software professionals for each client
–
–
–
Seamlessly extends client’s local team, offering full transparency
Stable teams with very low turn-over
NOT just “staff augmentation”, but provide full mgmt support
• Actually reduces risk of development/delivery
–
–
Experienced team - uses appropriate level of engineering discipline
Practices Agile development – responsive, yet disciplined
• Reduces cost – dual-shore team, 50% cost advantage
• Offers long term flexibility – allows (facilitates) taking offshore team
captive – aka “BOT” option
nfidential
December 18, 2014
Our Clients
Confidential
December 18, 2014
Next Webinar
Agile Leadership: Want to change your results?
Change how you lead.
Complimentary Webinar:
Wednesday, January 21, 2015 @ noon CST
Presented by: Niel Nickolaisen,
Chief Technology Officer at OC
Tanner. He also co-authored
“Stand Back and Deliver:
Accelerating Business Agility”
which gives you the agile
leadership tools you’ll need to
achieve breakthrough levels of
performance.
Confidential
December 18, 2014
Thanks!
Call Us for a Free Consultation!
Hemant Elhence
hemant@synerzip.com
469.374.0500
linkedin.com/company/synerzip
@Synerzip_Agile
facebook.com/Synerzip
Confidential
December 18, 2014
Download