Future Generation Computer Systems 111 (2020) 654–667
Contents lists available at ScienceDirect
Future Generation Computer Systems
journal homepage: www.elsevier.com/locate/fgcs
Human capital evaluation in knowledge-based organizations based on
big data analytics
∗
Sergiu Stefan Nicolaescu a , Adrian Florea b , , Claudiu Vasile Kifor a , Ugo Fiore c ,
Nicolae Cocan d , Ilie Receu e , Paolo Zanetti c
a
Department of Industrial Engineering and Management, Lucian Blaga University of Sibiu, Romania
Department of Computer Science and Electrical Engineering, Lucian Blaga University of Sibiu, Romania
c
Department of Management Studies and Quantitative Methods, Parthenope University of Naples, Italy
d
UiPath S.R.L. Cluj Napoca, Romania
e
SOBIS Solutions S.R.L. Sibiu, Romania
b
article
info
Article history:
Received 6 March 2019
Received in revised form 18 June 2019
Accepted 27 September 2019
Available online 3 October 2019
Keywords:
Score
Human resource (HR)
Human capital (HC)
Analytics
Big data
a b s t r a c t
Starting from a Human Capital Analysis Model, this work introduces an original methodology for
evaluating the performance of employees. The proposed architecture, particularly well suited to the
special needs of knowledge-based organizations, is articulated into a framework able to manage cases
where data is missing and an adaptive scoring algorithm takes into account seniority, performance,
and performance evolution trends, allowing employee evaluation over longer periods. We developed a
flexible software tool that gathers data from organizations in an automatic way – through adapted
connectors – and generates abundant results on the measurement and distribution of employees’
performances. The main challenges of human resource departments – quantification of human resource
performance, analysis of the distribution of performance, and early identification of employees willing
to leave the workforce – are handled through the proposed IT platform. Insights are presented on
different granularity levels, from organization view down to department, group, and team.
© 2019 Elsevier B.V. All rights reserved.
1. Introduction
People at all levels of management, coordinating small or large
teams on different organization levels, are aware of the great
effort involved in gathering information on employees’ performances, surveying their opinions on job satisfaction, and so forth.
The question arises: How much of the information gathered is really
used, and is the existing knowledge or insight level the maximum
that can be obtained?
The most important things for managers to focus on is the
value of each employee as an individual. HR (human resources)
professionals identify the following insights as high value: evaluation of employee performance, training, and development [1];
prediction of turnover; and planning of succession [2]. Low performance of employees, recruitment and replacement of talent,
and loss of freshly trained employees or the most valuable and
senior employees generate high costs for a company, decreasing
its operational efficiency. Generally, organizations know essential
information about their employees, such as their salary progress,
∗ Corresponding author.
E-mail address: adrian.florea@ulbsibiu.ro (A. Florea).
https://doi.org/10.1016/j.future.2019.09.048
0167-739X/© 2019 Elsevier B.V. All rights reserved.
completed trainings, project experience, main expertise, performance level, and goal achievement, all of which are captured in
one form or another.
These data fit no general model well, are not correlated, and
change frequently, especially in knowledge-based organizations;
instead, manufacturing—based organizations can measure work
norms easily and clearly (ex: 8 man hours are needed for 20 units
of product). Therefore, it is hard to find a response to key questions such as: How can we evaluate the performance of employees in
a knowledge-based organization? or Why did the employee leave the
organization? These responses are critical for human resource departments in helping them forecast performance drops, increases,
departures, trends, or problems that might be encountered.
This work proposes a model that quantifies and appropriately
exploits employees’ information by applying data analytics that
generated insights and by providing a new method for evaluating
human capital inside a knowledge-based organization. The approach relies heavily on the use of big data, increasing its accuracy
through the quantity of relevant data collected.
The proposed algorithm evaluates human capital (HC) by
quantifying the performance of a company’s employees, accounting for both their seniority and professional development, and
identifies internal factors that may cause them to leave the workplace; all of this is integrated with already existing information
S.S. Nicolaescu, A. Florea, C.V. Kifor et al. / Future Generation Computer Systems 111 (2020) 654–667
and communications technology (ICT) for industrial companies.
The algorithm was validated with real data from a multinational
organization; the data was anonymized and the results were
reviewed and evaluated by the management.
The rest of the paper is organized into six sections. Section 2
briefly reviews state-of-the-art papers related to this study. Section 3 describes the proposed approach for measuring human
capital using data analytics, while Section 4 presents an analysis
of the distribution of competences in different organizational
units. Section 5 goes deeper in the technical area and describes
the methodology for computing the Human Resource score. Section 6 analyzes the experimental results, providing some interpretations or possible guidelines for HR managers. Finally,
Section 7 highlights the paper’s conclusions and suggests future
research directions.
2. Related work
HR analytics is a method used to improve individual and
organizational performance by improving the quality of decisionmaking [3]. The approach is relatively new, and its use has seen
a noticeable rise in popularity recently [4]. However, researchers
have found that the actual application of analytics by companies
remains at the initial stages [5], and a wide range of research is
needed on this topic. The following achievements can be observed
in successful analytics approaches: gathering and interpreting
complex data, drawing connections between data and larger business strategies, and using multiple models to generate reliable
predictions [2].
This section briefly presents a few existing studies, connected
with our work and focused on measuring the human capital
within organizations. Iwamoto et al. [6] have provided a quantitative approach to human capital management targeting especially
financial performance indices of employees. In contrast with our
work, their objective was not to evaluate human capital but to
construct a tool to evaluate (and explain how to evaluate) human
capital management. Another difference lies in the manner the
score is computed. Ours is based on real data about company
employees, unlike their statistical approach, which used different
data reports provided by a private Japanese company specializing
in economic and business analysis and management.
Abdullah et al. [7] have proposed an analytical hierarchical
approach to ranking indicators of human capital in Malaysia
using mathematics and psychology in a comparative evaluation
model. It ranks four main indicators (creating results by using
knowledge, employees’ skill index, sharing and reporting knowledge, and the succession rate of training programs) respecting
five main criteria of human capital (talent, strategic integration, cultural relevance, knowledge management, and leadership).
Creating results by using knowledge proved to be the most important indicator of HC management in Malaysia, whereas the
employee skill index had the lowest importance. These results
correspond to the trend of employers looking not only for skilled
people but also for people who are able to adapt, eager to learn,
and have soft skills. Somewhat similarly, our approach collects
and—most importantly—quantifies and stores in a database the
employee records with reference to four important class of key
performance indicators (KPIs): technical skills, soft skills and motivation, achievements, and involvement. However, unlike this
study, Abdullah et al. [7] did their analysis after decision-makers
were asked to set up a comparison relationship between pairs of
indicators, emphasizing how important or less important some
indicators were relative to each other.
In [8], the author generated worker skills and job skills networks for measuring human capital, proposing that the results
could be used to better correlate employees’ wages with their
655
skills and to determine the degree to which workers skills
matched employers’ job tasks. The networks’ vertices are skills,
and two skills are connected by an edge if a worker has both or
if both are required for the same job. Test data were collected
from an online freelancing website. The analysis aimed to divide
diverse skills from more specialized skills, individual skills, and
skill combinations. A connection between [8] and our work is the
fact that skills are usually characterized using years of training as
a measure; as we show in the score computation methodology,
we have included in the scoring algorithm KPIs on certifications
and trainings achieved, as well as on employee seniority.
In Chen and Chen [9], the authors applied a Chi-square automatic interaction detector (CHAID) data-mining algorithm, based
on decision trees and association rules, to employees’ characteristics and work performance, including their opportunity to
leave the workforce, in order to generate useful rules for ‘‘head
hunting’’. They predicted employees’ performance and retention
based on profile features (age, gender, education, experience,
recruitment source, etc.) that can be obtained at the selection
stage. The solution was tested using empirical data on engineers
with different job functions at a semiconductor factory located
in Taiwan. Two similarities exist between the work of Chen and
Chen [9] and our approach. First, the solutions are applied to
a high-tech industrial company, and second, the analyses have
the same target, namely, determining employee performance and
retention risk; however, they used a different methodology and
calculation tools.
An in-depth study on human capital theory, measures, and
metrics has been presented by Charlwood et al. [10]. The authors
examined current HR analytics practices exhibiting a large number of different HC metrics. Some of these metrics are considered
in our analysis as KPIs (i.e., employee engagement, technical skills,
etc.).
In Li and Zhang [11], the authors used a neural network to
compute the index value of individual human capital in high-tech
enterprises. The evaluation index system included a two-level
hierarchy of indicators. The first layer refers to the existing value
of HC, the potential value of HC, the ongoing cost of HC, and the
opportunity cost of HC. The second layer expands each indicator
from the first layer to others, 16 in total, such as education, age,
gender, professional knowledge, and skills, which are in some
ways similar to our KPIs.
3. Proposed approach for measuring HC assets
The ability to extract insights from data and use them in
decision-making has become increasingly important in recent
years, and its main applicability is in the field of human resource
development. This Section will present a method that exploits
this ability and quantifies employee performance.
3.1. The business intelligence process
The model developed to measure human capital in an organization is a decision-making tool dedicated to human resources
departments to visualize and enhance employee performance and
to increase employee retention rates. Based on raw data and KPIs
specified by the HR department and saved in a centralized nonrelational database for each employee, the algorithm calculates a
score for each employee. The performance of each structure in the
organization is determined based on the average performance of
its component members.
It is worth noting that, in the literature, KRIs (Key Results
Indicators) are often used instead of KPIs [12]. The difference
is that KRIs measure the results from business actions, while
KPIs measure the actions and events that lead to a result, so
656
S.S. Nicolaescu, A. Florea, C.V. Kifor et al. / Future Generation Computer Systems 111 (2020) 654–667
the former are critical in measuring progress and the latter are
crucial in creating and evaluating strategies. In the present work,
KPIs have been used because the knowledge-based nature of
the organization emphasizes the importance of flexibility in the
business processes.
The present tool is extremely useful for HR managers because,
by processing and interpreting the data extracted, they can obtain
a fairly accurate understanding of employees’ performance indicators, attrition risk, and patterns of leaving the workforce. Using
this tool, HR managers can identify the causes of attrition and
find solutions to increase retention rates. This type of analysis is
called ‘‘business intelligence’’ and it is becoming a key factor in
planning strategies for increasing economic efficiency. With this
pattern recognition ability, a management team can confidently
make decisions that in the past would have been considered risky
and dependent on a manager’s skills on the subject.
In Fig. 1, a flow process is presented that is used to create
business intelligence inside massive organizations. The raw data
assets of an organization are correlated with employee KPIs –
selected jointly by HR and management – and used as input
for the algorithms. On the first layer, an algorithm is used to
calculate a score for each person, using the data to quantify
aggregate scores for teams, groups, departments, and the whole
organization. On the second layer, machine-learning algorithms
can be used for prediction when sufficient historical data have
been collected, providing output that represents the intelligence
that is offered as feedback to the organization’s management.
3.2. HCDA Model
The Human Capital Data Analytics Model (HCDA) collects data
from current and previous employees having the purpose to
enhance retention. It exploits internal factors (from inside of
organization) and external opportunities that provoked attrition
in employees and led them to leave the company. Regarding the
internal factors, HCDA starts with the collection and processing of
data that is created during the Exit Interview process. It identifies
the weak and strong points of the organization and specific jobs,
from the employees’ point of view. Also, the employees’ history
inside organization, through KPIs, is stored in a database in order to compute the Human Resource score for each employee.
Descriptive analytics algorithms are used to gather insights from
the correlation of data received from all employees that left the
organization. Thus, some patterns that are applicable to a large
number of employees will be identified. Patterns can have a
positive connotation, on which the company should focus on, like
maintenance and promotion, and on the other hand, negative
sides can be identified, like little investments in technical training
and development of employees, repetitive or uninteresting tasks,
etc. External opportunities are continuously monitored with an
automated crawler (self-developed software module) looking at
well-known jobs websites from Internet and social media. In
the ‘‘Predictive analytics’’ stage, machine learning algorithms like
partition trees, gradient boosting, and deep learning networks
will be run on the employees’ historical data stored in database.
The algorithms will first receive as input the employees that
already left the workforce, to learn the pattern and afterwards
will be programmed to classify between employees who are at
risk of leaving the organization and the ones who are not.
The HCDA model suggests some trends in employees’ career
path, whether they exceed expectations or just meet job requirements, whether their skill level is below the standards in the job
description, or how good they fit within the organizational unit
they belong to, which in turn may be an indicator of their willingness to leave the organization. The results, analyzed individually
within a department, can determine the degree of compatibility
of an employee with the position, team, and department in which
he or she operates. The model can spot high fluctuations in performance from one year to another. It can also identify whether
someone’s performance or position stalls for a long period. Also,
by having information about employees who have left an organization and exhibited certain patterns (i.e., they have similar
scores as current employees of the same age), organizations can
prevent current employees from leaving by analyzing the reasons
for their dissatisfaction.
The researched model receives as input conventional and nonconventional data on an employee and will quantify his or her
value inside an organization. The evaluation is made based on
the employee’s work output and personal performance in the
present and past. Employee performance is analyzed under four
categories: technical skills, soft skills and motivation, employee
achievements (inside and outside the organization), and employee dedication (defined as the extra mile the employee is
willing to go to achieve business goals). Data used to extract the
employee’s ranking or performance level in these characteristics
are divided into two levels: conventional and non-conventional.
Data processing begins with conventional employee data (e.g., an
assessment received from a direct superior, the level of achieved
goals) followed by non-conventional data (data that do not normally directly expose the performance level, but which, when
correlated with other data, can provide some useful insights).
HR management provides a very large amount of data that can
be processed and contains valuable information; data have the
special V characteristics: variety (the data is in various forms),
volume (high volume generated by a multinational company
including structured and mostly non-structured data), velocity
(the data is generated with a big speed, the number of employees’ tasks are continuously changing and all contain useful
information).
The model represented in Fig. 3 (an abstract view of HCDA)
is implemented from the inner block containing conventional
data analytics, and it continues with the outer blocks until the
predictive analytics are implemented. The deployment of the
HCDA model, applied on an organization level, is done in four
phases:
1. Identify the employee characteristics that are most relevant for the organization, at least one for each group:
technical skills, soft skills and motivation, achievements,
and going the extra mile.
2. Gather all conventional and non-conventional data that
already exist in the organization. The tool and instruments
that already contain data are analyzed, and it is decided
how the data can be imported. If there is a lack of data
in some areas, the organization’s strategic management
needs to add process activities that will produce data (for
example during an evaluation, the motivation of employees
should be assessed).
3. Apply descriptive analytics to gain insights into the state of
previous and current human capital.
4. Apply predictive analytics to measure what might happen
in the future and to observe the organization’s evolution or
to identify issues in case the organization makes strategic
changes.
Through deployment of the model, an organization can measure the potential of employees based on various metrics, and
it can compare and balance the team’s work among different
permutations of human resources, resulting in that which is most
appropriate for a given activity. Another advantage is the possibility of quantifying the value gained by the organization from each
human resource, obtaining a total value at a certain moment of
S.S. Nicolaescu, A. Florea, C.V. Kifor et al. / Future Generation Computer Systems 111 (2020) 654–667
657
Fig. 1. Business Intelligence Process.
Fig. 2. The Human Capital Data Analytics (HCDA) model.
time. Management can use this data to adopt a strategy based on
the best development direction.
Currently, most organizations are measuring only the number
of employees, which can lead to misleading information. According to [13], the statistics reveal that a reduced percent of
companies (only 8%) report they have usable data for HR predictive analytics, and IBM institute1 shows that over 40 percent
of organizations are limited to basic HR reporting capabilities.
1 ftp://ftp.software.ibm.com/software/in/IAF2015/Unlock_the_people_
equation_Infographic_PDF.pdf.
Similarly, Bright & Company2 statistics, revealed that on average,
45% of all companies say they engage in basic human resource
reporting, while 55% says they are using advanced metrics, integrated dashboards and customized reports. An increase in the
number of employees does not necessarily represent an increase
in the value of human resources; a person having a strong performance that brings increased value to an organization cannot be
replaced by ten new employees lacking experience — especially at
knowledge-based organizations, where innovation is essential. By
2 http://www.brightcompany.nl/cache/2456_2456/2456.pdf.
658
S.S. Nicolaescu, A. Florea, C.V. Kifor et al. / Future Generation Computer Systems 111 (2020) 654–667
Fig. 3. The KPI classes used in our retention model.
correlating the calculated score for the team with employee compatibility data, algorithms can be trained to provide suggestions
on reorganizing and balancing teams.
Fig. 4 provides the organizational perspective of the HCDA
model:
• Provide a way of measuring human resources by processing
the data generated by them and the assessments received
from their line manager. For each employee, a score is
calculated that represents the employee’s value within the
organization.
• Starting from the employee’s value, the value of the team,
the group, the department, or even the entire organization
can be calculated.
The insights revealed by the analytics, such as the distribution of employees’ performance and attrition risk, are presented
on different granularity levels, with the data cascading from
organization view down to department, group, and team.
4. Analysis of the distribution of competences
In this Section, an analysis of how some key competences
are distributed over the employees in four departments of a
knowledge-based organization is performed. The five competences selected were those with the fewest unevaluated employees, in order to have an evaluation base as large as possible. The
four departments were chosen on the basis of the number of
employees and the diversity of their roles within the organization,
so that they could representative of the entire organization.
The distribution of employees who have obtained top-notch
evaluations over the five selected competencies, together with
the distribution of those who have received disappointing evaluations, provide some information about the relative importance of
these competencies in the different departments. In fact, recruitment is not done randomly and it can be assumed that the needs
of a unit are reflected in the average profile of its employees. It is
intuitive that certain characteristics are essential in a department
and are therefore sought after more actively than other aspects
that, despite being desirable in general, are less aligned with the
goals and everyday practice of the department.
For example, both Problem solving skills and Passion and
Commitment seem to be desired in all the four departments, and
downright essential in the profile of employees in Department 3,
where no one is seen receiving a low evaluation (Fig. 5). Department 1 exhibits the highest percentages of employees with a low
evaluation for all the competences considered, while Department
3 is generally associated with a low incidence of unfavorable
evaluations.
Fig. 6 reports the percentage of employees who have received the highest level of evaluation in the five competencies
considered. In all departments, the percentages of employees
with pleasing evaluations as far as Passion and Commitment is
concerned is higher than it is for the other competencies. The
percentage of very passionate employees in Department 1 is
remarkably high. On the other hand, effectiveness in communications does not seem to be a critical factor all over the four
departments, since the percentages of employees with a high
evaluation are all low (Fig. 6).
The profile of a department can be quantified by the vector
of the median evaluations received by all its employees across
each competence. The deviation with respect to the median could
have been taken as a measure of how relevant the evaluation of
an employee is for the specific competence taken into account.
Since we adopted a multiplicative model, the ratio vij /mkj of
the evaluation over the median evaluation in the department
has been considered instead, where vij = vij (t) denotes the
S.S. Nicolaescu, A. Florea, C.V. Kifor et al. / Future Generation Computer Systems 111 (2020) 654–667
659
Fig. 4. Organizational perspective of the HCDA model.
evaluation received by employee i (belonging to department k)
for competence j and mkj = mkj (t) is the median evaluation for
competence j in department k. The dependence on time will be
omitted for the sake of conciseness whenever it can easily be
inferred. The evaluations for the single competences are then
combined together to obtain a single value. To this end, profiles
will be populated for each department, illustrating the weight
that the management desires to attribute to the single competences. In this way, the management can set specific goals. The
KPIs extracted from each employee’s collected data are stored in
a database and take different values, depending on the company.
After normalization, each KPI is assigned to one of four quadrants
of the HCDA model (technical skills, soft skills and motivation,
achievements, and extra mile.). The weights used during the
yearly evaluation computation are company dependent, as well;
for each quadrant, they are chosen on company management
level. The profile for department k will be a vector with components pkj representing the perceived importance of competence
j in department k. Profiles are then normalized to ensure that
∀k,
∑
j
pkj = 1. The overall evaluation ηi = ηi (t) of employee i
660
S.S. Nicolaescu, A. Florea, C.V. Kifor et al. / Future Generation Computer Systems 111 (2020) 654–667
• Spe — represents an array with SCORE (Si (t )) of previous
employees, i ∈ 1, n
• Sae — represents an array with SCORE of actual employees
• n — is the number of evaluated employees
• t — is the year of evaluation
SCORE = Si (t ) = f1 (ηi , Si (t − 1) , ci , α (si ) , β )
(3)
where,
• f1 — function measuring the score for each employee at year
Fig. 5. Percentage of employees who have received low-grade evaluations in
the five selected competencies.
t
Si (t) — score at time t
ηi (t) — employee’s yearly evaluation
si — employee seniority
ci — indicator specifying whether the employee has changed
role in the last year
• α (si ) — adjusting factor accounting for seniority
• β — adjusting factor accounting for the variation in evaluation
•
•
•
•
Si (t ) =
⎧
⎨Si (t − 1)
⎩
ifci = 1
(4)
Si (t − 1) + α (si ) · β · ηi (t ) otherwise
ηi (t) = f2 (Technical_Si (t), Soft_S_Motiv ationi (t),
Achiev ementi (t), Inv olv ementi (t))
(5)
• f2 — function measuring employee’s yearly evaluation from
Fig. 6. Percentage of employees who have received top-notch evaluations in the
five selected competencies.
superior leader which weighted 4 important classes of KPI
such as: technical skills, soft skills and motivation, achievements and involvements
• Technical_Si (t),
Soft_S_Motiv ationi (t), Achiev ementi (t),
Inv olv ementi (t) - one-dimensional vectors of different sizes.
at time t will be thus given by
ηi =
∑
j
pkj
vij
mkj
(1)
If no profile information is available, uniformly distributed
weights can be used.
5. Problem statement
The formalized organizational structure focuses on roles and
positions rather than the people in the positions [14]. Formalization of an organizational structure is commonly initiated in
an attempt to rationalize the decision-making process. In our
case, starting from employee layer and targeting the process to
all departments of organization, we use a hierarchical approach
regarding technical and soft skills, performance scores, attrition
risk and probability of leaving the company. The main objective
of our work is to create a tool for computing the employees’
performance indicators, such that to determine their attrition
risk and to identify patterns of leaving the workforce. Using this
tool, HR managers can identify the causes of attrition and find
solutions to increase retention rates.
We define the following 5 functions (f0 ÷ f4 ):
Similitude_score = f0 (Spe , Sae )
(2)
where,
Technical_Si (t) = f3 (HW _SW _Si (t), Professional Experiencei (t),
Project_complexityi (t), Self _studyi (t))
(6)
• f3 — function which weighted employee’s yearly hardware/
software skills (bugs fixed, etc.), number of projects successfully finished, number of added functions, etc.
Achiev ementi (t ) = f4 (certifications, papers, job_rotation)
(7)
All functions and parameters that are involved in this formal representation followed the model described by the authors
briefly in Fig. 2 in this work and in more details in Fig. 2 of [15].
Judging from this formal representation it might be observed
that for computing the final score which will be used then in
other pattern recognition algorithms for determining the potential employees to leave is needed to apply 5 different functions
(f0 ÷ f4 ) in a hierarchical (or sequential dependent approach —
meaning that after applying f4 you will determine f3 to use).
Although companies do not have more than hundreds of thousands employees, every employee has many attributes/features
(at least 30) and due to the sequential process of scores generation involves a big volume of data. Furthermore, by anticipating
the potential leaving employees, the scoring algorithm creates
value (specific feature of Big Data) — by producing very important
information for HR departments.
• f0 — function measuring the similitude between two arrays:
Euclidean distance, cosines distance, and exhibits the probability that current employees which have the same performance with those who left the company, are susceptible to
leaving the company also etc.
5.1. Score computation methodology
The algorithm developed to calculate the employee’s value
within the organization is described in this section, detailing
S.S. Nicolaescu, A. Florea, C.V. Kifor et al. / Future Generation Computer Systems 111 (2020) 654–667
661
Table 1
Intervals for the evaluation.
Interval
Description
[0.200, 0.447)
[0.447, 1.000)
[1.000, 2.236)
[2.236, 5.000]
Does not meet expectations
Meets expectations
Sometimes exceeds expectations
Exceeds expectations
the steps implemented to compute the score for each employee,
based on current evaluation data obtained from the HR department and the previous year’s score. The score is determined using
Eq. (4) above.
If the employee’s position or level changed during the evaluation year, an improvement in performance will be harder to
obtain, since in his or her previous status the employee had
greater experience and was more familiar with tasks than in the
new position or level. Thus, our algorithm will not punish the
employee in such circumstances even if he or she gets a lower
yearly evaluation value than he or she did in the year before the
level or position change. Rather, the overall score will be kept the
same as the year before. A binary variable ci , such that ci = 1 if a
position change occurred and ci = 0 otherwise will keep track of
changes in professional category.
The overall score for an employee should reflect his or her
achievements during the evaluation year, but also recognize his
or her efforts toward improving the evaluation with respect
to the previous year. To this end, an adjusting factor β =
β (ηi (t ) , ηi (t − 1)) will be introduced to account for and reward the proclivity to ameliorate. The adjustment factor will be
based on a lookup table based on different performance levels.
As explained above, an employee’s yearly evaluation (ηi ) is the
weighted average of KPIs embedded inside the HCDA model and
can result in a value ranging between 1/5 and 5. This range is
partitioned into the following four intervals, such that the ratio
between the initial √
values of consecutive intervals is constant, and
therefore equal to 5 ≈ 2.236 (see Table 1). Thus, the intervals
are as follows:
The adjustment factor β will be determined according to Table 2. For example, if the previous evaluation ηi (t − 1) belonged to
the interval corresponding to ‘‘sometimes exceeds expectations’’
and the current evaluation ηi (t) points to the ‘‘meets expectations’’ class, β will take a value corresponding to −50%.
Finally, an additional factor associated to seniority will be
introduced so that young employees will be motivated to grow
and, at the same time, experienced employees will be encouraged
to break new boundaries. The seniority weight α (si ) was chosen
to make the reward proportional to the seniority si (measured in
years) of employee i(Fig. 7). The relation between seniority and
α is given by a sigmoid function, hitting a plateau around the
seniority level of 10 years.
α (si ) is computed as follows:
α (si ) =
3/2
1 + 24−si
(8)
The constant values in (8) are carefully chosen, such that the
following conditions are met simultaneously:
• For the minimum seniority of one year, α (1)=1/6 (the minimum weight).
• The 100% contribution is reached at a seniority level of
around 4–5 years, the exact point at which an employee is
considered senior in his or her position.
• The maximum α value of 150% is achieved around for a
seniority level around 10 years and above.
Fig. 7. Seniority-weight evolution over time.
5.1.1. Handling the lack of data
Since the developed application is still a prototype, we had
access to a reduced dataset, accounting for 2015 and 2016 evaluation years. This fact introduced of imbalances between younger
and older employees, which had to be handled in one way or
another. At this point, we decided to compute the score for
missing years using statistical data:
• For each missing data-point, an evaluation score of 2.5 (average value) is assumed.
• This evaluation score is weighted with the seniority adjustment factor.
Table 3 illustrates the score estimation for the 2010–2016 period,
applying the methodology presented above.
This method uses only seniority to fill in missing data-points.
A better approach would be to combine seniority and the gradient
of level change over time to obtain a better approximation of the
missing evaluation values.
5.1.2. Software analytics platform
The Software Analytics Platform was designed to handle the
main challenges of human resource departments: the quantification of human resource performance, the distribution of performance, and the early identification of employees intending to
leave the workforce. To achieve these goals, a flexible architecture is needed that provides the ability to automatically gather
raw data within the organization and process them using data
processing algorithms.
The main architectural decisions are exposed in Fig. 8 and
listed below:
• Usage of NOSql database to handle Big Data challenges,
more explicitly ArangoDB. Conventional databasemanagement tools are inadequate to handle the huge sets
of data produced. In our work we use the NoSQL database
systems that allow simpler scalability and improved performance in terms of maintaining big unstructured data [17].
NoSQL helps deal with the volume, variety, and velocity
requirements of big data. In order to handle large amount of
data which is stored in forms and types, all data is collected
and stored in this centralized database. Considering that
the data generated by employees are reaching enormous
dimensions, and most importantly are varied, the relational
databases are not a suitable solution anymore.
• The algorithm module is written in Python language, using the SciKit Learn library.3 The library offers framework
for classification, clustering, regression and dimensionality
reduction algorithms.
3 https://scikit-learn.org/stable/.
662
S.S. Nicolaescu, A. Florea, C.V. Kifor et al. / Future Generation Computer Systems 111 (2020) 654–667
Table 2
Search table for choosing the performance variation factor β .
Fig. 8. Software Analytics Platform — Static Architecture.
Source: Adapted from [16].
Table 3
Score estimation for a longer period with partially known data.
Year
2010
2011
2012
2013
2014
2015
2016
Available data
Seniority
–
0
–
0
–
–
0
–
1
15%
2.5
–
–
0.38
–
2
56%
2.5
–
–
1.78
–
3
79%
2.5
–
–
3.75
–
4
96%
2.5
–
–
6.15
yes
5
109%
2.8
+0.30
50%
7.68
yes
6
120%
3.2
+0.40
50%
9.6
α
η
∆η
β
Scor
• The visualization of data is implemented using Google Charts.
• Java SE is used for interactions with the various databases
that needs to be interrogated; connectors are being implemented to import/ export the various data in the NoSQL
DB.
• For presentation and user interface, an input of application
parameters and settings, AngularJS was chosen.
Considering that data inside an organization is found in unstructured forms, the software platform provides a functionality
to gather data and store it within a centralized to main ArangoDB
database. The functionality is implemented through personalized
custom data connectors. The connectors need to be adapted for
each dataset and can be triggered manually or cyclic at a certain
period of time.
• XLS/XLSX connectors: A large amount of data inside the
company is available as spreadsheets (.xls or .xlsx files). To
make use of this information, Python scripts were implemented, which collect data from files (and makes a first
processing if necessary) and saves them into the platform
database, so they can be accessed more easily and faster.
Python’s language already possesses many libraries that facilitate the work with spreadsheets.
• PDF connectors: Another important part of the data inside
the company is found in PDF documents. It is proposed to
use the iText library to make these connectors and retrieve
the data contained in a pdf document, and save it in the
ArangoDB database. The documents are scanned based on
predefined identifiers, as the form of templates used in the
organization is known; if there are more templates used
over the years to store data, these are tested and adapted
as needed.
S.S. Nicolaescu, A. Florea, C.V. Kifor et al. / Future Generation Computer Systems 111 (2020) 654–667
663
Fig. 9. Data Analytics platform — Edit Views example.
• Databases: specific connectors developed for each database
• The trendline appears to be left-skewed, highlighting a total
found inside the organization, like SQL, Lotus, MySQL.
• Specific Task management tools: API Tools and Python
scripts were used to gather data and store it to Arango
Database.
organization HC value above the average (a value of 2.0
suggests the employee is meeting his or her requirements,
with most of employees scoring above this value).
• A significant number of employees have a below-average
value, which can be identified as new employees or employees unfit for their current position.
• The number of employees with very poor or very good
results decreases as we approach the extremes.
The architecture of Software Analytics Platform is structured
into two layers, the Core layer and the Application layer. Into the
Core layer is implemented the whole backend, which contains
the functionalities that are common for the entire platform like
the connection (the main implementation of connectors, the personalization is done in the application for each data template
in particular), storage and management of databases etc. Also,
the interfaces with the tools that are already used inside the
organization are implemented into the Core layer, to load data
for processing in an automatic way.
The HCDA model proposed, extracts insights from data generated by employees by quantifying the performance of an organization’s employees and identifies internal/external factors that
may cause them to leave the workplace; all of this needs to be
integrated with already existing databases, tools and instruments
from inside the organization.
The present software platform is extremely useful for HR
managers because, by processing and interpreting the data extracted, they can obtain an accurate understanding of employees’
performance indicators, attrition risk, and patterns of leaving the
workforce.
A screenshot from the developed Software Platform used to
create a query of employees score is presented in Fig. 9.
6. Experimental results
This section contains experimental results obtained after applying the scoring algorithm to data on employees of a real company. The charts are generated using a SW Platform developed
by the authors and made with Google Charts. The exposed results
have applied a destruction factor, in such way that the extraction
of insights is demonstrated but the content is anonymized. The
following two features have been used: histogram & trendlines
(polynomial).
This chart shows the distribution of all employees’ evaluations
corresponding to 2016 data. The trendline corresponding to this
histogram is compared with a normal distribution with a mean
value of 2.0, which is considered the reference. A few insights can
be revealed:
Fig. 11 shows overall scores since the inception of the company. The trendline is now severely right-skewed, which can
indicate a young, growing organization; this conclusion is backed
by the data in Fig. 10, which shows a large number of inexperienced employees who scored low. Helpfully, a large number
of experienced employees is available, as well, who can provide
knowledge and guidance to the new hires.
Our application is flexible and allows statistics on the distribution of all employees’ performances in different time slots
and different departments. The range is an important insight;
it can be compared from one year to another and from one
department, group, or team to another. The chart in Fig. 11
shows a histogram of the total scores of all employees (current or
former) for a company with more than 1200 employees in studied
field, divided into several IT departments. The shape described by
the associated trendline can be explained by the fact that most
employees leaving the company, such as those who make the
biggest contribution to the graph in Fig. 11, make this step in
the first 4–5 years after entering the company. Thus, making a
correlation with the data in Fig. 10, we can say that this category
of employees explains the existence of a large number of total
scores in the interval [10,16].
The next figure comparatively illustrates employees’ performance from different departments. The insights are plotted on a
department level to reveal the distribution in each component.
The analysis can be extended to a group or team level to see
the peaks and the valleys of performance (according to the HCDA
model presented in Fig. 4).
Analyzing the charts from Fig. 12, looking at both axes, we
observed that the best performance is obtained in Department
1 (i.e., it has the most employees with high scores and is the
only department that reaches scores above 40). This result is
confirmed if we compare performance rates (i.e., the total score
664
S.S. Nicolaescu, A. Florea, C.V. Kifor et al. / Future Generation Computer Systems 111 (2020) 654–667
Fig. 10. Organization — employees’ performance for one year (2016).
Fig. 11. Organization — employee performance (histogram) since inception (14 years, including estimations of missing data points).
Fig. 12. Employees’ performance scores: A comparison between departments.
divided by the number of employees in a certain department)
according to Table 4.
The chart in Fig. 13 shows the progress of human capital over
the years through the total computed score of each department.
The blue bar represents the default score calculated for employees for whom data is missing (keeping, as we previously stated,
an average evaluation for a missing year); the red and yellow bars
represent the real data obtained for the years 2015 and 2016,
respectively.
Another metric that can be obtained from this data is the rate
of performance, presented in Table 4, which is computed as the
ratio between total performance and the number of employees in
a given organizational structure.
Fig. 14 represents one of the most important results of our
analysis. On the same chart with different colors, it shows the
distribution of employees that have left the organization and
of employees that are still inside organization, with the score
versus seniority of employees plotted on a scatter chart. It can
be observed that several groups are formed (marked in red). The
employees near these groups present a large risk of leaving the
company. Such information will help HR managers to consider
with great care those employees that are susceptible to leaving
S.S. Nicolaescu, A. Florea, C.V. Kifor et al. / Future Generation Computer Systems 111 (2020) 654–667
665
Fig. 13. The organization score — progress over years.
Fig. 14. Score of all individuals which were or are currently employed by the company (blue — current employees, red — employees which left the company).
Table 4
Score estimation for a longer period with partially known data.
the company and to analyze in more depth their possible reasons
for dissatisfaction.
7. Conclusions and further work
The theoretical contributions of this research are the creation
of a model for applying analytics on Human Resources in a
knowledge-based organization, as well as creation of a scoring
algorithm that can be applied on already existing data – that is
filtered, structured and modeled according to HCDA requirements
– that calculates employee’s value within the organization. The
algorithmic approach considers both the seniority and the professional evolution trends of employees, handling both increases
and decreases in performance. More than that, it can compensate
for changes in level or position. The algorithm equally takes into
account the seniority of an employee and his or her performance
in order to foster young employees’ motivation to grow and, at
the same time, to encourage experienced employees’ to break
new boundaries.
The employee scoring insights revealed by the algorithm are
bringing valuable information to HR department and management of organization when all data is added and the knowledge
is presented on a team, group, department or even organization
level. As an example, it can be learned that the team/organization
is growing overall but the soft skill & motivation is going down
while the technical skills are going strong. The trend of organization performance will be used as support for strategic decisions
and the trend of teams will be used by line management in day
to day work decisions.
Being a prototype application, with some data being unavailable, ways have been found to handle missing data-points. Fortunately, from now on, the database can be updated periodically
with new data, which will increase the accuracy and performance
of our model and its outcomes.
Applying the algorithm to data collected from employees that
left the workforce and, at the same time, to data on current
employees, we identified those persons that present a higher risk
of leaving the company. This information is a key indicator for
human resource managers to take regulatory measures that increase the employees’ retention rate. Using the individual score of
each employee, our application determines and illustrates comparatively the performance of company departments, showing
possible fluctuations, such as high increases or decreases in performance from one year to another. It also identifies whether an
666
S.S. Nicolaescu, A. Florea, C.V. Kifor et al. / Future Generation Computer Systems 111 (2020) 654–667
employee stays too long at the same performance level or in the
same position, making a job rotation advisable.
As further work, we intend to test variable weights for seniority and variance in performance. We will also evaluate the
performance of prediction algorithms – such as partition trees,
gradient boosting, and deep learning networks – perform on the
dataset of employees’ scores. The aim will be to identify common
patterns in the evolution of scores and, in particular, to accurately
anticipate which employees are an attrition risk and are more
likely to leave the organization.
Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared
to influence the work reported in this paper.
Acknowledgment
This work was partially supported by a UEFISCDI Bridge Grant,
PNCDI III financing contract no. 44 BG/2016.
References
[1] W.Y.M. Momin, K. Mishra, HR analytics as a strategic workforce planning,
Int. J. Appl. Res. 1 (4) (2015) 258–260.
[2] K. Wang, J. Taylor, Can you provide the current trends in HR on people
analytics? 2017, Retrieved from Cornell University, ILR School website:
http://digitalcommons.ilr.cornell.edu/student/151.
[3] S.N. Mishra, D.R. Lama, Y. Pal, Human resource predictive analytics (HRPA)
for HR management in organizations, Int. J. Sci. Technol. Res. 5 (5) (2016)
33–35.
[4] King K. G., Data analytics in human resources: A case study and critical
review, Human Resour. Dev. Rev. 15 (4) (2016) 487–495.
[5] J. Lismont, J. Vanthienen, B. Baesens, W. Lemahieu, Defining analytics
maturity indicators: A survey approach, Int. J. Inf. Manage. 37 (3) (2017)
114–124.
[6] H. Iwamoto, M. Takahashi, A quantitative approach to human capital
management, Proc.-Soc. Behav. Sci. 172 (2015) 112–119.
[7] L. Abdullah, S. Jaafar, I. Taib, Ranking of human capital indicators using
analytic hierarchy process, Proc.-Soc. Behav. Sci. 107 (2013) 22–28.
[8] K.A. Anderson, Skill networks and measures of complex human capital,
Proc. Natl. Acad. Sci. 114 (48) (2017).
[9] C.F. Chen, L.F. Chen, Data mining to improve personnel selection and
enhance human capital: A case study in high-technology industry, Expert
Syst. Appl. 34 (1) (2008) 280–290.
[10] A. Charlwood, M. Stuart, C. Trusson, Human capital metrics and analytics: Assessing the evidence of the value and impact of people
data, Technical Report, University of Leeds & Loughborough University, 2017, https://www.cipd.co.uk/Images/human-capital-metrics-andanalytics-assessing-the-evidence_tcm18-22291.pdf, on 11 May 2018.
[11] X.F. Li, P. Zhang, A research on value of individual human capital of
high-tech enterprises based on the BP neural network algorithm, in:
Proceedings of the 19th International Conference on Industrial Engineering
and Engineering Management, Springer, Berlin, Germany, 2013.
[12] Parmenter D., Key Performance Indicators: Developing, Implementing, and
using Winning KPIs, John Wiley & Sons, 2015.
[13] B. Walsh, E. Volini, Rewriting the Rules for the Digital Age, Deloitte Global
Human Capital Trends, Deloitte University Press, 2017.
[14] A.L. Webster, ‘‘Formalization of an organizational structure" bizfluent.com,
2019,
https://bizfluent.com/info-8235460-formalization-organizationalstructure.html, 25 January 2019.
[15] A. Florea, C.V. Kifor, S.S. Nicolaescu, N. Cocan, I. Receu, Intellectual capital
evaluation and exploitation model based on big data technologies, in: Economic and Social Development (Book of Proceedings), 24th International
Scientific Conference on Economic and Social, Vol. 1, No. 1, 2017, pp.
21–30.
[16] S.S. Nicolaescu, H.C. Palade, C.V. Kifor, A. Florea, Collaborative platform for
transferring knowledge from university to industry - A bridge grant case
study, in: Proceedings of the 4th IETEC Conference, Hanoi, Vietnam, 2017,
pp. 4751–488.
[17] R. Zicari, The Forrester WaveTM: Big Data NoSQL, Q1 2019, Report
Redis Labs Recognized As A Big Data NoSQL Database Leader, March 19,
2019, Available online: https://lp.redislabs.com/rs/915-NFD-128/images/
BM-2019_Q1_Big%20Data%20NoSQL_Forrester.pdf.
Sergiu Stefan Nicolaescu is Ph.D. student in Engineering and Management Lucian Blaga University of
Sibiu, Group Leader in an R&D automotive company,
ADAS (Advance Driver Assistance Systems) department.
He received his MSE in Embedded Systems (2011),
and MSE in Industrial Business Management (2013)
at Lucian Blaga University of Sibiu. He has in-depth
knowledge of project management within the automotive industry; software development; experience with
quality standards and processes used in the automotive
industry; knowledge management and organizational
leadership, technical recruiter. He has published in some prestigious journals
(ISI Web of Science) and international top conferences over 7 scientific papers
with applicability to the automotive industry and other 2 with applicability to
collaborative platform for transferring knowledge from university to industry.
Adrian Florea obtained his MSE (1998) and his Ph.D.
in Computer Science from the ’Politehnica’ University
of Bucharest, Romania (2005). At present he is Professor in Computer Science and Engineering at the
‘Lucian Blaga’ University of Sibiu, Romania. Adrian is
an active researcher in the fields of High Performance
Processor Design and Simulation, Dynamic Branch and
Value Prediction. He has worked around 20 years
in interdisciplinary national and international research
projects dealing with issues such as optimization problems in different engineering domains (microprocessors
systems, suspension design and reliability, energy efficiency in buildings), embedded systems with applicability in Smart City (smart parking, urban mobility,
smart traffic), creating digital tools for supporting communities of practice,
mobile computing. He has published over 7 (didactic and scientific) books
and 78 scientific papers in some prestigious journals (ISI Web of Science)
and international top conferences in Romania, USA, UK, Italy, Germany, China,
Slovenia, Korea, Latvia, Spain, India, Poland etc. He received ‘Tudor Tanasescu’
Romanian Academy Award 2005, for the book entitled ‘Microarchitectures
simulation and optimization’ (in Romanian) and ‘Ad Augusta Per Angusta’ Award
for young researcher, received from ‘Lucian Blaga’ University of Sibiu in June
2007, for special results obtained in scientific research. Since 2012 he is HiPEAC
affiliate member and, since 2013 he is ACM Professional member. His web-page
can be found at http://webspace.ulbsibiu.ro/adrian.florea/html/.
Claudiu Vasile Kifor obtained his MSE (1995) and
his Ph.D. in Industrial Engineering from the Lucian
Blaga University of Sibiu, Romania (2005). At present
he is Professor and Ph.D. supervisor in Engineering
and Management at Lucian Blaga University of Sibiu,
Romania. Claudiu is an active researcher in the fields
of Quality Assurance, Management, and Problem Based
Learning in Engineering & Science. He has published
over 13 books and over 122 scientific papers in some
prestigious journals (ISI Web of Science) and international top conferences. Since 2007 he was the Ph.D.
supervisor of 10 national and international engineers who received the ir Ph.D.
title. He was the manager of more than 19 national and international research
grants. He is serving as Associate Editor with two journals and is a member of
the editorial board and steering committee in other 21 international conferences.
Ugo Fiore is an assistant professor with Parthenope
University. His research interests include nonlinear
analysis, deep learning, optimization, energy-aware
systems, covert communications, and security. He has
authored or co-authored more than 40 papers on
international peer-reviewed journals. He is serving as
Associate Editor with two journals and is a member
of the editorial board in three other journals. He has
delivered a keynote speech in an international conference, participated to the organizing committees of
numerous conferences, and served as a member of
Ph.D. examination panels in foreign universities.
Nicolae Cocan received his B.Sc. in Electronics and
Communications Engineering (2013) at ‘‘Politehnica’’
University of Timisoara. He is Embedded Development
Engineer at a private company in Sibiu, where his
role is to role develop the next generation of photoelectronic sensors. He has competence in PCB Design,
programming for Internet of Things, Big Data. He was
technical responsible at one national research project
and has published 3 paper in ISI journals or top
international conferences.
S.S. Nicolaescu, A. Florea, C.V. Kifor et al. / Future Generation Computer Systems 111 (2020) 654–667
Ilie Receu received his B.Sc. in Computer Science Engineer (2009) and he
is Master student in Embedded Systems, both at Lucian Blaga University of
Sibiu. He is software developer at a private company in Sibiu, department
of development and management solutions. He has in-depth knowledge of
software development (cloud-based document management solutions based on
unrelated databases), cryptography, digital signature, and the integration of
various technologies. He was technical responsible at three national projects
and has published 1 paper in ISI journals.
667
Paolo Zanetti is an Aggregate Professor at the Department of Management Studies and Quantitative Methods, Parthenope University. He has undertaken senior
administrative responsibilities, has been member of the
Board of Directors of Parthenope University, where he
currently sits in the Academic Senate. His research
interests are mainly related to the field of Applied
Mathematics and High Performance Scientific Computing, focusing on methods and mathematical tools
and techniques for solving scientific problems of practical interest, in particular large-scale Computational
Finance problems and deep learning applied to Finance.