Future Generation Computer Systems 111 (2020) 654–667 Contents lists available at ScienceDirect Future Generation Computer Systems journal homepage: www.elsevier.com/locate/fgcs Human capital evaluation in knowledge-based organizations based on big data analytics ∗ Sergiu Stefan Nicolaescu a , Adrian Florea b , , Claudiu Vasile Kifor a , Ugo Fiore c , Nicolae Cocan d , Ilie Receu e , Paolo Zanetti c a Department of Industrial Engineering and Management, Lucian Blaga University of Sibiu, Romania Department of Computer Science and Electrical Engineering, Lucian Blaga University of Sibiu, Romania c Department of Management Studies and Quantitative Methods, Parthenope University of Naples, Italy d UiPath S.R.L. Cluj Napoca, Romania e SOBIS Solutions S.R.L. Sibiu, Romania b article info Article history: Received 6 March 2019 Received in revised form 18 June 2019 Accepted 27 September 2019 Available online 3 October 2019 Keywords: Score Human resource (HR) Human capital (HC) Analytics Big data a b s t r a c t Starting from a Human Capital Analysis Model, this work introduces an original methodology for evaluating the performance of employees. The proposed architecture, particularly well suited to the special needs of knowledge-based organizations, is articulated into a framework able to manage cases where data is missing and an adaptive scoring algorithm takes into account seniority, performance, and performance evolution trends, allowing employee evaluation over longer periods. We developed a flexible software tool that gathers data from organizations in an automatic way – through adapted connectors – and generates abundant results on the measurement and distribution of employees’ performances. The main challenges of human resource departments – quantification of human resource performance, analysis of the distribution of performance, and early identification of employees willing to leave the workforce – are handled through the proposed IT platform. Insights are presented on different granularity levels, from organization view down to department, group, and team. © 2019 Elsevier B.V. All rights reserved. 1. Introduction People at all levels of management, coordinating small or large teams on different organization levels, are aware of the great effort involved in gathering information on employees’ performances, surveying their opinions on job satisfaction, and so forth. The question arises: How much of the information gathered is really used, and is the existing knowledge or insight level the maximum that can be obtained? The most important things for managers to focus on is the value of each employee as an individual. HR (human resources) professionals identify the following insights as high value: evaluation of employee performance, training, and development [1]; prediction of turnover; and planning of succession [2]. Low performance of employees, recruitment and replacement of talent, and loss of freshly trained employees or the most valuable and senior employees generate high costs for a company, decreasing its operational efficiency. Generally, organizations know essential information about their employees, such as their salary progress, ∗ Corresponding author. E-mail address: adrian.florea@ulbsibiu.ro (A. Florea). https://doi.org/10.1016/j.future.2019.09.048 0167-739X/© 2019 Elsevier B.V. All rights reserved. completed trainings, project experience, main expertise, performance level, and goal achievement, all of which are captured in one form or another. These data fit no general model well, are not correlated, and change frequently, especially in knowledge-based organizations; instead, manufacturing—based organizations can measure work norms easily and clearly (ex: 8 man hours are needed for 20 units of product). Therefore, it is hard to find a response to key questions such as: How can we evaluate the performance of employees in a knowledge-based organization? or Why did the employee leave the organization? These responses are critical for human resource departments in helping them forecast performance drops, increases, departures, trends, or problems that might be encountered. This work proposes a model that quantifies and appropriately exploits employees’ information by applying data analytics that generated insights and by providing a new method for evaluating human capital inside a knowledge-based organization. The approach relies heavily on the use of big data, increasing its accuracy through the quantity of relevant data collected. The proposed algorithm evaluates human capital (HC) by quantifying the performance of a company’s employees, accounting for both their seniority and professional development, and identifies internal factors that may cause them to leave the workplace; all of this is integrated with already existing information S.S. Nicolaescu, A. Florea, C.V. Kifor et al. / Future Generation Computer Systems 111 (2020) 654–667 and communications technology (ICT) for industrial companies. The algorithm was validated with real data from a multinational organization; the data was anonymized and the results were reviewed and evaluated by the management. The rest of the paper is organized into six sections. Section 2 briefly reviews state-of-the-art papers related to this study. Section 3 describes the proposed approach for measuring human capital using data analytics, while Section 4 presents an analysis of the distribution of competences in different organizational units. Section 5 goes deeper in the technical area and describes the methodology for computing the Human Resource score. Section 6 analyzes the experimental results, providing some interpretations or possible guidelines for HR managers. Finally, Section 7 highlights the paper’s conclusions and suggests future research directions. 2. Related work HR analytics is a method used to improve individual and organizational performance by improving the quality of decisionmaking [3]. The approach is relatively new, and its use has seen a noticeable rise in popularity recently [4]. However, researchers have found that the actual application of analytics by companies remains at the initial stages [5], and a wide range of research is needed on this topic. The following achievements can be observed in successful analytics approaches: gathering and interpreting complex data, drawing connections between data and larger business strategies, and using multiple models to generate reliable predictions [2]. This section briefly presents a few existing studies, connected with our work and focused on measuring the human capital within organizations. Iwamoto et al. [6] have provided a quantitative approach to human capital management targeting especially financial performance indices of employees. In contrast with our work, their objective was not to evaluate human capital but to construct a tool to evaluate (and explain how to evaluate) human capital management. Another difference lies in the manner the score is computed. Ours is based on real data about company employees, unlike their statistical approach, which used different data reports provided by a private Japanese company specializing in economic and business analysis and management. Abdullah et al. [7] have proposed an analytical hierarchical approach to ranking indicators of human capital in Malaysia using mathematics and psychology in a comparative evaluation model. It ranks four main indicators (creating results by using knowledge, employees’ skill index, sharing and reporting knowledge, and the succession rate of training programs) respecting five main criteria of human capital (talent, strategic integration, cultural relevance, knowledge management, and leadership). Creating results by using knowledge proved to be the most important indicator of HC management in Malaysia, whereas the employee skill index had the lowest importance. These results correspond to the trend of employers looking not only for skilled people but also for people who are able to adapt, eager to learn, and have soft skills. Somewhat similarly, our approach collects and—most importantly—quantifies and stores in a database the employee records with reference to four important class of key performance indicators (KPIs): technical skills, soft skills and motivation, achievements, and involvement. However, unlike this study, Abdullah et al. [7] did their analysis after decision-makers were asked to set up a comparison relationship between pairs of indicators, emphasizing how important or less important some indicators were relative to each other. In [8], the author generated worker skills and job skills networks for measuring human capital, proposing that the results could be used to better correlate employees’ wages with their 655 skills and to determine the degree to which workers skills matched employers’ job tasks. The networks’ vertices are skills, and two skills are connected by an edge if a worker has both or if both are required for the same job. Test data were collected from an online freelancing website. The analysis aimed to divide diverse skills from more specialized skills, individual skills, and skill combinations. A connection between [8] and our work is the fact that skills are usually characterized using years of training as a measure; as we show in the score computation methodology, we have included in the scoring algorithm KPIs on certifications and trainings achieved, as well as on employee seniority. In Chen and Chen [9], the authors applied a Chi-square automatic interaction detector (CHAID) data-mining algorithm, based on decision trees and association rules, to employees’ characteristics and work performance, including their opportunity to leave the workforce, in order to generate useful rules for ‘‘head hunting’’. They predicted employees’ performance and retention based on profile features (age, gender, education, experience, recruitment source, etc.) that can be obtained at the selection stage. The solution was tested using empirical data on engineers with different job functions at a semiconductor factory located in Taiwan. Two similarities exist between the work of Chen and Chen [9] and our approach. First, the solutions are applied to a high-tech industrial company, and second, the analyses have the same target, namely, determining employee performance and retention risk; however, they used a different methodology and calculation tools. An in-depth study on human capital theory, measures, and metrics has been presented by Charlwood et al. [10]. The authors examined current HR analytics practices exhibiting a large number of different HC metrics. Some of these metrics are considered in our analysis as KPIs (i.e., employee engagement, technical skills, etc.). In Li and Zhang [11], the authors used a neural network to compute the index value of individual human capital in high-tech enterprises. The evaluation index system included a two-level hierarchy of indicators. The first layer refers to the existing value of HC, the potential value of HC, the ongoing cost of HC, and the opportunity cost of HC. The second layer expands each indicator from the first layer to others, 16 in total, such as education, age, gender, professional knowledge, and skills, which are in some ways similar to our KPIs. 3. Proposed approach for measuring HC assets The ability to extract insights from data and use them in decision-making has become increasingly important in recent years, and its main applicability is in the field of human resource development. This Section will present a method that exploits this ability and quantifies employee performance. 3.1. The business intelligence process The model developed to measure human capital in an organization is a decision-making tool dedicated to human resources departments to visualize and enhance employee performance and to increase employee retention rates. Based on raw data and KPIs specified by the HR department and saved in a centralized nonrelational database for each employee, the algorithm calculates a score for each employee. The performance of each structure in the organization is determined based on the average performance of its component members. It is worth noting that, in the literature, KRIs (Key Results Indicators) are often used instead of KPIs [12]. The difference is that KRIs measure the results from business actions, while KPIs measure the actions and events that lead to a result, so 656 S.S. Nicolaescu, A. Florea, C.V. Kifor et al. / Future Generation Computer Systems 111 (2020) 654–667 the former are critical in measuring progress and the latter are crucial in creating and evaluating strategies. In the present work, KPIs have been used because the knowledge-based nature of the organization emphasizes the importance of flexibility in the business processes. The present tool is extremely useful for HR managers because, by processing and interpreting the data extracted, they can obtain a fairly accurate understanding of employees’ performance indicators, attrition risk, and patterns of leaving the workforce. Using this tool, HR managers can identify the causes of attrition and find solutions to increase retention rates. This type of analysis is called ‘‘business intelligence’’ and it is becoming a key factor in planning strategies for increasing economic efficiency. With this pattern recognition ability, a management team can confidently make decisions that in the past would have been considered risky and dependent on a manager’s skills on the subject. In Fig. 1, a flow process is presented that is used to create business intelligence inside massive organizations. The raw data assets of an organization are correlated with employee KPIs – selected jointly by HR and management – and used as input for the algorithms. On the first layer, an algorithm is used to calculate a score for each person, using the data to quantify aggregate scores for teams, groups, departments, and the whole organization. On the second layer, machine-learning algorithms can be used for prediction when sufficient historical data have been collected, providing output that represents the intelligence that is offered as feedback to the organization’s management. 3.2. HCDA Model The Human Capital Data Analytics Model (HCDA) collects data from current and previous employees having the purpose to enhance retention. It exploits internal factors (from inside of organization) and external opportunities that provoked attrition in employees and led them to leave the company. Regarding the internal factors, HCDA starts with the collection and processing of data that is created during the Exit Interview process. It identifies the weak and strong points of the organization and specific jobs, from the employees’ point of view. Also, the employees’ history inside organization, through KPIs, is stored in a database in order to compute the Human Resource score for each employee. Descriptive analytics algorithms are used to gather insights from the correlation of data received from all employees that left the organization. Thus, some patterns that are applicable to a large number of employees will be identified. Patterns can have a positive connotation, on which the company should focus on, like maintenance and promotion, and on the other hand, negative sides can be identified, like little investments in technical training and development of employees, repetitive or uninteresting tasks, etc. External opportunities are continuously monitored with an automated crawler (self-developed software module) looking at well-known jobs websites from Internet and social media. In the ‘‘Predictive analytics’’ stage, machine learning algorithms like partition trees, gradient boosting, and deep learning networks will be run on the employees’ historical data stored in database. The algorithms will first receive as input the employees that already left the workforce, to learn the pattern and afterwards will be programmed to classify between employees who are at risk of leaving the organization and the ones who are not. The HCDA model suggests some trends in employees’ career path, whether they exceed expectations or just meet job requirements, whether their skill level is below the standards in the job description, or how good they fit within the organizational unit they belong to, which in turn may be an indicator of their willingness to leave the organization. The results, analyzed individually within a department, can determine the degree of compatibility of an employee with the position, team, and department in which he or she operates. The model can spot high fluctuations in performance from one year to another. It can also identify whether someone’s performance or position stalls for a long period. Also, by having information about employees who have left an organization and exhibited certain patterns (i.e., they have similar scores as current employees of the same age), organizations can prevent current employees from leaving by analyzing the reasons for their dissatisfaction. The researched model receives as input conventional and nonconventional data on an employee and will quantify his or her value inside an organization. The evaluation is made based on the employee’s work output and personal performance in the present and past. Employee performance is analyzed under four categories: technical skills, soft skills and motivation, employee achievements (inside and outside the organization), and employee dedication (defined as the extra mile the employee is willing to go to achieve business goals). Data used to extract the employee’s ranking or performance level in these characteristics are divided into two levels: conventional and non-conventional. Data processing begins with conventional employee data (e.g., an assessment received from a direct superior, the level of achieved goals) followed by non-conventional data (data that do not normally directly expose the performance level, but which, when correlated with other data, can provide some useful insights). HR management provides a very large amount of data that can be processed and contains valuable information; data have the special V characteristics: variety (the data is in various forms), volume (high volume generated by a multinational company including structured and mostly non-structured data), velocity (the data is generated with a big speed, the number of employees’ tasks are continuously changing and all contain useful information). The model represented in Fig. 3 (an abstract view of HCDA) is implemented from the inner block containing conventional data analytics, and it continues with the outer blocks until the predictive analytics are implemented. The deployment of the HCDA model, applied on an organization level, is done in four phases: 1. Identify the employee characteristics that are most relevant for the organization, at least one for each group: technical skills, soft skills and motivation, achievements, and going the extra mile. 2. Gather all conventional and non-conventional data that already exist in the organization. The tool and instruments that already contain data are analyzed, and it is decided how the data can be imported. If there is a lack of data in some areas, the organization’s strategic management needs to add process activities that will produce data (for example during an evaluation, the motivation of employees should be assessed). 3. Apply descriptive analytics to gain insights into the state of previous and current human capital. 4. Apply predictive analytics to measure what might happen in the future and to observe the organization’s evolution or to identify issues in case the organization makes strategic changes. Through deployment of the model, an organization can measure the potential of employees based on various metrics, and it can compare and balance the team’s work among different permutations of human resources, resulting in that which is most appropriate for a given activity. Another advantage is the possibility of quantifying the value gained by the organization from each human resource, obtaining a total value at a certain moment of S.S. Nicolaescu, A. Florea, C.V. Kifor et al. / Future Generation Computer Systems 111 (2020) 654–667 657 Fig. 1. Business Intelligence Process. Fig. 2. The Human Capital Data Analytics (HCDA) model. time. Management can use this data to adopt a strategy based on the best development direction. Currently, most organizations are measuring only the number of employees, which can lead to misleading information. According to [13], the statistics reveal that a reduced percent of companies (only 8%) report they have usable data for HR predictive analytics, and IBM institute1 shows that over 40 percent of organizations are limited to basic HR reporting capabilities. 1 ftp://ftp.software.ibm.com/software/in/IAF2015/Unlock_the_people_ equation_Infographic_PDF.pdf. Similarly, Bright & Company2 statistics, revealed that on average, 45% of all companies say they engage in basic human resource reporting, while 55% says they are using advanced metrics, integrated dashboards and customized reports. An increase in the number of employees does not necessarily represent an increase in the value of human resources; a person having a strong performance that brings increased value to an organization cannot be replaced by ten new employees lacking experience — especially at knowledge-based organizations, where innovation is essential. By 2 http://www.brightcompany.nl/cache/2456_2456/2456.pdf. 658 S.S. Nicolaescu, A. Florea, C.V. Kifor et al. / Future Generation Computer Systems 111 (2020) 654–667 Fig. 3. The KPI classes used in our retention model. correlating the calculated score for the team with employee compatibility data, algorithms can be trained to provide suggestions on reorganizing and balancing teams. Fig. 4 provides the organizational perspective of the HCDA model: • Provide a way of measuring human resources by processing the data generated by them and the assessments received from their line manager. For each employee, a score is calculated that represents the employee’s value within the organization. • Starting from the employee’s value, the value of the team, the group, the department, or even the entire organization can be calculated. The insights revealed by the analytics, such as the distribution of employees’ performance and attrition risk, are presented on different granularity levels, with the data cascading from organization view down to department, group, and team. 4. Analysis of the distribution of competences In this Section, an analysis of how some key competences are distributed over the employees in four departments of a knowledge-based organization is performed. The five competences selected were those with the fewest unevaluated employees, in order to have an evaluation base as large as possible. The four departments were chosen on the basis of the number of employees and the diversity of their roles within the organization, so that they could representative of the entire organization. The distribution of employees who have obtained top-notch evaluations over the five selected competencies, together with the distribution of those who have received disappointing evaluations, provide some information about the relative importance of these competencies in the different departments. In fact, recruitment is not done randomly and it can be assumed that the needs of a unit are reflected in the average profile of its employees. It is intuitive that certain characteristics are essential in a department and are therefore sought after more actively than other aspects that, despite being desirable in general, are less aligned with the goals and everyday practice of the department. For example, both Problem solving skills and Passion and Commitment seem to be desired in all the four departments, and downright essential in the profile of employees in Department 3, where no one is seen receiving a low evaluation (Fig. 5). Department 1 exhibits the highest percentages of employees with a low evaluation for all the competences considered, while Department 3 is generally associated with a low incidence of unfavorable evaluations. Fig. 6 reports the percentage of employees who have received the highest level of evaluation in the five competencies considered. In all departments, the percentages of employees with pleasing evaluations as far as Passion and Commitment is concerned is higher than it is for the other competencies. The percentage of very passionate employees in Department 1 is remarkably high. On the other hand, effectiveness in communications does not seem to be a critical factor all over the four departments, since the percentages of employees with a high evaluation are all low (Fig. 6). The profile of a department can be quantified by the vector of the median evaluations received by all its employees across each competence. The deviation with respect to the median could have been taken as a measure of how relevant the evaluation of an employee is for the specific competence taken into account. Since we adopted a multiplicative model, the ratio vij /mkj of the evaluation over the median evaluation in the department has been considered instead, where vij = vij (t) denotes the S.S. Nicolaescu, A. Florea, C.V. Kifor et al. / Future Generation Computer Systems 111 (2020) 654–667 659 Fig. 4. Organizational perspective of the HCDA model. evaluation received by employee i (belonging to department k) for competence j and mkj = mkj (t) is the median evaluation for competence j in department k. The dependence on time will be omitted for the sake of conciseness whenever it can easily be inferred. The evaluations for the single competences are then combined together to obtain a single value. To this end, profiles will be populated for each department, illustrating the weight that the management desires to attribute to the single competences. In this way, the management can set specific goals. The KPIs extracted from each employee’s collected data are stored in a database and take different values, depending on the company. After normalization, each KPI is assigned to one of four quadrants of the HCDA model (technical skills, soft skills and motivation, achievements, and extra mile.). The weights used during the yearly evaluation computation are company dependent, as well; for each quadrant, they are chosen on company management level. The profile for department k will be a vector with components pkj representing the perceived importance of competence j in department k. Profiles are then normalized to ensure that ∀k, ∑ j pkj = 1. The overall evaluation ηi = ηi (t) of employee i 660 S.S. Nicolaescu, A. Florea, C.V. Kifor et al. / Future Generation Computer Systems 111 (2020) 654–667 • Spe — represents an array with SCORE (Si (t )) of previous employees, i ∈ 1, n • Sae — represents an array with SCORE of actual employees • n — is the number of evaluated employees • t — is the year of evaluation SCORE = Si (t ) = f1 (ηi , Si (t − 1) , ci , α (si ) , β ) (3) where, • f1 — function measuring the score for each employee at year Fig. 5. Percentage of employees who have received low-grade evaluations in the five selected competencies. t Si (t) — score at time t ηi (t) — employee’s yearly evaluation si — employee seniority ci — indicator specifying whether the employee has changed role in the last year • α (si ) — adjusting factor accounting for seniority • β — adjusting factor accounting for the variation in evaluation • • • • Si (t ) = ⎧ ⎨Si (t − 1) ⎩ ifci = 1 (4) Si (t − 1) + α (si ) · β · ηi (t ) otherwise ηi (t) = f2 (Technical_Si (t), Soft_S_Motiv ationi (t), Achiev ementi (t), Inv olv ementi (t)) (5) • f2 — function measuring employee’s yearly evaluation from Fig. 6. Percentage of employees who have received top-notch evaluations in the five selected competencies. superior leader which weighted 4 important classes of KPI such as: technical skills, soft skills and motivation, achievements and involvements • Technical_Si (t), Soft_S_Motiv ationi (t), Achiev ementi (t), Inv olv ementi (t) - one-dimensional vectors of different sizes. at time t will be thus given by ηi = ∑ j pkj vij mkj (1) If no profile information is available, uniformly distributed weights can be used. 5. Problem statement The formalized organizational structure focuses on roles and positions rather than the people in the positions [14]. Formalization of an organizational structure is commonly initiated in an attempt to rationalize the decision-making process. In our case, starting from employee layer and targeting the process to all departments of organization, we use a hierarchical approach regarding technical and soft skills, performance scores, attrition risk and probability of leaving the company. The main objective of our work is to create a tool for computing the employees’ performance indicators, such that to determine their attrition risk and to identify patterns of leaving the workforce. Using this tool, HR managers can identify the causes of attrition and find solutions to increase retention rates. We define the following 5 functions (f0 ÷ f4 ): Similitude_score = f0 (Spe , Sae ) (2) where, Technical_Si (t) = f3 (HW _SW _Si (t), Professional Experiencei (t), Project_complexityi (t), Self _studyi (t)) (6) • f3 — function which weighted employee’s yearly hardware/ software skills (bugs fixed, etc.), number of projects successfully finished, number of added functions, etc. Achiev ementi (t ) = f4 (certifications, papers, job_rotation) (7) All functions and parameters that are involved in this formal representation followed the model described by the authors briefly in Fig. 2 in this work and in more details in Fig. 2 of [15]. Judging from this formal representation it might be observed that for computing the final score which will be used then in other pattern recognition algorithms for determining the potential employees to leave is needed to apply 5 different functions (f0 ÷ f4 ) in a hierarchical (or sequential dependent approach — meaning that after applying f4 you will determine f3 to use). Although companies do not have more than hundreds of thousands employees, every employee has many attributes/features (at least 30) and due to the sequential process of scores generation involves a big volume of data. Furthermore, by anticipating the potential leaving employees, the scoring algorithm creates value (specific feature of Big Data) — by producing very important information for HR departments. • f0 — function measuring the similitude between two arrays: Euclidean distance, cosines distance, and exhibits the probability that current employees which have the same performance with those who left the company, are susceptible to leaving the company also etc. 5.1. Score computation methodology The algorithm developed to calculate the employee’s value within the organization is described in this section, detailing S.S. Nicolaescu, A. Florea, C.V. Kifor et al. / Future Generation Computer Systems 111 (2020) 654–667 661 Table 1 Intervals for the evaluation. Interval Description [0.200, 0.447) [0.447, 1.000) [1.000, 2.236) [2.236, 5.000] Does not meet expectations Meets expectations Sometimes exceeds expectations Exceeds expectations the steps implemented to compute the score for each employee, based on current evaluation data obtained from the HR department and the previous year’s score. The score is determined using Eq. (4) above. If the employee’s position or level changed during the evaluation year, an improvement in performance will be harder to obtain, since in his or her previous status the employee had greater experience and was more familiar with tasks than in the new position or level. Thus, our algorithm will not punish the employee in such circumstances even if he or she gets a lower yearly evaluation value than he or she did in the year before the level or position change. Rather, the overall score will be kept the same as the year before. A binary variable ci , such that ci = 1 if a position change occurred and ci = 0 otherwise will keep track of changes in professional category. The overall score for an employee should reflect his or her achievements during the evaluation year, but also recognize his or her efforts toward improving the evaluation with respect to the previous year. To this end, an adjusting factor β = β (ηi (t ) , ηi (t − 1)) will be introduced to account for and reward the proclivity to ameliorate. The adjustment factor will be based on a lookup table based on different performance levels. As explained above, an employee’s yearly evaluation (ηi ) is the weighted average of KPIs embedded inside the HCDA model and can result in a value ranging between 1/5 and 5. This range is partitioned into the following four intervals, such that the ratio between the initial √ values of consecutive intervals is constant, and therefore equal to 5 ≈ 2.236 (see Table 1). Thus, the intervals are as follows: The adjustment factor β will be determined according to Table 2. For example, if the previous evaluation ηi (t − 1) belonged to the interval corresponding to ‘‘sometimes exceeds expectations’’ and the current evaluation ηi (t) points to the ‘‘meets expectations’’ class, β will take a value corresponding to −50%. Finally, an additional factor associated to seniority will be introduced so that young employees will be motivated to grow and, at the same time, experienced employees will be encouraged to break new boundaries. The seniority weight α (si ) was chosen to make the reward proportional to the seniority si (measured in years) of employee i(Fig. 7). The relation between seniority and α is given by a sigmoid function, hitting a plateau around the seniority level of 10 years. α (si ) is computed as follows: α (si ) = 3/2 1 + 24−si (8) The constant values in (8) are carefully chosen, such that the following conditions are met simultaneously: • For the minimum seniority of one year, α (1)=1/6 (the minimum weight). • The 100% contribution is reached at a seniority level of around 4–5 years, the exact point at which an employee is considered senior in his or her position. • The maximum α value of 150% is achieved around for a seniority level around 10 years and above. Fig. 7. Seniority-weight evolution over time. 5.1.1. Handling the lack of data Since the developed application is still a prototype, we had access to a reduced dataset, accounting for 2015 and 2016 evaluation years. This fact introduced of imbalances between younger and older employees, which had to be handled in one way or another. At this point, we decided to compute the score for missing years using statistical data: • For each missing data-point, an evaluation score of 2.5 (average value) is assumed. • This evaluation score is weighted with the seniority adjustment factor. Table 3 illustrates the score estimation for the 2010–2016 period, applying the methodology presented above. This method uses only seniority to fill in missing data-points. A better approach would be to combine seniority and the gradient of level change over time to obtain a better approximation of the missing evaluation values. 5.1.2. Software analytics platform The Software Analytics Platform was designed to handle the main challenges of human resource departments: the quantification of human resource performance, the distribution of performance, and the early identification of employees intending to leave the workforce. To achieve these goals, a flexible architecture is needed that provides the ability to automatically gather raw data within the organization and process them using data processing algorithms. The main architectural decisions are exposed in Fig. 8 and listed below: • Usage of NOSql database to handle Big Data challenges, more explicitly ArangoDB. Conventional databasemanagement tools are inadequate to handle the huge sets of data produced. In our work we use the NoSQL database systems that allow simpler scalability and improved performance in terms of maintaining big unstructured data [17]. NoSQL helps deal with the volume, variety, and velocity requirements of big data. In order to handle large amount of data which is stored in forms and types, all data is collected and stored in this centralized database. Considering that the data generated by employees are reaching enormous dimensions, and most importantly are varied, the relational databases are not a suitable solution anymore. • The algorithm module is written in Python language, using the SciKit Learn library.3 The library offers framework for classification, clustering, regression and dimensionality reduction algorithms. 3 https://scikit-learn.org/stable/. 662 S.S. Nicolaescu, A. Florea, C.V. Kifor et al. / Future Generation Computer Systems 111 (2020) 654–667 Table 2 Search table for choosing the performance variation factor β . Fig. 8. Software Analytics Platform — Static Architecture. Source: Adapted from [16]. Table 3 Score estimation for a longer period with partially known data. Year 2010 2011 2012 2013 2014 2015 2016 Available data Seniority – 0 – 0 – – 0 – 1 15% 2.5 – – 0.38 – 2 56% 2.5 – – 1.78 – 3 79% 2.5 – – 3.75 – 4 96% 2.5 – – 6.15 yes 5 109% 2.8 +0.30 50% 7.68 yes 6 120% 3.2 +0.40 50% 9.6 α η ∆η β Scor • The visualization of data is implemented using Google Charts. • Java SE is used for interactions with the various databases that needs to be interrogated; connectors are being implemented to import/ export the various data in the NoSQL DB. • For presentation and user interface, an input of application parameters and settings, AngularJS was chosen. Considering that data inside an organization is found in unstructured forms, the software platform provides a functionality to gather data and store it within a centralized to main ArangoDB database. The functionality is implemented through personalized custom data connectors. The connectors need to be adapted for each dataset and can be triggered manually or cyclic at a certain period of time. • XLS/XLSX connectors: A large amount of data inside the company is available as spreadsheets (.xls or .xlsx files). To make use of this information, Python scripts were implemented, which collect data from files (and makes a first processing if necessary) and saves them into the platform database, so they can be accessed more easily and faster. Python’s language already possesses many libraries that facilitate the work with spreadsheets. • PDF connectors: Another important part of the data inside the company is found in PDF documents. It is proposed to use the iText library to make these connectors and retrieve the data contained in a pdf document, and save it in the ArangoDB database. The documents are scanned based on predefined identifiers, as the form of templates used in the organization is known; if there are more templates used over the years to store data, these are tested and adapted as needed. S.S. Nicolaescu, A. Florea, C.V. Kifor et al. / Future Generation Computer Systems 111 (2020) 654–667 663 Fig. 9. Data Analytics platform — Edit Views example. • Databases: specific connectors developed for each database • The trendline appears to be left-skewed, highlighting a total found inside the organization, like SQL, Lotus, MySQL. • Specific Task management tools: API Tools and Python scripts were used to gather data and store it to Arango Database. organization HC value above the average (a value of 2.0 suggests the employee is meeting his or her requirements, with most of employees scoring above this value). • A significant number of employees have a below-average value, which can be identified as new employees or employees unfit for their current position. • The number of employees with very poor or very good results decreases as we approach the extremes. The architecture of Software Analytics Platform is structured into two layers, the Core layer and the Application layer. Into the Core layer is implemented the whole backend, which contains the functionalities that are common for the entire platform like the connection (the main implementation of connectors, the personalization is done in the application for each data template in particular), storage and management of databases etc. Also, the interfaces with the tools that are already used inside the organization are implemented into the Core layer, to load data for processing in an automatic way. The HCDA model proposed, extracts insights from data generated by employees by quantifying the performance of an organization’s employees and identifies internal/external factors that may cause them to leave the workplace; all of this needs to be integrated with already existing databases, tools and instruments from inside the organization. The present software platform is extremely useful for HR managers because, by processing and interpreting the data extracted, they can obtain an accurate understanding of employees’ performance indicators, attrition risk, and patterns of leaving the workforce. A screenshot from the developed Software Platform used to create a query of employees score is presented in Fig. 9. 6. Experimental results This section contains experimental results obtained after applying the scoring algorithm to data on employees of a real company. The charts are generated using a SW Platform developed by the authors and made with Google Charts. The exposed results have applied a destruction factor, in such way that the extraction of insights is demonstrated but the content is anonymized. The following two features have been used: histogram & trendlines (polynomial). This chart shows the distribution of all employees’ evaluations corresponding to 2016 data. The trendline corresponding to this histogram is compared with a normal distribution with a mean value of 2.0, which is considered the reference. A few insights can be revealed: Fig. 11 shows overall scores since the inception of the company. The trendline is now severely right-skewed, which can indicate a young, growing organization; this conclusion is backed by the data in Fig. 10, which shows a large number of inexperienced employees who scored low. Helpfully, a large number of experienced employees is available, as well, who can provide knowledge and guidance to the new hires. Our application is flexible and allows statistics on the distribution of all employees’ performances in different time slots and different departments. The range is an important insight; it can be compared from one year to another and from one department, group, or team to another. The chart in Fig. 11 shows a histogram of the total scores of all employees (current or former) for a company with more than 1200 employees in studied field, divided into several IT departments. The shape described by the associated trendline can be explained by the fact that most employees leaving the company, such as those who make the biggest contribution to the graph in Fig. 11, make this step in the first 4–5 years after entering the company. Thus, making a correlation with the data in Fig. 10, we can say that this category of employees explains the existence of a large number of total scores in the interval [10,16]. The next figure comparatively illustrates employees’ performance from different departments. The insights are plotted on a department level to reveal the distribution in each component. The analysis can be extended to a group or team level to see the peaks and the valleys of performance (according to the HCDA model presented in Fig. 4). Analyzing the charts from Fig. 12, looking at both axes, we observed that the best performance is obtained in Department 1 (i.e., it has the most employees with high scores and is the only department that reaches scores above 40). This result is confirmed if we compare performance rates (i.e., the total score 664 S.S. Nicolaescu, A. Florea, C.V. Kifor et al. / Future Generation Computer Systems 111 (2020) 654–667 Fig. 10. Organization — employees’ performance for one year (2016). Fig. 11. Organization — employee performance (histogram) since inception (14 years, including estimations of missing data points). Fig. 12. Employees’ performance scores: A comparison between departments. divided by the number of employees in a certain department) according to Table 4. The chart in Fig. 13 shows the progress of human capital over the years through the total computed score of each department. The blue bar represents the default score calculated for employees for whom data is missing (keeping, as we previously stated, an average evaluation for a missing year); the red and yellow bars represent the real data obtained for the years 2015 and 2016, respectively. Another metric that can be obtained from this data is the rate of performance, presented in Table 4, which is computed as the ratio between total performance and the number of employees in a given organizational structure. Fig. 14 represents one of the most important results of our analysis. On the same chart with different colors, it shows the distribution of employees that have left the organization and of employees that are still inside organization, with the score versus seniority of employees plotted on a scatter chart. It can be observed that several groups are formed (marked in red). The employees near these groups present a large risk of leaving the company. Such information will help HR managers to consider with great care those employees that are susceptible to leaving S.S. Nicolaescu, A. Florea, C.V. Kifor et al. / Future Generation Computer Systems 111 (2020) 654–667 665 Fig. 13. The organization score — progress over years. Fig. 14. Score of all individuals which were or are currently employed by the company (blue — current employees, red — employees which left the company). Table 4 Score estimation for a longer period with partially known data. the company and to analyze in more depth their possible reasons for dissatisfaction. 7. Conclusions and further work The theoretical contributions of this research are the creation of a model for applying analytics on Human Resources in a knowledge-based organization, as well as creation of a scoring algorithm that can be applied on already existing data – that is filtered, structured and modeled according to HCDA requirements – that calculates employee’s value within the organization. The algorithmic approach considers both the seniority and the professional evolution trends of employees, handling both increases and decreases in performance. More than that, it can compensate for changes in level or position. The algorithm equally takes into account the seniority of an employee and his or her performance in order to foster young employees’ motivation to grow and, at the same time, to encourage experienced employees’ to break new boundaries. The employee scoring insights revealed by the algorithm are bringing valuable information to HR department and management of organization when all data is added and the knowledge is presented on a team, group, department or even organization level. As an example, it can be learned that the team/organization is growing overall but the soft skill & motivation is going down while the technical skills are going strong. The trend of organization performance will be used as support for strategic decisions and the trend of teams will be used by line management in day to day work decisions. Being a prototype application, with some data being unavailable, ways have been found to handle missing data-points. Fortunately, from now on, the database can be updated periodically with new data, which will increase the accuracy and performance of our model and its outcomes. Applying the algorithm to data collected from employees that left the workforce and, at the same time, to data on current employees, we identified those persons that present a higher risk of leaving the company. This information is a key indicator for human resource managers to take regulatory measures that increase the employees’ retention rate. Using the individual score of each employee, our application determines and illustrates comparatively the performance of company departments, showing possible fluctuations, such as high increases or decreases in performance from one year to another. It also identifies whether an 666 S.S. Nicolaescu, A. Florea, C.V. Kifor et al. / Future Generation Computer Systems 111 (2020) 654–667 employee stays too long at the same performance level or in the same position, making a job rotation advisable. As further work, we intend to test variable weights for seniority and variance in performance. We will also evaluate the performance of prediction algorithms – such as partition trees, gradient boosting, and deep learning networks – perform on the dataset of employees’ scores. The aim will be to identify common patterns in the evolution of scores and, in particular, to accurately anticipate which employees are an attrition risk and are more likely to leave the organization. Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Acknowledgment This work was partially supported by a UEFISCDI Bridge Grant, PNCDI III financing contract no. 44 BG/2016. References [1] W.Y.M. Momin, K. Mishra, HR analytics as a strategic workforce planning, Int. J. Appl. Res. 1 (4) (2015) 258–260. [2] K. Wang, J. Taylor, Can you provide the current trends in HR on people analytics? 2017, Retrieved from Cornell University, ILR School website: http://digitalcommons.ilr.cornell.edu/student/151. [3] S.N. Mishra, D.R. Lama, Y. Pal, Human resource predictive analytics (HRPA) for HR management in organizations, Int. J. Sci. Technol. Res. 5 (5) (2016) 33–35. [4] King K. G., Data analytics in human resources: A case study and critical review, Human Resour. Dev. Rev. 15 (4) (2016) 487–495. [5] J. Lismont, J. Vanthienen, B. Baesens, W. Lemahieu, Defining analytics maturity indicators: A survey approach, Int. J. Inf. Manage. 37 (3) (2017) 114–124. [6] H. Iwamoto, M. Takahashi, A quantitative approach to human capital management, Proc.-Soc. Behav. Sci. 172 (2015) 112–119. [7] L. Abdullah, S. Jaafar, I. Taib, Ranking of human capital indicators using analytic hierarchy process, Proc.-Soc. Behav. Sci. 107 (2013) 22–28. [8] K.A. Anderson, Skill networks and measures of complex human capital, Proc. Natl. Acad. Sci. 114 (48) (2017). [9] C.F. Chen, L.F. Chen, Data mining to improve personnel selection and enhance human capital: A case study in high-technology industry, Expert Syst. Appl. 34 (1) (2008) 280–290. [10] A. Charlwood, M. Stuart, C. Trusson, Human capital metrics and analytics: Assessing the evidence of the value and impact of people data, Technical Report, University of Leeds & Loughborough University, 2017, https://www.cipd.co.uk/Images/human-capital-metrics-andanalytics-assessing-the-evidence_tcm18-22291.pdf, on 11 May 2018. [11] X.F. Li, P. Zhang, A research on value of individual human capital of high-tech enterprises based on the BP neural network algorithm, in: Proceedings of the 19th International Conference on Industrial Engineering and Engineering Management, Springer, Berlin, Germany, 2013. [12] Parmenter D., Key Performance Indicators: Developing, Implementing, and using Winning KPIs, John Wiley & Sons, 2015. [13] B. Walsh, E. Volini, Rewriting the Rules for the Digital Age, Deloitte Global Human Capital Trends, Deloitte University Press, 2017. [14] A.L. Webster, ‘‘Formalization of an organizational structure" bizfluent.com, 2019, https://bizfluent.com/info-8235460-formalization-organizationalstructure.html, 25 January 2019. [15] A. Florea, C.V. Kifor, S.S. Nicolaescu, N. Cocan, I. Receu, Intellectual capital evaluation and exploitation model based on big data technologies, in: Economic and Social Development (Book of Proceedings), 24th International Scientific Conference on Economic and Social, Vol. 1, No. 1, 2017, pp. 21–30. [16] S.S. Nicolaescu, H.C. Palade, C.V. Kifor, A. Florea, Collaborative platform for transferring knowledge from university to industry - A bridge grant case study, in: Proceedings of the 4th IETEC Conference, Hanoi, Vietnam, 2017, pp. 4751–488. [17] R. Zicari, The Forrester WaveTM: Big Data NoSQL, Q1 2019, Report Redis Labs Recognized As A Big Data NoSQL Database Leader, March 19, 2019, Available online: https://lp.redislabs.com/rs/915-NFD-128/images/ BM-2019_Q1_Big%20Data%20NoSQL_Forrester.pdf. Sergiu Stefan Nicolaescu is Ph.D. student in Engineering and Management Lucian Blaga University of Sibiu, Group Leader in an R&D automotive company, ADAS (Advance Driver Assistance Systems) department. He received his MSE in Embedded Systems (2011), and MSE in Industrial Business Management (2013) at Lucian Blaga University of Sibiu. He has in-depth knowledge of project management within the automotive industry; software development; experience with quality standards and processes used in the automotive industry; knowledge management and organizational leadership, technical recruiter. He has published in some prestigious journals (ISI Web of Science) and international top conferences over 7 scientific papers with applicability to the automotive industry and other 2 with applicability to collaborative platform for transferring knowledge from university to industry. Adrian Florea obtained his MSE (1998) and his Ph.D. in Computer Science from the ’Politehnica’ University of Bucharest, Romania (2005). At present he is Professor in Computer Science and Engineering at the ‘Lucian Blaga’ University of Sibiu, Romania. Adrian is an active researcher in the fields of High Performance Processor Design and Simulation, Dynamic Branch and Value Prediction. He has worked around 20 years in interdisciplinary national and international research projects dealing with issues such as optimization problems in different engineering domains (microprocessors systems, suspension design and reliability, energy efficiency in buildings), embedded systems with applicability in Smart City (smart parking, urban mobility, smart traffic), creating digital tools for supporting communities of practice, mobile computing. He has published over 7 (didactic and scientific) books and 78 scientific papers in some prestigious journals (ISI Web of Science) and international top conferences in Romania, USA, UK, Italy, Germany, China, Slovenia, Korea, Latvia, Spain, India, Poland etc. He received ‘Tudor Tanasescu’ Romanian Academy Award 2005, for the book entitled ‘Microarchitectures simulation and optimization’ (in Romanian) and ‘Ad Augusta Per Angusta’ Award for young researcher, received from ‘Lucian Blaga’ University of Sibiu in June 2007, for special results obtained in scientific research. Since 2012 he is HiPEAC affiliate member and, since 2013 he is ACM Professional member. His web-page can be found at http://webspace.ulbsibiu.ro/adrian.florea/html/. Claudiu Vasile Kifor obtained his MSE (1995) and his Ph.D. in Industrial Engineering from the Lucian Blaga University of Sibiu, Romania (2005). At present he is Professor and Ph.D. supervisor in Engineering and Management at Lucian Blaga University of Sibiu, Romania. Claudiu is an active researcher in the fields of Quality Assurance, Management, and Problem Based Learning in Engineering & Science. He has published over 13 books and over 122 scientific papers in some prestigious journals (ISI Web of Science) and international top conferences. Since 2007 he was the Ph.D. supervisor of 10 national and international engineers who received the ir Ph.D. title. He was the manager of more than 19 national and international research grants. He is serving as Associate Editor with two journals and is a member of the editorial board and steering committee in other 21 international conferences. Ugo Fiore is an assistant professor with Parthenope University. His research interests include nonlinear analysis, deep learning, optimization, energy-aware systems, covert communications, and security. He has authored or co-authored more than 40 papers on international peer-reviewed journals. He is serving as Associate Editor with two journals and is a member of the editorial board in three other journals. He has delivered a keynote speech in an international conference, participated to the organizing committees of numerous conferences, and served as a member of Ph.D. examination panels in foreign universities. Nicolae Cocan received his B.Sc. in Electronics and Communications Engineering (2013) at ‘‘Politehnica’’ University of Timisoara. He is Embedded Development Engineer at a private company in Sibiu, where his role is to role develop the next generation of photoelectronic sensors. He has competence in PCB Design, programming for Internet of Things, Big Data. He was technical responsible at one national research project and has published 3 paper in ISI journals or top international conferences. S.S. Nicolaescu, A. Florea, C.V. Kifor et al. / Future Generation Computer Systems 111 (2020) 654–667 Ilie Receu received his B.Sc. in Computer Science Engineer (2009) and he is Master student in Embedded Systems, both at Lucian Blaga University of Sibiu. He is software developer at a private company in Sibiu, department of development and management solutions. He has in-depth knowledge of software development (cloud-based document management solutions based on unrelated databases), cryptography, digital signature, and the integration of various technologies. He was technical responsible at three national projects and has published 1 paper in ISI journals. 667 Paolo Zanetti is an Aggregate Professor at the Department of Management Studies and Quantitative Methods, Parthenope University. He has undertaken senior administrative responsibilities, has been member of the Board of Directors of Parthenope University, where he currently sits in the Academic Senate. His research interests are mainly related to the field of Applied Mathematics and High Performance Scientific Computing, focusing on methods and mathematical tools and techniques for solving scientific problems of practical interest, in particular large-scale Computational Finance problems and deep learning applied to Finance.