Available online at www.ijpe-online.com vol. 18, no. 5, May 2022, pp. 369-379 DOI: 10.23940/ijpe.22.05.p7.369379 Enhancing Information Extraction Process in Job Recommendation using Semantic Technology Assia Brek* and Zizette Boufaida Lire Laboratory, University of Abdelhamid MEHRI Constantine II, Constantine, 25016, Algeria Abstract Recently, the internet has become the first destination as a recruitment market, which has increased the number of job offers and resumes online. Recommendation systems are proposed to help users filter this massive amount of information, selecting the best candidate or the relevant offer. Processing the content of the documents correctly not only can reduce the matching complexity but also improve the recommender performance. This paper presents a semantic-based information extraction process, which intelligently and automatically extracts domain entities. The extracted entities are inter-linked to build domain context utilizing domain ontology covering the most significant and common parts of job offers/resumes. Moreover, the extracted information is structured in RDF triples delivering a semantic and unified presentation of documents data. The used ontology is dynamically enriched with both domain instances and relations to keep up with the constant change of the relevant data. We evaluate our system using various experiments on data from real-world recruitment documents. Our test results show that our approach can achieve a precision value of more than 90% in extracting domain-specific information. Keywords: information extraction; semantic technology; ontology; enrichment; semantic similarity (Submitted on February 25, 2022; Revised on March 30, 2022; Accepted on April 18, 2022) © 2022 Totem Publisher, Inc. All rights reserved. 1. Introduction Resumes and job descriptions carry information unique to each job position and candidate, delivering a valuable base for selecting the best candidate or choosing the right job. Thus, processing accurately job descriptions/resumes content is of crucial importance. Processing documents aims to extract domain information intelligently and formally structure it, making it easy to access and utilize. Current information extraction systems like OpenCalais (www.opencalais.com) and Alchemy API (www.alchemyapi.com) are limited to extracting named entities based on NLP techniques that are insufficient for complex sentences, and most importantly, that are unable to extract contextual entities. The information in offers/resumes is presented in complex sentences composed of related domain entities that need specific domain rules to extract. For example, in a job offer, the requirements may be composed of skills, years or level of expertise such as "5+ years experience in Spring framework (Spring Boot, Spring MVC, Spring Batch, JPA)". Also, in a resume, the education entity is presented with the diploma title, education degree, year of graduation, and university name. Moreover, some entities are presented with different labels such as B.Sc, BS, Bachelors, which provides another challenge in extracting domain entities. Extracting relation is also a critical phase, where contextual entities can be inferred from the semantic links between entities. For example, let us look at a job that requires "3 years of experience in Android skill" and a resume that deliver "Android skill" and "4 years of work experience as a developer of mobile applications". Linking the "Android" with job experience "developer of mobile applications" can deliver the required entity "expert in Android skill". As there are no defined templates for resumes or jobs contents, unifying the presentation of the extracted information is essential. For example, instead of delivering "B.S. in Computer Science or equivalent" as a requirement, we can present the "B.S." with the concept "degree" and "Computer Science" as "diploma", the same thing with the education information in a resume, which facilitates matching the entities of the same concept. * Corresponding author. E-mail address: assia.brek@univ-constantine2.dz 370 Assia Brek and Zizette Boufaida Semantic technology and resources such as ontologies and knowledge graphs are proposed as a solution to extract and structure domain entities by presenting semantic links between them to infer contextual entities [1,2]. Therefore, we proposed a semantic-based information extraction approach to intelligently extract domain entities, semantically interlink those entities, and provide the extracted information in a unified format. First, the offers/resumes content is divided into blocks based on keywords dictionary. Next, we developed extraction rules (JAPE) based on domain entity dictionaries and sentence syntax to extract entities from each block. A domain ontology (JobOnto) is proposed to link the extracted entities with semantic relations. Moreover, our ontology is enriched in sync with the extraction process to keep up with data changes. As a final phase, the retrieved information was given as RDF triples, granting a semantic and unified structure that is easy to access and exploit. The proposed approach processes the domain documents intelligently and automatically without any human intervention, improving the user experience and gaining time for both employers and candidates. Moreover, it has been evaluated in real domain data and gets a more than 90% precision value. This paper is organized as follows: Section 2 outlines the related work in information extraction and e-recruitment, Section 3 delivers a detailed presentation of the proposed semantic-based process, Section 4 presents the evaluation, and finally Section 5 concludes the paper. 2. Related work Recommender systems aim to suggest products that interest users by studying their behaviors (their product research) and their profiles [3]. As a recommender system, the job recommender system can retrieve a list of job positions that satisfy a job seeker’s desires or a list of talented candidates that meet the requirements of a recruiter by using recommendation approaches. The main- stream approaches to recommender systems are classified into four categories: Knowledge-based (KB), ContentBased Filtering (CBF), Collaborative Filtering(CF), and Hybrid approaches. Based on several comparison studies in the erecruitment field [4-6], the content-based approach is more suitable for this context, according to the necessity of examining the content of domain documents to grant the best recommendation. Those documents contain unstructured information that is difficult to automatically access or process correctly. Several information extraction methods have been proposed to address this issue. Information extraction is an automatic technique for identifying entities from unstructured and/or semi-structured documents, intending to structure these documents and make them machine-readable. Based on previous research, the information extraction methods can be divided into three categories. Rule and pattern-based methods: exploit predefined rules or patterns to extract entities from documents. The rules/patterns are usually defined based on document structures or domain dictionaries. In [7,8] authors defined entities extraction rules using the sentence syntax information, writing style, punctuation index, and word lexical in order to extract entities from resumes. another way proposed by [9] suggests SAJ system that uses JAPE rules based on a domain-specific dictionary to extract entities from job offers. These methods are limited to the text structure and domain dictionary. It is hard to adopt these methods since it is impossible to know how many documents follow the same template or if the dictionary covers all domain knowledge. Machine learning-based methods: allow extracting information from unstructured texts by applying Hidden Markov Models (HMM) [10] and Conditional Random Fields (CRF) [11]. PROSPECT [12] was presented as a system to rank the candidate's resume for a job. The system employs a conditional random field (CRF) based on three features (Lexicon, Visual, and Named Entity) to segment and label the resume contents. In [13], the authors proposed a cascaded information extraction framework to acquire detailed information from the resumes. In the first step, the Hidden Markov Modeling (HMM) model is applied to divide the resume into successive blocks. Next, an SVM model is utilized to gather detailed information in each block, such as name, address, education, etc. Machine learning-based techniques require large data sets for training; the absence of this last may cause information loss or extracting noisy ones. Semantic methods: have recently been applied to guide the information extraction process [14]. Semantic methods mainly exploit semantic references like Knowledge graphs and domain ontologies to extract entities or relations from unstructured text. Domain ontologies have been used in different ways to extract and identify entities. In [15-17], a domain ontology is used as a semantic dictionary to extract information from resumes. The document content was first segmented into sentences, then the ontology concepts are used to detect domain entities. In the same way, authors in [18,19] proposed a semantic annotation approach using domain ontology to extract information from resumes and job offers. Using an ontology to extract entities takes advantage of exploiting semantic knowledge that gives accurate results, but the ontology knowledge incompleteness problem still constrains it. To prevent this problem, authors in [20] utilized WordNet and YAGO3 knowledge graph (KG) for two reasons first, to capture terms from resumes and offers after segmenting them into sentences and following Enhancing Information Extraction Process in Job Recommendation using Semantic Technology 371 extracting semantic and taxonomic relations (synonymy relation and hypernymy relation) to create a network that reflects resume and offer. The usage of YAGO3 and WordNet was insufficient, as certain concepts are absent; besides, the extracted relations are not always meaningful. This delivers the extracted information in a structured format that is easy to access and use, influences the matching process, and enhances the recommendation results. Several works ignore this crucial phase, treating the extracted information as a bag of entities where the matching process identifies the required entity in this bag. In contrast, others use different formats, such as XML, RDF, or JSON, to formally structure the extracted entities from resumes [9,18,19]. In [8], the authors used vectorial representation to generate a classification with Support vector machines (SVM). Furthermore, in [20], the authors exploit WordNet to present the extracted entities in a semantic network using synonyms and hypernyms relations. The works mentioned above present the extracted entities in different ways but ignore the semantic relation between entities and the importance of presenting the resumes and offers in a unified format. Our information extraction approach takes into consideration the limits of the previous methods. We proposed a semantic-based approach to extract entities and relations from resumes and offers. Our approach employed JAPE rules based on domain-specific dictionaries and sentence syntax to extract entities; then, we developed a domain ontology covering the most significant and common parts of job offers/resumes (skills, diplomas, interests, assets, and work experiences); This last part is used to link the extracted entities semantically and yield contextual entities. The extracted entities are presented in RDF triples delivering a unified format for resumes and offers. Furthermore, during the IE process, the used ontology is enriched with domain instances and relations to keep up with the constant change of the relevant data. 3. Proposed Approach The e-recruitment documents are primarily published as unstructured text, which is difficult for the machine to process or exploit. The information extraction process intends to extract information and entities such as skills, diplomas, experience, job title, and others essential for job recommendations. We propose a process that will apply semantic technology to extract information and formally structure them, presenting their semantic. Figure 1 presents the different phases of our information extraction process. Figure 1. Information Extraction Process 3.1. Overview The proposed process consists of three main phases: Preprocessing that assists in extracting text from different files type and structuring it in blocks based on keywords dictionary. Information extraction is the main phase that consists of extracting entities and relations from each block of data based on defined JAPE rules and JobOnto. Information presentation enables presenting the extracted information semantically in RDF triples. 3.2. Preprocessing Phase This phase aids with text segmentation. The input is a given set of job offers or resumes with different file formats, such as doc, docx, and pdf. Those files will be processed by Tika (HTTP //tikaapache.org) to extract the raw text while table layouts, font type, and font colors will be removed. The next stage is arranging the extracted text in blocks based on the contents of each section like "skills", "education", "job experience", and "requirement". Each section usually starts with a title that describes its contents. Therefore, we started by identifying those titles and adding the "kayVal" label based on keywords dictionary. This label indicates the beginning of a block, that finishes with identifying the next section label. Some vital information does not begin with a label, such as personal information presented at the start of the resume. We define it as the first block that starts from the first line of the text and ends at the first label. Figure 2 summarize the description above. 372 Assia Brek and Zizette Boufaida Figure 2. Flowchart of the preprocessing phase 3.3. Information Extraction 3.3.1. Entities Extraction As shown in Figure 3, the entities from blocks will be extracted and structured in an XML file in this step. The resumes or the job offers are published in different templates, which means some information sections may be labeled differently, such as "projects" that can be expressed with "achievements" or "requirements" as "must have". We suggest using an XML skeleton to manage each resume/offer content to address this issue, emphasizing functional parts. Unlike the existing works, our extraction theory suggests extracting only the necessary entities for the matching module and ignoring others such as name, university name, phone number, mail, graduation year or company name. Therefore, we developed extracting rules using JAPE grammar. JAPE grammar uses features for building pattern/action rules. The feature set contains aspects such as POS tags, a dictionary of entities presented in gazetteer lists, and punctuations. Based on the tags name of the XML skeleton, the block is selected, and then a specific JAPE rule for each block is applied to the block's content to extract entities and write XML elements values (tag value and attribute value). Figure 3. Flowchart of the entity extraction step Enhancing Information Extraction Process in Job Recommendation using Semantic Technology 373 3.3.2. Semantic Labeling We faced another challenge in extracting entities when a single word may express several concepts and vice versa. For example, the term bachelor can be stated in several ways: baccalaureate, bachelor, B.S., B.A., BA/BS, etc. We suggest employing a gazetteer list (Figure 4), which provides semantically similar terms to "bachelor". This list gives major type property "degree" and minor type property "bachelor" in our example, after the annotation of type "lookup" is raised for each education degree in the text, with the main type property set to "degree", and then we may extract the minor type property. For example, the sentence "B.S. in computer science" from resume/offer, by applying the JAPE rule, the "B.S." token is selected as lookup annotation with major type "degree"; next, we will extract the minor type of this annotation, which is "bachelor". Figure 5 illustrates the JAPE rule of bachelor degree example. Figure 5. Jape rule (example of bachelor degree) Figure 4. Gazetteer list This phase result is an XML file containing the extracted entities structured based on the XML skeleton. Figure 6 presents an example of a resume XML file. Figure 6. XML file (resume example) 374 Assia Brek and Zizette Boufaida 3.3.3. Relations Extraction In this step, we aim to interlink the extracted entities based on the developed ontology (JobOnto) and the generated XML file. Those links present both hierarchical and associative relations between entities which deliver information that may not be explicitly defined in the documents. For example, adding a link between certification and language proves the required language level. Adding a link between "TCF C2" and "French" presents the required information as "Fluent in French". This step aims to semantically structure the data and implicitly enrich the JobOnto, which consequently assesses the semantic similarity between resume, and job offers entities. Figure 7 presents flowchart of this step. Figure 7. Flowchart of the relations extraction step Figure 8. JobOnto ontology Unlike the previous works, we developed one ontology that represents both resumes and offers. As shown in Figure 8, the JobOnto (job ontology) elements are inspired by the most essential and common parts of resumes and job offers, such as personal information, diplomas, skills, languages, interests, and job experiences. Moreover, the relationships (both hierarchical and associative) present the semantic links between concepts, which provide a map to match resumes and offers later. The concept "person" has its attributes that are displayed in the personal information of the resume, which includes personal information necessary in job offers such as age, sex, driving license, or address when requiring that an applicant reside in an area near the work location. Furthermore, the "job" concept represents the work experience in the resume or the Enhancing Information Extraction Process in Job Recommendation using Semantic Technology 375 job offered in the job descriptions, the applicant that occupant a job post or works on a project is gaining experience in certain skills, which is shown with the relation "has-experience". Having a certification confirms the candidate's proficiency in some skill or language, which is offered with the concept "certification" associated with concept "skill" and "language" with relation "proves". Besides the skills, degrees, certification, and job experience, considering "assets" and "interest" is crucial in resumes and offers. For example, in a job offer, the asset "team spirit" is required; this concept may not be clearly expressed in a resume but may be deduced from the interest in football or work experience in a team. The concept "document" represents the annotated document, which is defined with an "id", a "title", and "type" that indicates whether the document is a resume or job offer. Extracting relations from the ontology is based on comparing the extracted entities with the ontology instances using entity types, such as skill, diploma, job, and others, for identification of the relationships. If the extracted entity is not in the ontology instances, the JobOnto is enriched with the entity as a new instance. Relation between entities can also be deduced from the XML document structure, where the relation parent/child present forms an associative relation between the two entities. For example, in the XML file of a given resume from Figure 3, the skill "routing" is a tag child of the job "network engineer". Thus, we can add a relation between "routing" and "network engineer" titled "has experience" based on the JobOnto architecture. 3.3.4. Enrichment Despite the benefits of employing ontology to interlink entities and construct context, enriching that ontology is a significant challenge since the data constantly changes. Enriching the JobOnto is an implicit phase in the information extraction procedure. To extract relation between entities, first, we compare the entities with the ontology instances using the entity type to determine the ontology element that represents the latter and select the relation. If the entity is not in the ontology instances, we add the entity as a new instance. Based on a similarity measure, we establish a relation "related-to" between the new instance and others from the ontology of the same type. The similarity between the two instances is based on two measures: string similarity using Levenshtein distance and semantic similarity using the LCH measure [21] over WordNet dictionary, See Equation (1). 𝑆𝑖𝑚(𝑁𝑖, 𝑂𝑖) = 𝐿𝑒𝑣 (𝑁𝑖, 𝑂𝑖) + 𝐿𝑐ℎ (𝑁𝑖, 𝑂𝑖) (1) Where Ni refer to the new instance and Oi to the ontology instance. The similarity score is the sum of the two-measure value. The Levenshtein distance is used to catch similarities between the two strings (Ni, Oi), to deal with entities that are close in terms of writing such as java, java fx, java EE or angular 8, angular js; the distance is scaled by the length of the longest string as Equation (2): 𝐿𝑒𝑣 (𝑁𝑖, 𝑂𝑖) = (𝑙𝑜𝑛𝑔𝑒𝑟𝐿𝑒𝑛𝑔𝑡ℎ – 𝐿𝑒𝑣𝑒𝑛𝑠ℎ (𝑁𝑖,𝑂𝑖)) 𝑙𝑜𝑛𝑔𝑒𝑟𝐿𝑒𝑛𝑔𝑡ℎ (2) The distance score ranges between 1 and 0, 1 for totally similar and 0 for no similarity. To handle the entities that are differently written but related based on their definition like "network director" and "telecommunication manager", the instances are submitted to the LCH measure. The Leacock and Chodorow (LCH) method count the number of edges between two words in WordNet's 'is-a' hierarchy. The value is then scaled by the maximum depth of the WordNet 'is-a' hierarchy. A similarity value is obtained by taking the negative log of this scaled value. The LCH method is exploited via WS4J API. The formulation is as Equation (3): Lch(𝑁𝑖, 𝑂𝑖) = max (−𝑙𝑜𝑔 𝑆ℎ𝑜𝑟𝑡𝑒𝑠𝑡𝐿𝑒𝑛(𝑁𝑖,𝑂𝑖) 2∗𝑇𝑎𝑥𝑜𝑛𝑜𝑚𝑦𝐷𝑒𝑝𝑡ℎ ) (3) The similarity score is 0 for no similarity and superior to 0 if there is a similarity. 3.4. Information Presentation The aim of structuring and delivering the extracted information in a unified format is to facilitate the access and use of this information. Using this presentation in matching resumes/offers gains time and gives more accurate results since only the 376 Assia Brek and Zizette Boufaida entities with the same type are compared instead of all entities. In addition, the presented relations will be represented as a semantic map to match the documents and recommend the best job or candidate. Presenting information may be seen from structuring the extracted knowledge based on the relations between entities and their etiquettes, attributes, and between entities themselves. Therefore, we deliver the extracted knowledge as RDF triples. The RDF Triple is presented in 3 elements: subject, predicate, and object. Table 1 shows the possible cases of presentation. Both resume and job offer data are presented on RDF triples based on the delivered description, which grants a unified representation for the extracted information. Subject Concept Instance Instance 1 Predicate rdf: type Attribute Name Object property Object Instance Attribute value Instance 2 Table 1. RDF triples description Description This RDF triple presents the relation between the extracted entity and its type using JobOnto URI. See Figure 9 This RDF triple presents the relation between the extracted entity and the attributes; the attribute name defines the relation between the entity and the attribute value. See Figure 10 This RDF triple presents the relation between two extracted entities, defined by the object property between them. See Figure 11 Figure 9. Example 1 Figure 10. Example 2 Figure 11. Example 3 4. Evaluation To assess the performance of our process, we evaluated it on real-world data. As there is no standard dataset available for job descriptions or resumes, we gathered a dataset of 1000 job descriptions downloaded from (https://www.indeed.com/; https://www.kaggle.com/). The content of job descriptions from "Indeed" contains more complex sentences and more detailed information, unlike data from "kaggle", which contain brief information that describes only the required skills. We collected 100 resumes from (https://www.linkedin.com/) that contains information about candidates of different levels (senior, beginners) and various specialties. The collected datasets belong to the field of information technology and computer science. The collected resumes and job posts are unstructured documents in different document formats such as (.pdf) and (.doc). The experiments were conducted on a laptop machine running Windows 10 with an Intel Core i5-4300U vPro processor and 16 GB RAM. Evaluation Metrics: To evaluate the information extraction results we used the measure of precision reports how well a system can identify information from a resume/offer and recall reports what a system actually tries to extract. Thus, these two metrics can be seen as a measure of completeness and correctness, and F-measure is used as a weighted harmonic mean of precision and recall. We denote the relevant entities in the resume/offer as E = {e1, e2,...en} and the retrieved entities as Ê = { ê1, ê2,... ên}, see Equations (4)-(6). 𝑟𝑒𝑐𝑎𝑙𝑙 = 𝐸 ∩Ê 𝑝𝑒𝑟𝑐𝑖𝑠𝑖𝑜𝑛 = (4) 𝐸 𝐸 ∩Ê (5) Ê 2∗ 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛∗𝑟𝑒𝑐𝑎𝑙𝑙 𝐹 − 𝑚𝑒𝑎𝑠𝑢𝑟𝑒 = 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛+𝑟𝑒𝑐𝑎𝑙𝑙 Table 2. Results of information extraction evaluation Job offers Resumes Recall Precision F-1 Recall Precision Skills 0.880 0.911 0.895 0.948 0.961 Education 0.851 0.832 0.841 0.902 0.970 Work experience 0.792 0.810 0.800 0.843 0.830 Personal information 0.723 0.734 0.728 0.972 0.987 Assets/Interests 0.761 0.774 0.767 0.820 0.846 (6) F-1 0.954 0.934 0.836 0.979 0.832 Enhancing Information Extraction Process in Job Recommendation using Semantic Technology 377 Table 2 shows the information extraction approach results from extracting the required/acquired entities in the documents. Exploiting entity dictionaries, punctuation, and sentence syntax in the extraction rules (JAPE rules) improves the extraction module and gives fascinating results. Based on the table, we can see that the extraction results from resumes are better than job offers due to how the information is written. The resume content presented in small sample sentences differs from the job description, where sentences are more complicated, making extracting all correct entities more difficult. The results represent the efficiency of extracting detailed knowledge in each block (skills, education, work experience, personal information, assets, and interest). Based on those results, we conducted a comparative study with our approach and other systems, those that exploit semantic methods E-RecSys [19], machine learning methods PROSPECT [12], CHM [17], and others based on extraction rules SAJ [9]. We took the results directly from their papers, and we compared the precision results in some commonly treated knowledge parts as (education, skills). Figure 12 shows a comparison with the three systems that extract information from resumes from the graph; it is evident that our process performed well compared to other systems. In Figure 13, our process was compared with the SAJ system that extracts entities from job offers. Although SAJ used JAPE rules for entities, it achieved only 38% of precision in extracting education entities compared to our process that also uses JAPE rules and achieved 83%. Unlike SAJ, which exploits only domain entities dictionary on JAPE rules, we utilized both syntactic and lexical rules to extract detailed information. 120 Pecision 100 80 60 40 20 0 E-RecSys PROSPECT Education CHM our process Skills Figure 12. Comparative analysis among E-RecSys , PROSPECT, CHM, and our process 100 Percision 80 60 40 20 0 SAJ our process Education Skills Figure 13. Comparative analysis between our process and SAJ Besides measuring the precision of entities extraction, we evaluated how extracting relations and presenting information improves knowledge access and information retrieval. We selected some requirements manually from job offers to form SPARQL queries searching for candidates with those requirements. We generated SPARQL queries of different requirements (education, personal information, and work experiences), and we applied those queries over 50 resumes. 378 Assia Brek and Zizette Boufaida We compared the selected resumes using SPARQL queries with manual selecting results. Table 3 shows the obtained results and clearly shows that our extraction and presentation process delivers a valuable, structured knowledge base and is easy to access and exploit. Table 3. Results of resumes retrieval Required information Manual selection Automatic selection Education 30 30 Personal information 15 15 Work experience 12 12 5. Conclusion This research proposed a semantic-based approach to extract information from resumes/offers and presented it in a unified format. First, document content is structured in blocks based on the label of each section; next, we applied lexical, syntactic, and semantic rules (JAPE rules) to extract domain entities from each block. Furthermore, based on the domain ontology (JobOnto), we extract semantic relations to link the extracted entities and build context. Finally, the extracted information is presented in RDF triples to grant a unified format for both resume and job offer contents. The JobOnto is dynamically enriched during the process. The evaluation has been performed on a dataset of 1000 jobs and 100 resumes. The initial assessment was conducted by comparing verified data and extracted entities. Currently, we are working on a matching process that exploits the RDF file of the extracted information and the proposed ontology. We are also focusing on generalizing our process and exploring datasets of different fields. References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. Bizer, C., Heese, R., Mochol, M., Oldakowski, R., Tolksdorf, R., and Eckstein, R. The Impact of Semantic Web Technologies on Job Recruitment Processes. In Wirtschaftsinformatik, Physica, Heidelberg, pp. 1367-1381, 2005. Mochol, M., Wache, H. and Nixon, L. Improving the Accuracy of Job Search with Semantic Techniques. In International Conference on Business Information Systems, Springer, Berlin, Heidelberg, pp. 301-313, 2007. Ricci, F., Rokach, L., and Shapira, B. Recommender Systems: Introduction and Challenges. In Recommender systems handbook, Springer, Boston, MA, pp. 1-34, 2015. Al-Otaibi, S. T., and Ykhlef, M. A Survey of Job Recommender Systems. International Journal of Physical Sciences, vol. 7, no. 29, pp. 5127-5142, 2012 Tran, M. L., Nguyen, A. T., Nguyen, Q. D., and Huynh, T. A Comparison Study for Job Recommendation. In 2017 International Conference on Information and Communications (ICIC), IEEE, pp. 199-204, 2017 Dhameliya, J., and Desai, N. Job Recommender Systems: a Survey. In 2019 Innovations in Power and Advanced Computing Technologies (i-PACT), IEEE, vol. 1, pp. 1-5, 2019. Chen, J., Zhang, C., and Niu, Z. A Two-step Resume Information Extraction Algorithm. Mathematical Problems in Engineering, 2018. Kessler, R., Torres-Moreno, J. M., and El-Bèze, M. E-Gen: Automatic Job Offer Processing System for Human Resources. In Mexican international conference on artificial intelligence, Springer, Berlin, Heidelberg, pp. 985-995, 2007. Ahmed Awan, M. N., Khan, S., Latif, K., and Khattak, A. M. A New Approach to Information Extraction in User-Centric ERecruitment Systems. Applied Sciences, vol. 9, no. 14, 2019. Zhang, N. R. Hidden Markov Models for Information Extraction. Technical Report. Stanford Natural Language Processing Group, 2001. Lafferty, J., McCallum, A., and Pereira, F. C. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data, 2001. Singh, A., Rose, C., Visweswariah, K., Chenthamarakshan, V., and Kambhatla, N. PROSPECT: A System for Screening Candidates for Recruitment. In Proceedings of the 19th ACM international conference on Information and knowledge management, pp. 659-668, 2010. Yu, K., Guan, G., and Zhou, M. Resume Information Extraction with Cascaded Hybrid Model. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05), pp. 499-506, 2005. Martinez-Rodriguez, J. L., Hogan, A., and Lopez-Arevalo, I. Information Extraction Meets the Semantic Web: a Survey. Semantic Web, vol. 11, no. 2, pp. 255-335, 2020. Senthil Kumaran, V., and Sankar, A. Towards an Automated System for Intelligent Screening of Candidates for Recruitment Using Ontology Mapping (EXPERT). International Journal of Metadata, Semantics and Ontologies, vol. 8, no. 1, pp. 56-64, 2013. Çelik, D., and Elçi, A. An Ontology-based Information Extraction Approach for Résumés. In Joint international conference on pervasive computing and the networked world, Springer, Berlin, Heidelberg, pp. 165-179, 2012. Guo, S., Alamudun, F., and Hammond, T. RésuMatcher: A Personalized Résumé-job Matching System. Expert Systems with Applications, vol. 60, pp. 169-182, 2016. Yahiaoui, L., Boufaïda, Z., and Prié, Y. Semantic Annotation of Documents Applied to E-Recruitment. In SWAP, 2006. Ben Abdessalem Karaa, W., and Mhimdi, N. Using Ontology for Resume Annotation. International Journal of Metadata, Semantics and Ontologies, vol. 6, no. 3-4, pp. 166-174, 2011. Maree, M., Kmail, A. B., and Belkhatir, M. Analysis and Shortcomings of E-recruitment Systems: Towards a Semantics-based Enhancing Information Extraction Process in Job Recommendation using Semantic Technology 379 Approach Addressing Knowledge Incompleteness and Limited Domain Coverage. Journal of Information Science, vol. 45, no. 6, pp. 713-735, 2019. 21. Leacock, C., and Chodorow, M. Combining Local Context and Wordnet Similarity for Word Sense Identification. WordNet: An electronic lexical database, vol. 49, no. 2, pp. 265-283, 1998.