Web Data Scraping and Text Analysis 2023 Fall Semester by Analyzing JobKorea Review Organizational Culture HR Analytics TEAM 2 고경주 고유진 김성현 CONTENTS 01 02 03 04 05 PROJECT BACKGROUND DATA COLLECTION ANALYSIS 1 : DERIVING O.C. SCORES ANALYSIS 2 : COMPARING O.C. CONCLUSION & INSIGHTS Problem Motivation Literature Review Research Objective Data Scraping Data Preprocessing Exploratory Data Analysis Sentiment Analysis Make O.C. Dictionary Deriving O.C. Scores Comparison by Company Comparison by Industry PCA & Clustering Insight 1 : Industry Insight 2 : Company *O.C. : ORGANIZATIONAL CULTURE 01 Project Background Problem Motivation Literature Review Research Objective Problem Motivation What is HR Analytics? 01 Project Background How to Measure Organizational Culture? : Data-driven approach to managing people at work HR analytics is a data-driven approach to managing people at work. HR analytics, also known as people analytics, workforce analytics, or talent analytics, revolves around analyzing people problems using data to answer critical questions about your organization. Not Data-Driven Approach : Self-report survey technique ➡️ Memory bias, Response distortion Data-Driven Approach What is Organizational Culture? : set of values, beliefs, attitudes, systems, and rules that outline and influence employee behavior within an organization “ Organizational culture is defined as 'the way we work here' “ - Deal & Kennedy (1982) : Text-analysis technique ➡️ Can measure raw, unfiltered culture BUT ! Absence of Korean Text-analysis solutions Literature Review 01 Project Background Competing Values Model and 4 Types of Organizational Culture Dimension of Competing Values Model (1) Flexible / Predictable Flexible: Emphasizing decentralization, diversity Predictable: Emphasizing centralization, integration (2) Internal / External Internal: Emphasizing coordination, integration for organizational maintenance External: Emphasizing adaptation of organization environment, competition, interrelationships 4 types of Organizational Culture Collaborate, Control, Create, Compete Research Objective 01 Project Background Deriving Organizational Culture Scores by Analyzing Company Review Text Data 4 types of Organizational Culture Collaborate Focus on unity, cooperation, shared values, and participation in decision-making processes Emphasizes personal development, human consideration, and a familial atmosphere among organizational members Control Emphasizes formal orders and control through hierarchical structures, adherence to rules and regulations, traditions Organizational members value stability and efficiently managing internal relations through formal rules Create Values acquiring resources to support organizational adaptation and growth Recognizes the creativity and entrepreneurial spirit of its members as central values Autonomy and discretionary power in job execution are key to an innovative culture Compete Emphasizes formal orders and control through hierarchical structures, adherence to rules and regulations, traditions Organizational members value stability and efficiently managing internal relations through formal rules 02 Data Collection Data Scraping Data Preprocessing Exploratory Data Analysis Data Scraping 02 Data Collection : No.1 company review website in Korea 1) Scraping Company Data Web scraping for a total of 50 companies, 5 industry sectors X10 companies each 1) IT/Web/Telecommunications 2) Service Industry 3) Banking/Finance 4) Distribution/Trade/Transportation 5) Manufacturing/Chemical Data Scraping 02 Data Collection : No.1 company review website in Korea 2) Scraping Review Data Text Data : 1) Overall comment 2) Advantage 3) Disadvantage 4) Wishes Advantage & Disadvantage : Used for understanding organizational culture. Wishes : Used for comparing current vs desired organizational culture Data Preprocessing 02 Data Collection Text Data Tokenization Tokenize each column: Overall comment, Advantage, Disadvantage, Wishes Tool: OKT Morphological analyzer of KoNLPy library Procedure: Text refining Remove stopwords Generate tokens ➡️ ➡️ ➡️ Tag part-of-speech EDA (Exploratory Data Analysis) 1) Company List by Industry 02 Data Collection EDA (Exploratory Data Analysis) 02 Data Collection 2) Comparison of evaluation metrics by industry Average Ratings of Welfare/salary, Work-life balance, Culture, Promotion, Management by industry Overall, the Banking/Finance & Manufacturing/Chemical industries have high ratings Service Industry is has low 'Welfare/salary' rating Banking/Finance industry has high 'Work-life balance' rating Average Rate of Recommendation, CEO approval, Growth potential by industry Above three indicators also show similar trends Service Industry is has low 'Recommendation' rate Manufacturing/Chemical Industry has high ‘Growth potential’ rate EDA (Exploratory Data Analysis) 02 Data Collection 3) WordCloud of Reviews Overall Comment Advantage Disadvantage Wishes 03 Analysis 1 : Deriving O.C. Scores Sentiment Analysis Make O.C. Dictionary Deriving O.C. Scores Sentiment Analysis 03 Analysis 1 : Deriving O.C. Scores 1) Lexicon Based Sentiment Analysis Step 1. Utilize Korean NRC Dictionary Step 2. Count Emotions based on Dictionary Step 3. Deriving Emotional Scores for each company Sentiment Analysis 03 Analysis 1 : Deriving O.C. Scores 2) Problem of NRC-dictionary Based Sentiment Analysis : Unable to conduct in-depth analysis of the Organization Culture 1. NRC is general Dictionary Not HR-specific 2. Emotion words are abstract, not specific trust, joy, anger, fear.. SO WHAT? ➡️ For deeper HR analysis, we applied a modified sentiment analysis method Make O.C. Dictionary 03 Analysis 1 : Deriving O.C. Scores OCAI Dictionary Korean ver. OCAI bag-of-words English ver. Make Korean ver. Step 1. Translation Based on OCAI Bag-of-Words Step 2. Addition of Synonyms Step 3. Mutual Review of the Word List Increase the number of syllables in ambiguous words ex. [뛰어] → 뛰어난, 뛰어날 Consider words that do not distinguish culture ex. 이루, 영향 Step 4. Addition of Words Reflecting Korean Culture ex. 꼰대, 정치질, 갑질, 눈치, 똥꼬, 탑다운, 구식, 빡세다, 탑티어 ... Deriving O.C. Scores 03 Analysis 1 : Deriving O.C. Scores Modified Sentiment Analysis : Using OCAI Dictionary, Instead of NRC Dictionary Step 1. Utilize OCAI Dictionary Compete Collaborate Control Create Step 2. Count O.C. Words based on Dictionary Sample 5 Companies Deriving O.C. Scores 03 Analysis 1 : Deriving O.C. Scores Modified Sentiment Analysis : Using OCAI Dictionary, Instead of NRC Dictionary Step 3. Calculate Ratio based on number of words in each type Step 4. Assign Representative Organizational Culture Ex. Ratio of ‘Compete’ = ‘Compete’ word count / Total word counts : Organizational Culture type with the highest ratio 04 Analysis 2 : Comparing O.C. Comparison by Company Comparison by Industry PCA & Clustering Comparison by Company Organizational Culture of 50 companies Control Collaborate Compete Many companies belong to the 'Control' culture None in the 'Create' culture This reflects a characteristic trait of Korean companies 04 Analysis 2 : Comparing O.C. Comparison by Industry 04 Analysis 2 : Comparing O.C. Industry ratio within each Organizational Culture ‘Control’ Organizational Culture IT/ 웹/ 통신 서 비스 유통 무역 운송 은행 금융 제조 화학 High proportion of Banking/Finance industry ‘Collaborate’ Organizational Culture IT/ 웹/통신 서비스 유통 무역 운송 High proportion of Service industry ‘Compete’ Organizational Culture IT/ 웹/통신 유통 무역 운송 제조 화학 High proportion of IT/Web/Telecom & Manufacturing/Chemical industry PCA & Clustering 04 Analysis 2 : Comparing O.C. Hypothesis: Organizational culture will show unique tendencies in each industry 1. Conduct Clustering based on 4 organizational cultures Unify cluster and industry numbers to compare cluster and industry classification similarities Extreme values tend to be classified by industry as seen in previous graph, so cluster and industry numbers are unified based on extreme values 2. Conduct PCA for visualization PCA & Clustering 04 Analysis 2 : Comparing O.C. Hypothesis: Organizational culture will show unique tendencies in each industry 3. Check whether the cluster and the industrial group are similar PCA Results Similarity Results In extreme cases, organizational culture is significantly divided by industry If clusters tend to be the same by industry, the variance should be small. Contrary to expectations, the variance was not that small * Same Color = Same Industry Accuracy Score 0.38 05 Conclusion Insight 1 : Industry Insight 2 : Company Insight 1 : Industry 05 Conclusion Can be used to understand Organizational Culture by Industry. Generate OCAI Dictionary for Korean, reflecting unique words used exclusively in Korea ➡️ Graph reflects the typical cultural perceptions of Korean industries ‘Control’ Organizational Culture IT/ 웹/ 통신 서 비스 유통 무역 운송 은행 금융 제조 화학 ‘Collaborate’ Organizational Culture IT/ 웹/통신 서비스 유통 무역 운송 ‘Compete’ Organizational Culture IT/ 웹/통신 유통 무역 운송 제조 화학 Insight 2 : Company 05 Conclusion Can be used to understand Organizational Culture by Company. 1) Current Organizational Culture: using ‘Overall Comment’ column -> Red Line 2) Desired Organizational Culture: using ‘Wishes’ column -> Blue Line ➡️ By comparing these, can identify the direction of organizational culture that companies should aim for ‘A’ Company Organizational Culture ‘B’ Company Organizational Culture compete compete create IT/ 웹/ 통신 control collaborate ‘C’ Company Organizational Culture compete control create collaborate control create collaborate Q&A