Uploaded by khanhhuyenhp2k4

Web Data Scraping & Text Analysis: JobKorea Review of Org Culture

advertisement
Web Data Scraping and Text Analysis
2023 Fall Semester
by Analyzing
JobKorea Review
Organizational
Culture
HR Analytics
TEAM 2 고경주 고유진 김성현
CONTENTS
01
02
03
04
05
PROJECT
BACKGROUND
DATA
COLLECTION
ANALYSIS 1
: DERIVING O.C. SCORES
ANALYSIS 2
: COMPARING O.C.
CONCLUSION
& INSIGHTS
Problem Motivation
Literature Review
Research Objective
Data Scraping
Data Preprocessing
Exploratory Data Analysis
Sentiment Analysis
Make O.C. Dictionary
Deriving O.C. Scores
Comparison by Company
Comparison by Industry
PCA & Clustering
Insight 1 : Industry
Insight 2 : Company
*O.C. : ORGANIZATIONAL CULTURE
01
Project
Background
Problem Motivation
Literature Review
Research Objective
Problem Motivation
What is HR Analytics?
01 Project Background
How to Measure Organizational Culture?
: Data-driven approach to managing people at work
HR analytics is a data-driven approach to managing people
at work. HR analytics, also known as people analytics,
workforce analytics, or talent analytics, revolves around
analyzing people problems using data to answer critical
questions about your organization.
Not Data-Driven Approach
: Self-report survey technique
➡️ Memory bias, Response distortion
Data-Driven Approach
What is Organizational Culture?
: set of values, beliefs, attitudes, systems, and rules
that outline and influence employee behavior within
an organization
“ Organizational culture is defined as 'the way we work here' “
- Deal & Kennedy (1982)
: Text-analysis technique
➡️ Can measure raw, unfiltered culture
BUT !
Absence of Korean Text-analysis solutions
Literature Review
01 Project Background
Competing Values Model and 4 Types of Organizational Culture
Dimension of Competing Values Model
(1) Flexible / Predictable
Flexible: Emphasizing decentralization, diversity
Predictable: Emphasizing centralization, integration
(2) Internal / External
Internal: Emphasizing coordination, integration for
organizational maintenance
External: Emphasizing adaptation of organization
environment, competition, interrelationships
4 types of Organizational Culture
Collaborate, Control, Create, Compete
Research Objective
01 Project Background
Deriving Organizational Culture Scores by Analyzing Company Review Text Data
4 types of Organizational Culture
Collaborate
Focus on unity, cooperation, shared values, and participation in decision-making processes
Emphasizes personal development, human consideration, and a familial atmosphere among organizational members
Control
Emphasizes formal orders and control through hierarchical structures, adherence to rules and regulations, traditions
Organizational members value stability and efficiently managing internal relations through formal rules
Create
Values acquiring resources to support organizational adaptation and growth
Recognizes the creativity and entrepreneurial spirit of its members as central values
Autonomy and discretionary power in job execution are key to an innovative culture
Compete
Emphasizes formal orders and control through hierarchical structures, adherence to rules and regulations, traditions
Organizational members value stability and efficiently managing internal relations through formal rules
02
Data
Collection
Data Scraping
Data Preprocessing
Exploratory Data Analysis
Data Scraping
02 Data Collection
: No.1 company review website in Korea
1) Scraping Company Data
Web scraping for a total of 50 companies,
5 industry sectors X10 companies each
1) IT/Web/Telecommunications 2) Service Industry
3) Banking/Finance 4) Distribution/Trade/Transportation
5) Manufacturing/Chemical
Data Scraping
02 Data Collection
: No.1 company review website in Korea
2) Scraping Review Data
Text Data : 1) Overall comment 2) Advantage 3) Disadvantage 4) Wishes
Advantage & Disadvantage : Used for understanding organizational culture.
Wishes : Used for comparing current vs desired organizational culture
Data Preprocessing
02 Data Collection
Text Data Tokenization
Tokenize each column: Overall comment, Advantage, Disadvantage, Wishes
Tool: OKT Morphological analyzer of KoNLPy library
Procedure: Text refining Remove stopwords Generate tokens
➡️
➡️
➡️ Tag part-of-speech
EDA (Exploratory Data Analysis)
1) Company List by Industry
02 Data Collection
EDA (Exploratory Data Analysis)
02 Data Collection
2) Comparison of evaluation metrics by industry
Average Ratings of Welfare/salary, Work-life balance,
Culture, Promotion, Management by industry
Overall, the Banking/Finance & Manufacturing/Chemical
industries have high ratings
Service Industry is has low 'Welfare/salary' rating
Banking/Finance industry has high 'Work-life balance' rating
Average Rate of Recommendation, CEO approval,
Growth potential by industry
Above three indicators also show similar trends
Service Industry is has low 'Recommendation' rate
Manufacturing/Chemical Industry has high ‘Growth potential’ rate
EDA (Exploratory Data Analysis)
02 Data Collection
3) WordCloud of Reviews
Overall Comment
Advantage
Disadvantage
Wishes
03
Analysis 1
: Deriving
O.C. Scores
Sentiment Analysis
Make O.C. Dictionary
Deriving O.C. Scores
Sentiment Analysis
03 Analysis 1 : Deriving O.C. Scores
1) Lexicon Based Sentiment Analysis
Step 1. Utilize Korean NRC Dictionary
Step 2. Count Emotions based on Dictionary
Step 3. Deriving Emotional Scores for each company
Sentiment Analysis
03 Analysis 1 : Deriving O.C. Scores
2) Problem of NRC-dictionary Based Sentiment Analysis
: Unable to conduct in-depth analysis of the Organization Culture
1. NRC is general Dictionary
Not HR-specific
2. Emotion words are abstract, not specific
trust, joy, anger, fear..
SO WHAT?
➡️ For deeper HR analysis, we applied a modified sentiment analysis method
Make O.C. Dictionary
03 Analysis 1 : Deriving O.C. Scores
OCAI Dictionary Korean ver.
OCAI bag-of-words English ver.
Make Korean ver.
Step 1. Translation Based on OCAI Bag-of-Words
Step 2. Addition of Synonyms
Step 3. Mutual Review of the Word List
Increase the number of syllables in ambiguous words
ex. [뛰어] → 뛰어난, 뛰어날
Consider words that do not distinguish culture
ex. 이루, 영향
Step 4. Addition of Words Reflecting Korean Culture
ex. 꼰대, 정치질, 갑질, 눈치, 똥꼬, 탑다운, 구식, 빡세다, 탑티어
...
Deriving O.C. Scores
03 Analysis 1 : Deriving O.C. Scores
Modified Sentiment Analysis : Using OCAI Dictionary, Instead of NRC Dictionary
Step 1. Utilize OCAI Dictionary
Compete
Collaborate
Control
Create
Step 2. Count O.C. Words based on Dictionary
Sample
5 Companies
Deriving O.C. Scores
03 Analysis 1 : Deriving O.C. Scores
Modified Sentiment Analysis : Using OCAI Dictionary, Instead of NRC Dictionary
Step 3. Calculate Ratio based on number of words in each type
Step 4. Assign Representative Organizational Culture
Ex. Ratio of ‘Compete’ = ‘Compete’ word count / Total word counts
: Organizational Culture type with the highest ratio
04
Analysis 2
: Comparing
O.C.
Comparison by Company
Comparison by Industry
PCA & Clustering
Comparison by Company
Organizational Culture of 50 companies
Control
Collaborate
Compete
Many companies belong to the 'Control' culture
None in the 'Create' culture
This reflects a characteristic trait of Korean companies
04 Analysis 2 : Comparing O.C.
Comparison by Industry
04 Analysis 2 : Comparing O.C.
Industry ratio within each Organizational Culture
‘Control’ Organizational Culture
IT/
웹/
통신
서
비스
유통
무역
운송
은행
금융
제조
화학
High proportion of Banking/Finance industry
‘Collaborate’ Organizational Culture
IT/
웹/통신
서비스
유통
무역
운송
High proportion of Service industry
‘Compete’ Organizational Culture
IT/
웹/통신
유통
무역
운송
제조
화학
High proportion of IT/Web/Telecom
& Manufacturing/Chemical industry
PCA & Clustering
04 Analysis 2 : Comparing O.C.
Hypothesis: Organizational culture will show unique tendencies in each industry
1. Conduct Clustering based on 4 organizational cultures
Unify cluster and industry numbers to compare cluster and
industry classification similarities
Extreme values tend to be classified by industry as seen in
previous graph, so cluster and industry numbers are unified
based on extreme values
2. Conduct PCA for visualization
PCA & Clustering
04 Analysis 2 : Comparing O.C.
Hypothesis: Organizational culture will show unique tendencies in each industry
3. Check whether the cluster and the industrial group are similar
PCA Results
Similarity Results
In extreme cases, organizational culture is
significantly divided by industry
If clusters tend to be the same by industry, the variance should be small.
Contrary to expectations, the variance was not that small
* Same Color
= Same Industry
Accuracy Score
0.38
05
Conclusion
Insight 1 : Industry
Insight 2 : Company
Insight 1 : Industry
05 Conclusion
Can be used to understand Organizational Culture by Industry.
Generate OCAI Dictionary for Korean, reflecting unique words used exclusively in Korea
➡️ Graph reflects the typical cultural perceptions of Korean industries
‘Control’ Organizational Culture
IT/
웹/
통신
서
비스
유통
무역
운송
은행
금융
제조
화학
‘Collaborate’ Organizational Culture
IT/
웹/통신
서비스
유통
무역
운송
‘Compete’ Organizational Culture
IT/
웹/통신
유통
무역
운송
제조
화학
Insight 2 : Company
05 Conclusion
Can be used to understand Organizational Culture by Company.
1) Current Organizational Culture: using ‘Overall Comment’ column -> Red Line
2) Desired Organizational Culture: using ‘Wishes’ column -> Blue Line
➡️ By comparing these, can identify the direction of organizational culture that companies should aim for
‘A’ Company Organizational Culture
‘B’ Company Organizational Culture
compete
compete
create
IT/
웹/
통신
control
collaborate
‘C’ Company Organizational Culture
compete
control
create
collaborate
control
create
collaborate
Q&A
Download