Intro-to-TDM

advertisement
INTRODUCTION TO DATA AND TEXT MINING
ANDREW PEASE, 8 MARCH 2013
C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
Design of Experiments
Survey Data Analysis
Analysis of Variance
Spectral Analysis
Social Network
Analysis
R Integration
Statistical
Analysis
Nonlinear
Sentiment Analysis
Data
Visualization
Vector Autoregressive
Models
Discrete Event Simulation
Network Flow
Models
Predictive
Modeling
Econometrics
Sample Size Computations
Data
Mining
Statistics
Scoring
Bayesian
Acceleration
Ensemble Models
Text
Analytics
Decision Trees
Descriptive Modeling
Gradient Boosting
Machines
Linear Programming
Interactive Matrix Programming
Matrix Programming
Multinomical Discrete
Choice
Scheduling
Reliability Analysis
Cluster Analysis
Mixed-Integer Programming
Neural
Networks
Forecasting
Process Capability Analysis
Programming
Exploratory Data
Analysis
Nonparametric
Analysis
Categorical Data
Analysis
Statistical Process
Control
Predictive
Analytics
X11 & X12 Models
Survival Analysis
D-Optimal
High Performance
Forecasting
Information
Psychometric Analysis
Theory
Mixed Models
Multivariate Analysis
Quality
Content Improvement
Interior-Point
Models
Study Planning
Analysis of Means
Genetic Algorithms
ARIMA
Models
Random Forrests
Categorization
Operations Research
Content Categorization
Ontology Managemen
Discrete Event Simulation
Association & Sequence Analysis
Constraint Programming
Automated Scoring
C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
Time Series
Analysis
Ontology
Management
Simulation
Model
Management
Regression
Large-Scale
Forecasting
Fractional Factorial
DATA MINING IS:
 Discovering patterns, trends and relationships
represented in data
 Developing models to understand and describe
characteristics and activity based on these patterns
 Use insights to help evaluate future options and take
fact-based decisions
 Deploy scores and results for timely, appropriate action
…. Past
Future ….
time….
Observed Events
C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
Predicted Events
INDUSTRY SPECIFIC DATA MINING APPLICATIONS
Application
What is Predicted?
Driven Business Decision
Credit Scoring
(Banking)
Measure credit worthiness
of new and existing set of
customers
How to assess and control risk
within existing (or new)
consumer portfolios?
Market Basket
Analysis (Retail)
Which products are likely to
purchased together?
How to increase sales with
cross-sell/up-sell, loyalty
programs, promotions?
Asset Maintenance
(Utilities, Mfg., Oil &
Gas)
Identify real drivers of asset
or equipment failure
How to minimize operational
disruptions and maintenance
costs?
Health & Condition
Mgmt. (Health
Insurance)
Identify patients at risk of a
chronic illness & offer
treatment program
How can we reduce healthcare
costs and satisfy patients?
Fraud Mgmt. (Govt., Detect unknown fraud
Insurance, Banks)
cases and future risks
How to decrease fraud losses
and lower false positives?
Drug Discovery
(Life Science)
How to bring drugs quickly and
effectively to the marketplace?
C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
Find compounds that have
desirable effects & detect
drug behavior during trials
DATA MINING METHODOLOGY
SEMMA
C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
G
Q
C
A
O
V
S
A
F
Q
T
W
M
Z
P
H
D
L
E
P
C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
E
E
N
C
H
J
W
K
W
S
W
T
X
F
Y
U
V
J
I
F
V
D
H
T
R
Y
B
T
Y
T
G
F
G
E
M
T
U
M
N
E
H
G
A
R
W
I
I
A
H
M
U
J
L
T
S
P
N
P
K
X
I
K
O
N
D
Q
S
I
D
T
B
O
J
J
F
A
W
O
N
R
C
I
U
H
M
P
B
I
Q
G
X
U
T
Y
N
G
C
U
U
E
A
T
Q
U
B
F
Z
X
P
O
SAS TEXT
ANALYTICS:
UNCOVERING THE
TECHNOLOGY
Content
Categorization
C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
Sentiment
Analysis
Text Mining
Ontology
Management
LILLEBAELT HOSPITAL (Denmark)
• Reduce error in patient
records
• Reduce manual effort
of patient record audits
RESULTS
• “If data is wrong, the basis for decision
making is also faulty. Therefore, the
Clinically Correct Time-True Registration
system makes sense even beyond our
department and hospital.”
- Sten Larsen, Chief Surgeon
• Creation of database to improving
clinical work in research and diagnosis
C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
HEALTHCARE
BUSINESS ISSUE
1823 HONG KONG EFFICIENCY UNIT
• 1823 operates round-theclock, including during
Sundays and public
holidays.
• Answers 2.65 million
calls and 98.000 e-mails,
including inquiries,
suggestions and complaints
• Developed a Compliant
Intelligence System that
uncovers the trends,
patterns and relationships
inherent in the complaints
C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
RESULTS
• "By decoding the 'messages' through
statistical and root-cause analyses of
complaints data, the government can
better understand the voice of the people,
and help government departments improve
service delivery, make informed decisions
and develop smart strategies. This in turn
helps boost public satisfaction with the
government, and build a quality city.”
- Efficiency Unit’s Assistant Director, W.
F. Yuk
PUBLIC
BUSINESS ISSUE
DATA/TEXT MINING RESEARCH CONSIDERATIONS
•
•
•
•
•
•
•
Data Mining for patent
research/control
Copyright research/control
Metadata-driven approach avoids
‘permanent’ data duplication
Analyst needs ‘creative freedom’
in combining, transforming data
User interfaces – programming
vs point-and-click
Cost to implement highly variable
Future Indications
•
In-Memory
• Big Data
• Cloud Com
C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
www.SAS.com
Download