data-mining_uii

advertisement
Seminar Data Mining
Business Trouble and Industrial Applications
Lab Data Mining, Teknik Industri
Universitas Islam Indonesia
10 Mei, 2008
Budi Santosa








3/14/2016
Pendahuluan
Data
Association rules
Klasifikasi
Clustering
Aplikasi data mining
Commercial tools
Kesimpulan
Budi Santosa



3/14/2016
Apa data mining?
Mengapa kita perlu untuk ‘mine’ data?
Jenis data seperti apa yang bisa kita
‘mine’?
Budi Santosa

Data mining adalah gabungan metode-metode analisis data secara
statistik dan algoritma-algoritma untuk memproses data berukuran
besar. Data mining merupakan proses menemukan informasi atau
pola yang penting dalam basis data berukuran besar.
Bagian dari proses Knowledge Discovery in Data (KDD).

Explorasi dan analisis large quantities of data

Dengan tools secara automatic or semi-automatic

Menemukan meaningful patterns dan rules. Patterns ini
memungkinkan suatu company untuk

better understand its customers
improve its marketing, sales, and customer support operations
3/14/2016
Budi Santosa
3/14/2016
Budi Santosa
Pertumbuhan yang explosive dalam data collection
 Penyimpanan data dalam data warehouses
 Ketersediaan akses data yang semakin meningkat dari Web
dan intranet
 Kita perlu menemukan cara yang lebih efektif untuk
menggunakan data ini dalam proses decision support dari
sekedar menggunakan traditional querry languages
3/14/2016
Budi Santosa



Structure - 3D Anatomy
Data warehouses
Transactional
databases
Advanced
database
systems
Function – 1D Signal
Metadata – Annotation

Spacial and
Temporal
 Time-series
 Multimedia, text
 WWW
…
3/14/2016
Budi Santosa
GeneFilter Comparison Report
GeneFilter 1 Name:
GeneFilter 1
O2#1 8-20-99adjfinal
N2#1finaladj
INTENSITIES
RAW NORMALIZED
ORF NAME
GENE NAME
CHRM F
G
R
YAL001C
TFC3 1
1 A 1 2 12.03 7.38
YBL080C
PET112
2
1 A 1 3 53.21
YBR154C
RPB5 2
1 A 1 4 79.26 78.51
YCL044C
3
1 A 1 5 53.22 44.66
YDL020C
SON1 4
1 A 1 6 23.80 20.34
YDL211C
4
1 A 1 7 17.31 35.34
YDR155C
CPH1 4
1 A 1 8 349.78
YDR346C
4
1 A 1 9 64.97 65.88
YAL010C
MDM10 1
1 A 2 2 13.73 9.61
YBL088C
TEL1 2
1 A 2 3 8.50 7.74
YBR162C
2
1 A 2 4 226.84
Name:
GF1 GF2
403.83
35.62 "1,786
"2,660.73"
"1,786.53"
799.06
581.00
401.84
"2,180.87"
461.03
285.38
293.83



3/14/2016
Kebanyakan algoritma data mining cocok hanya
untuk data numerik
Semua data seharusnya direpresentasikan sebagai
bilangan/data numerik sehingga algoritma bisa
diterapkan
Data sales, crime rates, text, atau images, kita
harus menemukan cara yang tepat untuk
mentransform data menjadi bilangan/number.
Budi Santosa
Non-trivial extraction of implicit, unknown, and
potentially useful information from databases.
o Proses Knowledge discovery terdiri dari fase:

3/14/2016
Budi Santosa

Prediksi: Bagaimana perilaku atribut tertentu dalam
data dimasa datang? (predictive)




Time series
Pattern Sequence
Independent-dependent relation
Klasifikasi: mengelompokkan data ke dalam kategori
berdasarkan sampel yang ada (label diskrit)

Feature selection

Clustering:
mengklasterkan obyek tanpa ada
sampel sebagai contoh (descriptive)

Association: object association
3/14/2016
Budi Santosa

Tujuan
 Memberikan aturan yang berkaitan dengan
kehadiran set item dengan set item yang lain
 Contoh:
3/14/2016
Budi Santosa

Market-basket model
Mencari kombinasi beberapa produk
Letakkan SHOES dekat dengan SOCK sehingga jika
seorang customer membeli satu dia akan membeli
yang lain
 Transaksi: seseorang membeli beberapa
items dalam itemset di supermarket
3/14/2016
Budi Santosa
married
Yes
no
salary
Acct balance
>5k
<20k
Poor risk
>=20k
<50k
>=50
Fair risk Good risk
Budi Santosa
<5k
Poor risk
age
<25
Fair risk
3/14/2016
>=25
Good risk
RID
Married
Salary
Acct balance
Age
Loanworthy
1
No
>=50
<5k
>=25
Yes
2
Yes
>=50
>=5k
>=25
Yes
3
Yes
20k..50k
<5k
<25
No
4
No
<20k
>=5k
<25
No
5
No
<20k
<5k
>=25
No
6
Yes
20k..50k
>=5k
>=25
Yes
Expected information
I ( S1 , S 2 ,...S n )   pi log 2 pi
i 1
I(3,3)=1
<20k
20k..50k
age
Class is “no” {4,5}
Entropy
n
E ( A)  
S j1  ...  S jn
j 1
S
<25
* I ( S j1 ,..., S jn )
3/14/2016
E(Salary)=0.33
Gain(Salary)=0.67
E(A.balance)=0.82
Gain(A.balance)=0.18
>=50k
Class is “yes” {1,2}
>=25
Class is “no” {3} Class is “yes” {6}
Information gain
Gain(A) = I-E(A)
E(Married)=0.92
Gain(Married)=0.08
E(Age)=0.81
Gain(Age)=0.19
Salary
n
Class
attribute
Budi Santosa
Ex# Country Marital
Status
Income
1
England Single
125K
2
England Married
3
England Single
4
Italy
Married
5
USA
6
England Married
7
England
hooligan
Country Marital
Status
Income
Yes
England Single
75K
?
70K
Yes
Turkey
50K
?
40K
No
England Married
150K
?
Divorced 95K
No
Divorced 90K
?
60K
Yes
Single
40K
?
20K
Yes
Married
80K
?
Yes
Itlay
Married
Hooligan
10
8
Italy
Single
85K
Yes
9
France
Married
75K
No
10
Denmark Single
50K
No
Training
Set
10
3/14/2016
Budi Santosa
Learn
Classifier
Test
Set
Model
Ex#
Hooligan
1
2
3
4
5
6
7
8
An English football fan
…
During a game in Italy
…
England has been
beating France …
Italian football fans were
cheering …
An average USA
salesman earns 75K
The game in London
was horrific
Manchester city is likely
to win the championship
Rome is taking the lead
in the football league
Yes
Hooligan
Yes
Yes
No
A Danish football fan
?
Turkey is playing vs. France.
The Turkish fans …
?
10
No
Yes
Test
Set
Yes
Yes
10
Training
Set
3/14/2016
Budi Santosa
Learn
Classifier
Model

Klastering adalah proses mengelompokkan
obyek-obyek yang mirip ke dalam satu klaster.
 Obyek bisa berasal dari data base customer,
produk, gen, mahasiswa, dsb.
3/14/2016
Budi Santosa

Berapa Konsep





Salah satu hal yang sangat penting adalah
penggunaan ukuran kemiripan (similarity)
Jika datanya numerik, fungsi kemiripan ( similarity
function) berdasarkan jarak sering digunakan
Euclidean metric (Euclidean distance), Minkowsky
metric, Manhattan metric.
Korelasi, cosinus, kovariance
Hiraki, Kmeans, Fuzzy, SOM, Support Vector
Clustering
jarak (rj , rk )  | rj1  rk1 | ... | rjn  rkn |
2
3/14/2016
Budi Santosa
2
3/14/2016
Budi Santosa






Cuaca
Bisnis
Mikrobiologi
Market analysis
Manufacturing and production
Fraud detection dan detection of unusual
patterns (outliers)

Telecommunication
 Financial transactions
3/14/2016
Budi Santosa

Text mining (news group, email, documents)
and Web mining

3/14/2016
DNA and bio-data analysis

Diseases outcome

Effectiveness of treatments

Identify new drugs
Budi Santosa
Cuaca
54 km
Chandler
180 km
North
Azimuth angle
Chandler
54 km
WSR-88D
records
digital
database
containing 3 variables: velocity (V),
reflectivity (Z), and spectrum width (W).




The current Mesocyclone Detection Algorithm (MDA) was
created at the National Severe Storms Laboratory (NSSL) ,
Oklahoma, to work with native variables derived from the WSR88D
In order to detect circulations associated with vortices that spin
up into tornadoes, the velocity data are exploited
The data are measured for circulation depth, height above the
ground, strength of the circulation, shear (change in wind
speed or direction with distance), etc.
By relaxing previous threshold values, the MDA is capable of
detecting weaker circulations that may eventually spin up into
mesocyclones (thereby enhancing the probability of detection)
3/14/2016
1. base (m) [0-12000]
2. depth (m) [0-13000]
3. strength rank [0-25]
4. low-level diameter (m) [0-15000]
5. maximum diameter (m) [0-15000]
6. height of maximum diameter (m) [012000]
7. low-level rotational velocity (m/s) [0-65]
8. maximum rotational velocity (m/s) [0-65]
9. height of maximum rotational velocity
(m) [0-12000]
10. low-level shear (m/s/km) [0-175]
11. maximum shear (m/s/km) [0-175]
12. height of maximum shear (m) [0-12000]
3/14/2016
13. low-level gate-to-gate velocity difference
(m/s) [0-130]
14. maximum gate-to-gate velocity difference
(m/s) [0-130]
15. height of maximum gate-to-gate velocity
difference (m) [0-12000]
16. core base (m) [0-12000]
17. core depth (m) [0-9000]
18. age (min) [0-200]
19. strength index (MSI) wghtd by avg density of
integrated layer [0-13000]
20. strength index (MSIr) "rank" [0-25]
21. relative depth (%) [0-100]
22. low-level convergence (m/s) [0-70]
23. mid-level convergence (m/s) [0-70]



Bisa kah saya menggunakan contact
lenses?
Possible output: none, soft, hard.
Decision berdasar pada:




3/14/2016
- age
- spectacle prescription
- astigmatism
- tear production rate
Budi Santosa
umur
resep
astigmatism
tear
p.r.
lenses
muda
miope
tidak
kurang
Tdk perlu
muda
miope
tidak
normal
soft
muda
hypermetrope ya
kurang
Tdk perlu
prepresbyopic
miope
tidak
kurang
Tdk perlu
presbyopic
miope
tidak
normal
hard
3/14/2016
Budi Santosa









3/14/2016
A set of “if-then” rules
A decision tree
A Neural Network
SVM, LSVM, LS-SVM
LDA
KNN
Minimax Prob Machine
Analytic Center Machine
Relevance Vector
Machine
Budi Santosa
28









3/14/2016
If umur = muda and astigmatic = tidak dan tear production
rate = normal then rekomendasi = soft
If age = pre-presbyopic and astigmatic = no and tear
production rate = normal then rekomendasi = soft
If age = presbyopic and spectacle prescription = myope and
astigmatic = no then rekomendasi = none
If spectacle prescription = hypermetrope and astigmatic = no
and tear production rate = normal then rekomendasi = soft
If spectacle prescription = myope and astigmatic = yes and
tear production rate = normal then rekomendasi = hard
If age = young and astigmatic = yes and tear production rate =
Normal then rekomendasi = hard
If age = pre-presbyopic and spectacle prescription =
hypermetrope and astigmatic = yes then rekomendasi = none
If age = presbyopic and spectacle prescription = hypermetrope
and astigmatic = yes then rekomendasi = none
Budi Santosa
3/14/2016
Budi Santosa



Regression is similar to classification
 First, construct a model
 Second, use model to predict unknown value
Methods
 Linear and multiple regression
 Non-linear regression, Neural network, SVR
Regression is different from classification
 Classification refers to predict categorical class label
 Regression models continuous-valued functions
2004/09/09
31

Contoh: pemakai Credit card bisa
diklasterkan menurut

Berapa sering menggunakan kartu:
• frequent/seldom usage
• domestic/foreign transactions
• high/low amounts of money
• transactions of specific type
•…
Untuk setiap klaster, sistem fraud detection bisa
dikembangkan. Atau sejumlah produk yang lain yang
bisa ditawarkan

3/14/2016
Budi Santosa

Attribute 1: (qualitative)
Status of existing checking account
A11 :
... < 0 DM
A12 : 0 <= ... < 200 DM
A13 :
... >= 200 DM /salary assignments for at least 1 year
A14 : no checking account
Attribute 2: (numerical)
Duration in month
Attribute 3: (qualitative)
Credit history
A30 : no credits taken/all credits paid back duly
A31 : all credits at this bank paid back duly
A32 : existing credits paid back duly till now
A33 : delay in paying off in the past
A34 : critical account/other credits existing (not at this bank)
3/14/2016
Budi Santosa

Attribute 4: (qualitative)
Purpose
A40 : car (new)
A41 : car (used)
A42 : furniture/equipment
A43 : radio/television
A44 : domestic appliances
A45 : repairs
A46 : education
A47 : (vacation - does not exist?)
A48 : retraining
A49 : business
A410 : others
3/14/2016
Budi Santosa
Attribute 15: (qualitative)
Housing
A151 : rent
A152 : own
A153 : for free
Attribute 16: (numerical)
Number of existing credits at this bank
Attribute 17: (qualitative)
Job
A171 : unemployed/ unskilled - non-resident
A172 : unskilled - resident
A173 : skilled employee / official
A174 : management/ self-employed/
highly qualified employee/ officer
Checking
account
3/14/2016
durasi
Credit
hist
purpose
Budi Santosa
amount
…
Good or bad
Cross selling salah satu aplikasi data mining penting yang
lain
Apa yang merupakan best additional or best next offer
(BNO) untuk setiap customer?
Misal, sebuah bank ingin bisa menjual automobile insurance
ketika seorang customer mendapatkan car loan
Bank tersebut mungkin memutuskan untuk mendapatkan a
full-service insurance agency
3/14/2016
Budi Santosa
36
A major manufacturer of diesel engines must also service engines
under warranty
Warranty claims come in from all around the world
Data mining is used to determine rules for routing claims
some are automatically approved
others require further research
Result: The manufacturer saves millions of dollars
Data mining also enables insurance companies and the Fed.
Government to save millions of dollars by not paying fraudulent
medical insurance claims
3/14/2016
Budi Santosa
37
A cellular phone company wanted to introduce a new service
They wanted to know which customers were the most likely
prospects
Data mining identified “sphere of influence” as a key indicator of
likely prospects
Sphere of influence is the number of different telephone numbers
that someone calls
3/14/2016
Budi Santosa
38
Clustering is an undirected data mining technique that
finds groups of similar items
Based on previous purchase patterns, customers are
placed into groups
Customers in each group are assumed to have an
affinity for the same types of products
New product recommendations can be generated
automatically based on new purchases made by the
group
This is sometimes called collaborative filtering
3/14/2016
Budi Santosa
39
Microbiology
3/14/2016
Budi Santosa
Biology Application Domain
validasi
Data Analysis
Microarray
Experiment
Experiment
Design and
Hypothesis
3/14/2016
Image
Analysis
Data
Mining
Data
Warehouse
Knowledge discovery
in databases (KDD)
Budi Santosa
41


Enterprise Resources Planning (ERP) systems
generate large volumes of data.
Examples of data sources in manufacturing
include:
Schedules.
Production
capacity, efficiency, failures, etc.
Manufacturing parameters.
Process quality.
Process plans.
3/14/2016
Budi Santosa
3/14/2016
Budi Santosa
3/14/2016
Budi Santosa
The learning stage focuses on discovering knowledge from
manufacturing processes:
Step 1: Similar parts and processes are grouped into clusters.
Step 2: Relevant processes are associated with each cluster.
The exploitation stage takes advantage of the clusters to
improve the efficiency of generation of process plans for new
parts:
Step 3: A new part to be manufactured is matched with a suitable
cluster.
Step 4: The new part is assigned the relevant process plan.
The specialization stage adapts the relevant process for the
new part:
Step 5: The relevant process is adapted to the new part.
Step 6: The new process plan data is incorporated into the database.
3/14/2016
Budi Santosa
3/14/2016
Budi Santosa
Data Mining to select supplier
Input feature set of a performance measure for suppliers
Feature
Content
Feature
Content
Fl
Quality of material (0, 1, 2, 3)
F10
Warranty (0/1)
F2
Track record (0, 1, 2, 3)
F11
Warehousing (0, 1, 2)
F3
Technical ability (0, 1, 2)
F12
Reliability (%)
F4
Tools and equipment (0, 1, 2, 3)
F13
Efficiency (%)
F5
Safety practices (0, 1, 2,3)
F14
Dependability (0, 1, 2)
F6
Deliveries/shipments (0, 1, 2, 3)
F15
Frequency of rejects (time/year)
F7
Conformance to standards (0, 1, 2)
F16
Failure rate (%)
F8
Applicability of product (0, 1, 2)
F17
Offered price (0, 1, 2, 3)
F9
Product development (0, 1)
F18
Responsiveness to bidding (0, 1, 2)
3/14/2016
Budi Santosa


Perencanaan dimulai dari forecasting demand
Dari demand forecasting didapatkan petunjuk:
Apa saja bahan yang dibutuhkan? Berapa
kebutuhan per jenis bahan?
 Alokasi tenaga kerja
Apa saja variabel yang diperlukan?
harga, nilai promosi, promosi pesaing, usia customer,
permintaan masa lalu
Hybrid time series forecasting dan causal relation

3/14/2016
Budi Santosa


Given a set of sequences, find the complete set of frequent
subsequences
SID
sequence
10
<a(abc)(ac)d(cf)>
20
<(ad)c(bc)(ae)>
30
<(ef)(ab)(df)cb>
40
<eg(af)cbc>
Given support threshold
min_sup =2, <(ab)c> is a
sequential pattern
Applications of sequential pattern
 Customer shopping sequences:
 First buy computer, then CD-ROM, and then digital camera,
within 3 months.
 Weblog click streams
 Telephone calling patterns
49





Direct mailing: siapa yang harus ditawari produk
tertentu?
Remote sensing: menentukan water pollution
dari spectral images
Forecast beban: prediksi permintaan untuk
electric power
Intelligent ATM’s : how much cash will be there
tomorrow?
City-planning: Identifying groups of houses according
to their house type, value, and geographical location
3/14/2016
Budi Santosa
50
Beberapa tahun lalu, UPS mempunyai masalah
dengan pekerjanya/pemogokan
FedEx mendapati volumenya meningkat
Setelah pemogokan, volume FedEx jatuh
FedEx mengidentifikasi kustomer yang dulu pindah
dan pindah lagi ke jasa lain
Kustomer ini menggunakan UPS lagi
FedEx memberikan special offers pada Kustomer ini
agar mau menggunakan FedEx
51
Can you find co-location patterns from the following sample dataset?
3/14/2016
Jawab:
and Budi Santosa
3/14/2016
Budi Santosa
Improves profit by limiting campaign to most
likely responders
Reduces costs by excluding individuals least
likely to respond
Using RFM : recency, frequency, monetary
54
Predicts response rates to help staff call
centers, with inventory control, etc.
Identifies most important channel for each
customer
Discovers patterns in customer data
55
A model takes a number of inputs, which
often come from databases, and it produces
one or more outputs
Sometimes, the purpose is to build the best
model
The best model yields the most accurate
output
Such a model may be viewed as a black box
Sometimes, the purpose is to better
understand what is happening
This model is more like a gray box
56
Actual
Predicted
Yes
No
Yes
800
50
No
50
100
There are 1000 records in
the model set
When the model predicts
Yes, it is right 800/850 =
94% of the time
When the model predicts No, it is right
100/150 = 67% of the time
57
The model is correct 800 times in predicting
Yes
The model is correct 100 times in predicting
No
The model is wrong 100 times in total
The overall prediction accuracy is
900/1000 = 90%
58





MSE
SSE
MAPE
MAD
R2
3/14/2016
Budi Santosa
Data mining is a tool to achieve goals
The goal is better service to customers
Only people know what to predict
Only people can make sense of rules
Only people can make sense of
visualizations
Only people know what is reasonable, legal,
tasteful
Human decision makers are critical to the
data mining process
60
Analyze available data (from the past)
Discover patterns, facts, and associations
Apply this knowledge to future actions
61
Does past data contain the important
business drivers?
e.g., demographic data
Is the business environment from the past
relevant to the future?
in the e-commerce era, what we know about the
past may not be relevant to tomorrow
users of the web have changed since late 1990s
Are the data mining models created from
past data relevant to the future?
have critical assumptions changed?
62
Form a learning relationship with your
customers
Notice their needs
On-line Transaction Processing Systems
Remember their preferences
Decision Support Data Warehouse
Learn how to serve them better
Data Mining
Act to make customers more profitable
63
Several years ago, Land’s End could not recognize regular
Christmas shoppers
some people generally don’t shop from catalogs
but spend hundreds of dollars every Christmas
if you only store 6 months of history, you will miss them
Victoria’s Secret builds customer loyalty with a no-hassle
returns policy
some “loyal customers” return several expensive outfits
each month
they are really “loyal renters”
64
Channels are the way a company interfaces
with its customers
Examples
Direct mail
Email
Banner ads
Telemarketing
Customer service centers
Messages on receipts
Key data about customers come from
channels
65
Channels are the source of data
Channels are the interface to customers
Channels enable a company to get a
particular message to a particular customer
Channel management is a challenge in
organizations
CRM is about serving customers through all
channels
66
The FBI handles numerous, complex cases
such as the Unabomber case
Leads come in from all over the country
The FBI and other law enforcement agencies
sift through thousands of reports from field
agents looking for some connection
Data mining plays a key role in FBI forensics
67








3/14/2016
An application of data mining for marketing in telecommunication
Application of data mining to customer profile analysis in the power
electricity
Conditional Market Segmentation by Neural Networks
cluster analysis in Industrial market
marketing segmentation using support vector
Using data mining for manufacturing process selection
Data mining application in credit card business
…..
Budi Santosa
More often, a customer is an account
Retail banking
checking account, mortgage, auto loan, …
Telecommunications
long distance, local, ISP, mobile, …
Insurance
auto policy, homeowners, life insurance, …
Utilities
The account-level view of a customer also
misses the boat since each customer can
have multiple accounts
69
Childhood
birth, school, graduation, …
Young Adulthood
choose career, move away from parents, …
Family Life
marriage, buy house, children, divorce, …
Retirement
sell home, travel, hobbies, …
Much marketing effort is directed at each
stage of life
70
It is difficult to identify the appropriate events
graduation, retirement may be easy
marriage, parenthood are not so easy
many events are “one-time”
Companies miss or lose track of valuable
information
a man moves
a woman gets married, changes her last name,
and merges her accounts with spouse
It is hard to track your customers so closely,
but, to the extent that you can, many
marketing opportunities arise
71
Customers begin as prospects
Prospects indicate interest
fill out credit card applications
apply for insurance
visit your website
They become new customers
After repeated purchases or usage, they
become established customers
Eventually, they become former customers
either voluntarily or involuntarily
72
Business Processes Organize Around the Customer
Lifecycle
Acquisition
Activation
Relationship Management
Winback
Former
Customer
High
Value
Prospect
New
Customer
Established
Customer
High
Potential
Low
Value
Voluntary
Churn
Forced
Churn
73
Prospects receive marketing messages
When they respond, they become new customers
They make initial purchases
They become established customers and are
targeted by cross-sell and up-sell campaigns
Some customers are forced to leave (cancel)
Some leave (cancel) voluntarily
Others simply stop using the product (e.g., credit
card)
Winback/collection campaigns
74
The purpose of data warehousing is to keep
this data around for decision-support
purposes
Charles Schwab wants to handle all of their
customers’ investment dollars
Schwab observed that customers started
with small investments
75
By reviewing the history of many customers,
Schwab discovered that customers who
transferred large amounts into their Schwab
accounts did so soon after joining
After a few months, the marketing cost
could not be justified
Schwab’s marketing strategy changed as a
result
76
Prospect acquisition
Prospect product propensity
Best next offer
Forced churn
Voluntary churn
Bottom line: We use data mining to predict
certain events during the customer lifecycle
77
Prediction uses data from the past to make
predictions about future events (“likelihoods”
and “probabilities”)
Profiling characterizes past events and
assumes that the future is similar to the past
(“similarities”)
Description and visualization find patterns in
past data and assume that the future is similar
to the past
78
We use the noun churn as a synonym for
attrition
We use the verb churn as a synonym for leave
Why study attrition?
it is a well-defined problem
it has a clear business value
we know our customers and which ones are valuable
we can rely on internal data
the problem is well-suited to predictive modeling
79
Focus on keeping high-value customers
Focus on keeping high-potential customers
Allow low-potential customers to leave,
especially if they are costing money
Don’t intervene in every case
Topic should be called “managing customer
attrition”
80

Weka, (Waikato Environment for Knowledge
Analysis) is a Java-based data mining tool
developed by Waikato University.

RapidMiner, http://www.rapidminer.com
3/14/2016
Budi Santosa





Oracle Data Miner
http://www.oracle.com/technology/products/bi/odm/od
miner.html
Data To Knowledge
http://alg.ncsa.uiuc.edu/do/tools/d2k
SAS
http://www.sas.com/
Clementine
http://spss.com/clemetine/
Intelligent Miner
3/14/2016
Budi Santosa
http://www-306.ibm.com/software/data/iminer/


3/14/2016
Data mining is a “decision support”
process in which we search for patterns of
information in data.
This technique can be used on many types
of data.
Budi Santosa




Michael Berry and Gordon Linoff, Customer Relationship Management Through Data
Mining, SAS Institute, 2000

Michael Berry and Gordon Linoff, Mastering Data Mining, John Wiley & Sons, 2000

Trafalis, T.B., M. Richman, and B. Santosa,"Prediction of Rainfall from WSR-88D Radar
Using Support Vector Regression", ASME Press, (2002). Book Published of Collection:
C.H. Dagli, A.L. Buczak, J. Ghosh, M.J. Embrechts, O. Ersoy, and S.W. Kercel, Intelligent
Engineering Systems Through Artificial Neural Networks, Vol. 12 (pp. 639-644).


3/14/2016
Budi Santosa, Data Mining Teknik pemanfaatan data untuk keperluan bisnis
A. Kusiak, International Journal of Production Research,Vol. 44,Data mining:
manufacturing and service applications,
Bruno Agard, Data mining for selection of Manufacturing processes, Data mining and
knowledge discovery handbook
Theodore B. Trafalis, Budi Santosa, and Michael B. Richman , “Learning Networks for
Tornado Detection”, International Journal of General Systems, 2005
Sumber dari internet
Budi Santosa
Download