Machine Learning with
WEKA
- a reminder (?)


WEKA: A Machine
Learning Toolkit
The Explorer
•
based on notes by
•
Eibe Frank
•
•
Department of Computer Science,
University of Waikato, New Zealand
•



Classification and
Regression
Clustering
Association Rules
Attribute Selection
Data Visualization
The Experimenter
The Knowledge
Flow GUI
Conclusions
WEKA: the bird
Copyright: Martin Kramer (mkramer@wxs.nl)
4/13/2015
University of Waikato
2
WEKA: the software




Machine learning/data mining software written in
Java (distributed under the GNU Public License)
Used for research, education, and applications
Complements “Data Mining” by Witten & Frank
Main features:
Comprehensive set of data pre-processing tools,
learning algorithms and evaluation methods
 Graphical user interfaces (incl. data visualization)
 Environment for comparing learning algorithms

4/13/2015
University of Waikato
3
WEKA: versions

There are several versions of WEKA:
WEKA 3.0: “book version” compatible with
description in data mining book 1st edition
 WEKA 3.2: “GUI version” adds graphical user
interfaces (earlier version is command-line only)
 WEKA 3.4 ++ on SoC linux and ISS windows
 This talk is based on snapshots of WEKA 3.3
 … with some extra up-to-date snapshots
 Only changes are “layout” and some extras

4/13/2015
University of Waikato
4
WEKA only deals with “flat” files
@relation heart-disease-simplified
@attribute age numeric
@attribute sex { female, male}
@attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina}
@attribute cholesterol numeric
@attribute exercise_induced_angina { no, yes}
@attribute class { present, not_present}
@data
63,male,typ_angina,233,no,not_present
67,male,asympt,286,yes,present
67,male,asympt,229,yes,present
38,female,non_anginal,?,no,not_present
...
4/13/2015
University of Waikato
5
WEKA only deals with “flat” files
@relation heart-disease-simplified
@attribute age numeric
@attribute sex { female, male}
@attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina}
@attribute cholesterol numeric
@attribute exercise_induced_angina { no, yes}
@attribute class { present, not_present}
@data
63,male,typ_angina,233,no,not_present
67,male,asympt,286,yes,present
67,male,asympt,229,yes,present
38,female,non_anginal,?,no,not_present
...
4/13/2015
University of Waikato
6
4/13/2015
University of Waikato
7
4/13/2015
University of Waikato
8
4/13/2015
University of Waikato
9
Explorer: pre-processing the data




Data can be imported from a file in various
formats: ARFF, CSV, C4.5, binary
Data can also be read from a URL or from an SQL
database (using JDBC)
Pre-processing tools in WEKA are called “filters”
BUT it may be easier to reformat to ARFF yourself
(write a program in python / java … or just use
WordPad to type in the text – but make sure
format is right!), this helps with data understanding
4/13/2015
University of Waikato
10
4/13/2015
University of Waikato
11
4/13/2015
University of Waikato
12
4/13/2015
University of Waikato
13
4/13/2015
University of Waikato
14
4/13/2015
University of Waikato
15
4/13/2015
University of Waikato
16
4/13/2015
University of Waikato
17
Explorer: building “classifiers”


Classifiers in WEKA are models for predicting
nominal or numeric quantities
Implemented learning schemes include:


Decision trees and lists, instance-based classifiers,
support vector machines, multi-layer perceptrons,
logistic regression, Bayes’ nets, …
You explore by trying different classifiers, see
which works best for you…
4/13/2015
University of Waikato
18
4/13/2015
University of Waikato
19
4/13/2015
University of Waikato
20
4/13/2015
University of Waikato
21
4/13/2015
University of Waikato
22
4/13/2015
University of Waikato
23
4/13/2015
University of Waikato
24
4/13/2015
University of Waikato
25
4/13/2015
University of Waikato
26
4/13/2015
University of Waikato
27
4/13/2015
University of Waikato
28
4/13/2015
University of Waikato
29
4/13/2015
University of Waikato
30
4/13/2015
University of Waikato
31
4/13/2015
University of Waikato
32
4/13/2015
University of Waikato
33
4/13/2015
University of Waikato
34
4/13/2015
University of Waikato
35
4/13/2015
University of Waikato
36
4/13/2015
University of Waikato
37
4/13/2015
University of Waikato
38
4/13/2015
University of Waikato
39
4/13/2015
University of Waikato
40
4/13/2015
University of Waikato
41
4/13/2015
University of Waikato
42
4/13/2015
University of Waikato
43
4/13/2015
University of Waikato
44
4/13/2015
University of Waikato
45
4/13/2015
University of Waikato
46
4/13/2015
University of Waikato
47
4/13/2015
University of Waikato
48
4/13/2015
QuickTime™ and a TI FF (LZW) decompressor are needed t o see this picture.
University of Waikato
49
4/13/2015
QuickTime™ and a TI FF (LZW) decompressor are needed t o see this picture.
University of Waikato
50
4/13/2015
QuickTime™ and a TI FF (LZW) decompressor are needed t o see this picture.
University of Waikato
51
4/13/2015
University of Waikato
52
WEKA has more…







Clustering data into groups
Finding associations between attributes
Visualisation - online analytical processing
Experimenter to run and compare different MLs
Knowledge Flow GUI
3rd-party add-ons: sourceforge.net
http://www.cs.waikato.ac.nz/ml/weka
4/13/2015
University of Waikato
53
WEKA from ISS PC
2009
@relation ukus
@attribute center numeric
@attribute centre numeric
@attribute centerpercent numeric
@attribute color numeric
@attribute colour numeric
@attribute colorpercent numeric
@attribute english {UK,US}
@data
1,32,3, 0,20,0, UK
0,25,0, 0,12,0, UK
9,27,25, 0,84,0, UK
0,19,0, 0,24,0, UK
0,16,0, 0,14,0, UK
0,16,0, 0,12,0, UK
0,21,0, 0,38,0, UK
0,25,0, 0,34,0, UK
2,26,7, 2,3,40, UK
2,32,5, 1,59,2, UK
31,0,100, 55,0,100, US
61,0,100, 26,0,100, US
24,0,100, 11,0,100, US
12,1,92, 21,4,84, US
8,0,100, 4,2,67, US
10,0,100, 8,0,100, US
19,0,100, 22,0,100, US
14,0,100, 7,0,100, US
14,0,100, 6,0,100, US
8,5,62, 24,0,100, US
@relation test
@attribute center numeric
@attribute centre numeric
@attribute centerpercent numeric
@attribute color numeric
@attribute colour numeric
@attribute colorpercent numeric
@attribute english {UK,US}
@data
10,5,33, 0,20,0, UK
WEKA has more…







Clustering data into groups
Finding associations between attributes
Visualisation - online analytical processing
Experimenter to run and compare different MLs
Knowledge Flow GUI
3rd-party add-ons: sourceforge.net
http://www.cs.waikato.ac.nz/ml/weka
4/13/2015
University of Waikato
76