Machine Learning with WEKA - a reminder (?) WEKA: A Machine Learning Toolkit The Explorer • based on notes by • Eibe Frank • • Department of Computer Science, University of Waikato, New Zealand • Classification and Regression Clustering Association Rules Attribute Selection Data Visualization The Experimenter The Knowledge Flow GUI Conclusions WEKA: the bird Copyright: Martin Kramer (mkramer@wxs.nl) 4/13/2015 University of Waikato 2 WEKA: the software Machine learning/data mining software written in Java (distributed under the GNU Public License) Used for research, education, and applications Complements “Data Mining” by Witten & Frank Main features: Comprehensive set of data pre-processing tools, learning algorithms and evaluation methods Graphical user interfaces (incl. data visualization) Environment for comparing learning algorithms 4/13/2015 University of Waikato 3 WEKA: versions There are several versions of WEKA: WEKA 3.0: “book version” compatible with description in data mining book 1st edition WEKA 3.2: “GUI version” adds graphical user interfaces (earlier version is command-line only) WEKA 3.4 ++ on SoC linux and ISS windows This talk is based on snapshots of WEKA 3.3 … with some extra up-to-date snapshots Only changes are “layout” and some extras 4/13/2015 University of Waikato 4 WEKA only deals with “flat” files @relation heart-disease-simplified @attribute age numeric @attribute sex { female, male} @attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina} @attribute cholesterol numeric @attribute exercise_induced_angina { no, yes} @attribute class { present, not_present} @data 63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present 67,male,asympt,229,yes,present 38,female,non_anginal,?,no,not_present ... 4/13/2015 University of Waikato 5 WEKA only deals with “flat” files @relation heart-disease-simplified @attribute age numeric @attribute sex { female, male} @attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina} @attribute cholesterol numeric @attribute exercise_induced_angina { no, yes} @attribute class { present, not_present} @data 63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present 67,male,asympt,229,yes,present 38,female,non_anginal,?,no,not_present ... 4/13/2015 University of Waikato 6 4/13/2015 University of Waikato 7 4/13/2015 University of Waikato 8 4/13/2015 University of Waikato 9 Explorer: pre-processing the data Data can be imported from a file in various formats: ARFF, CSV, C4.5, binary Data can also be read from a URL or from an SQL database (using JDBC) Pre-processing tools in WEKA are called “filters” BUT it may be easier to reformat to ARFF yourself (write a program in python / java … or just use WordPad to type in the text – but make sure format is right!), this helps with data understanding 4/13/2015 University of Waikato 10 4/13/2015 University of Waikato 11 4/13/2015 University of Waikato 12 4/13/2015 University of Waikato 13 4/13/2015 University of Waikato 14 4/13/2015 University of Waikato 15 4/13/2015 University of Waikato 16 4/13/2015 University of Waikato 17 Explorer: building “classifiers” Classifiers in WEKA are models for predicting nominal or numeric quantities Implemented learning schemes include: Decision trees and lists, instance-based classifiers, support vector machines, multi-layer perceptrons, logistic regression, Bayes’ nets, … You explore by trying different classifiers, see which works best for you… 4/13/2015 University of Waikato 18 4/13/2015 University of Waikato 19 4/13/2015 University of Waikato 20 4/13/2015 University of Waikato 21 4/13/2015 University of Waikato 22 4/13/2015 University of Waikato 23 4/13/2015 University of Waikato 24 4/13/2015 University of Waikato 25 4/13/2015 University of Waikato 26 4/13/2015 University of Waikato 27 4/13/2015 University of Waikato 28 4/13/2015 University of Waikato 29 4/13/2015 University of Waikato 30 4/13/2015 University of Waikato 31 4/13/2015 University of Waikato 32 4/13/2015 University of Waikato 33 4/13/2015 University of Waikato 34 4/13/2015 University of Waikato 35 4/13/2015 University of Waikato 36 4/13/2015 University of Waikato 37 4/13/2015 University of Waikato 38 4/13/2015 University of Waikato 39 4/13/2015 University of Waikato 40 4/13/2015 University of Waikato 41 4/13/2015 University of Waikato 42 4/13/2015 University of Waikato 43 4/13/2015 University of Waikato 44 4/13/2015 University of Waikato 45 4/13/2015 University of Waikato 46 4/13/2015 University of Waikato 47 4/13/2015 University of Waikato 48 4/13/2015 QuickTime™ and a TI FF (LZW) decompressor are needed t o see this picture. University of Waikato 49 4/13/2015 QuickTime™ and a TI FF (LZW) decompressor are needed t o see this picture. University of Waikato 50 4/13/2015 QuickTime™ and a TI FF (LZW) decompressor are needed t o see this picture. University of Waikato 51 4/13/2015 University of Waikato 52 WEKA has more… Clustering data into groups Finding associations between attributes Visualisation - online analytical processing Experimenter to run and compare different MLs Knowledge Flow GUI 3rd-party add-ons: sourceforge.net http://www.cs.waikato.ac.nz/ml/weka 4/13/2015 University of Waikato 53 WEKA from ISS PC 2009 @relation ukus @attribute center numeric @attribute centre numeric @attribute centerpercent numeric @attribute color numeric @attribute colour numeric @attribute colorpercent numeric @attribute english {UK,US} @data 1,32,3, 0,20,0, UK 0,25,0, 0,12,0, UK 9,27,25, 0,84,0, UK 0,19,0, 0,24,0, UK 0,16,0, 0,14,0, UK 0,16,0, 0,12,0, UK 0,21,0, 0,38,0, UK 0,25,0, 0,34,0, UK 2,26,7, 2,3,40, UK 2,32,5, 1,59,2, UK 31,0,100, 55,0,100, US 61,0,100, 26,0,100, US 24,0,100, 11,0,100, US 12,1,92, 21,4,84, US 8,0,100, 4,2,67, US 10,0,100, 8,0,100, US 19,0,100, 22,0,100, US 14,0,100, 7,0,100, US 14,0,100, 6,0,100, US 8,5,62, 24,0,100, US @relation test @attribute center numeric @attribute centre numeric @attribute centerpercent numeric @attribute color numeric @attribute colour numeric @attribute colorpercent numeric @attribute english {UK,US} @data 10,5,33, 0,20,0, UK WEKA has more… Clustering data into groups Finding associations between attributes Visualisation - online analytical processing Experimenter to run and compare different MLs Knowledge Flow GUI 3rd-party add-ons: sourceforge.net http://www.cs.waikato.ac.nz/ml/weka 4/13/2015 University of Waikato 76