Introduction to Weka

advertisement
A Short Introduction
to Weka
Natural Language Processing
Thursday, September 25th
What is weka?
●
Java-based Machine Learning Tool
●
Implements numerous classifiers
●
3 modes of operation
–
GUI
–
Command Line
–
Java API (not discussed here)
●
Google: ‘weka java’
weka Homepage
●
http://www.cs.waikato.ac.nz/ml/weka/
●
To run:
–
java -Xmx1024M -jar ~cs4705/bin/weka.jar &
.arff file format
●
http://www.cs.waikato.ac.nz/~ml/weka/arff.html
% 1. Title: Iris Plants Database %
@RELATION iris
@ATTRIBUTE
@ATTRIBUTE
@ATTRIBUTE
@ATTRIBUTE
@ATTRIBUTE
virginica}
sepallength NUMERIC
sepalwidth NUMERIC
petallength NUMERIC
petalwidth NUMERIC
class {Iris-setosa,Iris-versicolor, Iris-
@DATA
5.1,3.5,1.4,0.2,Iris-setosa
.arff file format
@attribute attrName {numeric, string,
<nominal>, date}

numeric: a number

nominal: a (finite) set of strings,
e.g.
{Iris-setosa,Iris-versicolor, Irisvirginica}

string: <arbitrary strings>

date: (default ISO-8601) yyyy-MMdd’T’HH:mm:ss
Example Arff Files
●
~cs4705/bin/weka-3-4-11/data/
●
iris.arff
●
soybean.arff
●
weather.arff
To Classify with weka GUI
1.Run weka GUI
7.Click 'Start'
2.Click 'Explorer'
8.Wait...
3.'Open file...'
9.Right-click on
Result list entry
4.Select 'Classify' tab
5.'Choose' a classifier
a.'Save result buffer'
6.Confirm options
b.'Save model'
Classify
●
Some classifiers to start with.
–
NaiveBayes
–
JRip
–
J48
–
SMO
●
Find References by selecting a classifier
●
Use Cross-Validation!
Analyzing Results
●
Important tools for Homework 2
–
Accuracy
●
“Correctly classified instances”
–
F-measure
–
Confusion matrix
–
Save model
–
Visualization
Running weka from the
Command Line
●
Running an N-fold cross validation
experiment
–
●
java -cp ~cs4705/bin/weka.jar
weka.classifiers.bayes.NaiveBayes -t
trainingdata.arff -x N -i
Using a predefined test set
–
java -cp ~cs4705/bin/weka.jar
weka.classifiers.bayes.NaiveBayes -t
trainingdata.arff -T testingdata.arff
●
Saving the model
–
●
Classifying a test set
–
●
java -cp ~cs4705/bin/weka.jar
weka.classifiers.bayes.NaiveBayes -t
trainingdata.arff -d output.model
java -cp ~cs4705/bin/weka.jar
weka.classifiers.bayes.NaiveBayes -l input.model
-T testingdata.arff
Getting help
–
java -cp ~cs4705/bin/weka.jar
weka.classifiers.bayes.NaiveBayes -?
Homework 2 Weka Workflow
…
T1
Your
Feature
Extractor
S1
S2
…
TN
Your
Feature
Extractor
.arff
Weka
best
model
Test
.arff
SN
results
Preprocessing
(you)
Experimentation
(you)
Weka
results
Grading
(us)
Tips for Homework Success
●
●
●
●
Start early
Read instructions carefully
Start simply
Your system should always work
–
80/20 Rule
–
Add features incrementally
–
This way, you always have something
you can turn in.
Download