A Short Introduction to Weka Natural Language Processing Thursday, September 25th What is weka? ● Java-based Machine Learning Tool ● Implements numerous classifiers ● 3 modes of operation – GUI – Command Line – Java API (not discussed here) ● Google: ‘weka java’ weka Homepage ● http://www.cs.waikato.ac.nz/ml/weka/ ● To run: – java -Xmx1024M -jar ~cs4705/bin/weka.jar & .arff file format ● http://www.cs.waikato.ac.nz/~ml/weka/arff.html % 1. Title: Iris Plants Database % @RELATION iris @ATTRIBUTE @ATTRIBUTE @ATTRIBUTE @ATTRIBUTE @ATTRIBUTE virginica} sepallength NUMERIC sepalwidth NUMERIC petallength NUMERIC petalwidth NUMERIC class {Iris-setosa,Iris-versicolor, Iris- @DATA 5.1,3.5,1.4,0.2,Iris-setosa .arff file format @attribute attrName {numeric, string, <nominal>, date} numeric: a number nominal: a (finite) set of strings, e.g. {Iris-setosa,Iris-versicolor, Irisvirginica} string: <arbitrary strings> date: (default ISO-8601) yyyy-MMdd’T’HH:mm:ss Example Arff Files ● ~cs4705/bin/weka-3-4-11/data/ ● iris.arff ● soybean.arff ● weather.arff To Classify with weka GUI 1.Run weka GUI 7.Click 'Start' 2.Click 'Explorer' 8.Wait... 3.'Open file...' 9.Right-click on Result list entry 4.Select 'Classify' tab 5.'Choose' a classifier a.'Save result buffer' 6.Confirm options b.'Save model' Classify ● Some classifiers to start with. – NaiveBayes – JRip – J48 – SMO ● Find References by selecting a classifier ● Use Cross-Validation! Analyzing Results ● Important tools for Homework 2 – Accuracy ● “Correctly classified instances” – F-measure – Confusion matrix – Save model – Visualization Running weka from the Command Line ● Running an N-fold cross validation experiment – ● java -cp ~cs4705/bin/weka.jar weka.classifiers.bayes.NaiveBayes -t trainingdata.arff -x N -i Using a predefined test set – java -cp ~cs4705/bin/weka.jar weka.classifiers.bayes.NaiveBayes -t trainingdata.arff -T testingdata.arff ● Saving the model – ● Classifying a test set – ● java -cp ~cs4705/bin/weka.jar weka.classifiers.bayes.NaiveBayes -t trainingdata.arff -d output.model java -cp ~cs4705/bin/weka.jar weka.classifiers.bayes.NaiveBayes -l input.model -T testingdata.arff Getting help – java -cp ~cs4705/bin/weka.jar weka.classifiers.bayes.NaiveBayes -? Homework 2 Weka Workflow … T1 Your Feature Extractor S1 S2 … TN Your Feature Extractor .arff Weka best model Test .arff SN results Preprocessing (you) Experimentation (you) Weka results Grading (us) Tips for Homework Success ● ● ● ● Start early Read instructions carefully Start simply Your system should always work – 80/20 Rule – Add features incrementally – This way, you always have something you can turn in.