weka-intro - Lecturer EEPIS

advertisement
WEKA 3.5.5
(sumber: Machine Learning with WEKA)
What is WEKA?
• Weka is a collection of machine learning algorithms for
data mining tasks.
• Weka contains tools for
–
–
–
–
–
–
data pre-processing,
classification,
regression,
clustering,
association rules, and
visualization.
• It is also well-suited for developing new machine learning
schemes.
Dataset
• A dataset is roughly equivalent to a two-dimensional
spreadsheet or database table.
• A dataset is a collection of examples.
• The external representation of an Instances class is an
ARFF file, which consists of a header describing the
attribute types and the data as comma-separated list.
Dataset - ARFF
• The ARFF Header Section
The ARFF Header section of the file contains the relation
declaration and attribute declarations.
– The @relation Declaration
The relation name is defined as the first line.
– The @attribute Declarations
Each attribute in the data set has its own @attribute
statement which uniquely defines the name and it's data
type. The order the attributes are declared indicates the
column position in the data section of the file.
ARFF - Header Section
ARFF - Data Types
• The <datatype> can be any of the types:
– Numeric: can be real or integer numbers.
• integer is treated as numeric
• real is treated as numeric
– Nominal
– String
– Date
• The keywords numeric, real, integer, string and date
are case insensitive.
ARFF - Data Types Example
• @ATTRIBUTE sepallength NUMERIC
• @ATTRIBUTE class {Iris-setosa,Iris-versicolor,Irisvirginica}
• @ATTRIBUTE LCC string
• @attribute <name> date [<date-format>]
default format: yyyy-MM-dd'T'HH:mm:ss
ARFF - Data Section
ARFF - Data Section ..
• The ARFF Data section of the file contains the data
declaration line and the actual instance lines.
– The @data Declaration
The @data declaration is a single line denoting the start of the
data segment in the file.
– The instance data
•
•
•
•
•
Each instance on a single line
Attribute values delimited by commas
The order agreed the declaration in header section
Missing values are represented by a single question mark
Values of string and nominal attributes are case sensitive, and any
that contain space must be quoted
Create an ARFF file
Create an ARFF file ..
WEKA 3.5.5
Program
• LogWindow Opens a log window that captures all that is
printed to stdout or stderr. Useful for environments like
MS Windows, where WEKA is not started from a
terminal.
• Exit Closes WEKA.
Program .. LogWindow
Applications
• Explorer: for exploring data with WEKA.
• Experimenter: for performing experiments and
conducting statistical tests between learning schemes.
• KnowledgeFlow: supports essentially the same
functions as the Explorer but with a drag-and-drop
interface. One advantage is that it supports incremental
learning.
• SimpleCLI: Provides a simple command-line interface
that allows direct execution of WEKA commands for
operating systems that do not provide their own
command line interface.
Tools
• ArffViewer An MDI application for viewing ARFF files in
spreadsheet format.
• SqlViewer represents an SQL worksheet, for querying
databases via JDBC.
• EnsembleLibrary An interface for generating setups for
Ensemble Selection.
ArffViewer
SqlViewer
EnsembleLibrary
Visualization
• Plot For plotting a 2D plot of a dataset.
• ROC Displays a previously saved ROC curve.
• TreeVisualizer For displaying directed graphs, e.g., a
decision tree.
• GraphVisualizer Visualizes XML BIF or DOT format
graphs, e.g., for Bayesian networks.
• BoundaryVisualizer Allows the visualization of classifier
decision boundaries in two dimensions.
Windows
• Minimize Minimizes all current windows.
• Restore Restores all minimized windows again.
Download