Slides

advertisement
Machine Learning
in Practice
Lecture 3
Carolyn Penstein Rosé
Language Technologies Institute/
Human-Computer Interaction
Institute
Plan for Today

Announcements
 Assignment
 Quiz
2
1
Weka helpful hints
 Topic of the day: Input and Output
 More on cross-validation
 ARFF format

Weka Helpful Hints
Increase Heap Size
Weka Helpful Hint: Documentation!!
Click on More
button!
Output Predictions Option
Output Predictions Option
Important note: Because of the way Weka randomizes the data for
cross-validation, the only circumstance under which you can match
the instance numbers to positions in your data is if you have
separate train and test sets so the order will be preserved!
View Classifier Errors
Input and Output
Representations



Concept: the rule you
want to learn
Instance: one data
point from your training
or testing data (row in
table)
Attribute: one of the
features that an
instance is composed
of (column in table)
Numeric versus Nominal Attributes
What kind of reasoning does your
representation enable?
 Numeric attributes allow instances to be
ordered
 Numeric attributes allow you to measure
distance between instances
 Sometimes numeric attributes make too fine
grained of a distinction

.2 .25
.28 .31
.35
.45 .47 .52
.6 .63
Numeric versus Nominal Attributes

Numeric attributes can be discretized into
nominal values
 Then
you lose ordering and distance
 Another option is applying a function that maps a
range of values into a single numeric attribute

Nominal attributes can be mapped into numbers
 i.e.,
decide that blue=1 and green=2
 But are inferences made based on this valid?
.2 .25
.28 .31
.35
.45 .47 .52
.6 .63
Numeric versus Nominal Attributes

Numeric attributes can be discretized into
nominal values
 Then
you lose ordering and distance
 Another option is applying a function that maps a
range of values into a single numeric attribute

Nominal attributes can be mapped into numbers
 i.e.,
decide that blue=1 and green=2
 But are inferences made based on this valid?
.2
.2 .25
.3
.28 .31
.5
.35
.45 .47 .52
.6
.6 .63
Example!
Problem: Learn a rule that predicts how
much time a person spends doing math
problems each day
 Attributes: You know gender, age, socioeconomic status of parents, chosen field if
any
 How would you represent age, and why?
What would you expect the target rule to
look like?

Styles of Learning
Classification – learn rules from labeled
instances that allow you to assign new
instances to a class
 Association – look for relationships between
features, not just rules that predict a class
from an instance (more general)
 Clustering – look for instances that are
similar (involves comparisons of multiple
features)
 Numeric Prediction (regression models)

Food Web
http://www.cas.psu.edu/DOCS/WEBCOURSE/WETLAND/WET1/identify.html
Food Web
What else would be affected if wheat
were to disappear?
http://www.cas.psu.edu/DOCS/WEBCOURSE/WETLAND/WET1/identify.html
Food Web
How would you represent this data?
http://www.cas.psu.edu/DOCS/WEBCOURSE/WETLAND/WET1/identify.html
Food Web
What would the learned rule look like?
http://www.cas.psu.edu/DOCS/WEBCOURSE/WETLAND/WET1/identify.html
Food Web
What would the learned rule look like?
http://www.cas.psu.edu/DOCS/WEBCOURSE/WETLAND/WET1/identify.html
Food Web
http://www.cas.psu.edu/DOCS/WEBCOURSE/WETLAND/WET1/identify.html
Food Web
What if you wanted a more general rule:
i.e., Affects(Entity1, Entity2)
http://www.cas.psu.edu/DOCS/WEBCOURSE/WETLAND/WET1/identify.html
Food Web
What if you wanted a more general rule:
i.e., Affects(Entity1, Entity2)
http://www.cas.psu.edu/DOCS/WEBCOURSE/WETLAND/WET1/identify.html
Food Web
What if you wanted a more general rule:
i.e., Affects(Entity1, Entity2)
122 rows altogether!
Now let’s look at the learned rule….
http://www.cas.psu.edu/DOCS/WEBCOURSE/WETLAND/WET1/identify.html
Food Web
What if you wanted a more general rule:
i.e., Affects(Entity1, Entity2)
122 rows altogether!
Now let’s look at the learned rule….
http://www.cas.psu.edu/DOCS/WEBCOURSE/WETLAND/WET1/identify.html
Food Web
What if you wanted a more general rule:
i.e., Affects(Entity1, Entity2)
122 rows altogether!
Does it have to be this complicated?
Now let’s look at the learned rule….
http://www.cas.psu.edu/DOCS/WEBCOURSE/WETLAND/WET1/identify.html
Food Web
What would your representation for
Affects(Entity1, Entity2) look like?
http://www.cas.psu.edu/DOCS/WEBCOURSE/WETLAND/WET1/identify.html
Food Web
What would your representation for
Affects(Entity1, Entity2) look like?
http://www.cas.psu.edu/DOCS/WEBCOURSE/WETLAND/WET1/identify.html
Food Web
What would your representation for
Affects(Entity1, Entity2) look like?
http://www.cas.psu.edu/DOCS/WEBCOURSE/WETLAND/WET1/identify.html
More on CrossValidation
Cross Validation Exercise
1
2
What is the same?
What is different?
What surprises you?
3
5
4
Compare Folds with Tree Trained on
Whole
Set
1
2
3
5
4
Train Versus Test
Performance on Training Data
Performance on Testing Data
Which Model Do You Think Will Perform Best
on Test Set?
1
2
3
5
4
Fold 1
Fold 2
Fold 3
Fold 4
Fold 5
Total Performance
What do you notice?
Total Performance
Average Kappa = .5
Starting to think about Error Analyses



Step 1: Look at the confusion matrix
Where are most of the errors occurring?
What are possible explanations for systematic
errors you see?
 Are
the instances in the confusable classes too similar
to each other? If so, how can we distinguish them?
 Are we paying attention to the wrong features?
 Are we missing features that would allow us to see
commonalities within classes that we are missing?
What went wrong on Fold 3?
1
2
3
5
4
What went wrong on Fold 3?
Training Set Performance
Testing Set Performance
Hypotheses?
What went wrong on Fold 3?
Training Set Performance
Testing Set Performance
Hypotheses?
What’s the difference?
Hypothesis: Problem with first cut
Some Examples
What do you
conclude?
What do you
conclude?
Problem with Fold 3 was
probably just a sampling fluke.
Distribution of classes
different between train and
test.
Download