Computational Methods for Data Analysis - clic

advertisement
Computational Methods for Data
Analysis
Massimo Poesio
INTRO TO MACHINE LEARNING
WHAT IS LEARNING
• Memorizing something
• Learning facts through observation and
exploration
• Developing motor and/or cognitive skills
through practice
• Organizing new knowledge into general,
effective representations
MACHINE LEARNING
Machine Learning
- Grew out of work in AI
-Solve problems where rules can’t be written by hand
SPAM
MACHINE LEARNING
Machine Learning
- Grew out of work in AI
- Solve problems where rules can’t be written by hand
Examples:
- Database mining
Large datasets from growth of automation/web.
E.g., Web click data, medical records, biology, engineering
- Applications can’t program by hand.
E.g., Autonomous helicopter, handwriting recognition, most of
Natural Language Processing (NLP), Computer Vision.
Machine Learning
- Grew out of work in AI
- New capability for computers
Examples:
- Database mining
Large datasets from growth of automation/web.
E.g., Web click data, medical records, biology, engineering
- Applications can’t program by hand.
E.g., Autonomous helicopter, handwriting recognition, most of
Natural Language Processing (NLP), Computer Vision.
- Self-customizing programs
E.g., Amazon, Netflix product recommendations
- Understanding human learning (brain, real AI).
Machine Learning definition
• Arthur Samuel (1959). Machine Learning:
Field of study that gives computers the ability
to learn without being explicitly programmed.
• Tom Mitchell (1998) Well-posed Learning
Problem: A computer program is said to learn
from experience E with respect to some task T
and some performance measure P, if its
performance on T, as measured by P, improves
with experience E.
“A computer program is said to learn from experience E with respect to
some task T and some performance measure P, if its performance on T,
as measured by P, improves with experience E.”
Suppose your email program watches which emails you do or do
not mark as spam, and based on that learns how to better filter
spam. What is the task T in this setting?
Classifying emails as spam or not spam.
Watching you label emails as spam or not spam.
The number (or fraction) of emails correctly classified as spam/not spam.
None of the above—this is not a machine learning problem.
A SPATIAL VIEW LEARNING
• Learning to discriminate between spam and
non-spam can be pictured as learning how to
discriminate between different types of
objects in a space
A SPATIAL VIEW OF LEARNING
The task of the learner is to
learn a function that divides
the space of examples into
black and red
EXAMPLE: SPAM
SPAM
NON-SPAM
A MORE DIFFICULT EXAMPLE
ONE SOLUTION
ANOTHER SOLUTION
LEARNING A FUNCTION
• Given a set of input / output pairs, find a
function that does a good job of expressing
the relationship:
– Wordsense disambiguation as a function from
words (the input) to their senses (the outputs)
– Categorizing email messages as a function from
emails to their category (spam, useful)
– A checker playing strategy a function from moves
to their values (winning, losing)
WAYS OF LEARNING A FUNCTION
• SUPERVISED: given a set of example input /
output pairs, find a rule that does a good job of
predicting the output associated with an input
• UNSUPERVISED learning or CLUSTERING: given a
set of examples, but no labelling, group the
examples into “natural” clusters
• REINFORCEMENT LEARNING: an agent interacting
with the world makes observations, takes action,
and is rewarded or punished; the agent learns to
take action in order to maximize reward
Supervised
Learning
EXAMPLE: HOUSING PRICE PREDICTION
400
300
Price ($)
in 1000’s 200
100
0
0
500
1000
1500
2000
2500
Size in feet2
Supervised Learning
“right answers” given
Regression: Predict continuous
valued output (price)
Spam, non-spam
Classification
Discrete valued
output (0 or 1)
1(Y)
Spam?
0(N)
Length of message (words)
Length of message (words)
- Occurrence of word
‘Nigeria’
- Occurrence of word
‘million dollars’
- …
From:
Length of message
You’re running a company, and you want to develop learning algorithms to address each
of two problems.
Problem 1: You have a large inventory of identical items. You want to predict how many
of these items will sell over the next 3 months.
Problem 2: You’d like software to examine individual customer accounts, and for each
account decide if it has been hacked/compromised.
Should you treat these as classification or as regression problems?
Treat both as classification problems.
Treat problem 1 as a classification problem, problem 2 as a regression problem.
Treat problem 1 as a regression problem, problem 2 as a classification problem.
Treat both as regression problems.
Unsupervised
Learning
Supervised Learning
x2
x1
Unsupervised Learning
x2
x1
Unsupervised Learning
x2
x1
Genes
[Source: Daphne Koller]
Individuals
Organize computing clusters
Social network analysis
Image credit: NASA/JPL-Caltech/E. Churchwell (Univ. of Wisconsin, Madison)
Market segmentation
Astronomical data analysis
Of the following examples, which would you address using an
unsupervised learning algorithm? (Check all that apply.)
Given email labeled as spam/not spam, learn a spam filter.
Given a set of news articles found on the web, group them into
set of articles about the same story.
Given a database of customer data, automatically discover market
segments and group customers into different market segments.
Given a dataset of patients diagnosed as either having diabetes or
not, learn to classify new patients as having diabetes or not.
History of Machine Learning
• 1950s
– Samuel’s checker player
– Selfridge’s Pandemonium
• 1960s:
–
–
–
–
Neural networks: Perceptron
Pattern recognition
Learning in the limit theory
Minsky and Papert prove limitations of Perceptron
• 1970s:
–
–
–
–
–
–
–
Symbolic concept induction
Winston’s arch learner
Expert systems and the knowledge acquisition bottleneck
Quinlan’s ID3
Michalski’s AQ and soybean diagnosis
Scientific discovery with BACON
Mathematical discovery with AM
44
History of Machine Learning (cont.)
• 1980s:
–
–
–
–
–
–
–
–
–
Advanced decision tree and rule learning
Explanation-based Learning (EBL)
Learning and planning and problem solving
Utility problem
Analogy
Cognitive architectures
Resurgence of neural networks (connectionism, backpropagation)
Valiant’s PAC Learning Theory
Focus on experimental methodology
• 1990s
–
–
–
–
–
–
–
Data mining
Adaptive software agents and web applications
Text learning
Reinforcement learning (RL)
Inductive Logic Programming (ILP)
Ensembles: Bagging, Boosting, and Stacking
Bayes Net learning
45
History of Machine Learning (cont.)
• 2000s
–
–
–
–
–
–
–
–
Support vector machines
Kernel methods
Graphical models
Statistical relational learning
Transfer learning
Sequence labeling
Collective classification and structured outputs
Computer Systems Applications
•
•
•
•
Compilers
Debugging
Graphics
Security (intrusion, virus, and worm detection)
– E mail management
– Personalized assistants that learn
– Learning in robotics and vision
46
READINGS
• English:
– T. Mitchell, Machine Learning, Mc-Graw Hill, ch.1
• Italian:
– R. Basili & A. Moschitti, Apprendimento
automatico, in F. Bianchini et al, Instrumentum
vocale
THANKS
• I used materials from
– Andrew Ng’s Coursera course at Stanford
– Ray Mooney’s ML course at Utexas
Download