lecture 2 not ready - Villanova Department of Computing Sciences

advertisement
Lecture 2:
History and Overview of Machine Learning
CSC 4510 – Machine Learning
Dr. Mary-Angela Papalaskari
Department of Computing Sciences
Villanova University
Course website:
www.csc.villanova.edu/~map/4510/
CSC 4510 - M.A. Papalaskari - Villanova University
1
“It won’t truly be an autonomous
vehicle until you instruct it to drive
to work and it heads to the beach
instead.”
-Brad Templeton, Software designer and a consultant for the Google
project on Autonomous Vehicles
-NYTimes 1/24/12
-http://www.nytimes.com/2012/01/24/technology/googles-autonomous-vehicles-draw-skepticism-at-legal-symposium.html?_r=2&nl=technology&emc=techupdateema22
CSC 4510 - M.A. Papalaskari - Villanova University
2
What are the goals of AI research?
Artifacts that THINK
like HUMANS
Artifacts that ACT
like HUMANS
Artifacts that THINK
RATIONALLY
Artifacts that ACT
RATIONALLY
CSC 4510 - M.A. Papalaskari - Villanova University
3
A Bit of History
• Arthur Samuel (1959) wrote a program that learnt to play
checkers well enough to beat him.
CSC 4510 - M.A. Papalaskari - Villanova University
4
1940s
Advances in mathematical logic, information theory, concept of
neural computation
1943: McCulloch & Pitts Neuron
1948: Shannon: Information Theory
1949: Hebbian Learning
cells that fire together, wire together
1950s
Early computers. Dartmouth conference coins the phrase “artificial
intelligence” and Lisp is proposed as the AI programming language
1950: Turing Test
1956: Dartmouth Conference
1958: Friedberg: Learn Assembly Code
1959: Samuel: Learning Checkers
CSC 4510 - M.A. Papalaskari - Villanova University
5
1960s
A.I. funding increased (mainly military). Famous quote: “Within a
generation ... the problem of creating 'artificial intelligence' will
substantially be solved.”
Early symbolic reasoning approaches.
Logic Theorist, GPS, Perceptrons
1969: Minsky & Papert “Perceptrons”
1970s
A.I. “winter” – Funding dries up as people realize this is a hard
problem!
Limited computing power and dead-end frameworks lead to
failures.
eg: Machine Translation Failure
CSC 4510 - M.A. Papalaskari - Villanova University
6
1980s
Rule based “expert systems” used in medical / legal professions.
Bio-inspired algorithms (Neural networks, Genetic Algorithms).
Again: A.I. promises the world – lots of commercial investment
Expert Systems (Mycin, Dendral, EMYCIN
Knowledge Representation and reasoning:
Frames, Eurisko, Cyc, NMR, fuzzy logic
Speech Recognition (HEARSAY, HARPY, HWIM)
ML:
1982: Hopfield Nets, Decision Trees, GA & GP.
1986: Backpropagation, Explanation-Based Learning
CSC 4510 - M.A. Papalaskari - Villanova University
7
1990s
Some concrete successes begin to emerge. AI diverges into
separate fields: Computer Vision, Automated Reasoning, Planning
systems, Natural Language processing, Machine Learning…
…Machine Learning begins to overlap with statistics / probability
theory.
1992: Koza & Genetic Programming
1995: Vapnik: Support Vector Machines
CSC 4510 - M.A. Papalaskari - Villanova University
8
2000s
First commercial-strength applications: Google, Amazon, computer games, routefinding, credit card fraud detection, spam filters, etc…
Tools adopted as standard by other fields e.g. biology
CSC 4510 - M.A. Papalaskari - Villanova University
9
2010s…. ??????
CSC 4510 - M.A. Papalaskari - Villanova University
10
• Using machine learning to detect spam emails.
To: you@gmail.com
GET YOUR DIPLOMA TODAY!
If you are looking for a fast and cheap way to
get a diploma, this is the best way out for you.
Choose the desired field and degree and call
us right now: For US: 1.845.709.8044 Outside
US: +1.845.709.8044 "Just leave your NAME
& PHONE NO. (with CountryCode)" in the
voicemail. Our staff will get back to you in next
few days!
ALGORITHM
Naïve Bayes
Rule mining
CSC 4510 - M.A. Papalaskari - Villanova University
11
• Using machine learning to recommend books.
ALGORITHMS
Collaborative Filtering
Nearest Neighbour
Clustering
CSC 4510 - M.A. Papalaskari - Villanova University
12
• Using machine learning to identify faces and expressions.
ALGORITHMS
Decision Trees
Adaboost
CSC 4510 - M.A. Papalaskari - Villanova University
13
• Using machine learning to identify vocal patterns
ALGORITHMS
Feature Extraction
Probabilistic Classifiers
Support Vector Machines
+ many more….
CSC 4510 - M.A. Papalaskari - Villanova University
14
• ML for working with social network data: detecting
fraud, predicting click-thru patterns, targeted
advertising, etc etc etc .
ALGORITHMS
Support Vector Machines
Collaborative filtering
Rule mining algorithms
Many many more….
CSC 4510 - M.A. Papalaskari - Villanova University
15
Samuel’s definition of ML is still relevant
• Arthur Samuel (1959). Machine Learning:
Field of study that gives computers the ability
to learn without being explicitly programmed.
CSC 4510 - M.A. Papalaskari - Villanova University
16
Tom Mitchell (1998):
Well-posed Learning Problem
A computer program is said to learn from
experience E with respect to some task T and
some performance measure P, if its
performance on T, as measured by P,
improves with experience E.
CSC 4510 - M.A. Papalaskari - Villanova University
17
Defining the Learning Task
Improve on task, T, with respect to
performance metric, P, based on experience, E.
T: Playing checkers
P: Percentage of games won against an arbitrary opponent
E: Playing practice games against itself
T: Recognizing hand-written words
P: Percentage of words correctly classified
E: Database of human-labeled images of handwritten words
T: Driving on four-lane highways using vision sensors
P: Average distance traveled before a human-judged error
E: A sequence of images and steering commands recorded while
observing a human driver.
T: Determine which students like oranges or apples
P: Percentage of students’ preferences guessed correctly
E: Student attribute data
CSC 4510 - M.A. Papalaskari - Villanova University
18
Designing a Learning System
• Choose the training experience
• Choose exactly what is too be learned, i.e. the target function.
• Choose a learning algorithm to infer the target function from the
experience.
• A learning algorithm will also determine a performance measure
Learner
Environment/
Experience
Knowledge
Performance
Element
CSC 4510 - M.A. Papalaskari - Villanova University
19
Quick check:
Improve on task, T, with respect to
performance metric, P, based on experience, E.
Suppose your email program watches which emails you do or
do not mark as spam, and based on that learns how to better
filter spam. What is the task T in this setting?
• Watching you label emails as spam or not spam.
•Classifying emails as spam or not spam
• The number (or fraction) of emails correctly classified as
spam/not spam.
• None of the above—this is not a machine learning problem.
CSC 4510 - M.A. Papalaskari - Villanova University
20
Machine learning
• Supervised Learning
– Classification
– Regression
• Unsupervised learning
Others: Reinforcement learning, recommender
systems.
Also talk about: Practical advice for applying
learning algorithms.
CSC 4510 - M.A. Papalaskari - Villanova University
21
Machine learning
• Supervised Learning
– Classification
– Regression
• Unsupervised learning
Others: Reinforcement learning, recommender
systems.
Also talk about: Practical advice for applying
learning algorithms.
CSC 4510 - M.A. Papalaskari - Villanova University
22
Classification
• Example: Credit
scoring
• Differentiating
between low-risk
and high-risk
customers from
their income and
savings
Discriminant: IF income > θ1 AND savings > θ2
THEN low-risk ELSE high-risk
CSC 4510 - M.A. Papalaskari - Villanova University
23
Classification
• Example: Iris data
• 4 attributes
–
–
–
–
sepal length
sepal width
petal length
petal width
• Differentiating
between 3 different
types of iris
CSC 4510 - M.A. Papalaskari - Villanova University
24
Iris Data
more plots:
CSC 4510 - M.A. Papalaskari - Villanova University
25
Classification Tree
CSC 4510 - M.A. Papalaskari - Villanova University
26
Face Recognition
Training examples of a person
Test images
ORL dataset,
AT&T Laboratories, Cambridge UK
CSC 4510 - M.A. Papalaskari - Villanova University
27
Housing price prediction.
400
300
Price ($)
in 1000’s 200
100
0
0
500
1000
1500
2000
2500
Size in feet2
Supervised Learning
“right answers” given
Regression: Predict continuous
valued output (price)
CSC 4510 - M.A. Papalaskari - Villanova University
28
Machine learning
• Supervised Learning
– Classification
– Regression
• Unsupervised learning
Others: Reinforcement learning, recommender
systems.
Also talk about: Practical advice for applying
learning algorithms.
CSC 4510 - M.A. Papalaskari - Villanova University
29
Regression
• Example: Price
of a used car
• x : car attributes
y : price
y = g (x | q )
g ( ) model,
q parameters
y = wx+w0
30
CSC 4510 - M.A. Papalaskari - Villanova University
Regression Applications
• Navigating a car: Angle of the steering
• Kinematics of a robot arm
CSC 4510 - M.A. Papalaskari - Villanova University
31
Supervised Learning: Uses
• Prediction of future cases: Use the rule to
predict the output for future inputs
• Knowledge extraction: The rule is easy to
understand
• Compression: The rule is simpler than the data
it explains
• Outlier detection: Exceptions that are not
covered by the rule, e.g., fraud
CSC 4510 - M.A. Papalaskari - Villanova University
32
Quick check:
You’re running a company, and you want to develop learning algorithms to
address each of two problems.
Problem 1: You have a large inventory of identical items. You want to predict
how many of these items will sell over the next 3 months.
Problem 2: You’d like software to examine individual customer accounts, and for
each account decide if it has been hacked/compromised.
Should you treat these as classification or as regression problems?
•Treat both as classification problems.
•Treat problem 1 as a classification problem, problem 2 as a regression
problem.
•Treat problem 1 as a regression problem, problem 2 as a classification
problem.
•Treat both as regression problems.
Machine learning
• Supervised Learning
– Classification
– Regression
• Unsupervised learning
Others: Reinforcement learning, recommender
systems.
Also talk about: Practical advice for applying
learning algorithms.
CSC 4510 - M.A. Papalaskari - Villanova University
34
Supervised Learning
x
2
x
CSC 4510 - M.A. Papalaskari - Villanova University
1
35
Unsupervised Learning
x
2
x
CSC 4510 - M.A. Papalaskari - Villanova University
1
36
Unsupervised Learning
•
•
•
•
Learning “what normally happens”
No output
Clustering: Grouping similar instances
Example applications
– Customer segmentation
– Image compression: Color quantization
– Bioinformatics: Learning motifs
CSC 4510 - M.A. Papalaskari - Villanova University
37
CSC 4510 - M.A. Papalaskari - Villanova University
38
CSC 4510 - M.A. Papalaskari - Villanova University
39
CSC 4510 - M.A. Papalaskari - Villanova University
40
CSC 4510 - M.A. Papalaskari - Villanova University
41
Genes
Individuals
CSC 4510 - M.A. Papalaskari - Villanova University
[Source: Su-In Lee, Dana Pe’er, Aimee Dudley, George Church, Daphne Koller]
42
Organize computing clusters
Social network analysis
Image credit: NASA/JPL-Caltech/E. Churchwell (Univ. of Wisconsin, Madison)
Market segmentation
Astronomical data analysis
CSC 4510 - M.A. Papalaskari - Villanova University
43
Quick check:
Of the following examples, which would you address using an
unsupervised learning algorithm? (Check all that apply.)
•Given email labeled as spam/not spam, learn a spam filter.
Given a database of customer data, automatically discover market
segments and group customers into different market segments.
•Given a set of web pages found on the web, automatically
detect the ones that are syllabi for AI or software engineering
courses
•Given a dataset of patients diagnosed as either having diabetes or
not, learn to classify new patients as having diabetes or not.
•Given a database of nutrition data, automatically discover categories
of food items.
Machine learning
• Supervised Learning
– Classification
– Regression
• Unsupervised learning
Others: Reinforcement learning, recommender
systems.
Also talk about: Practical advice for applying
learning algorithms.
CSC 4510 - M.A. Papalaskari - Villanova University
45
Reinforcement Learning
•
•
•
•
•
•
Learning a policy: A sequence of outputs
No supervised output but delayed reward
Credit assignment problem
Game playing
Robot in a maze
Multiple agents, partial observability, ...
CSC 4510 - M.A. Papalaskari - Villanova University
46
Machine learning
• Supervised Learning
– Classification
– Regression
• Unsupervised learning
Others: Reinforcement learning, recommender
systems.
Also talk about: Practical advice for applying
learning algorithms.
CSC 4510 - M.A. Papalaskari - Villanova University
47
Supervised or Unsupervised learning?
Iris Data
Summary
• ML grew out of work in AI
Optimize a performance criterion using
example data or past experience.
• Types of learning
– Supervised
– Unsupervised
• Role of Statistics: Inference from a sample
• Role of Computer science:
– Data representation and modeling
– Efficient algorithms to solve optimization problems
– Representing and evaluating the model for inference
CSC 4510 - M.A. Papalaskari - Villanova University
49
Resources: Datasets
• UCI Repository:
http://www.ics.uci.edu/~mlearn/MLRepository.html
• UCI KDD Archive:
http://kdd.ics.uci.edu/summary.data.application.html
• Statlib: http://lib.stat.cmu.edu/
• Delve: http://www.cs.utoronto.ca/~delve/
CSC 4510 - M.A. Papalaskari - Villanova University
50
Resources: Journals
•
•
•
•
•
•
Journal of Machine Learning Research www.jmlr.org
Machine Learning
Neural Computation
Neural Networks
IEEE Transactions on Neural Networks
IEEE Transactions on Pattern Analysis and Machine
Intelligence
• Annals of Statistics
• Journal of the American Statistical Association
• ...
CSC 4510 - M.A. Papalaskari - Villanova University
51
Resources: Conferences
•
•
•
•
•
•
International Conference on Machine Learning (ICML)
European Conference on Machine Learning (ECML)
Neural Information Processing Systems (NIPS)
Uncertainty in Artificial Intelligence (UAI)
Computational Learning Theory (COLT)
International Conference on Artificial Neural Networks
(ICANN)
• International Conference on AI & Statistics (AISTATS)
• International Conference on Pattern Recognition (ICPR)
• ...
Some of the slides in this presentation are adapted from:
•
Prof. Frank Klassner’s ML class at Villanova
•
the University of Manchester ML course http://www.cs.manchester.ac.uk/ugt/COMP24111/
•
The Stanford online ML course http://www.ml-class.org/
CSC 4510 - M.A. Papalaskari - Villanova University
52
Download