Document

advertisement

Amit Sethi, EEE, IIT G @ Cepstrum, Oct 16, 2011

1

Objectives:

 Understand what is machine learning

 Motivate why it has become so important

 Identify Types of learning and salient frameworks, algorithms and their utility

 Take a sneak peak at the next set of problems

2

 What is learning?

 Why learn?

 Types of learning and salient frameworks

 Frontiers

3

 Example: Learning to ride a bicycle

 T: Task of learning to ride a bicycle

 P: Performance of balancing while moving

 E: Experience of riding in many situations

 Is it wise to memorize all situations and appropriate responses by observing an expert?

4

Improve on task, T, with respect to performance metric, P, based on experience, E.

T: Playing checkers

P: Percentage of games won against an arbitrary opponent

E: Playing practice games against itself

T: Recognizing hand-written words

P: Percentage of words correctly classified

E: Database of human-labeled images of handwritten words

T: Driving on four-lane highways using vision sensors

P: Average distance traveled before a human-judged error

E: A sequence of images and steering commands recorded while observing a human driver.

T: Categorize email messages as spam or legitimate.

P: Percentage of email messages correctly classified.

E: Database of emails, some with human-given labels

Source: Introduction to Machine Learning by Raymond J. Mooney

5

Determine f such that y n

=f(x n

) and g(y, x) is minimized for unseen values of y and x pairs.

Form of f is fixed, but some parameters can be tuned:

 So, y=f

θ

(x ), where, x is observed, and y needs to be inferred

 e.g. y=1, if mx > c, 0 otherwise, so θ = (m,c)

Machine Learning is concerned with designing algorithms that learn “better” values of θ given “more” x (and y) for a given problem

6

What is the scope of the task?

How will performance be measured?

How should learning be approached?

 Scalability:

 How can we learn fast?

 How much resources are needed to learn?

 Generalization:

 How will it perform in unseen situations?

 Online learning:

 Can it learn and improve while performing the task?

7

Artificial Intelligence

Data Mining

Probability and Statistics

Information theory

Numerical optimization

Adaptive Control Theory

Neurobiology

Psychology (cognitive, perceptual, dev.)

Linguistics

8

 What is learning?

 Why learn?

 Types of learning and salient frameworks

 Frontiers

9

Develop systems that are too difficult/expensive to construct manually because they require specific detailed skills or knowledge tuned to a specific task ( knowledge engineering bottleneck ).

Develop systems that can automatically adapt and customize themselves to individual users.

 Personalized news or mail filter

 Personalized tutoring

Discover new knowledge from large databases ( data mining ).

 Market basket analysis (e.g. diapers and beer)

 Medical text mining (e.g. migraines to calcium channel blockers to magnesium)

Source: Introduction to Machine Learning by Raymond J. Mooney

10

 Computational studies of learning may help us understand learning in humans and other biological organisms.

 Hebbian neural learning

▪ “Neurons that fire together, wire together.”

 Power law of practice log(# training trials)

Source: Introduction to Machine Learning by Raymond J. Mooney

11

 Many basic effective and efficient algorithms available

 Large amounts of data available

 Large amounts of computational resources available

Source: Introduction to Machine Learning by Raymond J. Mooney

12

Automatic vehicle navigation

• Road recognition

• Automatic navigation

Speech recognition

• Speech to text

• Automated services over the phone

Face detection

• Facebook face tagging suggestions

• Camera autofocus for portraits

13

 What is learning?

 Why learn?

 Types of learning and salient frameworks

 Frontiers

14

 Remember, y=f

θ

(x)?

 y can be continuous or categorical

 y may be known for some x or none at all

 f can be simple (e.g. linear) or complex

 f can incorporate some knowledge of how x was generated or be blind to the generation

 etc…

15

Supervised learning:

 For, y=f

θ

(x), a set of x i, y i

(usually classes) are known

 Now predict y j

Examples: for new x j

 Two classes of protein with given amino acid sequences

 Labeled male and female face images

16

In a nutshell:

 Input is non-linearly transformed by hidden layers usually a “fuzzy” linearly classified combination

 Output is a linear combination of the hidden layer

Use when:

 Want to model a non-linear function

 Labeled data is available

 Don’t want to write new s/w

Variations:

 Competitive learning for classification

 Many more…

17

In a nutshell:

 Learns optimal boundary between two classes (red line)

Use when:

 Labeled class data is available

 Want to minimize chance of error in the test case

Variations:

 Non-linear mapping of the input vectors using “Kernels”

18

Unsupervised learning:

 For, y=f

θ

(x), only a set of x i are known

 Predict y, such that y is simpler than x but retains its essence

Examples:

 Clustering (when y is a class label)

 Dimensionality reduction

(when y is continuous)

19

In a nutshell:

 Grouping a similar objects based on a definition of similarity

 That is, intra vs. inter cluster similarity, e.g. distance from center of the cluster

Use when:

 Class labels are not available, but you have a desired number of clusters in mind

Variations:

 Different similarity measures

 Automatic detection of number of clusters

 Online clustering

20

In a nutshell:

 High dimensional data, where not all dimensions are independent, e.g. (x

1

, x

2, x

3

), where x

3

=ax

1

+bx

2

+c

Use when:

 You want to perform linear dimensionality reduction

Variations:

 ICA

 Online PCA

21

In a nutshell:

 Learning a lower-dimensional manifold (e.g. surface) close to which the data lies

Use when:

 You want to perform nonlinear dimensionality reduction

Variations:

 SOM

22

Generative models:

 For, y=f

θ

(x), we have some idea of how x was generated given x and θ

Examples:

 HMMs: Given phonemes and {age, gender}, we know how the speech can be generated

 Bayesian Networks: Given {gender, age, race} we have some idea of what a face will look like for different emotions

23

Discriminative Models:

 Do not care about how the data was generated

 Finding the right features is of prime importance

 Followed by finding the right classifier

Examples:

 SVM

 MLP

Source: “Automatic Recognition of Facial Actions in Spontaneous Expressions” by Bartlett et al in Journal of Multimedia, Sep 2006

24

 What is learning?

 Why learn?

 Types of learning and salient frameworks

 Frontiers

25

1980s:

 Advanced decision tree and rule learning

 Explanation-based Learning (EBL)

 Learning and planning and problem solving

 Utility problem

Analogy

 Cognitive architectures

Resurgence of neural networks (connectionism, backpropagation)

Valiant’s PAC Learning Theory

 Focus on experimental methodology

1990s

 Data mining

Adaptive software agents and web applications

Text learning

Reinforcement learning (RL)

 Inductive Logic Programming (ILP)

 Ensembles: Bagging, Boosting, and Stacking

 Bayes Net learning

Source: Introduction to Machine Learning by Raymond J. Mooney

26

 2000s

 Support vector machines

 Kernel methods

 Graphical models

 Statistical relational learning

Transfer learning

Sequence labeling

 Collective classification and structured outputs

 Computer Systems Applications

▪ Compilers

▪ Debugging

▪ Graphics

▪ Security (intrusion, virus, and worm detection)

 E mail management

 Personalized assistants that learn

 Learning in robotics and vision

Source: Introduction to Machine Learning by Raymond J. Mooney

27

Bioinformatics

• Gene expression prediction (just scratched the surface)

• Automated drug discovery

Speech recognition

• Context recog., e.g. for digital personal assistants (SiRi?)

• Better than Google translate; imagine visiting Brazil

Image and video processing

• Automatic event detection in video

• “Seeing” software for the blind

28

Robotics

• Where is my iRobot?

• Would you raise a “robot” child and make it learn?

Advanced scientific calculations

• Weather modeling through prediction

• Vector field or FEM calculation through prediction

Who knows…

• Always in search of new problems

29

 Learning the structure of classifiers

 Automatic feature discovery and active learning

 Discovering the limits of learning

 Information theoretic bounds?

 Learning that never ends

 Explaining human learning

 Computer languages with ML primitives

Adapted from: “The Discipline of Machine Learning” by Tom Mitchell, 2006

30

Thank you!

31

Inference: Using a system to get the output variable for a given input variable

Learning: Changing parameters according to an algorithm to improve performance

Training: Using machine learning algorithm to learn function parameters based on input and (optionally) output dataset known as “training set”

Validation and Testing: Using inference (without training) to test the performance of the learned system on data

Offline learning: When all training happens prior to testing, and no learning takes place during testing

Online learning: When learning and testing happen for the same data

32

Download