Document

Amit Sethi, EEE, IIT G @ Cepstrum, Oct 16, 2011

1

Objectives:

 Understand what is machine learning

 Motivate why it has become so important

 Identify Types of learning and salient frameworks, algorithms and their utility

 Take a sneak peak at the next set of problems

2

 What is learning?

 Why learn?

 Types of learning and salient frameworks

 Frontiers

3

 Example: Learning to ride a bicycle

 T: Task of learning to ride a bicycle

 P: Performance of balancing while moving

 E: Experience of riding in many situations

 Is it wise to memorize all situations and appropriate responses by observing an expert?

4

Improve on task, T, with respect to performance metric, P, based on experience, E.

T: Playing checkers

P: Percentage of games won against an arbitrary opponent

E: Playing practice games against itself

T: Recognizing hand-written words

P: Percentage of words correctly classified

E: Database of human-labeled images of handwritten words

T: Driving on four-lane highways using vision sensors

P: Average distance traveled before a human-judged error

E: A sequence of images and steering commands recorded while observing a human driver.

T: Categorize email messages as spam or legitimate.

P: Percentage of email messages correctly classified.

E: Database of emails, some with human-given labels

Source: Introduction to Machine Learning by Raymond J. Mooney

5







Determine f such that y n

=f(x n

) and g(y, x) is minimized for unseen values of y and x pairs.

Form of f is fixed, but some parameters can be tuned:

 So, y=f

θ

(x ), where, x is observed, and y needs to be inferred

 e.g. y=1, if mx > c, 0 otherwise, so θ = (m,c)

Machine Learning is concerned with designing algorithms that learn “better” values of θ given “more” x (and y) for a given problem

6







What is the scope of the task?

How will performance be measured?

How should learning be approached?

 Scalability:

 How can we learn fast?

 How much resources are needed to learn?

 Generalization:

 How will it perform in unseen situations?

 Online learning:

 Can it learn and improve while performing the task?

7



















Artificial Intelligence

Data Mining

Probability and Statistics

Information theory

Numerical optimization

Adaptive Control Theory

Neurobiology

Psychology (cognitive, perceptual, dev.)

Linguistics

8


 Why learn?


 Frontiers

9







Develop systems that are too difficult/expensive to construct manually because they require specific detailed skills or knowledge tuned to a specific task ( knowledge engineering bottleneck ).

Develop systems that can automatically adapt and customize themselves to individual users.

 Personalized news or mail filter

 Personalized tutoring

Discover new knowledge from large databases ( data mining ).

 Market basket analysis (e.g. diapers and beer)

 Medical text mining (e.g. migraines to calcium channel blockers to magnesium)


10

 Computational studies of learning may help us understand learning in humans and other biological organisms.

 Hebbian neural learning

▪ “Neurons that fire together, wire together.”

 Power law of practice log(# training trials)


11

 Many basic effective and efficient algorithms available

 Large amounts of data available

 Large amounts of computational resources available


12

Automatic vehicle navigation

• Road recognition

• Automatic navigation

Speech recognition

• Speech to text

• Automated services over the phone

Face detection

• Facebook face tagging suggestions

• Camera autofocus for portraits

13


 Why learn?


 Frontiers

14

 Remember, y=f

θ

(x)?

 y can be continuous or categorical

 y may be known for some x or none at all

 f can be simple (e.g. linear) or complex

 f can incorporate some knowledge of how x was generated or be blind to the generation

 etc…

15





Supervised learning:

 For, y=f

θ

(x), a set of x i, y i

(usually classes) are known

 Now predict y j

Examples: for new x j

 Two classes of protein with given amino acid sequences

 Labeled male and female face images

16







In a nutshell:

 Input is non-linearly transformed by hidden layers usually a “fuzzy” linearly classified combination

 Output is a linear combination of the hidden layer

Use when:

 Want to model a non-linear function

 Labeled data is available

 Don’t want to write new s/w

Variations:

 Competitive learning for classification

 Many more…

17







In a nutshell:

 Learns optimal boundary between two classes (red line)

Use when:

 Labeled class data is available

 Want to minimize chance of error in the test case

Variations:

 Non-linear mapping of the input vectors using “Kernels”

18





Unsupervised learning:

 For, y=f

θ

(x), only a set of x i are known

 Predict y, such that y is simpler than x but retains its essence

Examples:

 Clustering (when y is a class label)

 Dimensionality reduction

(when y is continuous)

19







In a nutshell:

 Grouping a similar objects based on a definition of similarity

 That is, intra vs. inter cluster similarity, e.g. distance from center of the cluster

Use when:

 Class labels are not available, but you have a desired number of clusters in mind

Variations:

 Different similarity measures

 Automatic detection of number of clusters

 Online clustering

20







In a nutshell:

 High dimensional data, where not all dimensions are independent, e.g. (x

1

, x

2, x

3

), where x

3

=ax

1

+bx

2

+c

Use when:

 You want to perform linear dimensionality reduction

Variations:

 ICA

 Online PCA

21







In a nutshell:

 Learning a lower-dimensional manifold (e.g. surface) close to which the data lies

Use when:

 You want to perform nonlinear dimensionality reduction

Variations:

 SOM

22





Generative models:

 For, y=f

θ

(x), we have some idea of how x was generated given x and θ

Examples:

 HMMs: Given phonemes and {age, gender}, we know how the speech can be generated

 Bayesian Networks: Given {gender, age, race} we have some idea of what a face will look like for different emotions

23





Discriminative Models:

 Do not care about how the data was generated

 Finding the right features is of prime importance

 Followed by finding the right classifier

Examples:

 SVM

 MLP

Source: “Automatic Recognition of Facial Actions in Spontaneous Expressions” by Bartlett et al in Journal of Multimedia, Sep 2006

24


 Why learn?


 Frontiers

25





1980s:

 Advanced decision tree and rule learning

 Explanation-based Learning (EBL)

 Learning and planning and problem solving

 Utility problem





Analogy

 Cognitive architectures





Resurgence of neural networks (connectionism, backpropagation)





Valiant’s PAC Learning Theory

 Focus on experimental methodology

1990s

 Data mining

Adaptive software agents and web applications

Text learning

Reinforcement learning (RL)

 Inductive Logic Programming (ILP)

 Ensembles: Bagging, Boosting, and Stacking

 Bayes Net learning


26

 2000s

 Support vector machines

 Kernel methods

 Graphical models

 Statistical relational learning





Transfer learning

Sequence labeling

 Collective classification and structured outputs

 Computer Systems Applications

▪ Compilers

▪ Debugging

▪ Graphics

▪ Security (intrusion, virus, and worm detection)

 E mail management

 Personalized assistants that learn

 Learning in robotics and vision


27

Bioinformatics

• Gene expression prediction (just scratched the surface)

• Automated drug discovery

Speech recognition

• Context recog., e.g. for digital personal assistants (SiRi?)

• Better than Google translate; imagine visiting Brazil

Image and video processing

• Automatic event detection in video

• “Seeing” software for the blind

28

Robotics

• Where is my iRobot?

• Would you raise a “robot” child and make it learn?

Advanced scientific calculations

• Weather modeling through prediction

• Vector field or FEM calculation through prediction

Who knows…

• Always in search of new problems

29

 Learning the structure of classifiers

 Automatic feature discovery and active learning

 Discovering the limits of learning

 Information theoretic bounds?

 Learning that never ends

 Explaining human learning

 Computer languages with ML primitives

Adapted from: “The Discipline of Machine Learning” by Tom Mitchell, 2006

30

Thank you!

31













Inference: Using a system to get the output variable for a given input variable

Learning: Changing parameters according to an algorithm to improve performance

Training: Using machine learning algorithm to learn function parameters based on input and (optionally) output dataset known as “training set”

Validation and Testing: Using inference (without training) to test the performance of the learned system on data

Offline learning: When all training happens prior to testing, and no learning takes place during testing

Online learning: When learning and testing happen for the same data

32

Document

Thank you!

Related documents

Products

Support

Document

Thank you!

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib