COE 292 Introduction to Artificial Intelligence Machine Learning Content on these slides was mostly developed by Dr. Akram F. Ahmed, COE Dept. 9/13/2022 INTRODUCTION TO AI 1 Outline • Introduction • AI and ML • ML Strategies and Paradigms • ML Training and Evaluation • Supervised Learning • Unsupervised Learning 9/13/2022 INTRODUCTION TO AI 2 Introduction • Learning is one of the most important activities of human beings and living beings in general, which help us in adapting to the environment. • Learning involves making changes in the way of learning to improve inference and knowledge acquisition. • Most of the knowledge in the real world is not formalized and not available in textual form which makes it difficult for computers to learn and infer. • The concept of learning is based on the principles of training the computing machines, and enabling them to teach themselves. 9/13/2022 INTRODUCTION TO AI 3 Machine Learning (ML) • Machine learning aims to create theories and procedures -learning algorithms, that allow machines to learn. • The computation undertaken by a learning system can be viewed as occurring at two distinct times, training time and consultation time. • Training Time: β¦ the time prior to consultation time where data points are presented to train the machine learning algorithm. In other words, its the time spent by new systems training to get ready to consultation time. • Consultation Time: β¦ the time between when a data point is presented to the system and the time when the inference is completed. 9/13/2022 INTRODUCTION TO AI 4 AI and Machine Learning 9/13/2022 INTRODUCTION TO AI 5 Learning Strategies • In the learning process, the learner transforms the information provided by a teacher (or environment) into some new form that can be used in the future. • The nature of this knowledge and transformation are the deciding factors for the type of learning strategy used. 9/13/2022 INTRODUCTION TO AI 6 Learning Strategies • Rote Learning: When learning, the original knowledge is copied in the same form and stored in the knowledge Base (KB), when needed in the future it will be retrieved as it was sored without change (Memorization). • Learning by Instruction: requires the knowledge to be transformed into an operational form before it can be integrated into the knowledge base. β¦ Example are students in a class where the teacher presents a number of facts. The basic transformations performed by a learner are selection and reformation of information provided by the teacher. 9/13/2022 INTRODUCTION TO AI 7 Fundamental Strategies of Learning • Learning by Deduction: is carried out through a sequence of deductive inference steps using known facts. β¦ Example: if we have the sentence in our KB (i.e. a rule), saying "a father's father is a grand father," and we have the case that π is the father of π and π is the father of π , then we can deduce that π is the grand father of π . • Learning by Analogy: provides learning of new concepts through the use of similar known concepts or solutions. β¦ Example: solving exam questions where based on the solution of somewhat similar solutions we have already done at home or in class we attempt to solve the new problem. 9/13/2022 INTRODUCTION TO AI 8 Fundamental Strategies of Learning • Learning by Induction (or Similarity): is often observed after the learner experiences a number of instances or examples regarding the same problem and formulating a general concept. β¦ Example: Learning that those students who complete their homework by themselves and attend all classes in time get more Cumulative Percentile Index (CPI) is an example of Inductive learning. • Reinforcement Learning: is the study of decision making over time with consequences. β¦ Example: Whenever, a teacher appreciates the student's efforts by saying "very good", for asking an interesting question, or answering a question, it is reinforcement-based learning for the student. 9/13/2022 INTRODUCTION TO AI 9 Machine Learning Paradigms • Machine Learning can be divided into the following classes: β¦ Supervised Ground Truth ο§ when learning is achieved while there is a teacher present for learning to take place ο§ There is ground truth available for the given training data ο§ Algorithm learns relationship between features and labels during training ο§ The algorithm iteratively makes predictions on the training data and is corrected by the teacher 9/13/2022 INTRODUCTION TO AI The real (true) label(s), value(s) or class(es) associated with the given data. Examples: • Given Ahmed’s face picture, the ground truth label could be his Name and/or ID# 10 Machine Learning Paradigms • Machine Learning can be divided into the following classes: β¦ Unsupervised Ground Truth ο§ Ground truth is not available ο§ Learning is achieved without a teacher ο§ Algorithm learns patterns* or groupings in the data during training The real (true) label(s), value(s) or class(es) associated with the given data. Examples: • Given Ahmed’s face picture, the ground truth label could be his Name and/or ID# ο§ Example: clustering methods such as π means. 9/13/2022 INTRODUCTION TO AI 11 Machine Learning Paradigms β¦ Reinforcement ο§ “learning what to do—how to map situations to actions— so as to maximize a numerical reward signal.” [Sutton & Barto]. β¦ Semi-Supervised ο§ The ground truth data may be scarce or partially available β¦ Self-Supervised Ground Truth The real (true) label(s), value(s) or class(es) associated with the given data. Examples: • Given Ahmed’s face picture, the ground truth label could be his Name and/or ID# ο§ Ground truth data not available, but pseudo-labels are generated to conduct the learning process ο§ “Obtains supervisory signals from the data itself, often leveraging the underlying structure in the data” [Meta AI] 9/13/2022 INTRODUCTION TO AI 12 Machine Learning Paradigms β¦ Self-Supervised ο§ Examples: Ground Truth β¦ “we can hide part of a sentence and predict the hidden words from the remaining words.” β¦ “We can also predict past or future frames in a video (hidden data) from current ones (observed data).” 9/13/2022 INTRODUCTION TO AI The real (true) label(s), value(s) or class(es) associated with the given data. Examples: • Given Ahmed’s face picture, the ground truth label could be his Name and/or ID# 13 Typical Problems for ML? Classification: Predict a class label for an input INTRODUCTION TO AI 14 Supervised vs. Unsupervised • Supervised • Unsupervised β¦ Example: distinguishing between two plant types 9/13/2022 β¦ Example: grouping data into classes INTRODUCTION TO AI 15 Supervised vs. Unsupervised – Examples • You’re running a real-estate company, and you want to develop ML algorithms to address each of two problems: β¦ Problem 1: You have a large database of housing prices for houses of different sizes. You want to predict the prices of houses given their sizes. β¦ Problem 2: You’d like the software to examine specific houses in your database, and put each house decide in one of two categories; highly desirable or undesirable. • Should you treat these as classification or as regression problems? 9/13/2022 INTRODUCTION TO AI 16 Supervised vs. Unsupervised – Examples • Of the following examples, which would you address using an unsupervised ML algorithm? (Check all that apply.) ο±Given an email labeled as spam/not spam, learn a spam filter. ο±Given a set of news articles found on the web, group them into set of articles about the same story. ο±Given a dataset of patients diagnosed as either having heart disease or not, learn to classify new patients as having heart disease or not. ο±Given a database of university students, automatically discover sport skills and group students into different sport segments. 9/13/2022 INTRODUCTION TO AI 17 ML Training and Evaluation • Cross Validation • Underfitting and Overfitting 9/13/2022 INTRODUCTION TO AI 18 ML Training and Evaluation – Cross Validation • In any Machine Learning problem, we are given a set of data with labels that tell us what this data means to the expert of the field. β¦ Example: Lets assume that we have collected data about heart disease from a large sample that cover all possible causes of having the disease. Furthermore, assume that we represent the entire collected data by the blue bar below where each dot in the bar represents the data set collected from one person. 9/13/2022 INTRODUCTION TO AI 19 ML Training and Evaluation – Cross Validation • In Machine Learning we need to do two things with this data: 1. Estimate the parameters of the machine learning method, i.e. use it to guess the shape of the curve that best fits the data if a 2D estimator is used. β¦ In Machine Learning, parameters estimation is called Training the algorithm. 2. Evaluate how well do the learned parameters work, i.e. we need to test how good a job will the estimated curve do when we present it with data it has never seen before. β¦ In Machine Learning, evaluating a method is called Testing the algorithm. 9/13/2022 INTRODUCTION TO AI 20 ML Training and Evaluation – Cross Validation • Therefore in Machine learning: β¦ We need the data to train the machine learning algorithm. β¦ We need to test the trained ML model on data it hasn’t seen in training, to make sure that the algorithm performs well. • Question: where can we get this data? β¦ Using the same data for training and testing does not work since we do not know how the algorithm performs when it is given a set of data it has not been trained on. β¦ Using all the data for training will not leave any data for testing 9/13/2022 INTRODUCTION TO AI 21 ML Training and Evaluation – Cross Validation • Answer: we use the labeled data provided by an expert. In the example of the heart disease we will divide the collected data into a training set and testing set. β¦ A common practice in Machine Learning is to use 75% of the data for training and 25% of data for testing. β¦ The Question is which 25% to choose for testing and which 75% to choose for training? 9/13/2022 INTRODUCTION TO AI 22 ML Training and Evaluation – Cross Validation • We use cross validation method : β¦ divide the data into a number of subsets β¦ In each fold, assign some (different) subsets for training and leave the rest for testing • Example: a Four-Fold cross validation: the data is divided into FOUR equal sets. For the heart disease example they are shown below: 9/13/2022 INTRODUCTION TO AI 23 ML Training and Evaluation – Cross Validation • We then train the Machine Learning Algorithm and test the result as follows: β¦ Sets 1,2,3 training and Set 4 for testing β¦ Sets 1,2,4 training and Set 3 for testing β¦ Sets 2,3,4 training and set 1 for testing β¦ ...etc (all possible combinations) • At the end, we find which testing sets results with the least error and we choose the respective trained model as the final one. • In practice, its is common to use 10-fold Cross validation where the data is divided into 10 sets and all combinations are then tested to yield the winner training set. 9/13/2022 INTRODUCTION TO AI 24 ML Training and Evaluation – Underfitting and Overfitting • Suppose we have the dataset as shown below β¦ Data is labeled, red circles and blue circles • How can we train and obtain the best classifier? 9/13/2022 INTRODUCTION TO AI 25 ML Training and Evaluation – Underfitting and Overfitting • Idea 1: Let us try a linear classifier represented by a straight line: β¦ As can be seen that there are many blue points above the line that are misclassified β¦ No matter how we rotate or shift the line, we will always have high misclassification rate β¦ This is known as Underfitting 9/13/2022 INTRODUCTION TO AI Over simplifies the complexity in the data 26 ML Training and Evaluation – Underfitting and Overfitting • Idea 2: β¦ Lets use a curve that best can separate the red from the blue classes β¦ Let us divide our data into training and testing as shown below |Training | Testing| |---------|--------| | 9/13/2022 INTRODUCTION TO AI 27 ML Training and Evaluation – Underfitting and Overfitting • Idea 2: β¦ We can find the "wavey" curve that best fits all the points in the training set as shown below: Fits the training data very well 9/13/2022 INTRODUCTION TO AI 28 ML Training and Evaluation – Underfitting and Overfitting • Idea 2: β¦ Now if we use the curve to test with data we get the following: Does not do well with the testing data β¦ As can be seen that many test points are not classified correctly. β¦ This is what we call Overfitting 9/13/2022 INTRODUCTION TO AI 29 ML Training and Evaluation – Underfitting and Overfitting • Idea 3: β¦ Allow for some misclassification and we can get: β¦ This curve does not over fit nor under fit β¦ There are some misclassifications but we can live with this error 9/13/2022 INTRODUCTION TO AI 30 Supervised Learning - Classification • Classification predicts the classes (categories) to which the given examples would belong and then classifies the examples in those categories. • It assumes the following: β¦ Existence of some teacher (environment), β¦ A fitness function to measure the fitness of an example for a class, and β¦ Some external method of classifying the training instances. • A classifier typically learns with the help of a training set containing examples in which any given target word has been manually tagged with the sense from the sense inventory of some reference dictionary. 9/13/2022 INTRODUCTION TO AI 31 Supervised Learning - Classification • An example of supervised learning for text classification is a classifier which picks one word at a time, performs its Word Sense Disambiguation, and then performs a classification task in order to assign the appropriate sense to each instance of the word. β¦ Disadvantage: the learning process requires some intervention from the user. β¦ pre-process and tag the examples occurring in the training set, or β¦ online process and dynamically tag examples as needed during the learning process. 9/13/2022 INTRODUCTION TO AI 32 Supervised Learning - Classification • Examples of Supervised Learning algorithms: β¦ π Nearest Neighbor (π -NN) β¦ Support Vector Machines (SVM) 9/13/2022 INTRODUCTION TO AI 33 K-Nearest Neighbor (k-NN) • Uses π closest points (nearest neighbors) for performing classification • π -Nearest Neighbor algorithms classify a new example by comparing it to all previously seen examples. • The classifications of the π most similar previous cases are used for predicting the classification of the current example. 9/13/2022 INTRODUCTION TO AI 34 K-Nearest Neighbor (k-NN) • The training examples are used for β¦ providing a library of sample cases β¦ re-scaling the similarity function to maximize performance 9/13/2022 INTRODUCTION TO AI 35 K-Nearest Neighbor Algorithm 1. Calculate the distance between a test point and every training instance. 2. Pick the π closest (nearest) training examples and assign the test instance to the most common category amongst these nearest neighbors. β¦ Voting multiple neighbors (π) helps decrease susceptibility to noise. β¦ Usually use odd value for π to avoid ties. 9/13/2022 INTRODUCTION TO AI 36 K-Nearest Neighbor (k-NN) – Examples • Example: Given training data (shown as solid circles) to classify two different attributes, a new point (hollow circle) has to be classified. β¦ It is assigned the most frequent label of its π nearest neighbors as shown below β¦ Note that changing π may lead to different classification 9/13/2022 INTRODUCTION TO AI 37 K-Nearest Neighbor: Distance Metrics • π-NN methods assume a function for determining the similarity or distance between any two instances. • Euclidean distance is the generic choice. • Considering two patterns and • The Euclidean distance between them is given by: where m is the number of dimensions 9/13/2022 INTRODUCTION TO AI 38 K-Nearest Neighbor: Distance Metrics • Example: Find the distance between π₯=(3,5) and π§=(1,2). β¦ The Euclidean distance in 2-dimensions is • Euclidean distance in higher dimension β¦ Example: Find the Euclidean distance between 784-dimensional vectors x; z? 9/13/2022 INTRODUCTION TO AI 39 K-Nearest Neighbor: Distance Metrics • For 1-NN, we can identify surfaces where if a point lies within a specific surface, it will be classified to its nearest neighbor. • In the following example the dots represents the nodes we will be using for classification and the X is the new point that needs to be classified. • Different distance metrics can change the decision surface as shown on the right where the used metric is shown under each picture 9/13/2022 INTRODUCTION TO AI 40 K-Nearest Neighbor: Examples • Example: Given the following data of a diseased patient, find the survivability of the new patient (shown in green) using: β¦ 1-NN β¦ 3-NN β¦ 48-NN • As we can see the 1-NN classifies the survivability to "Did Not Survive" while 3-NN classifies it to "Survived." ; underfitting • Selecting 48-NN will compare with average and so on. 9/13/2022 INTRODUCTION TO AI 41 K-Nearest Neighbor: Overfitting and Underfitting • Based on the dataset that you may have, the value of π determines if you are facing under or over fitting • In this example: β¦ If the value of π=1, the result may be considered as underfitting since outliers may misclassify some new points especially if they are close to the outlier. β¦ If the value of π is set to be too large then you may have overfitting. 9/13/2022 INTRODUCTION TO AI 42 K-Nearest Neighbor: Overfitting and Underfitting • When π is equal to the number of samples in the data, π -NN becomes an average comparator. • Good value of π maybe 3, 4, 5, 6, and 7. 9/13/2022 INTRODUCTION TO AI 43 K-Nearest Neighbor: Overfitting and Underfitting • Conclusion β¦ The selection of the value of π is crucial to ensure that the classifier works correctly and as per our needs. 9/13/2022 INTRODUCTION TO AI 44 K-Nearest Neighbor: Efficiency • Very efficient in training β¦ Only store the training data • Not so efficient in testing β¦ Computation of distance measure to every training example β¦ Much more expensive than, e.g., rule learning • Simplest way of finding nearest neighbor: β¦ Linear scan of the data β¦ Classification takes time proportional to the product of the number of instances in training and test sets 9/13/2022 INTRODUCTION TO AI 45 K-Nearest Neighbor: Pros and Cons • Pros β¦ It is extremely easy to implement β¦ Requires no training prior to making real time predictions. β¦ This makes the π-NN algorithm much faster than other algorithms that require training, e.g SVM, linear regression, etc. β¦ Since the algorithm requires no training before making predictions, new data can be added seamlessly. β¦ There are only two parameters required to implement π-NN, the value of π and the distance function (e.g. Euclidean or Manhattan etc.) 9/13/2022 INTRODUCTION TO AI 46 K-Nearest Neighbor: Pros and Cons • Cons β¦ The π-NN algorithm doesn't work well with high dimensional data because with large number of dimensions, it becomes difficult for the algorithm to calculate distance in each dimension. β¦ The π-NN algorithm doesn't work well with categorical features since it is difficult to find the distance between dimensions with categorical features. 9/13/2022 INTRODUCTION TO AI 47 K-Nearest Neighbor: Implementation • On Jupyter Notebook: kNN-Implementation.ipynb 9/13/2022 INTRODUCTION TO AI 48 K-Nearest Neighbor: Evaluating the Algorithm • For evaluating an algorithm, confusion matrix, precision, recall and f1 score are the most commonly used metrics. β¦ Confusion Matrix: is a matrix in which (m,n)th element is the number of examples of the mth class which were labeled, by the classifier, as belonging to the nth class β¦ For the k-nn example we have Iris-setosa Iris-setosa 9/13/2022 Iris-versicolor Iris-virginica 12 Iris-versicolor 11 Iris-virginica 1 INTRODUCTION TO AI 6 49 K-Nearest Neighbor: Evaluating the Algorithm • Accuracy: Overall performance of model • Precision: How accurate the positive predictions are • Recall: Coverage of actual positive sample 9/13/2022 INTRODUCTION TO AI 50 Confusion Matrix for a Binary Classifier • Suppose that the correct label is either 0 or 1. Then the confusion matrix is just 2x2 • In this box, you would write the # examples of class 1 that were misclassified as class 0 Correct Label: Classified As: 9/13/2022 0 1 0 1 INTRODUCTION TO AI 51 False Positives & False Negatives • TP (True Positives) = examples that were correctly labeled as “1” • FN (False Negatives) = examples that should have been “1”, but were labeled as “0” • FP (False Positives) = examples that should have been “0”, but were labeled as “1” labeled as “0” 9/13/2022 INTRODUCTION TO AI Correct Label: • TN (True Negative) = examples that were correctly Classified As: 0 1 0 1 TN FP FN TP 52 False Positives & False Negatives 9/13/2022 INTRODUCTION TO AI 53