1. What are the issues in Machine Learning? Ans. "Machine Learning" is one of the most popular technologies among all data scientists and machine learning enthusiasts. It is the most effective Artificial Intelligence technology that helps create automated learning systems to take future decisions without being constantly programmed. It can be considered an algorithm that automatically constructs various computer software using past experience and training data. It can be seen in every industry, such as healthcare, education, finance, automobile, marketing, shipping, infrastructure, automation, etc. Almost all big companies like Amazon, Facebook, Google, Adobe, etc., are using various machine learning techniques to grow their businesses. But everything in this world has bright as well as dark sides. Similarly, Machine Learning offers great opportunities, but some issues need to be solved. Although machine learning is being used in every industry and helps organizations make more informed and data-driven choices that are more effective than classical methodologies, it still has so many problems that cannot be ignored. Here are some common issues in Machine Learning that professionals face to inculcate ML skills and create an application from scratch. 1. Inadequate Training Data The major issue that comes while using machine learning algorithms is the lack of quality as well as quantity of data. Although data plays a vital role in the processing of machine learning algorithms, many data scientists claim that inadequate data, noisy data, and unclean data are extremely exhausting the machine learning algorithms. For example, a simple task requires thousands of sample data, and an advanced task such as speech or image recognition needs millions of sample data examples. Further, data quality is also important for the algorithms to work ideally, but the absence of data quality is also found in Machine Learning applications. Data quality can be affected by some factors as follows: Noisy Data- It is responsible for an inaccurate prediction that affects the decision as well as accuracy in classification tasks. Incorrect data- It is also responsible for faulty programming and results obtained in machine learning models. Hence, incorrect data may affect the accuracy of the results also. Generalizing of output data- Sometimes, it is also found that generalizing output data becomes complex, which results in comparatively poor future actions. 2. Poor quality of data As we have discussed above, data plays a significant role in machine learning, and it must be of good quality as well. Noisy data, incomplete data, inaccurate data, and unclean data lead to less accuracy in classification and low-quality results. Hence, data quality can also be considered as a major common problem while processing machine learning algorithms. 3. Non-representative training data To make sure our training model is generalized well or not, we have to ensure that sample training data must be representative of new cases that we need to generalize. The training data must cover all cases that are already occurred as well as occurring. Further, if we are using non-representative training data in the model, it results in less accurate predictions. A machine learning model is said to be ideal if it predicts well for generalized cases and provides accurate decisions. If there is less training data, then there will be a sampling noise in the model, called the non-representative training set. It won't be accurate in predictions. To overcome this, it will be biased against one class or a group. Hence, we should use representative data in training to protect against being biased and make accurate predictions without any drift. 4. Overfitting and Underfitting Overfitting: Overfitting is one of the most common issues faced by Machine Learning engineers and data scientists. Whenever a machine learning model is trained with a huge amount of data, it starts capturing noise and inaccurate data into the training data set. It negatively affects the performance of the model. Let's understand with a simple example where we have a few training data sets such as 1000 mangoes, 1000 apples, 1000 bananas, and 5000 papayas. Then there is a considerable probability of identification of an apple as papaya because we have a massive amount of biased data in the training data set; hence prediction got negatively affected. The main reason behind overfitting is using non-linear methods used in machine learning algorithms as they build non-realistic data models. We can overcome overfitting by using linear and parametric algorithms in the machine learning models. Methods to reduce overfitting: Increase training data in a dataset. Reduce model complexity by simplifying the model by selecting one with fewer parameters Ridge Regularization and Lasso Regularization Early stopping during the training phase Reduce the noise Reduce the number of attributes in training data. Constraining the model. Underfitting: Underfitting is just the opposite of overfitting. Whenever a machine learning model is trained with fewer amounts of data, and as a result, it provides incomplete and inaccurate data and destroys the accuracy of the machine learning model. Underfitting occurs when our model is too simple to understand the base structure of the data, just like an undersized pant. This generally happens when we have limited data into the data set, and we try to build a linear model with non-linear data. In such scenarios, the complexity of the model destroys, and rules of the machine learning model become too easy to be applied on this data set, and the model starts doing wrong predictions as well. Methods to reduce Underfitting: Increase model complexity Remove noise from the data Trained on increased and better features Reduce the constraints Increase the number of epochs to get better results. 5. Monitoring and maintenance As we know that generalized output data is mandatory for any machine learning model; hence, regular monitoring and maintenance become compulsory for the same. Different results for different actions require data change; hence editing of codes as well as resources for monitoring them also become necessary. 6. Getting bad recommendations A machine learning model operates under a specific context which results in bad recommendations and concept drift in the model. Let's understand with an example where at a specific time customer is looking for some gadgets, but now customer requirement changed over time but still machine learning model showing same recommendations to the customer while customer expectation has been changed. This incident is called a Data Drift. It generally occurs when new data is introduced or interpretation of data changes. However, we can overcome this by regularly updating and monitoring data according to the expectations. 7. Lack of skilled resources Although Machine Learning and Artificial Intelligence are continuously growing in the market, still these industries are fresher in comparison to others. The absence of skilled resources in the form of manpower is also an issue. Hence, we need manpower having indepth knowledge of mathematics, science, and technologies for developing and managing scientific substances for machine learning. 8. Customer Segmentation Customer segmentation is also an important issue while developing a machine learning algorithm. To identify the customers who paid for the recommendations shown by the model and who don't even check them. Hence, an algorithm is necessary to recognize the customer behaviour and trigger a relevant recommendation for the user based on past experience. 9. Process Complexity of Machine Learning The machine learning process is very complex, which is also another major issue faced by machine learning engineers and data scientists. However, Machine Learning and Artificial Intelligence are very new technologies but are still in an experimental phase and continuously being changing over time. There is the majority of hits and trial experiments; hence the probability of error is higher than expected. Further, it also includes analysing the data, removing data bias, training data, applying complex mathematical calculations, etc., making the procedure more complicated and quite tedious. 10. Data Bias Data Biasing is also found a big challenge in Machine Learning. These errors exist when certain elements of the dataset are heavily weighted or need more importance than others. Biased data leads to inaccurate results, skewed outcomes, and other analytical errors. However, we can resolve this error by determining where data is actually biased in the dataset. Further, take necessary steps to reduce it. Methods to remove Data Bias: Research more for customer segmentation. Be aware of your general use cases and potential outliers. Combine inputs from multiple sources to ensure data diversity. Include bias testing in the development process. Analyze data regularly and keep tracking errors to resolve them easily. Review the collected and annotated data. Use multi-pass annotation such as sentiment analysis, content moderation, and intent recognition. 11. Lack of Explainability This basically means the outputs cannot be easily comprehended as it is programmed in specific ways to deliver for certain conditions. Hence, a lack of explainability is also found in machine learning algorithms which reduce the credibility of the algorithms. 12. Slow implementations and results This issue is also very commonly seen in machine learning models. However, machine learning models are highly efficient in producing accurate results but are time-consuming. Slow programming, excessive requirements' and overloaded data take more time to provide accurate results than expected. This needs continuous maintenance and monitoring of the model for delivering accurate results. 13. Irrelevant features Although machine learning models are intended to give the best possible outcome, if we feed garbage data as input, then the result will also be garbage. Hence, we should use relevant features in our training sample. A machine learning model is said to be good if training data has a good set of features or less to no irrelevant features. 2. Explain Regression Line, Scatter Plot, Error in Prediction and Best fitting time. Ans. In machine learning, a regression line is a straight line that represents the relationship between a dependent variable (also called the target or output variable) and one or more independent variables (also called features or input variables). The goal of regression analysis is to find the best-fit line that minimizes the difference between the actual observed values and the values predicted by the line. The equation of a simple linear regression line, which involves one independent variable, is often written as: \[ Y = mX + b \] where: - \( Y \) is the dependent variable (output), - \( X \) is the independent variable (input), - \( m \) is the slope of the line, - \( b \) is the y-intercept. The slope (\( m \)) represents the rate at which the dependent variable changes with respect to changes in the independent variable, and the y-intercept (\( b \)) is the value of the dependent variable when the independent variable is zero. In the case of multiple linear regression, where there are more than one independent variable, the equation becomes: \[ Y = b_0 + b_1X_1 + b_2X_2 + \ldots + b_nX_n \] Here: - \( Y \) is the dependent variable, - \( X_1, X_2, \ldots, X_n \) are the independent variables, - \( b_0 \) is the y-intercept, and - \( b_1, b_2, \ldots, b_n \) are the coefficients for each independent variable. The regression line is determined during the training phase of a machine learning model, where the model learns the optimal values for the coefficients (\( m \), \( b \) in simple linear regression, or \( b_0, b_1, \ldots, b_n \) in multiple linear regression) based on the training data. Once the model is trained, it can be used to make predictions on new data by applying the learned regression equation. Scatter plot A scatter plot is a type of data visualization used in machine learning and statistics to display the relationship between two continuous variables. It is a graphical representation in which individual data points are plotted on a two-dimensional plane, with one variable on the x-axis and the other on the y-axis. Each point on the scatter plot represents a pair of values for the two variables. In machine learning, scatter plots are often used to visually inspect and understand the pattern, trend, or correlation between two variables. The shape and direction of the scatter plot points can provide insights into the nature of the relationship between the variables. Here are some common patterns observed in scatter plots: 1. **Positive Correlation:** Points on the scatter plot tend to form a rising pattern from left to right, indicating a positive correlation between the two variables. This means that as one variable increases, the other variable also tends to increase. 2. **Negative Correlation:** Points on the scatter plot tend to form a descending pattern from left to right, indicating a negative correlation between the two variables. This means that as one variable increases, the other variable tends to decrease. 3. **No Correlation:** Points on the scatter plot are scattered randomly without a clear pattern, indicating no significant correlation between the two variables. Machine learning practitioners often use scatter plots during the exploratory data analysis (EDA) phase to gain insights into the data before building models. Scatter plots can also help identify outliers, clusters, or other interesting patterns in the data. In Python, libraries like Matplotlib or Seaborn are commonly used to create scatter plots. Here's a simple example using Matplotlib. Error in Prediction In machine learning, an error is a measure of how accurately an algorithm can make predictions for the previously unknown dataset. On the basis of these errors, the machine learning model is selected that can perform best on the particular dataset. There are mainly two types of errors in machine learning, which are: Reducible errors: These errors can be reduced to improve the model accuracy. Such errors can further be classified into bias and Variance. Irreducible errors: These errors will always be present in the model regardless of which algorithm has been used. The cause of these errors is unknown variables whose value can't be reduced. What is Bias? In general, a machine learning model analyses the data, find patterns in it and make predictions. While training, the model learns these patterns in the dataset and applies them to test data for prediction. While making predictions, a difference occurs between prediction values made by the model and actual values/expected values, and this difference is known as bias errors or Errors due to bias. It can be defined as an inability of machine learning algorithms such as Linear Regression to capture the true relationship between the data points. Each algorithm begins with some amount of bias because bias occurs from assumptions in the model, which makes the target function simple to learn. What is a Variance Error? The variance would specify the amount of variation in the prediction if the different training data was used. In simple words, variance tells that how much a random variable is different from its expected value. Ideally, a model should not vary too much from one training dataset to another, which means the algorithm should be good in understanding the hidden mapping between inputs and output variables. Variance errors are either of low variance or high variance. Low variance means there is a small variation in the prediction of the target function with changes in the training data set. At the same time, High variance shows a large variation in the prediction of the target function with changes in the training dataset. A model that shows high variance learns a lot and perform well with the training dataset, and does not generalize well with the unseen dataset. As a result, such a model gives good results with the training dataset but shows high error rates on the test dataset. Since, with high variance, the model learns too much from the dataset, it leads to overfitting of the model. A model with high variance has the below problems: A high variance model leads to overfitting. Increase model complexities. Usually, nonlinear algorithms have a lot of flexibility to fit the model, have high variance. Low-Bias, Low-Variance: The combination of low bias and low variance shows an ideal machine learning model. However, it is not possible practically. Low-Bias, High-Variance: With low bias and high variance, model predictions are inconsistent and accurate on average. This case occurs when the model learns with a large number of parameters and hence leads to an overfitting High-Bias, Low-Variance: With High bias and low variance, predictions are consistent but inaccurate on average. This case occurs when a model does not learn well with the training dataset or uses few numbers of the parameter. It leads to underfitting problems in the model. High-Bias, High-Variance: With high bias and high variance, predictions are inconsistent and also inaccurate on average. Bias-Variance Trade-Off While building the machine learning model, it is really important to take care of bias and variance in order to avoid overfitting and underfitting in the model. If the model is very simple with fewer parameters, it may have low variance and high bias. Whereas, if the model has a large number of parameters, it will have high variance and low bias. So, it is required to make a balance between bias and variance errors, and this balance between the bias error and variance error is known as the Bias-Variance trade-off. 3. Describe the essential steps of K-means algorithm for clustering analysis. STEP 1: Let us pick k clusters, i.e., K=2, to separate the dataset and assign it to its appropriate clusters. We will select two random places to function as the cluster’s centroid. STEP 2: Now, each data point will be assigned to a scatter plot depending on its distance from the nearest K-point or centroid. This will be accomplished by establishing a median between both centroids. Consider the following illustration: STEP 3: The points on the line’s left side are close to the blue centroid, while the points on the line’s right side are close to the yellow centroid. The left Form cluster has a blue centroid, whereas the right Form cluster has a yellow centroid. STEP 4: Repeat the procedure, this time selecting a different centroid. To choose the new centroids, we will determine their new center of gravity, which is represented below: STEP 5: After that, we’ll re-assign each data point to its new centroid. We shall repeat the procedure outlined before (using a median line). The blue cluster will contain the yellow data point on the blue side of the median line. STEP 6: Now that reassignment has occurred, we will repeat the previous step of locating new centroids. STEP 7: We will repeat the procedure outlined above for determining the center of gravity of centroids, as shown below. STEP 8: Similar to the previous stages, we will draw the median line and reassign the data points after locating the new centroids. STEP 9: We will finally group points depending on their distance from the median line, ensuring that two distinct groups are established and that no dissimilar points are included in a single group. The final Cluster is as follows: 4. What is SVM? Explain the following terms: hyperplane, supporting hyperplane, margin and support vectors with suitable examples. Support Vector Machine (SVM) is a supervised machine learning algorithm used for both classification and regression. Though we say regression problems as well it’s best suited for classification. The main objective of the SVM algorithm is to find the optimal hyperplane in an N-dimensional space that can separate the data points in different classes in the feature space. The hyperplane tries that the margin between the closest points of different classes should be as maximum as possible. The dimension of the hyperplane depends upon the number of features. If the number of input features is two, then the hyperplane is just a line. If the number of input features is three, then the hyperplane becomes a 2-D plane. It becomes difficult to imagine when the number of features exceeds three. A hyperplane is a supervised learning tool that separates the dataspace into one less dimension for easier linear classification. A separating hyperplane can be defined by two terms: an intercept term called b and a decision hyperplane normal vector called w. These are commonly referred to as the weight vector in machine learning. Here b is used to select the hyperplane i.e perpendicular to the normal vector. Now since all the plane x in the hyperplane should satisfy the following equation: Support vectors are the data points that are close to the decision boundary, they are the data points most difficult to classify, they hold the key for SVM to be optimal decision surface. The optimal hyperplane comes from the function class with the lowest capacity i.e minimum number of independent features/parameters. 5. Explain in detail Temporal Difference Learning. Temporal Difference Learning in reinforcement learning is an unsupervised learning technique that is very commonly used in it for the purpose of predicting the total reward expected over the future. They can, however, be used to predict other quantities as well. It is essentially a way to learn how to predict a quantity that is dependent on the future values of a given signal. It is a method that is used to compute the long-term utility of a pattern of behaviour from a series of intermediate rewards. Essentially, Temporal Difference Learning (TD Learning) focuses on predicting a variable's future value in a sequence of states. Temporal difference learning was a major breakthrough in solving the problem of reward prediction. You could say that it employs a mathematical trick that allows it to replace complicated reasoning with a simple learning procedure that can be used to generate the very same results. The trick is that rather than attempting to calculate the total future reward, temporal difference learning just attempts to predict the combination of immediate reward and its own reward prediction at the next moment in time. Now when the next moment comes and brings fresh information with it, the new prediction is compared with the expected prediction. If these two predictions are different from each other, the Temporal Difference Learning algorithm will calculate how different the predictions are from each other and make use of this temporal difference to adjust the old prediction toward the new prediction. The temporal difference algorithm always aims to bring the expected prediction and the new prediction together, thus matching expectations with reality and gradually increasing the accuracy of the entire chain of prediction. Temporal Difference Learning aims to predict a combination of the immediate reward and its own reward prediction at the next moment in time. In TD Learning, the training signal for a prediction is a future prediction. This method is a combination of the Monte Carlo (MC) method and the Dynamic Programming (DP) method. Monte Carlo methods adjust their estimates only after the final outcome is known, but temporal difference methods tend to adjust predictions to match later, more accurate, predictions for the future, much before the final outcome is clear and know. This is essentially a type of bootstrapping. Temporal difference learning in machine learning got its name from the way it uses changes, or differences, in predictions over successive time steps for the purpose of driving the learning process. The prediction at any particular time step gets updated to bring it nearer to the prediction of the same quantity at the next time step. 6. Create a decision tree for the attribute “class” using the respective values: 7. What are the different Hidden Markov Models? Hidden Markov Model (HMM) is a statistical model that is used to describe the probabilistic relationship between a sequence of observations and a sequence of hidden states. It is often used in situations where the underlying system or process that generates the observations is unknown or hidden, hence it got the name “Hidden Markov Model.” It is used to predict future observations or classify sequences, based on the underlying hidden process that generates the data. An HMM consists of two types of variables: hidden states and observations. The hidden states are the underlying variables that generate the observed data, but they are not directly observable. The observations are the variables that are measured and observed. The relationship between the hidden states and the observations is modeled using a probability distribution. The Hidden Markov Model (HMM) is the relationship between the hidden states and the observations using two sets of probabilities: the transition probabilities and the emission probabilities. The transition probabilities describe the probability of transitioning from one hidden state to another. The emission probabilities describe the probability of observing an output given a hidden state. Modeling observations in these two layers, one visible and the other invisible, is very useful, since many real world problems deal with classifying raw observations into a number of categories, or class labels, that are more meaningful to us. For example, let us consider the speech recognition problem, for which HMMs have been extensively used for several decades [1]. In speech recognition, we are interested in predicting the uttered word from a recorded speech signal. For this purpose, the speech recognizer tries to find the sequence of phonemes (states) that gave rise to the actual uttered sound (observations). Since there can be a large variation in the actual pronunciation, the original phonemes (and ultimately, the uttered word) cannot be directly observed, and need to be predicted. This approach is also useful in modeling biological sequences, such as proteins and DNA sequences. Typically, a biological sequence consists of smaller substructures with different functions, and different functional regions often display distinct statistical properties. For example, it is well-known that proteins generally consist of multiple domains. Given a new protein, it would be interesting to predict the constituting domains (corresponding to one or more states in an HMM) and their locations in the amino acid sequence (observations). Furthermore, we may also want to find the protein family to which this new protein sequence belongs. In fact, HMMs have been shown to be very effective in representing biological sequences, as they have been successfully used for modeling speech signals. As a result, HMMs have become increasingly popular in computational molecular biology, and many state-of-the-art sequence analysis algorithms have been built on HMMs. 8. What is Reinforcement Learning? Explain with the help of an example. Reinforcement learning is an area of Machine Learning. It is about taking suitable action to maximize reward in a particular situation. It is employed by various software and machines to find the best possible behavior or path it should take in a specific situation. Reinforcement learning differs from supervised learning in a way that in supervised learning the training data has the answer key with it so the model is trained with the correct answer itself whereas in reinforcement learning, there is no answer but the reinforcement agent decides what to do to perform the given task. In the absence of a training dataset, it is bound to learn from its experience. Reinforcement Learning (RL) is the science of decision making. It is about learning the optimal behavior in an environment to obtain maximum reward. In RL, the data is accumulated from machine learning systems that use a trial-and-error method. Data is not part of the input that we would find in supervised or unsupervised machine learning. Reinforcement learning uses algorithms that learn from outcomes and decide which action to take next. After each action, the algorithm receives feedback that helps it determine whether the choice it made was correct, neutral or incorrect. It is a good technique to use for automated systems that have to make a lot of small decisions without human guidance. Reinforcement learning is an autonomous, self-teaching system that essentially learns by trial and error. It performs actions with the aim of maximizing rewards, or in other words, it is learning by doing in order to achieve the best outcomes. Example: The problem is as follows: We have an agent and a reward, with many hurdles in between. The agent is supposed to find the best possible path to reach the reward. The following problem explains the problem more easily. The problem is as follows: We have an agent and a reward, with many hurdles in between. The agent is supposed to find the best possible path to reach the reward. The following problem explains the problem more easily. The above image shows the robot, diamond, and fire. The goal of the robot is to get the reward that is the diamond and avoid the hurdles that are fired. The robot learns by trying all the possible paths and then choosing the path which gives him the reward with the least hurdles. Each right step will give the robot a reward and each wrong step will subtract the reward of the robot. The total reward will be calculated when it reaches the final reward that is the diamond. Main points in Reinforcement learning – Input: The input should be an initial state from which the model will start Output: There are many possible outputs as there are a variety of solutions to a particular problem Training: The training is based upon the input, The model will return a state and the user will decide to reward or punish the model based on its output. The model keeps continues to learn. The best solution is decided based on the maximum reward. 9. Apply K-means algorithm on given data for k=3. Use C1(2), C2(16) and C3(38) as initial cluster centres. Data: 2, 4, 6, 3, 31, 12, 15, 16, 38, 35, 14, 21, 23, 25, 30 10. Explain with suitable example the advantages of Bayesian approach over classical approaches to probability. 11. Explain in detail Principal Component Analysis for Dimension Reduction. 12. Find optimal hyperplane for the data points: {(1,1), (2,1), (1,-1), (2,-1), (4,0), (5,1), (5, -1), (6,0)} 13. Short note on Machine Learning Applications 14. Short note on Classification using Back Propogation Algorithm 15. Short note on Issues in Decision Tree 16. What are the key tasks of Machine Learning? 17. What are the key terminologies of Support Vector Machine? 18. Explain in brief Linear Regression Technique. 19. Explain in brief elements of Reinforcement Learning. 20. Explain the steps required for selecting the right machine learning algorithm. 21. For the given data determine the entropy after classification using each attribute for classification separately and find which attribute is cest as decision attribute for the root by finding information gain with respect to entropy of Temperature as reference attribute. 22. Explain in detail Principal Component Analysis for Dimension Reduction 23. 24. Apply Agglomerative clustering algorithm on given data and draw dendogram. Show three clusters with its allocated points. Use single link method. 25. Explain classification using Back Propogation algorithm with a suitable example. 26. Detail notes on quadratic programming solution for finding maximum margin separation in SVM 27. Detail notes on applications of machine learning algorithms 28. Detail notes on hidden Markov model 29. Explain classification using Bayesian Belief Network with an example. 30. Define SVM and further explain the maximum margin linear seperators concept. 31. Explain in detail PCA for Dimension Reduction 32. Explain reinforcement learning in detail along with the various elements involved in forming the concept. Also define what is meant by partially observable state. 33. Detail notes on Hierarchical Clustering algorithms 34. Detail notes on Model Based Learning 35. What is machine learning? Explain how supervised learning is different from unsupervised learning. 36. Explain Bayes theorem. 37. What are the elements of reinforcement learning? 38. Describe two methods of reducing dimensionality 39. The following table shows the midterm and final exam grades obtained for students in a database course. Use the method of least squares using regression to predict the final exam grade of a student who received 86 on the midterm exam. 40. Explain the steps in developing a machine learning application. 41. For a sunburn dataset given below, construct a decision tree. 42. What is SVM? How to compute the margin? 43. Explain PCA Principal Component Analysus to arrive at the transformed matrix for the given matrix A. 1. Explain how Back Propogation algorithm helps in classification 2. For the given set of points identify clusters using complete link and average link using agglomerative clustering. 3. 4. Short note on Temporal difference learning 5. Short note on Logistic regression 6. Short note on Machine learning Applications 7. Define well posed learning problem. Hence define robot driving learning problem. 8. Explain in brief Bayesian Belief networks 9. Write short note on Temporal Difference Learning 10. Explain procedure to construct decision trees. 11. Explain how SVM can be used to find optimal hyperplane to classify linearly seperable data. Give suitable example. 12. Explain procedure to design machine learning system. 13. What is linear regression? Find the best fitted line for following example: 14. What is decision tree? How you will choose best attribute for decision tree classifier? Give suitable example. 15. Explain K-mean clustering algorithm giving suitable example. Also, explain how Kmean clustering differs from hierarchical clustering. 16. What is kernel? How kernel can be used with SVM to classify non-linearly seperable data? Also, list standard kernel functions. 17. What is Q-learning? Explain algorithm for learning Q. 18. Explain following terms with respect to Reinforcement learning: delayed rewards, exploration and partially observable states. 19. Short notes on soft margin SVM 20. Short notes on Radial Basis functions 21. Short notes on Independent Component Analysis 22. Short notes on Logistic Regression 23. Define machine learning? Briefly explain the types of learning. 24. What is independent component analysis? 25. What are the issues in decision tree induction? 26. What are the requirements of clustering algorithms? 27. The values of independent variable x and dependent value y are given below. Find the least square regression line y=ax+b. Estimate the value of y when x is 10. 28. 29. What are the steps in designing a machine learning problem? Explain with the checkers problem. 30. 31. What is the goal of SVM? How to compute the margin? 32. For the given set of points identify clusters using complete link and average link using agglomerative clustering. 33. 34. What is the role of radial basis function in separating nonlinear patterns. 35. Use Principal Component Analysis PCA to arrive at the transformed matrix for the given matrix A. 36. 37. What are the elements of reinforcement learning? 38. Short notes on Logistic regression 39. Short notes on Back propogation algorithm 40. Short notes on issues in machine learning 41. Define Machine Learning and Explain with example importance of Machine Learning 42. Explain Multilayer perceptron with a neat diagram 43. Why is SVM more accurate than logistic regression 44. Explain Radial Basis Function with example 45. What is dimensionality reduction? Describe how principal component analysis is carried out to reduce dimensionality of data sets. 46. Find the singular value decomposition of 47. 48. For a unknown tuple t=<Outlook =Sunny, Temperature =Cool, Wind =Strong>use naïve bayes classifier to find whether the class for PlayTennis is yes or no. 49. The dataset is given below. 50. 51. List some advantages of derivative-based optimization techniques. Explain Steepest Descent method for optimization. 52. Given the following data for the sales of car of an automobile company for six consecutive years. Predict the sales for next two consecutive years 53. Explain various basic evaluation measures of supervised learning Algorithm for Classification. 54. Consider following table for binary classification. Calculate the root of the decision tree using Gini index 55. 56. Define SVM. Explain how margin is computed and optimal hyperplane is decided. 57. Short notes on Hidden Markov Model 58. Short notes on EM Algorithm 59. Short notes on Logistic Regression 60. Short notes on McCulloch-Pitts Neuron Model 61. Short notes on DownHill simplex method. 62. Define Machine Learning (ML) Briefly explain the types of learning 63. “Entropy is a thermodynamic function used to measure the disorder of a system in Chemistry.” How do you suitably clarify the concept of entropy in ML? 64. State the principle of Occam’s Razar. Which ML algorithm uses this principle? 65. Explain Bayesian Belief Network with an example. [5] 66. Q2 a) Use the k-means clustering algorithm and Euclidean distance to cluster the following eight 8 examples into three clusters: A1= (2, 10), A2= (2, 5), A3= (8, 4), A4= (5, 8), A5= (7, 5), A6= (6, 4), A7= 1(1, 2), A8= (4, 9). Find the new centroid at every new point entry into the cluster group. Assume initial cluster centers as A1, A4 and A7. [10] 67. b) Compare and contrast Linear and Logistic regressions with respect to their mechanisms of prediction. [10] 68. Q3 a) Find predicted value of Y for one epoch and RMSE using Linear regression. 69. 70. Find the new revised theta for the given problem using Expectation -Maximization algorithm for one epoch. 71. 72. For the given set of points identify clusters using single linkage and draw the dendrogram with cluster separation line emerging at 1.3. Find how many clusters are formed below the line? [10] n 73. b) Use Principal Component Analysis (PCA) to arrive at the transformed matrix for the given matrix A. A T = 2 1 0 -1 4 3 1 0.5 [10] 74. Q5 a) Find optimal hyper plane for the following points: {(1, 1), (2, 1), (1, -1), (2,-1), (4, 0), (5, 1), (6, 0)} [10] 75. b) The following table consists of training data from an employee database. The data have been generalized. For example, “31 . . . 35” for age represents the age range of 31 to 35. For a given row entry, count represents the number of data tuples having the values for department, status, age, and salary given in that row. Let the status be the class-label attribute. (i) Design a multilayer feedforward neural network for the given data. Label the nodes in the input and output layers. (ii) Using the multilayer feed-forward neural network obtained in (i), show the weight values after one iteration of the back propagation algorithm, given the training instance “(sales, senior, 31 . . . 35, 46K . . . 50K)”. Assume initial weight values and biases. Assume learning rate to be 0.9. Use binary input and draw (one input layer, one output layer and one hidden layer) neural network. Solve the problem for one epoch. 76. 77. Short notes on Machine learning applications 78. Short notes on temporal difference learning 79. Short notes on independent component analysis 80. a. What are the issues in Machine learning? 81. b. Explain Regression line, Scatter plot, Error in prediction and Best fitting line 82. c. Explain the concept of margin and support vector. 83. d. Explain the distance metrics used in clustering. 84. e. Explain Logistic Regression 85. Q2. a. Explain the steps of developing Machine Learning applications. [10] 86. b. Explain Linear regression along with an example. [10] 87. Q3. a. Create a decision tree using Gini Index to classify following dataset. 88. Describe Multiclass classification 89. Explain the Random Forest algorithm in detail. 90. Explain the different ways to combine the classifiers 91. Compute the Linear Discriminant projection for the following two-dimensional dataset. X1 = (x1, x2) = {(4,1), (2,4), (2,3), (3,6), (4,4)} and X2 = (x1, x2) = {(9,10), (6,8), (9,5), (8,7), (10,8)} 92. Explain EM algorithm 93. Detailed note on Performance Metrics for Classification 94. Principal Component Analysis for Dimension Reduction 95. DBSCAN 96. A How to choose the right ML algorithm? 97. B Explain Regression line, Scatter plot, Error in prediction and Best fitting line. 98. C Explain the concept of feature selection and extraction 99. D Explain K-means algorithm. 100. E Explain the concept of Logistic Regression 101. Q2 A Explain any five applications of Machine Learning. [10] 102. B Explain Multivariate Linear regression method. [10] 103. Q3 A Create a decision tree using Gini Index to classify following dataset for profit. 104. 105. 106. 107. 108. 109. 110. 111. B Find SVD for Explain Random Forest algorithm in detail Explain the concept of bagging and boosting Describe Multiclass classification Explain the concept of Expectation Maximization Algorithm Detailed note on Linear Regression Detailed note on Linear Discriminant Analysis for Dimension Reduction Detailed note on DBSCAN