Machine Learning Fundamentals: Algorithms & Concepts

a) To calculate the summed input of the hidden layer neuron, we perform the dot product of the inputs and the weights. Summed input = (i1 * w1) + (i2 * w2) + (i3 * w3) Summed input = (5 * 0.1) + (3.2 * 0.2) + (0.1 * 0.3) Summed input = 0.5 + 0.64 + 0.03 Summed input = 1.17 b) In supervised learning, the model learns from labeled training data, and it applies this learning to predict outcomes for unseen data. Some examples of supervised learning algorithms include regression (Linear and Logistic), Support Vector Machines (SVM), Decision Trees, and Random Forest. On the contrary, unsupervised learning involves learning from unlabeled data. The model tries to identify patterns and structures within the data. Examples of unsupervised learning algorithms include clustering methods (K-Means, Hierarchical Clustering), and dimensionality reduction methods (PCA, tSNE). c) Here's a comparative table: AI Machine Learning Deep Learning Definition Broad field aimed at making machines mimic human intelligence. Subset of AI, uses statistical methods to enable machines to improve tasks with experience. Subset of ML, uses neural networks with many layers ("deep" structures). Data Dependency Can be rule-based and doesn't always require data. Requires moderate amounts of data. Requires large amounts of data. Often considered a black box, Can be easily interpreted Moderately interpretable, depends interpretations are generally Interpretability depending on the system. on the algorithm. difficult. Use Cases Expert Systems, Speech Recognition, Image Recognition. Spam detection, Google search algorithms, Credit scoring. Image Recognition, Speech Recognition, Natural Language Processing. d) Precision, sensitivity, and accuracy are defined as follows: Precision: Proportion of true positives over total positive predictions (true positives + false positives). Sensitivity (also known as recall): Proportion of true positives over actual positives (true positives + false negatives). Accuracy: Proportion of correct predictions over total predictions (true positives + true negatives) / (true positives + true negatives + false positives + false negatives). Given in your case: True Positives (TP): 20 False Negatives (FN): 12 (from the 32 tumor patients, 12 were not detected) True Negatives (TN): 60 False Positives (FP): 8 (from the 68 non-tumor patients, 8 were incorrectly detected as tumor) Therefore, calculations are: Precision = TP / (TP + FP) = 20 / (20 + 8) = 0.714 (or 71.4%) Sensitivity = TP / (TP + FN) = 20 / (20 + 12) = 0.625 (or 62.5%) Accuracy = (TP + TN) / (TP + TN + FP + FN) = (20 + 60) / (20 + 60 + 8 + 12) = 0.8 (or 80%) 2) Absolutely, let's break down each step. Step 1: Calculate Prior Probabilities This is the initial probability we have about the event before any new evidence is considered. Here, it's the proportion of cars that were stolen (Yes) and not stolen (No) in the total sample. P(Yes) = Count(Yes) / Total = 5 / 10 = 0.5. This is saying that in our dataset, half the cars were stolen. P(No) = Count(No) / Total = 5 / 10 = 0.5. This is saying that in our dataset, half the cars were not stolen. Step 2: Calculate Conditional Probabilities This is the probability of an event given that another event has occurred. Here we're calculating the probability of each feature (color, type, origin) given the class (stolen, not stolen). For Stolen = Yes: P(Red|Yes) = Count(Red and Yes) / Count(Yes) = 3 / 5 = 0.6. This is saying that of the cars that were stolen, 60% were Red. P(SUV|Yes) = Count(SUV and Yes) / Count(Yes) = 1 / 5 = 0.2. Of the cars that were stolen, 20% were SUVs. P(Domestic|Yes) = Count(Domestic and Yes) / Count(Yes) = 2 / 5 = 0.4. Of the cars that were stolen, 40% were Domestic. For Stolen = No: P(Red|No) = Count(Red and No) / Count(No) = 1 / 5 = 0.2. Of the cars that were not stolen, 20% were Red. P(SUV|No) = Count(SUV and No) / Count(No) = 2 / 5 = 0.4. Of the cars that were not stolen, 40% were SUVs. P(Domestic|No) = Count(Domestic and No) / Count(No) = 3 / 5 = 0.6. Of the cars that were not stolen, 60% were Domestic. Step 3: Apply Naive Bayes theorem Naive Bayes theorem is used to calculate the posterior probability, which is the probability of the event after the new evidence is taken into account. Here we're calculating the probability of the car being stolen or not, given the features (Red, SUV, Domestic). P(Yes|Red,SUV,Domestic) = P(Red|Yes) * P(SUV|Yes) * P(Domestic|Yes) * P(Yes) = 0.6 * 0.2 * 0.4 * 0.5 = 0.024. This is saying that given a Red, SUV, Domestic car, there's a 0.024 probability that it was stolen based on our model. P(No|Red,SUV,Domestic) = P(Red|No) * P(SUV|No) * P(Domestic|No) * P(No) = 0.2 * 0.4 * 0.6 * 0.5 = 0.024. This is saying that given a Red, SUV, Domestic car, there's a 0.024 probability that it wasn't stolen based on our model. Since both probabilities are equal, we cannot confidently predict whether the car would be stolen or not based on our current data. 3) The Entropy of a dataset is a measure of the amount of uncertainty or disorder in the data. We'll need to compute the entropy for the entire dataset first and then for each feature (A and B). After that, we can calculate the information gain which is the decrease in entropy after splitting on a particular feature. Let's proceed. Step 1: Compute the entropy of the entire dataset The formula for entropy is: -p(+)*log2(p(+)) - p(-)*log2(p(-)), where p(+) is the probability of the positive class (True) and p(-) is the probability of the negative class (False). In our case, the dataset has 4 instances with one positive case (True) and three negative cases (False). Therefore, p(+) = 1 / 4 = 0.25 and p(-) = 3 / 4 = 0.75. The entropy of the dataset (E(Dataset)) is: E(Dataset) = -0.25 * log2(0.25) - 0.75 * log2(0.75) = 0.5 + 0.311 = 0.811. Step 2: Compute the entropy of each feature Let's start with Feature A. We have two values, False and True. We'll calculate the entropy for each: Entropy for A = False: Here, both instances of A = False result in 'False' for 'A and B', so the entropy is 0 (there's no uncertainty). Entropy for A = True: Here, we have one positive case and one negative case, so the entropy is 1 (maximum uncertainty). The entropy of a feature is the weighted average of the entropies of all its values, with the weights being the proportions of instances with each value. Therefore, E(A) = [2/4 * E(A=False) + 2/4 * E(A=True)] = 0.5. We do the same for Feature B: Entropy for B = False is 0 (no uncertainty, as both instances of B = False result in 'False' for 'A and B'). Entropy for B = True is also 0 (no uncertainty, as we have one positive case and one negative case). Therefore, E(B) = 0. Step 3: Compute Information Gain The Information Gain is the decrease in entropy caused by partitioning the examples according to a feature. It's computed as the entropy of the dataset minus the entropy of the feature. Information Gain for A = E(Dataset) - E(A) = 0.811 - 0.5 = 0.311. Information Gain for B = E(Dataset) - E(B) = 0.811 - 0 = 0.811. Step 4: Construct Decision Tree The decision tree starts with the feature that provides the highest information gain, which in our case is B. Given this, the decision tree would look like: 4) Absolutely, converting data into a suitable format for machine learning models is a crucial step known as preprocessing. 1. Images: Image data is typically high-dimensional. Each pixel in an image is a feature and a separate channel is used for each color (red, green, and blue) in color images. Thus, an image can be converted into a numerical matrix, where each element of the matrix represents pixel intensity. For instance, a 64x64 color image will be represented as a 64x64x3 tensor (3D matrix). In addition, image data is often normalized by dividing by 255 (the maximum pixel value), which results in the pixel values being in the range [0,1]. 2. Audio: Audio files can be converted into spectrograms, which are visual representations of the spectrum of frequencies of a signal as it varies with time. Spectrograms are 2D images that can be treated in the same way as any other image. Another common way to represent audio data is using Mel-Frequency Cepstral Coefficients (MFCCs), which are a type of feature widely used in automatic speech and speaker recognition. 3. Text: Text must be converted into some form of numerical representation for use in machine learning models. There are several methods for doing this: • One-Hot Encoding: Here, each word in the vocabulary is represented by a vector of length equal to the vocabulary size, with a 1 in the position corresponding to the word in the vocabulary, and 0s elsewhere. • Bag-of-Words (BoW): Here, the text is represented as a 'bag' (multiset) of its words, disregarding grammar and even word order but keeping track of frequency. • TF-IDF: This is similar to BoW, but here words are given weights that represent their importance in the document. • Word Embeddings: This is a more advanced technique where words are represented by dense word vectors (also known as embeddings) that are learned in a way that takes into account the context and semantics of the word. Examples of methods that learn such embeddings include Word2Vec and GloVe. Remember that depending on the specific task and the nature of the data, additional preprocessing steps such as noise removal, dimensionality reduction, or data augmentation might be necessary. 5) A Convolutional Neural Network (CNN) is a type of artificial neural network designed to process data with grid-like topology, such as an image. It's exceptionally good at spatial hierarchies, making it highly effective for tasks like image recognition. The architecture of a CNN varies with different tasks. Generally, it consists of an input layer, multiple hidden layers, and an output layer. Hidden layers typically consist of a series of convolutional layers, ReLU (Rectified Linear Unit) layers for introducing nonlinearity, pooling layers for down-sampling, and fully connected layers for classification. The number of these layers can vary greatly depending on the specific architecture and task. A typical Convolution Neural Network might include the following layers: Input Layer: Used for inputting the image into the system. Convolution Layer: Apply various filters to the image to create a feature map. ReLU Layer: It introduces non-linearity to the system. It replaces all negative pixel values in the feature map with zero. Pooling Layer: It reduces the dimensionality of the feature map while retaining important information. Fully Connected Layer: After several convolutional, ReLU, and pooling layers, the high-level reasoning in the neural network is done via fully connected layers. For the 5x5 input matrix and a 3x3 filter, we perform a convolution operation. In the convolution, we slide the filter over the image and for each position, calculate elementwise multiplication and add them up. Here is the convolution operation: The calculation for the first element (top-left corner) is (11) + (10) + (11) + (00) + (11) + (10) + (01) + (00) + (1*1) = 2. Remember that the output size of the convolution operation depends on the stride (how much you move the filter each time) and padding (adding extra pixels around the input image). In this case, a stride of 1 and no padding were assumed.

Machine Learning Fundamentals: Algorithms & Concepts

Related documents

Products

Support

Machine Learning Fundamentals: Algorithms & Concepts

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib