Uploaded by mohsinrk124

solution AI

advertisement
a) To calculate the summed input of the hidden layer neuron, we perform the dot product of the inputs
and the weights.
Summed input = (i1 * w1) + (i2 * w2) + (i3 * w3) Summed input = (5 * 0.1) + (3.2 * 0.2) + (0.1 * 0.3)
Summed input = 0.5 + 0.64 + 0.03 Summed input = 1.17
b) In supervised learning, the model learns from labeled training data, and it applies this learning to
predict outcomes for unseen data. Some examples of supervised learning algorithms include regression
(Linear and Logistic), Support Vector Machines (SVM), Decision Trees, and Random Forest.
On the contrary, unsupervised learning involves learning from unlabeled data. The model tries to
identify patterns and structures within the data. Examples of unsupervised learning algorithms include
clustering methods (K-Means, Hierarchical Clustering), and dimensionality reduction methods (PCA, tSNE).
c) Here's a comparative table:
AI
Machine Learning
Deep Learning
Definition
Broad field aimed at
making machines mimic
human intelligence.
Subset of AI, uses statistical
methods to enable machines to
improve tasks with experience.
Subset of ML, uses neural
networks with many layers
("deep" structures).
Data
Dependency
Can be rule-based and
doesn't always require
data.
Requires moderate amounts of
data.
Requires large amounts of
data.
Often considered a black box,
Can be easily interpreted Moderately interpretable, depends interpretations are generally
Interpretability depending on the system. on the algorithm.
difficult.
Use Cases
Expert Systems, Speech
Recognition, Image
Recognition.
Spam detection, Google search
algorithms, Credit scoring.
Image Recognition, Speech
Recognition, Natural Language
Processing.
d) Precision, sensitivity, and accuracy are defined as follows:
Precision: Proportion of true positives over total positive predictions (true positives + false positives).
Sensitivity (also known as recall): Proportion of true positives over actual positives (true positives + false
negatives).
Accuracy: Proportion of correct predictions over total predictions (true positives + true negatives) / (true
positives + true negatives + false positives + false negatives).
Given in your case:
True Positives (TP): 20
False Negatives (FN): 12 (from the 32 tumor patients, 12 were not detected)
True Negatives (TN): 60
False Positives (FP): 8 (from the 68 non-tumor patients, 8 were incorrectly detected as tumor)
Therefore, calculations are:
Precision = TP / (TP + FP) = 20 / (20 + 8) = 0.714 (or 71.4%)
Sensitivity = TP / (TP + FN) = 20 / (20 + 12) = 0.625 (or 62.5%)
Accuracy = (TP + TN) / (TP + TN + FP + FN) = (20 + 60) / (20 + 60 + 8 + 12) = 0.8 (or 80%)
2)
Absolutely, let's break down each step.
Step 1: Calculate Prior Probabilities
This is the initial probability we have about the event before any new evidence is considered. Here, it's
the proportion of cars that were stolen (Yes) and not stolen (No) in the total sample.
P(Yes) = Count(Yes) / Total = 5 / 10 = 0.5. This is saying that in our dataset, half the cars were stolen.
P(No) = Count(No) / Total = 5 / 10 = 0.5. This is saying that in our dataset, half the cars were not stolen.
Step 2: Calculate Conditional Probabilities
This is the probability of an event given that another event has occurred. Here we're calculating the
probability of each feature (color, type, origin) given the class (stolen, not stolen).
For Stolen = Yes:
P(Red|Yes) = Count(Red and Yes) / Count(Yes) = 3 / 5 = 0.6. This is saying that of the cars that were
stolen, 60% were Red.
P(SUV|Yes) = Count(SUV and Yes) / Count(Yes) = 1 / 5 = 0.2. Of the cars that were stolen, 20% were
SUVs.
P(Domestic|Yes) = Count(Domestic and Yes) / Count(Yes) = 2 / 5 = 0.4. Of the cars that were stolen, 40%
were Domestic.
For Stolen = No:
P(Red|No) = Count(Red and No) / Count(No) = 1 / 5 = 0.2. Of the cars that were not stolen, 20% were
Red.
P(SUV|No) = Count(SUV and No) / Count(No) = 2 / 5 = 0.4. Of the cars that were not stolen, 40% were
SUVs.
P(Domestic|No) = Count(Domestic and No) / Count(No) = 3 / 5 = 0.6. Of the cars that were not stolen,
60% were Domestic.
Step 3: Apply Naive Bayes theorem
Naive Bayes theorem is used to calculate the posterior probability, which is the probability of the event
after the new evidence is taken into account. Here we're calculating the probability of the car being
stolen or not, given the features (Red, SUV, Domestic).
P(Yes|Red,SUV,Domestic) = P(Red|Yes) * P(SUV|Yes) * P(Domestic|Yes) * P(Yes) = 0.6 * 0.2 * 0.4 * 0.5 =
0.024. This is saying that given a Red, SUV, Domestic car, there's a 0.024 probability that it was stolen
based on our model.
P(No|Red,SUV,Domestic) = P(Red|No) * P(SUV|No) * P(Domestic|No) * P(No) = 0.2 * 0.4 * 0.6 * 0.5 =
0.024. This is saying that given a Red, SUV, Domestic car, there's a 0.024 probability that it wasn't stolen
based on our model.
Since both probabilities are equal, we cannot confidently predict whether the car would be stolen or not
based on our current data.
3)
The Entropy of a dataset is a measure of the amount of uncertainty or disorder in the data. We'll need to
compute the entropy for the entire dataset first and then for each feature (A and B). After that, we can
calculate the information gain which is the decrease in entropy after splitting on a particular feature.
Let's proceed.
Step 1: Compute the entropy of the entire dataset
The formula for entropy is: -p(+)*log2(p(+)) - p(-)*log2(p(-)), where p(+) is the probability of the positive
class (True) and p(-) is the probability of the negative class (False).
In our case, the dataset has 4 instances with one positive case (True) and three negative cases (False).
Therefore,
p(+) = 1 / 4 = 0.25 and p(-) = 3 / 4 = 0.75.
The entropy of the dataset (E(Dataset)) is:
E(Dataset) = -0.25 * log2(0.25) - 0.75 * log2(0.75) = 0.5 + 0.311 = 0.811.
Step 2: Compute the entropy of each feature
Let's start with Feature A. We have two values, False and True. We'll calculate the entropy for each:
Entropy for A = False:
Here, both instances of A = False result in 'False' for 'A and B', so the entropy is 0 (there's no
uncertainty).
Entropy for A = True:
Here, we have one positive case and one negative case, so the entropy is 1 (maximum uncertainty).
The entropy of a feature is the weighted average of the entropies of all its values, with the weights being
the proportions of instances with each value. Therefore,
E(A) = [2/4 * E(A=False) + 2/4 * E(A=True)] = 0.5.
We do the same for Feature B:
Entropy for B = False is 0 (no uncertainty, as both instances of B = False result in 'False' for 'A and B').
Entropy for B = True is also 0 (no uncertainty, as we have one positive case and one negative case).
Therefore, E(B) = 0.
Step 3: Compute Information Gain
The Information Gain is the decrease in entropy caused by partitioning the examples according to a
feature. It's computed as the entropy of the dataset minus the entropy of the feature.
Information Gain for A = E(Dataset) - E(A) = 0.811 - 0.5 = 0.311.
Information Gain for B = E(Dataset) - E(B) = 0.811 - 0 = 0.811.
Step 4: Construct Decision Tree
The decision tree starts with the feature that provides the highest information gain, which in our case is
B.
Given this, the decision tree would look like:
4)
Absolutely, converting data into a suitable format for machine learning models is a crucial step
known as preprocessing.
1. Images: Image data is typically high-dimensional. Each pixel in an image is a feature and a
separate channel is used for each color (red, green, and blue) in color images. Thus, an image
can be converted into a numerical matrix, where each element of the matrix represents pixel
intensity. For instance, a 64x64 color image will be represented as a 64x64x3 tensor (3D
matrix). In addition, image data is often normalized by dividing by 255 (the maximum pixel
value), which results in the pixel values being in the range [0,1].
2. Audio: Audio files can be converted into spectrograms, which are visual representations of
the spectrum of frequencies of a signal as it varies with time. Spectrograms are 2D images
that can be treated in the same way as any other image. Another common way to represent
audio data is using Mel-Frequency Cepstral Coefficients (MFCCs), which are a type of feature
widely used in automatic speech and speaker recognition.
3. Text: Text must be converted into some form of numerical representation for use in machine
learning models. There are several methods for doing this:
• One-Hot Encoding: Here, each word in the vocabulary is represented by a vector of
length equal to the vocabulary size, with a 1 in the position corresponding to the
word in the vocabulary, and 0s elsewhere.
• Bag-of-Words (BoW): Here, the text is represented as a 'bag' (multiset) of its words,
disregarding grammar and even word order but keeping track of frequency.
• TF-IDF: This is similar to BoW, but here words are given weights that represent their
importance in the document.
• Word Embeddings: This is a more advanced technique where words are represented
by dense word vectors (also known as embeddings) that are learned in a way that
takes into account the context and semantics of the word. Examples of methods that
learn such embeddings include Word2Vec and GloVe.
Remember that depending on the specific task and the nature of the data, additional preprocessing
steps such as noise removal, dimensionality reduction, or data augmentation might be necessary.
5)
A Convolutional Neural Network (CNN) is a type of artificial neural network designed to process data
with grid-like topology, such as an image. It's exceptionally good at spatial hierarchies, making it highly
effective for tasks like image recognition.
The architecture of a CNN varies with different tasks. Generally, it consists of an input layer, multiple
hidden layers, and an output layer. Hidden layers typically consist of a series of convolutional layers,
ReLU (Rectified Linear Unit) layers for introducing nonlinearity, pooling layers for down-sampling, and
fully connected layers for classification. The number of these layers can vary greatly depending on the
specific architecture and task.
A typical Convolution Neural Network might include the following layers:
Input Layer: Used for inputting the image into the system.
Convolution Layer: Apply various filters to the image to create a feature map.
ReLU Layer: It introduces non-linearity to the system. It replaces all negative pixel values in the feature
map with zero.
Pooling Layer: It reduces the dimensionality of the feature map while retaining important information.
Fully Connected Layer: After several convolutional, ReLU, and pooling layers, the high-level reasoning in
the neural network is done via fully connected layers.
For the 5x5 input matrix and a 3x3 filter, we perform a convolution operation. In the convolution, we
slide the filter over the image and for each position, calculate elementwise multiplication and add them
up. Here is the convolution operation:
The calculation for the first element (top-left corner) is (11) + (10) + (11) + (00) + (11) + (10) + (01) +
(00) + (1*1) = 2.
Remember that the output size of the convolution operation depends on the stride (how much you
move the filter each time) and padding (adding extra pixels around the input image). In this case, a
stride of 1 and no padding were assumed.
Download