Here is a compiled list of all the questions extracted from your provided images, formatted for inclusion
in a PDF. You can copy this text into a Word processor or LaTeX editor and export it as a PDF.
MANIPAL INSTITUTE OF TECHNOLOGY
II SEM. M.C.A. IN-SEMESTER EXAMINATIONS
MARCH 2023
SUBJECT: Data Analytics [MCA-4251]
Date of Exam: 14/03/2023
Time of Exam: 10.30 AM – 11.30 AM
Max. Marks: 15
Instructions to Candidates:
•
Answer ALL the questions & missing data may be suitably assumed.
1.
An airlines agency would like to increase their airline bookings by adjusting variables like ticket prices,
frequency of flights, and promotional rates? Identify the type of Data Analytics to be used and justify
your choice.
Consider the following dataset having details of employee:
Table 1: Employee Dataset
2.
Identify the variable type for the following:
a. Percentage SSC
b. Board SSC
c. Gender
d. Salary
3.
Draw a boxplot for the variable Percentage HSC in Table 1.
4.
Differentiate between MSB and MSW. Give an example.
5.
Differentiate between Null and Alternative hypothesis. Give an example each.
6.
A startup company “HelpAway Pvt. Ltd.” has 12 employees on their payroll. The monthly salaries of all
the employees are given in Table 2:
Table 2: Employee Salary
Find mean, mode and median of monthly salaries for male and female employees.
2B.
Using Manhattan distance, find which two pharmacies are close to each other (Refer Table 1).
2C.
Differentiate between Business Intelligence and Data Analytics.
3A.
Consider the animals dataset in Table 3 for predicting whether one can pet an animal or not. Use Naive
Bayes Classifier.
X₁: Animal = Cow, Size = Medium, Color = Brown
Table 3: Dataset on Animals
3B.
Describe any three methods for improving the efficiency of Apriori Algorithm.
3C.
A sample of n = 15 overdue accounts in a large department store yields the following amounts (in $)
due:
55.20 4.88 271.95 18.06 180.29 365.29 28.16 399.11 807.80 44.14 97.47 9.98 61.61 56.89 82.73
i. Determine the mean amount due for the 15 accounts sampled.
ii. If there are a total of 150 overdue accounts, use the sample mean to predict the total amount
overdue for all 150 accounts.
4A.
Suppose an entire population has a total of 30 instances. The dataset is to predict whether the person
will go to the gym or not. Let’s say 16 people go to the gym and 14 people don’t. Now there are two
features to predict whether he/she will go to the gym or not.
Feature 1 is “Energy” which takes 2 values “high” and “low”.
7.
With the help of scatter plot find out if there is a relationship between X (a person’s salary) and Y
(his/her car price).
Table 3: Salary vs Car Price
8.
The runs scored by a cricket player Amar Rathore from 2008 to 2017 are as follows:
Table 4: Cricket Scores
Normalize the variable Runs using Min-Max, z-score, and decimal scale transformation.
9.
Consider the dataset of donations made to a charity organization Sevakarm in May 2023. Each donation
has an ID and a donor. Each donor has a unique ID. Donor can donate in two ways: money without
anything in return, or they can buy a t-shirt. The payment can be made by credit card, Paypal or cash
and can reach three different status: completed, failed and abandoned. Every donation has a value in
S.
Table 5: Donations
1. Generate a contingency table summarizing the variables Method and Status category.
2. Generate the following summary tables:
a. Grouping by type with a count of the number of observations and the sum of amount($) for
each row.
b. Grouping by date with a count of the number of observations and the mean amount($) for
each row.
3C.
Explain K-fold cross validation? Mention any two real time applications mentioning its usage in solving
Machine Learning problems.
4A.
Given the following data:
(i) Apply the principles of Maximum Likelihood Estimation to predict the class label of the test tuple
(1.5, 2.5).
μ₁
μ₂
σ₁
σ₂
Class A -0.09 5.83 4.02 0.78
Class B -2.78 -2.04 2.08 0.80
(ii) Compare and contrast Eager and Lazy learners with an example for each.
4B.
Assume we have a data set D with only two classes, positive and negative. Compute the entropy values
for the following 3 different cases as given below:
(i) The data set D has 50% positive examples and 50% negative examples.
(ii) The data set D has 20% positive examples and 80% negative examples.
(iii) The data set D has 100% positive examples and 0 negative examples.
4C.
List the assumptions and/or objectives of Linear Discriminant Analysis (LDA) for the given input data.
5A.
Compute:
(i) The Gini index for the entire data set of Table given below with respect to the two classes.
(ii) The Gini index for the portion of the data set with age at least 50.
(iii) Repeat the computation of the previous exercise with the use of the entropy criterion.
Name Age Salary Donor?
Nancy 21 37,000 N
Jim
27 41,000 N
Allen 43 61,000 N
Jane
38 55,000 N
Steve 41 36,000 Y
Peter 51 56,000 Y
Name Age Salary Donor?
Sayani 56 74,000 Y
Lata
53 25,000 Y
Mary
61 68,000 Y
Victor 51 60,000 Y
Dale
63 51,000 Y
You can now copy this into your preferred text editor and export as a PDF.
Citations:
1. https://pplx-res.cloudinary.com/image/private/user_uploads/GPGnBCRFLUTNclc/WhatsAppImage-2025-02-27-at-10.16.16-PM-1.jpg
2. https://pplx-res.cloudinary.com/image/private/user_uploads/nuTolUWMiqewJgQ/WhatsAppImage-2025-02-27-at-10.16.17-PM-1.jpg
3. https://pplx-res.cloudinary.com/image/private/user_uploads/gFTqZfPIFRAsvSi/WhatsAppImage-2025-02-27-at-10.16.17-PM.jpg
4. https://pplx-res.cloudinary.com/image/private/user_uploads/NwpnsJAUbkvaMXI/WhatsAppImage-2025-02-27-at-10.16.16-PM.jpg
Answer from Perplexity: pplx.ai/share