Uploaded by Aditi Kumari

Answer key 2021 final paper

advertisement
Set A
Name of the Course : B. A. (H) Economics
Semester : III
Name of the Paper : Data Analysis (SEC)
Unique Paper Code : 12273303
Duration: 3 hours
1.
Marking Scheme
Maximum Marks:65
a Population of interest: 4000 full-time students of the engineering college.
2 marks
Probability Sampling methods: Random Sampling, Systematic Random Sampling, Stratified Sampling,
Cluster Sampling (Levine et al. (2017), Section 1.3)
4*2=8 marks
b Both RAND and RANDBETWEEN functions are used to create random numbers in Excel. The RAND
function returns a random number between 0 and 1 whereas RANDBETWEEN returns a random integer number between any two specified numbers. For example, =RAND() will generate a number like
0.422245717 while RANDBETWEEN(1,100) will create a random integer number between 1 and 100.
Both the functions return a new random number every time the worksheet is calculated. 2 marks
Simple random sample of size 200 with replacement: RANDBETWEEN(1,4000)
1 mark
c R commands:
1*3=3 marks
i Round off 22/7 to the nearest 3 digits after decimal: round(22/7, digits = 3)
ii Round off 18/7 to the greatest integer: ceiling(18/7)
iii Round off 17/5 to the least integer: floor(17/5)
2a
i Contingency table based on percentage of row total: 2 marks
High
Low
Male
41.67
58.33
Female
50
50
Contingency Table based on percentage of column total: 2 marks
High
Low
Male
55.55
63.64
Female
44.44
36.36
Contingency Table based on percentage of overall total: 2 marks
High
Low
Male
25
35
Female
20
20
Males are at greater risk of high stress. 2 marks
ii Percentage of employees who are females and have low stress level=40/200 2 marks
2b Constructing Frequency contingency table characterized by gender and stress level using COUNTIFS Excel
function: 3 marks
COUNTIFS(B2:B200, “Male”, C2:C200, “High”)
COUNTIFS(B2:B200, “Female”, C2:C200, “High”)
COUNTIFS(B2:B200, “Male”, C2:C200, “Low”)
COUNTIFS(B2:B200, “Female”, C2:C200, “Low”)
3 marks for any 3 of the above syntax
2c Command to import the Excel data file into R: data = read.csv(filename.csv) 1 mark
Contingency table command: table(data$gender, data$stress) 2 marks
Marks should not be deducted if the student merely mentions table(gender, stress)
3a
i Mean and standard deviation of 3 types of balls: (1 mark for Mean +2 marks for S.D.)*3= 9 marks
Red
Blue
Green
Mean
37.8
35.8
35.2
Standard Deviation
6.142
6.285
3.425
If the last value for red balls was 35 instead of 50, mean falls to 36.3 1 mark
ii Column bar plot is best to represent the mean of the diameters of the three types of balls most efficiently.
Reason: Scatter plots are used for showing the relationship between two variables and line plot is used to
show the trend over time. Column plot is the only option applicable, where the three Columns can be used
to compare values across categories. 3 marks
UPC : 12273303
Data Analysis (SEC) - Page 2 of 2
Semester : III
3b R commands: (1.5*2)=3 marks
i bag = rep(c(“Red”,“Green”,“Blue”),times=c(10,10,10))
ii urnsamples(bag, size = 5, replace = FALSE) OR urnsamples(bag, size = 5)
Either of the two commands.
3 a(ii) In lieu of Q3, Part (a)(ii), only for Visually impaired students: Difference between discrete and continuous numerical variables: Levine et al. (2017), P-38 3 marks
4a Comparing data characteristics to theoretical properties: (Levine et al. (2017), P 228-229) (2*3)=6 marks
i The mean of 19.19 is less than the median of 14.47. (In a normal distribution, the mean and median are
equal.)
ii The interquartile range of 14.2 is approximately 1.18 standard deviations. (In a normal distribution, the
interquartile range is 1.33 standard deviations.)
iii The range of 42.97 is equal to 3.57 standard deviations. (In a normal distribution, the range is approximately
6 standard deviations.)
Hence, not normally distributed.
4b Explanation of measure of skewness and kurtosis of a distribution. (Levine et al. (2017), P-132) 4 marks
Excel functions: SKEW, KURT (1*2)=2 marks
4c R commands:
i 4X4 matrix A using sequence of numbers from 1 to 16.
A = matrix(seq(1,16), nrow = 4, ncol = 4) 2.5 marks
ii Matrix B, which is transpose of matrix A.
B=t(A) 1 mark
iii Matrix C, which is obtained by multiplication of matrix A with B:
C=A % * % B 1 mark
Marks should be deducted if a student simply writes A*B
5a 95 % CI for population mean= [0.905, 0.948] 4 marks
1 litre does not lie in the 95 % CI and hence, the distributor has the right to complaint. 2 marks
5b (b) Sampling error explanation. (Levine et al. (2017), P-265) 2 marks
When N=900, the sampling error is 0.02156.
When N=1089, the sampling error becomes 0.0196. Hence, the sampling error falls. 2 marks
Excel function to calculate sampling error= CONFIDENCE() 2 marks
5c R command(s) for constructing a neatly labelled and colourful histogram, with unequal bins. 4.5 marks
hist(marks, breaks = c(0,33,50,60,75,100), col = “tomato”, main = “Number of students scoring marks”, xlab
= “Marks”, ylab = “Number of Students”)
Key options that answer must contain: hist(), breaks(), main(), xlab, ylab, col
5c For Visually Impaired students: Explanation of the use of the following R commands: getwd() and setwd().
(Garderner, P- 35) 4.5 marks
6a Let x1 and x2 be the mean revenue earned from City A and City B, respectively. Then hypotheses are: 2 marks
H0 : x1 − x2 ≤= 0
H1 : x1 − x2 > 0
6b Following are the hypotheses to test the difference in the mean revenue:
H0 : x1 − x2 = 0; H1 : x1 − x2! = 0 2 marks
t-stat = 1.76; t critical (at 5 % level) = 2.05.
Since t-stat is less than the t-critical, we do not reject Ho. Hence, no evidence of a difference in revenue earned
in the two cities. Therefore, it is not justified for the firm to focus on one city. 2 marks
6c The p-value for the two-tail test is 0.09.
0.01 < 0.09 < 0.10. Thus, do not reject Ho at 0.01, while reject Ho at 0.1 level of significance. 2 marks
6d Suppose CityArevenue, CityBrevenue are the variable names.
t.test(CityArevenue,CityBrevenue, var.equal = TRUE, alternative = “greater”)
t.test(CityArevenue,CityBrevenue, var.equal = TRUE, alternative = “greater”, conf.level =.99)
t.test(CityArevenue,CityBrevenue, var.equal = TRUE, alternative = “greater”, conf.level =.90)
Key options to check: t.test(), var.equal, alternative, conf.level
2.5 marks
1 mark
1 mark
6e Excel functions used for getting the Student’s-t distribution: T.DIST.2T/T.DIST/TDIST 2 marks
Excel functions used for getting the inverse of Student’s-t distribution: T.INV.2T/T.INV/TINV 2 marks
(Levine et al. (2017), P -329).
Full credit to be given for explanation of any one of the above mentioned functions for student’s t-distribution,
and any one for inverse of t-distribution.
Download