Set A Name of the Course : B. A. (H) Economics Semester : III Name of the Paper : Data Analysis (SEC) Unique Paper Code : 12273303 Duration: 3 hours 1. Marking Scheme Maximum Marks:65 a Population of interest: 4000 full-time students of the engineering college. 2 marks Probability Sampling methods: Random Sampling, Systematic Random Sampling, Stratified Sampling, Cluster Sampling (Levine et al. (2017), Section 1.3) 4*2=8 marks b Both RAND and RANDBETWEEN functions are used to create random numbers in Excel. The RAND function returns a random number between 0 and 1 whereas RANDBETWEEN returns a random integer number between any two specified numbers. For example, =RAND() will generate a number like 0.422245717 while RANDBETWEEN(1,100) will create a random integer number between 1 and 100. Both the functions return a new random number every time the worksheet is calculated. 2 marks Simple random sample of size 200 with replacement: RANDBETWEEN(1,4000) 1 mark c R commands: 1*3=3 marks i Round off 22/7 to the nearest 3 digits after decimal: round(22/7, digits = 3) ii Round off 18/7 to the greatest integer: ceiling(18/7) iii Round off 17/5 to the least integer: floor(17/5) 2a i Contingency table based on percentage of row total: 2 marks High Low Male 41.67 58.33 Female 50 50 Contingency Table based on percentage of column total: 2 marks High Low Male 55.55 63.64 Female 44.44 36.36 Contingency Table based on percentage of overall total: 2 marks High Low Male 25 35 Female 20 20 Males are at greater risk of high stress. 2 marks ii Percentage of employees who are females and have low stress level=40/200 2 marks 2b Constructing Frequency contingency table characterized by gender and stress level using COUNTIFS Excel function: 3 marks COUNTIFS(B2:B200, “Male”, C2:C200, “High”) COUNTIFS(B2:B200, “Female”, C2:C200, “High”) COUNTIFS(B2:B200, “Male”, C2:C200, “Low”) COUNTIFS(B2:B200, “Female”, C2:C200, “Low”) 3 marks for any 3 of the above syntax 2c Command to import the Excel data file into R: data = read.csv(filename.csv) 1 mark Contingency table command: table(data$gender, data$stress) 2 marks Marks should not be deducted if the student merely mentions table(gender, stress) 3a i Mean and standard deviation of 3 types of balls: (1 mark for Mean +2 marks for S.D.)*3= 9 marks Red Blue Green Mean 37.8 35.8 35.2 Standard Deviation 6.142 6.285 3.425 If the last value for red balls was 35 instead of 50, mean falls to 36.3 1 mark ii Column bar plot is best to represent the mean of the diameters of the three types of balls most efficiently. Reason: Scatter plots are used for showing the relationship between two variables and line plot is used to show the trend over time. Column plot is the only option applicable, where the three Columns can be used to compare values across categories. 3 marks UPC : 12273303 Data Analysis (SEC) - Page 2 of 2 Semester : III 3b R commands: (1.5*2)=3 marks i bag = rep(c(“Red”,“Green”,“Blue”),times=c(10,10,10)) ii urnsamples(bag, size = 5, replace = FALSE) OR urnsamples(bag, size = 5) Either of the two commands. 3 a(ii) In lieu of Q3, Part (a)(ii), only for Visually impaired students: Difference between discrete and continuous numerical variables: Levine et al. (2017), P-38 3 marks 4a Comparing data characteristics to theoretical properties: (Levine et al. (2017), P 228-229) (2*3)=6 marks i The mean of 19.19 is less than the median of 14.47. (In a normal distribution, the mean and median are equal.) ii The interquartile range of 14.2 is approximately 1.18 standard deviations. (In a normal distribution, the interquartile range is 1.33 standard deviations.) iii The range of 42.97 is equal to 3.57 standard deviations. (In a normal distribution, the range is approximately 6 standard deviations.) Hence, not normally distributed. 4b Explanation of measure of skewness and kurtosis of a distribution. (Levine et al. (2017), P-132) 4 marks Excel functions: SKEW, KURT (1*2)=2 marks 4c R commands: i 4X4 matrix A using sequence of numbers from 1 to 16. A = matrix(seq(1,16), nrow = 4, ncol = 4) 2.5 marks ii Matrix B, which is transpose of matrix A. B=t(A) 1 mark iii Matrix C, which is obtained by multiplication of matrix A with B: C=A % * % B 1 mark Marks should be deducted if a student simply writes A*B 5a 95 % CI for population mean= [0.905, 0.948] 4 marks 1 litre does not lie in the 95 % CI and hence, the distributor has the right to complaint. 2 marks 5b (b) Sampling error explanation. (Levine et al. (2017), P-265) 2 marks When N=900, the sampling error is 0.02156. When N=1089, the sampling error becomes 0.0196. Hence, the sampling error falls. 2 marks Excel function to calculate sampling error= CONFIDENCE() 2 marks 5c R command(s) for constructing a neatly labelled and colourful histogram, with unequal bins. 4.5 marks hist(marks, breaks = c(0,33,50,60,75,100), col = “tomato”, main = “Number of students scoring marks”, xlab = “Marks”, ylab = “Number of Students”) Key options that answer must contain: hist(), breaks(), main(), xlab, ylab, col 5c For Visually Impaired students: Explanation of the use of the following R commands: getwd() and setwd(). (Garderner, P- 35) 4.5 marks 6a Let x1 and x2 be the mean revenue earned from City A and City B, respectively. Then hypotheses are: 2 marks H0 : x1 − x2 ≤= 0 H1 : x1 − x2 > 0 6b Following are the hypotheses to test the difference in the mean revenue: H0 : x1 − x2 = 0; H1 : x1 − x2! = 0 2 marks t-stat = 1.76; t critical (at 5 % level) = 2.05. Since t-stat is less than the t-critical, we do not reject Ho. Hence, no evidence of a difference in revenue earned in the two cities. Therefore, it is not justified for the firm to focus on one city. 2 marks 6c The p-value for the two-tail test is 0.09. 0.01 < 0.09 < 0.10. Thus, do not reject Ho at 0.01, while reject Ho at 0.1 level of significance. 2 marks 6d Suppose CityArevenue, CityBrevenue are the variable names. t.test(CityArevenue,CityBrevenue, var.equal = TRUE, alternative = “greater”) t.test(CityArevenue,CityBrevenue, var.equal = TRUE, alternative = “greater”, conf.level =.99) t.test(CityArevenue,CityBrevenue, var.equal = TRUE, alternative = “greater”, conf.level =.90) Key options to check: t.test(), var.equal, alternative, conf.level 2.5 marks 1 mark 1 mark 6e Excel functions used for getting the Student’s-t distribution: T.DIST.2T/T.DIST/TDIST 2 marks Excel functions used for getting the inverse of Student’s-t distribution: T.INV.2T/T.INV/TINV 2 marks (Levine et al. (2017), P -329). Full credit to be given for explanation of any one of the above mentioned functions for student’s t-distribution, and any one for inverse of t-distribution.