Note: This is a sample exam only. Do not treat it as an exact indication of any future exam. DEAKIN UNIVERSITY Faculty of Business and Law School of Information Systems MSQ791 – DATA ANALYSIS FOR MANAGERS SAMPLE EXAMINATION PAPER Question 1. Mark 2. 3. 4. 5. 6. 7. 8. 9. 10. Total Instructions for candidates: 1. Answer questions in the spaces provided on the question sheets. If you need more space use the back of the page opposite the question, but only if necessary, for example, for extra calculations. Do not use a script book. 2. Dot/bullet point answers for written questions are acceptable. 3. Answer ALL ten (10) questions on this Sample Exam. Show sufficient working for each. 4. This exam totals 100 marks and is worth 60% of the final assessment for this unit. MSQ791 Data Analysis For Managers – Sample Exam Page 2 QUESTION 1 (1+1+1+1+1+1+1+1+1+1) = 10 marks) Conrobar Pty Ltd has requested you to investigate any difference in years of education of male management staff and female management staff. The first table below gives the actual years of education for the 14 male managers, and for the 10 female managers, in a random sample of employees. The second table gives relevant summary measures for the two groups. Sample data: Males Females 12 14 12 15 11 16 10 13 19 13 12 14 11 14 16 14 10 15 17 14 10 12 15 12 PaceXL Ungrouped Data: Years of education Sample Data: Number of Data Points Minimum Maximum Total Arithmetic mean Median Mode First Quartile Third Quartile Range Inter Quartile Range Variance (Sample) Standard Deviation (Sample) Coefficient of Variation (Sample) Skewness Coeff (Pearson's, Sample) Males Females 14 10 19 179 12.786 12.0 12.0 11.0 15.0 9 4.0 8.027 2.833 0.222 0.832 Find the following missing summary measures for years of education for Females. (a) Minimum (b) Maximum (c) Arithmetic mean (d) Median (e) Mode (f) First quartile (g) Interquartile range (h) Standard deviation (i) Coefficient of variation (j) Skewness coefficient 10 142 15.0 3 0.844 MSQ791 Data Analysis For Managers – Sample Exam QUESTION 2 (1+1+1+1+1+1+1+1+1+1) = 10 marks) Following are graphs marked A, B, C, D, E and F of six different scatterplots depicting relationship situations between the two variables Y and X. (cont.) Page 3 MSQ791 Data Analysis For Managers – Sample Exam Page 4 For the ten questions that follow, circle just one (1) of the letters given. Choose the scatterplot which is the best example of its type or is the most appropriate for what is required. If none is appropriate, circle the letter Z. (a) Perfect positive linear relationship A B C D E F Z (b) No relationship with r = 0 A B C D E F Z (c) r = -0.79 A B C D E F Z (d) Curvilinear positive relationship A B C D E F Z (e) Wide positive scatter A B C D E F Z (f) As X increases Y tends to neither increase nor decrease A B C D E F Z (g) Quadratic (parabolic) model appropriate A B C D E F Z (h) R2 close to 100% A B C D E F Z (i) Inverse positive relationship A B C D E F Z (j) Zero linear correlation with strong relationship A B C D E F Z MSQ791 Data Analysis For Managers – Sample Exam Page 5 QUESTION 3 (3+4+3 = 10 marks) Faster Parcels, a courier service company, classifies customers by "method of payment" (for Credit and Non-Credit customers) and by "regularity of custom" (for Regular and Non-Regular customers). We know the following: (a) 70% of customers are Credit customers 60% of customers are Regular customers 40% of customers are both Regular and Credit customers Set up a contingency table for these data by completing the following table as a "percent of total table": Regular Non-Regular Row Total Credit Non-Credit Col. Total (b) 100% If we were to choose one customer at random, find the probabilities of each of the following events occurring. (i) The customer is a Credit customer. i.e. P(Credit) = (ii) A Regular but Non-Credit customer i.e. P(Regular and Non-Credit) = (iii) A Non-Credit and a Non-Regular customer i.e. P(Non-Credit and Non- Regular) = (iv) A Credit customer given that they are a Regular customer. i.e. P(Credit | Regular) = (c) Does there appear to be any relationship between "method of payment" and "regularity of custom". Explain your answer by using just two (2) of the results in (b) above. MSQ791 Data Analysis For Managers – Sample Exam Page 6 QUESTION 4 (5+5 = 10 marks) For Faster Parcels (the courier company in Question 3), it is assumed that the delivery time for overseas deliveries will follow a normal distribution, with a mean of 10 days and a standard deviation of 2.5 days. Under these assumptions: (a) What is the probability that a parcel will be delivered within 7 to 14 days? (b) The company intends to offer a double refund of the freight charge for overseas deliveries, if the delivery time is excessive. It only wants to commit itself to doing this for 1% of deliveries. What is the number of days (calculated to two decimal places) beyond which it would have to offer this refund? MSQ791 Data Analysis For Managers – Sample Exam Page 7 QUESTION 5 (4+2+2+2 = 10 marks) The Council for White Waters, a (fictitious) city in greater Brisbane with approximately 22,000 households, intends to conduct a sample survey of households to gather relevant data from residents in regard to a range of issues pertaining to waste management planning: including rubbish collection, non-burnable waste and recycling. The survey has great importance because it will determine household waste management within the city for many years to come. Data will be collected by asking households to complete an extensive questionnaire. You are in charge of the Council’s research staff. Your research team has identified three key stages to the decision making process. I. II. III. Data collection (including developing the questionnaire, distribution and collection). Data processing (including coding and entering questionnaire results for computer analysis, computer generation of relevant tables, charts, calculations, etc.). Analysis (including interpretation of the results and presentation of findings). A precise budget for the cost of the survey has been proposed by your team, but the Finance Manager for the Council has insisted that there are insufficient funds available. There is no alternative: one of these stages must be scaled back significantly. (a) Which of the three stages given above is the least important, and therefore the one that you think is the most amenable to funding cuts? Provide a full explanation. (continued) MSQ791 Data Analysis For Managers – Sample Exam Page 8 (b) You have arranged to meet with the Finance Manager again, who puts the following three suggestions to you. In each case, give your response and explanation for that response. (i) Abandon the pilot study for the survey. (ii) Replace the random sample survey of households with a sample survey of shoppers, visitors to the local library, and visitors to the Council chambers. (iii) Put the survey out to tender to consultants and accept the lowest bid. MSQ791 Data Analysis For Managers – Sample Exam Page 9 QUESTION 6 (3+5 = 8 marks) Following is a table containing key selected PaceXL summary measures for the annual rates ($) collected in 2005 from a random sample of households broken into low density housing (that is, typical houses on standard sized blocks), medium density housing (that is, townhouses and units on small blocks) and high density housing (that is, flats and apartments in multi-storey blocks) in the City of White Waters. PaceXL Ungrouped Data: Annual rates ($) SAMPLE DATA Low Density Medium Density High Density Minimum 181.1 174.2 89.5 Maximum 485.5 399.9 415.0 324.5923 263.875 235.8533 283.0 241.85 219.3 Nil 206.4 Nil 232.2 206.4 159.1 Arithmetic mean Median Mode First Quartile Third Quartile Standard Deviation (a) 430.2 309.85 328.9 104.3829 74.2333 102.7881 The following diagram contains boxplots for annual rates for Low and High Density Housing. Complete the diagram by drawing in the boxplot for the rates for Medium Density Housing. LOW MEDIUM HIGH 0 100 200 300 RATES ($) 400 500 MSQ791 Data Analysis For Managers – Sample Exam (b) Page 10 Using the graph above, and the results from the table of summary measures, write a brief report in plain language comparing and contrasting the rates received from Low, Medium and High density housing in 2005. (Dot point answers are acceptable.) MSQ791 Data Analysis For Managers – Sample Exam Page 11 QUESTION 7 (3+3+2+2 = 10 marks) Council for White Waters eventually took a random sample of 400 households. One of the aims of the sampling exercise is to investigate whether or not there is any difference in the amount of household waste produced by low density housing and medium density housing. Of those households sampled, 125 were low density housing, while 160 were medium density. (a) As indicated, of those sampled, 125 were low density housing. The mean amount of household waste per week was 13.3 kilograms, with a standard deviation of 2.5 kilograms. Calculate the 99% confidence interval of weekly household waste per low density household. (b) As indicated, of those sampled, 160 were medium density housing. The mean amount of household waste per week was 12.3 kilograms with a standard deviation of 2.2 kilograms. Calculate the 99% confidence interval of the mean value of weekly household waste per medium density household. (continued) MSQ791 Data Analysis For Managers – Sample Exam Page 12 (c) The CEO of White Waters has insisted on maximum confidence and minimal risk, hence the use of 99% confidence. On the basis of these two sample results, and your subsequent confidence intervals, does there appear to be any difference in the mean amount of weekly household waste produced by low density housing compared to medium density housing? Explain your conclusion, and provide a recommendation you would make to management. (d) Alternatively, say you wished to perform a hypothesis test as to whether or not there is any difference in the mean amount of waste for low density housing compared to medium density housing. Without performing any calculations, or completing the test, what would be the null and alternative hypotheses? (Here use Population 1 as Low Density housing, and Population 2 as Medium Density housing.) Give the hypotheses in words and in symbol form. MSQ791 Data Analysis For Managers – Sample Exam Page 13 QUESTION 8 (2+2+2+2+2 = 10 marks) The White Waters Council currently receives on-going complaints about the inadequacy of the garbage collection system. You have been asked by the Council to assess the extent of the problem and have included an appropriate question in the survey. The chief engineer has indicated to you that he believes that no more than 30% of households "experience any problems". You have decided to test the validity of this figure. Of the 400 households sampled, 145 owners reported that they had "experienced problems" with the garbage collection system. You decided to conduct an appropriate hypothesis test. Following is the output you have generated from a statistics package. Complete the questions that follow. PaceXL Intervals and Tests Data Input Significance Level (alpha) % Ho : Proportion, p% Sample Size, n Sample pBar, % Sample Count (a) 1.0% 30.0 400 36.25 145 Test and Results Ho : p < = 30.0 Ha : p > 30.0 Test Type : z : Upper-tail alpha, a : 0.01 CV of z : 2.326 Statistics: - pBar% 36.25 - Standard Error% 2.291 - z-Statistic 2.728 - CV pBar% 35.33 - p-Value 0.0032 Conclusion: p-Value v alpha: - Reject Ho Is this problem about a categorical variable or a numerical variable? Briefly explain your answer. Include an explanation as to why the two types of variables must be treated differently. (continued) MSQ791 Data Analysis For Managers – Sample Exam (b) Page 14 Under your test, you have set the null and alternative hypotheses as follows: H0: p ≤ 30% Ha: p > 30% and therefore used an "upper tail" test. Explain why you have set up the two hypotheses in this way, and why it is referred to as an "upper tail" test. (c) With a hypothesis test it is usual to specify a level of significance. What level of significance have you used in this test? On what basis would you have chosen this value, and could other levels have been appropriate? (continued) MSQ791 Data Analysis For Managers – Sample Exam (d) Page 15 Explain the following two concepts and include calculations to show their derivations: (i) p (sample) % = 36.25 Calculation: Interpretation: (ii) z statistic = 2.7277 Calculation: Interpretation: (e) What conclusion have you come to from this test? Please explain in plain language, and include an explanation of the reasons shy you came to this conclusion, based on the output from your statistical test. MSQ791 Data Analysis For Managers – Sample Exam Page 16 QUESTION 9 (3+3 = 6 marks) The following details are required for Q. 9 and Q. 10. Use a 5% level of significance for all tests. Your firm manufactures microprocessors and, as production manager, you are concerned about the proportion of defective microprocessors being produced on the production line. Each item is tested the moment it comes off the production line, but any defectives must be thrown away. Because of the nature of the product, there is a large defective rate. A level of up to 10% defectives is considered to be acceptable for this product. Demand for your firm's product is running very high, yet the factory is not large. As a result, for the last 12 months, the company has been operating on the basis of three shifts a day, for 5 days a week. The three shifts are the Day shift (8.00 a.m. - 4.00 p.m.), the Evening shift (4.00 p.m. - 12.00 midnight.) and the Morning shift (12.00 a.m. - 8.00 a.m.). The Day shift is the largest, in terms of staff on the production line and total output. The Morning shift is the smallest, and run with close to the minimum number of staff to keep the production line going; the absolute minimum is 10. While every attempt is made to keep the staff number at 15 on the Morning shift, absenteeism is a problem on that shift. A major concern is the proportion of defectives being produced on the Morning shift. Using data over the past four months (equal to 90 days), you have derived the following table: Observed Frequencies/Count Level of Defectives Acceptable (10% or less) Unacceptable (More than l0%) Total days (a) Day 76 14 90 Shift Evening 72 18 90 Morning 62 28 90 From a descriptive statistics point of view, would you say that there is any reason to believe there is any difference in the defective rate between shifts? Explain your answer. (cont.) MSQ791 Data Analysis For Managers – Sample Exam (b) Page 17 From an inferential statistics point of view, would you say that there is any reason to believe that there is any difference in the defective rate between shifts? To assist your answer, the sample Chi-square statistic is 6.6857. MSQ791 Data Analysis For Managers – Sample Exam Page 18 QUESTION 10 (2+2+2+2+2+2+2+2 = 16 marks) Please read this question in conjunction with the introduction from Question 9. From a period of months, you have randomly selected 12 Morning shifts and collected the following data, with a view to detecting any underlying factors behind the defective rate for the Morning shift. In this problem, the dependent variable is the defective rate, measured as the percentage that defective items are of total production. Defective Rate (Per cent of Total Pd’n) % 11% 13% 8% 10% 18% 7% 10% 15% 11% 8% 11% 13% Total Production (Units) 380 394 280 352 375 369 410 395 406 295 367 383 Number of Staff on Duty 15 14 15 13 10 12 15 11 14 15 12 12 Production Supervisor that Shift Smith, 0 Smith, 0 Jones, 1 Jones, 1 Smith, 0 Jones, 1 Smith, 0 Smith, 0 Jones, 1 Jones, 1 Jones, 1 Smith, 0 Minimum outside o Temperature C 5 10 7 6 12 8 11 7 6 8 15 9 (Note: Smith and Jones are the two production supervisors in charge of the Morning shift and alternate on a weekly basis. In the regression model, 0 is used for Smith and 1 for Jones.) The data above were analysed using the Regression and Correlation routine of PaceXL, and relevant output is reproduced below. PaceXL: Correlation Correlation Coefficient, r DefRate% DefRate% 1.000 TotalPd'n 0.493 NumStaff -0.654 Supervisor -0.694 MinTemp 0.300 TotalPd'n 0.493 1.000 -0.303 -0.571 0.155 NumStaff -0.654 -0.303 1.000 0.199 -0.382 Supervisor -0.694 -0.571 0.199 1.000 -0.120 MinTemp 0.300 0.155 -0.382 -0.120 1.000 MSQ791 Data Analysis For Managers – Sample Exam Analysis of Variance, ANOVA Degrees Freedom, df Regression 4 Error 7 Total 11 Sum of Squares, SS 82.203 26.047 108.250 Mean Square, MS 20.551 3.721 Page 19 F-Ratio 5.523 p-Value 0.02500 p-Value 0.03530 0.97021 0.03868 0.03489 0.89252 95% Conf. Lower 2.329 -0.043 -1.831 -6.756 -0.483 Regression Equation Results Dependent Variable, Y: DefRate% Indep. X Variables Intercept TotalPd'n NumStaff Supervisor MinTemp R-Squared Multiple R Adj. R-Squared Standard Error of Estimate Durbin-Watson Number of Observations Coefficien t 25.498 -0.001 -0.948 -3.545 0.03 Standard Error 9.798 0.018 0.373 1.358 0.217 t-Statistic 2.6024 -0.0387 -2.5396 -2.6105 0.1401 95% Conf. Upper 48.667 0.042 -0.065 -0.334 0.544 VIF 1.572 1.261 1.487 1.174 75.94% 0.8714 62.19% 1.929 1.998 12 (a) What does the F-ratio, and associated results, tell us? (b) For the regression printout shown, all four independent variables have been included in the regression model. Write down the multiple regression equation: (c) Is there any evidence of multi-collinearity in this model? Explain fully? MSQ791 Data Analysis For Managers – Sample Exam Page 20 (d) Interpret each of the coefficients in this equation, from a practical point of view: (e) Interpret R2 for this regression equation. (f) Which of the four independent variables are significant in this relationship? Justify briefly. (continued) MSQ791 Data Analysis For Managers – Sample Exam (g) (h) Page 21 Under the best conditions, namely with 15 staff, total output for the Morning shift should o be 380 microprocessors. Assuming an outside temperature of 8 C, what is the expected defective rate: (i) If Smith is the supervisor? (ii) If Jones is the supervisor? What conclusions would you draw about the underlying factors behind the defective rate in the Morning shift? Write a brief report for management. END OF SAMPLE EXAMINATION PAPER