STAT20028 Assignment 2, Term 2 2008 Question 1 5 Marks a) The histogram below shows a sample of annual percentage returns on investment portfolios chosen by 50 investment managers from Queensland. Note that the class interval 0 to less than 6 is represented by the class mark 3, and so on. Histogram of class mark of annual return 25 20 y15 c n e u q10 e r F 5 0 (i) For this sample, what is the mean annual return? 3 9 15 Answer: Mean = ΣX/n = 15 21 27 Class mark of annual return (ii) What is the standard deviation for this sample? Answer: Standard deviation, S = 1 mark ( X X ) 2 1 mark /(n 1) = 6.6394 Class Mark Frequency C f Cxf (C-Xbar)^2*f 3 5 15 720 9 10 90 360 15 20 300 0 21 10 210 360 27 5 135 720 Total: 50 750 2160 Mean: 15 St. deviation: 6.6394 b) Assume that David has contracted to use a computer dating service. He knows that his personality, background, interest, and so on should make him initially compatible with 30% of the general population. If the computer dating service is doing nothing more than making random assignments and its clientele is representative of the general population, what is the probability that out of 10 independent uses of the dating service that David would be matched with at least two persons with whom he would be initially compatible? 1.5 marks Answer: p = 0.30, n = 10 P(at least 2) 1 P(0) P(1) 1 10C0 (0.3)0 (0.7)0 10C1 (0.3)1 (0.7)9 1 0.710 10(0.3)(0.7)9 1 0.0282475 0.1210608 0.8507 c) The defects in the rubber covering on a particular type of telephone cable follow a Poisson distribution with a mean rate of 0.00065 per lineal metre. What is the probability of fewer than 3 defects in a spool of 10,000 lineal metres? 1.5 marks Answer: Mean, λ = 0.00065×10,000 = 6.5 Fewer than 3 = P(0) + P(1) + P(2) e6.5 (6.5)0 e6.5 (6.5)1 e6.5 (6.5) 2 0! 1! 2! 0.001503 0.009772 0.031760 0.043 Question 2 (a) 5 Marks A study shows that employees who begin their work day at 8:00am vary their times of arrival uniformly over the range 7:30am to 8:15am. What is the probability that a random employee arrives between 8:00am and 8:10am? 1 mark Answer: Probability = (8:10 – 8:00)/(8:15 – 7:30) = 10/45 = 0.2222 (b) The errors made in filling a bag of rice are assumed to be normally (Gaussian) distributed with a mean of zero and a standard deviation of 20 grams. What proportion of bags will have errors within 30 grams from the desired weight? 1 mark Answer: 86.64% of bags will have errors within ±30. Z1 = - 30/20 = - 1.5, Z2 = 30/20 = 1.5 From standard normal table , P(Z1<X<Z2) = 0.9332 – 0.0.0668 = 0.8664 (c) An automatic soft drink dispensing machine is supposed to fill 350ml cups with 325ml of soft drink. Due to machine tolerance, the amount actually filled is normally distributed with a variance of 400 ml2. What proportion of cups will overflow? 1 mark Answer: 10.56% cups will overflow 350 325 1.25 400 P( Z 1.25) 1 P( Z 1.25) 1 0.8944 0.1056 Z1 (d) The CPI for major Australian cities for March quarter 2008 is given below. Can the data be considered to be normally distributed? (Source: ABS) 1 mark CPI, Index number for major Australian cities Mar Qtr 2008 Sydney 161.7 Melbourne 160.6 Brisbane 165.6 Adelaide 165.5 Perth 162.5 Hobart 161.3 Darwin 158.5 Canberra 163.0 Answer: The normal probability plot given below and obtained by PHStat is not really a straight line. Therefore, it is not normally distributed in a strict sense, but roughly it can be considered as normal distribution. However, more observations are needed for a conclusive decision. (e) Given an arrival process follows an exponential distribution that has a mean of 20, what is the probability that the arrival time will be less than 2? 1 mark Answer: P(Arrival time < 2) 1 e (2) 1 e40 1 4.248 1018 = 1.00 Question 3 (a) 5 Marks The population of a variable is known to have Gamma distribution. We wish to make some inferences about the variable using the theories of normal distribution. How can we achieve that through sampling? 1 mark Answer: Gamma distribution is a skewed distribution. To apply normal distribution theory the Central Limit Theorem should take effect, and for that purpose, we need to select the sample size sufficiently large. Specifically, the sample size should be greater than 30. (b) The price-earnings ratios of a very large (essentially infinite) set of common stocks are approximately normally distributed with a mean of 15 and a standard deviation of 5. If a large number of samples, each of size 25, are selected from the population of stocks and a sample mean is computed for each sample, what will be the expected value and the variance of the large number of sample means for the price-earnings ratios? 2 marks Answer: The expected value of sample means = 15 The variance of the sample means = (5/√25)2 = 1 (c) Fifty-five percent of Rocky Fitness Centre customers are women. Many samples with n = 25 were taken from this Centre’s customer list. What is the standard deviation of the proportion of women customers found in these samples? 2 marks Answer: standard deviation = P(1 P) 0.55(1 0.55) 0.099 n 25 Question 4 (a) 5 Marks A health service provider has been reimbursed by Medibank Private for a large number of services. It appears that overpayments may have occurred on many of the reimbursements due to errors made during the billing process or owing to upcoding (billing for a more complicated service than was actually provided). Find the 95% confidence interval estimate of the mean overpayment per service given the following random sample of overpayments. $110, $130, $90, $100, $120, $105, $95, $115, $125 2 marks Answer: Sample size is small and sample probability distribution is not known, therefore, tdistribution should be used. Sample mean = 110, standard deviation = 13.693 Confidence interval = X tn 1,0.025 (b) S 13.693 = (99.47, 120.53) 110 2.306 n 9 A local government agency has just purchased a new computer software product. The agency chief is interested in determining how many of the local government’s 3,900 employees have previous experience with similar software product. She takes a sample of 50 people and finds 6 who have had this experience. Construct a 95% confidence interval for the proportion of people who have had this experience. 2 marks Answer: p = 6/50 = 0.12 Confidence interval p Z (c) p(1 p) 0.12(1 0.12) = (0.03, 0.21) 0.12 1.96 n 50 BP wishes to estimate the mean amount of water that has seeped into the fuel storage tanks at its refineries in Brisbane. A preliminary sample of n = 21 tanks showed that the standard deviation, s = 45 litres. How much larger should the sample be in order to estimate the mean water content of the tanks to within ±10 litres with 95% confidence? 1 mark Answer: Sampling error, e = 10, s = 45, we do not know t, therefore start with the Z value which is 1.96. Therefore, n Z 2 s 2 (1.96)2 (45)2 77.79 ≈ 78. (Since the required sample size is e2 (10)2 greater than 30 we do not have to refine the result with t-value instead of the Z-value). Thus, the sample size has to be increased by 57 (=78-21) additional samples.