SINGAPORE MANAGEMENT UNIVERSITY Sample Final Examination 2 COR-STAT1202 Introductory Statistics Instructions to Candidates (i) This test paper contains ten (10) true-or-false questions, ten (10) multiplechoice questions, ten (10) fill-in-the-blanks questions and three (3) problemsolving questions. It comprises eleven (12) printed pages. (ii) Candidates must answer all the 33 questions. (iii) Each true-and-false question carries 1 mark. Each multiple-choice question carries 3 marks. Each fill-in-the-blanks questions carries 3 marks. Each problem-solving question carries 10 marks. The total mark for this paper is 100. (iv) The formula sheets are provided. (v) The normal statistical table and t-distribution table are provided. (vi) All answers must be written in the answer books. (vii) You are given 120 minutes to write. 1 Section A: True or False (Write ‘T’ on the answer books if the statement is true and ‘F’ if the statement is false.) [1.] (1 mark) If the regression line is estimated to be Ŷi = 0.76 − 0.43Xi and the coefficient of determination r2 = 0.49, the coefficient of correlation between X and Y is 0.7. [2.] (1 mark) When sampling without replacement from a finite population, the use of the finite population correction factor will reduce the standard error of the sample mean. [3.] (1 mark) A binomial distribution with parameters n = 800 and p = 0.1 is highly right-skewed. [4.] (1 mark) The larger the p-value, the weaker the evidence to reject the null hypothesis. [5.] (1 mark) The test statistic measures how close the computed sample statistic has come to the hypothesized population parameter. [6.] (1 mark) When constructing a confidence interval for population proportion, if we decrease the confidence level, the estimated standard error of sample proportion would increase. [7.] (1 mark) is false. A Type I error is committed when we do not reject the null hypothesis that [8.] (1 mark) In regression analysis, the Total Sum of Squares (SST) does not depend on the values of independent variable Xi . [9.] (1 mark) If we are performing a two-tailed test of whether p = 0.5, the probability of detecting a shift of the proportion to 0.6 will be less than the probability of detecting a shift of the proportion to 0.7. [10.] (1 mark) The process of using sample statistics to draw conclusions about true population parameters is called descriptive statistics. 2 Section B: Multiple Choices (Write your choice on the answer books.) [11.] (3 marks) Which one of the following statements is false? (A) Sample mean X̄ is an unbiased estimator of the population mean µ regardless the population size and sample size. (B) The sampling distribution of X̄ is approximately normal provided the sample size is sufficiently large according to Central Limit Theorem. (C) The coefficient of correlation must be between -1 and 1, inclusive. (D) As the number of degrees of freedom decreases, the t-distribution approaches to the standard normal distribution. [12.] (3 marks) An auto analyst is conducting a satisfaction survey, sampling a list of 1,000 new car buyers. The list includes, 250 Nissan buyers, 250 BMW buyers, 250 Honda buyers, and 250 Toyota buyers. The analyst selects a sample of 80 car buyers, by randomly sampling 20 buyers of each brand. Is this an example of a simple random sample? (A) Yes, because each buyer in the sample was randomly sampled. (B) Yes, because car buyers of every brand were equally represented in the sample. (C) No, because every possible 80-buyer sample did not have an equal chance of being chosen. (D) No, because the population consisted of purchasers of four different brands of car. [13.] (3 marks) Other things being equal, which of the following actions will reduce the power of a hypothesis test? (I) Increasing sample size (II) Increasing significance level (III) Decreasing sample size (IV) Decreasing significance level (A) I and II (B) I and IV (C) II and III (D) III and IV 3 [14.] (3 marks) A national consumer magazine reported the following correlation analysis: The correlation between car weight and car reliability is -0.34. The correlation between car weight and annual maintenance cost is 0.21. Which of the following statements are true? (I) Heavier cars tend to be less reliable. (II) Heavier cars tend to cost more to maintain. (III) Car weight is related more strongly to reliability than to maintenance cost. (A) I only (B) II only (C) I and II (D) I, II and III [15.] (3 marks) If we randomly choose a number from 2 to 8, inclusive, the probability distribution of the 7 possible outcomes is given as: Outcome Probabilities 2 0.1 3 0.2 4 0.2 5 0.1 Events G1 , G2 , G3 , and G4 are defined as follows. G1 : {The number is odd} G2 : {The number is less than 5 } G3 : {The number is less than 4 or it is even} G4 : {The number is more than 3 and it is odd} Which one of the following statements is false? (A) Events G2 and G4 are mutually exclusive. (B) Events G1 and G4 are statistically independent. (C) Events G1 and G3 are collectively exhaustive. (D) P (G2 |G3 ) 6= P (G3 |G2 ) 4 6 0.1 7 0.2 8 0.1 [16.] (3 marks) A major metropolitan newspaper selected a simple random sample of 1,000 readers from their list of subscribers. They asked whether the paper should increase its coverage of local news. The sample survey finds that 420 readers wanted more local news. What is the 99% confidence interval for the proportion of readers who would like more coverage of local news? (A) 0.3798 to 0.4602 (B) 0.3850 to 0.4550 (C) 0.3894 to 0.4506 (D) 0.3943 to 0.4457 [17.] (3 marks) It has been claimed that 65% of homeowners would prefer to heat with electricity instead of gas. A study finds that 71% of 200 homeowners prefer electric heating to gas. In a two-tail test, can we conclude that the percentage who prefer electric heating may differ from 65%? Determine the p-value for the test. (A) p-value = 0.075; At the 0.05 level of significance, we have sufficient evidence to conclude that the percentage who prefer electric heating may differ from 65%. (B) p-value = 0.075; At the 0.05 level of significance, we do not have sufficient evidence to conclude that the percentage who prefer electric heating may differ from 65%. (C) p-value = 0.0375; At the 0.01 level of significance, we have sufficient evidence to conclude that the percentage who prefer electric heating may differ from 65%. (D) p-value = 0.0375; At the 0.01 level of significance, we do not have sufficient evidence to conclude that the percentage who prefer electric heating may differ from 65%. 5 [18.] (3 marks) A developmental psychologist believes that the age at which a normal child begins to speak words clearly is closely related to the age at which first begins to use complete sentences. A random sample of 20 normal children was taken, and careful records were kept for each. Let X (in months) be the age at which words are first clearly used, and let Y (in months) be the age at which complete sentences are used. The sample provides following statistics: P xi = 260.9, P x2i = 3423.09, P yi = 495, P yi2 = 12280.08, P xi yi = 6473.26 Based on the simple linear regression analysis, predict the age at which complete sentences are used if a child first clearly uses words at age of 14.5 months. (A) 24.74 months (B) 25.93 months (C) 26.48 months (D) 27.26 months [19.] (3 marks) Refer to the information given in question 18, determine what percentage of variations in Y can be explained by the independent variable X in this model. (A) 45.11% (B) 43.25% (C) 44.78% (D) 46.24% 6 [20.] (3 marks) A group of high school students registered for a special SAT mathematics preparatory course offered in their school. They took a sample SAT the first day and then took another the last day. The scores and their differences (After Scores minus Before Scores) were as follows: Student 1 2 3 4 5 6 7 8 9 10 Mean Before 540 460 520 580 670 590 640 490 530 540 556 After 570 510 530 570 680 610 660 520 540 580 577 Difference 30 50 10 -10 10 20 20 30 10 40 21 Standard deviation 64.84169 57.74465 17.2884 We would like to test whether the course has any impact on the SAT scores at the 5% significance level. Determine the test statistic for this test, its degrees of freedom, and the decision of the test. (A) |t| = 3.8412 with with 9 degrees of freedom; reject H0 (B) |t| = 3.8412 with with 9 degrees of freedom; do not reject H0 (C) |t| = 0.7648 with with 18 degrees of freedom; reject H0 (D) |t| = 0.7648 with with 18 degrees of freedom; do not reject H0 7 Section C: Fill in the blanks (Write your final answer to four decimal places on the answer book). [21.] (3 marks) Seven of the 15 campus police officers available for assignment to the auditorium in which a local politician is to speak have received advanced training in crowd control. If 5 officers are randomly selected for service during the speech, what is the probability that at least 3 of them will have had advanced training in crowd control? [22.] (3 marks) During a study of auto accidents, the Highway Safety Council found that 60 percent of all accidents occur at night, 52 percent are alcohol-related, and 36 percent occur at night and are alcohol-related. What is the probability that an accident was not alcohol-related, given that it occurred at night? [23.] (3 marks) Martin Coleman, credit manager for Beck’s, knows that the company uses three methods to encourage collection of delinquent accounts. From past collection records, he learns that 70 percent of the accounts are called on personally, 20 percent are phoned, and 10 percent are sent a letter. The probabilities of collecting an overdue amount from an account with three methods are 0.75, 0.60, and 0.40 respectively. Mr Coleman has just received payment from a past-due account. What is the probability that this account was called on personally? [24.] (3 marks) Robertson Employment Service customarily gives standard intelligence and aptitude tests to all people who seek employment through the firm. The firm has collected data for several years and has found that the distribution of scores is not normal, but is skewed to the left with a mean of 86 and a standard deviation of 16. What is the probability that in a sample of 100 applicants who take the test, the mean score will be less than 84 or greater than 90? 8 [25.] (3 marks) A psychologist wrote a computer program to simulate the way a person responds to a standard IQ test. To test the program, he gave the computer 15 different forms of a popular P IQ test and computed P15 2 its IQ score (X) from each form. The results are as follows: 15 x = 2, 145, i=1 i i=1 xi = 307, 125. Based on this sample results, what is the coefficient of variation? [26.] (3 marks) Arrivals at a walk-in optometry department in a shopping mall have been found to be Poisson distributed with a mean of 2.5 potential customers arriving per hour. What is the probability that the time interval between two potential customers is more than 20 minutes but less than 30 minutes? [27.] (3 marks) A criminologist has developed a questionnaire for predicting whether a teenager will become a delinquent. Scores on the questionnaire can range from 0 to 100, with higher values reflecting a presumably greater criminal tendency. As a rule of thumb, the criminologist decides to classify a teenager as a potential delinquent if his or her score exceeds 75. The questionnaire has already been tested on a large sample of teenagers, both delinquent and nondelinquent. Among those considered nondelinquent, scores were normally distributed with a mean of 60 and a standard deviation of 10. Among those considered delinquent, scores were normally distributed with a mean of 80 and a standard deviation of 5. In a randomly selected group of four considered delinquents, what is the probability that the criminologist will classify all of them as delinquents? [28.] (3 marks) In a study of the effects of a medication on the body temperature of normal adults, a scientist wishes to be 95% sure that the estimates made from a sample are within 0.01◦ C of the population mean. The population under study is believed to have a standard deviation in body temperature of 0.07◦ C. At least how many subjects should be used in the sample if these conditions are to be met? 9 [29.] (3 marks) A dental experiment involves coating patients’ teeth with a special compound which is intended to reduce formation of plaque and so reduce the number of cavities. The compound is applied to the teeth of a sample group of 65 volunteers. After 3 years, these patients developed a mean of 3.2 cavities with a standard deviation of 1.4. A 99% confidence interval for the mean number of cavities developed by all similar patients using this compounded for 3 years is then calculated. Find the margin of error. [30.] (3 marks) The Chevrolet dealers of a large city are conducting a study to determine the proportion of car owners in the city who are considering the purchase of a new car within the next year. If the population proportion is believed to be 0.15, how many owners must be included in a simple random sample if the dealers want to be 90% confident that the margin of error will be no more than 0.02? 10 Section D: Problems Solving (Answer all problems in the answer books) [31.] (10 marks) The following table contains the probability distribution for the number of traffic accidents (X) daily in a small city: x P (X = x) 0 0.20 1 0.25 2 0.30 3 0.15 4 0.10 (a) Suppose you randomly select 100 days and observe the number of traffic accidents in each day. What is the probability that the sample mean of traffic accidents per day is more than 1.55? State clearly whether any assumption is needed and explain why. (b) If you only randomly select 5 days, instead of 100 days in the sample, what is the probability that the sample has exactly two days without any traffic accident? (c) In a sample of 5 days, the daily observations are 0, 4, 2, 3 and 2 traffic accidents. Find the sample mean and estimated standard error of sample mean. [32.] (10 marks) Two research laboratories have independently produced drugs that provide pain relief to arthritis sufferers. In laboratory 1, Drug A was tested on a group of 60 arthritis sufferers and produced a mean of 8.5 hours of relief, and a sample standard deviation of 1.8 hours. In laboratory 2, Drug B was tested on 40 arthritis sufferers, producing a mean of 7.4 hours of relief, and a sample standard deviation of 2.1 hours. Assume that the variances of the two populations are equal even though they are unknown. (a)At the 0.05 level of significance, can we conclude that the average lengths of relief period to arthritis sufferers provided by the two laboratories are different? If we can, which laboratory provides a longer period of relief to arthritis sufferers in general? State clearly whether any additional assumption is needed and explain why. (b) Based on the decision made in part (a), what type of error of the test is possibly committed? Explain why. (c) You are also given the followings information about the characteristics of arthritis sufferers tested in both laboratories: Laboratory 1 2 Age Range 40-45 60-65 Ratio of male to Female 1:2 4:1 Would the above additional information affect the conclusion made in part (a)? Explain briefly. 11 [33.] (10 marks) A firm administers a test to sales trainees before they go into the field. The management of the firm is interested in determining the relationship between the test scores (X) and the sales (Y in units sold) made by the trainees at the end of one year in the field. Data are collected for 10 sales personnel who have been in the field one year. The simple linear regression model Yi = β0 + β1 Xi + i is used to model the situation. The data statistics: P provide the P following P P 2 P xi = 784, x2i = 62348, yi = 2998, yi = 913394, xi yi = 238393 (a) Write down the regression equation used to predict the sales as a function of the test scores. (b) At the 0.05 level of significance, do the data present sufficient evidence to conclude that the test score contributes useful information for the prediction of the sales? State clearly all the assumptions needed for your hypothesis testing. (c) Construct a 95% confidence interval estimate of the increase in sales for every 1 point increase in test scores. (d) Construct a 95% prediction interval of the sales of a particular sales personnel with test score of 78. –END– 12