lOMoARcPSD|17532581 St104a 2021 za paper Statistics 1 (University of London) Studocu is not sponsored or endorsed by any college or university Downloaded by Georgii Shamugia (shamugiaofficial@gmail.com) lOMoARcPSD|17532581 ST104A ZA BSc DEGREES AND GRADUATE DIPLOMAS IN ECONOMICS, MANAGEMENT, FINANCE AND THE SOCIAL SCIENCES, THE DIPLOMA IN ECONOMICS AND SOCIAL SCIENCES AND THE CERTIFICATE IN EDUCATION IN SOCIAL SCIENCES Summer 2021 Online Assessment Instructions ST104A Statistics 1 Tuesday, 4 May 2021: 15:00 – 19:00 (BST) The assessment will be an open-book take-home online assessment within a 4hour window. The requirements for this assessment remain the same as the closedbook exam, with an expected time/effort of 2 hours. Candidates should answer THREE questions: all parts of Section A (50 marks in total) and TWO questions from Section B (25 marks each). Candidates are strongly advised to divide their time accordingly. You should complete this paper using pen and paper. Please use BLACK INK only. Handwritten work then needs to be scanned, converted to PDF and then uploaded to the VLE as ONE individual file including the coversheet. Each scanned sheet should have your candidate number written clearly in the header. Please do not write your name anywhere on your submission. You have until 19:00 (BST) on Tuesday, 4 May 2021 to upload your file into the VLE submission portal. However, you are advised not to leave your submission to the last minute. Workings should be submitted for all questions requiring calculations. Any necessary assumptions introduced in answering a question are to be stated. A list of formulae and extracts from statistical tables are provided after the final question of this paper. You may use any calculator for any appropriate calculations, but you may not use any computer software to obtain solutions. Credit will only be given if all workings are shown. © University of London 2021 Page 1 of 9 UL21/0184 Downloaded by Georgii Shamugia (shamugiaofficial@gmail.com) lOMoARcPSD|17532581 If you think there is any information missing or any error in any question, then you should indicate this but proceed to answer the question stating any assumptions you have made. The assessment has been designed with a duration of 4 hours to provide a more flexible window in which to complete the assessment and to appropriately test the course learning outcomes. As an open-book exam, the expected amount of effort required to complete all questions and upload your answers during this window is no more than 2 hours. Organise your time well. You are assured that there will be no benefit in you going beyond the expected 2 hours of effort. Your assessment has been carefully designed to help you show what you have learned in the hours allocated. This is an open book assessment and as such you may have access to additional materials including but not limited to subject guides and any recommended reading. But the work you submit is expected to be 100% your own. Therefore, unless instructed otherwise, you must not collaborate or confer with anyone during the assessment. The University of London will carry out checks to ensure the academic integrity of your work. Many students that break the University of London’s assessment regulations did not intend to cheat but did not properly understand the University of London’s regulations on referencing and plagiarism. The University of London considers all forms of plagiarism, whether deliberate or otherwise, a very serious matter and can apply severe penalties that might impact on your award. The University of London 2020-21 Procedure for the consideration of Allegations of Assessment Offences is available online at: Assessment Offence Procedures - University of London Page 2 of 9 UL21/0184 Downloaded by Georgii Shamugia (shamugiaofficial@gmail.com) lOMoARcPSD|17532581 SECTION A Answer all parts of question 1 (50 marks in total). 1. (a) Suppose that x1 = −3, x2 = 9, x3 = 16, and y1 = −2, y2 = 1, y3 = 0.5. Calculate the following quantities: !2 3 3 3 X X X √ yi3 i. xi ii. xi yi iii. |x1 | + . y2 i=1 i=2 i i=2 (6 marks) (b) Classify each one of the following variables as either measurable (continuous) or categorical. If a variable is categorical, further classify it as either nominal or ordinal. Justify your answer. (No marks will be awarded without a justification.) i. Age brackets of 18–30, 31–50, 51–70, 70+. ii. Passport number. iii. A country’s inflation rate. (6 marks) (c) State whether the following statements are true or false, and provide a brief explanation. (No marks will be awarded for a simple true/false answer.) i. For a set of observations x1 , x2 , . . . , xn , with mean x̄, then: n X i=1 (xi − x̄) > 0. ii. For two independent events A and B such that P (A) > 0 and P (B) > 0, then: P (A ∪ B) < P (A) + P (B). iii. For a random variable X, E(X 2 ) can be less than (E(X))2 . iv. Rejecting a true null hypothesis is known as the power of a test. v. A 4-by-2 contingency table which results in a χ2 test statistic value of 6.724 is statistically significant at the 5% significance level. (10 marks) (d) X is a normal random variable with a mean of µ = 5. If P (X < 1) = 0.20, approximately what is the value of the variance, σ 2 ? (5 marks) Downloaded by Georgii Shamugia (shamugiaofficial@gmail.com) lOMoARcPSD|17532581 (e) The probability distribution of a random variable X is given below. X=x P (X = x) −2 k −1 2k 0 4k 1 2k 2 k i. Explain why k = 0.10. (2 marks) ii. Given that E(X) = 0, calculate the standard deviation of X to four decimal places. (3 marks) iii. Is it possible to calculate E(1/X)? If yes, calculate its value. If no, explain why. (3 marks) iv. Does X have a normal distribution? Briefly explain your answer. (2 marks) (f) Based on the central limit theorem, you are told that a 90% confidence interval for a population proportion is (0.7077, 0.7723). i. What was the sample proportion which resulted in this confidence interval? (2 marks) ii. What was the size of the sample used? (4 marks) (g) It is assumed that investors are equally split between those who prefer ‘growth’ stocks and those who prefer ‘value’ stocks. In a random sample of 200 investors, 105 agreed with the statement ‘Growth stocks are better than value stocks’. i. Conduct a two-sided hypothesis test, at the 5% significance level, to test whether in the population of investors there are equal preferences for growth and value stocks. Show all steps of your calculation and use the ‘critical value’ approach to perform the test. (5 marks) ii. Calculate the p-value of the test statistic value calculated in part i. (2 marks) Downloaded by Georgii Shamugia (shamugiaofficial@gmail.com) lOMoARcPSD|17532581 SECTION B Answer two out of the three questions from this section (25 marks each). 2. The manager of a store selling shoes is looking into the association between daily sales (in hundreds of $) in the store, y, and the number of customers who visited the store in that day, x. For this reason, in 10 days selected at random the variables x and y were recorded. They appear in the table below: Days # of customers (x) Sales (y) #1 90 11.2 #2 92 11.1 #3 50 6.8 #4 74 9.2 #5 #6 78 88 9.4 10.1 #7 87 9.4 #8 51 7.7 #9 53 8.2 #10 42 6.1 The summary statistics for these data are: Sum of x data: 705 Sum of the squares of x data: 53,111 Sum of y data: 89.2 Sum of the squares of y data: 822 Sum of the products of x and y data: 6,573.3 (a) i. Draw a scatter diagram of these data. Label the diagram carefully. ii. Calculate the sample correlation coefficient. Interpret your findings. iii. Calculate the least squares line of y on x and draw the line on the scatter diagram. iv. Suppose that you observe more data and when you draw the corresponding scatter diagram a non-linear association is revealed. Discuss how this can be interpreted in the context of the problem. (13 marks) (b) A study focused on the perception of job satisfaction that may vary between women and men. For this reason, at random 15 women and 13 men took a job satisfaction questionnaire that gave a score for each one of them (high values of the score indicate higher job satisfaction). Summaries of these scores are presented below. Women Men Sample size 15 13 Sample mean 32.1 28.5 Sample variance 15.2 19.3 i. Use an appropriate hypothesis test to determine whether the mean job satisfaction scores differ between women and men. Test at two appropriate significance levels, stating clearly the hypotheses, the test statistic and its distribution under the null hypothesis. Comment on your findings. ii. State clearly any assumptions you made in part i. iii. Is it possible that there is no difference between men and women in terms of their job satisfaction? Discuss. (12 marks) Downloaded by Georgii Shamugia (shamugiaofficial@gmail.com) lOMoARcPSD|17532581 3. (a) Thirty people were asked about the number of hours they exercise in a week and their answers were recorded and listed below. 2.0 6.0 7.5 8.5 10.5 13.0 4.0 6.5 7.5 8.5 10.5 14.0 4.5 6.5 8.0 9.0 11.0 17.0 5.0 7.0 8.0 9.0 11.5 18.0 5.5 7.0 8.5 10.0 12.0 21.0 i. Carefully construct, draw and label a histogram of these data. ii. Find the mean (given that the sum of the data is 277), the median and the modal group. iii. Comment on the data based on the shape of the histogram and the measures you have calculated. iv Name two other types of graphical displays that would be suitable to represent the data. (12 marks) (b) A researcher is interested in determining whether taking additional vitamin C helps prevent the common cold. A randomised experiment was conducted to address this question. The study randomly allocated 279 people to either a group where vitamin C supplements were given, or a group where a placebo pill was given. These people were monitored and the numbers of those who got or did not get a cold were recorded. The results are summarised below: Vitamin C Placebo Got a cold 17 31 Did not get a cold 122 109 i. Give a 95% confidence interval for the difference in the probabilities of getting a cold between the vitamin C and the placebo groups. ii. Carry out an appropriate hypothesis test at the 5% significance level to determine whether the probability of getting a cold is lower in the vitamin C group, compared to the probability in the placebo group. State the test hypotheses, and specify your test statistic and its distribution under the null hypothesis. Comment on your findings. iii. State any assumptions you made in part ii. iv. On the basis of the data alone, would you conclude that a vitamin C pill reduces the chances of getting a cold? Provide an explanation with your answer. (13 marks) Downloaded by Georgii Shamugia (shamugiaofficial@gmail.com) lOMoARcPSD|17532581 4. (a) A mental health study focused on 300 patients visiting three community mental health centres. The patients were classified into three groups according to the primary issue for which they were seen. The data are shown below. Type of Problem Social Adjustment Stress Related Centre 1 45 28 Centre 2 28 44 Centre 3 46 29 Total 119 101 Other 27 28 25 80 Total 100 100 100 300 i. Based on the data in the table, and without conducting a significance test, describe the differences in terms of the primary issue for which the patients were seen across the different centres. ii. Calculate the χ2 statistic and use it to test for independence, using a 5% significance level. What do you conclude? (13 marks) (b) i. You have been asked to design a nationwide survey in your country to find out about internet use by children less than 10 years old. Provide a probability sampling scheme and a sampling frame that you would like to use. Identify a potential source of selection bias that may occur and discuss how this issue could be addressed. ii. Describe what a longitudinal survey is. State two ways in which panel surveys differ from longitudinal surveys. (12 marks) END OF PAPER Downloaded by Georgii Shamugia (shamugiaofficial@gmail.com) lOMoARcPSD|17532581 ST104a Statistics 1 Examination Formula Sheet Expected value of a discrete random variable: µ = E(X) = N X pi x i i=1 The transformation formula: Z= X −µ σ Standard deviation of a discrete random variable: v uN uX √ 2 σ= σ =t pi (xi − µ)2 i=1 Finding Z for the sampling distribution of the sample mean: Z= X̄ − µ √ σ/ n Finding Z for the sampling distribution of the sample proportion: Confidence interval endpoints for a single mean (σ known): P −π Z=p π(1 − π)/n σ x̄ ± zα/2 × √ n Confidence interval endpoints for a single mean (σ unknown): Confidence interval endpoints for a single proportion: r p(1 − p) p ± zα/2 × n s x̄ ± tα/2, n−1 × √ n Sample size determination for a mean: (zα/2 )2 σ 2 n≥ e2 z test of hypothesis for a single mean (σ known): X̄ − µ0 √ Z= σ/ n Sample size proportion: determination n≥ for a (zα/2 )2 p(1 − p) e2 t test of hypothesis for a single mean (σ unknown): T = Downloaded by Georgii Shamugia (shamugiaofficial@gmail.com) X̄ − µ0 √ S/ n lOMoARcPSD|17532581 z test of hypothesis for a single proportion: P − π0 Z∼ =p π0 (1 − π0 )/n t test for the difference between two means (variances unknown): X̄1 − X̄2 − (µ1 − µ2 ) T = q Sp2 (1/n1 + 1/n2 ) Pooled variance estimator: Sp2 = (n1 − 1)S12 + (n2 − 1)S22 n1 + n2 − 2 z test for the difference between two means (variances known): Z= X̄1 − X̄2 − (µ1 − µ2 ) p σ12 /n1 + σ22 /n2 Confidence interval endpoints for the difference between two means: s 1 1 2 + x̄1 − x̄2 ± tα/2, n1 +n2 −2 × sp n1 n2 t test for the difference in means in paired samples: T = X̄d − µd √ Sd / n Confidence interval endpoints for the difference in means in paired samples: z test for the difference between two proportions: sd x̄d ± tα/2, n−1 × √ n P1 − P2 − (π1 − π2 ) Z=p P (1 − P )(1/n1 + 1/n2 ) Pooled proportion estimator: P = R1 + R 2 n1 + n2 χ2 statistic for test of association: r X c X (Oij − Eij )2 Eij i=1 j=1 Spearman rank correlation: 6 rs = 1 − n P Confidence interval endpoints for the difference between two proportions: s p1 (1 − p1 ) p2 (1 − p2 ) + p1 −p2 ±zα/2 × n1 n2 Sample correlation coefficient: r = s n P i=1 n P i=1 x2i − nx̄2 1) n P i=1 yi2 − nȳ 2 Simple linear regression line estimates: d2i i=1 n(n2 − xi yi − nx̄ȳ b= n P xi yi − nx̄ȳ i=1 n P i=1 x2i − nx̄2 a = ȳ − bx̄ Downloaded by Georgii Shamugia (shamugiaofficial@gmail.com)