Uploaded by Georgii Shamugia

Statistics 1 Exam Paper (University of London)

advertisement
lOMoARcPSD|17532581
St104a 2021 za paper
Statistics 1 (University of London)
Studocu is not sponsored or endorsed by any college or university
Downloaded by Georgii Shamugia (shamugiaofficial@gmail.com)
lOMoARcPSD|17532581
ST104A ZA
BSc DEGREES AND GRADUATE DIPLOMAS IN ECONOMICS, MANAGEMENT,
FINANCE AND THE SOCIAL SCIENCES, THE DIPLOMA IN ECONOMICS AND
SOCIAL SCIENCES AND THE CERTIFICATE IN EDUCATION IN SOCIAL SCIENCES
Summer 2021 Online Assessment Instructions
ST104A Statistics 1
Tuesday, 4 May 2021: 15:00 – 19:00 (BST)
The assessment will be an open-book take-home online assessment within a 4hour window. The requirements for this assessment remain the same as the closedbook exam, with an expected time/effort of 2 hours.
Candidates should answer THREE questions: all parts of Section A (50 marks in total)
and TWO questions from Section B (25 marks each). Candidates are strongly
advised to divide their time accordingly.
You should complete this paper using pen and paper. Please use BLACK INK only.
Handwritten work then needs to be scanned, converted to PDF and then uploaded to
the VLE as ONE individual file including the coversheet. Each scanned sheet should
have your candidate number written clearly in the header. Please do not write your
name anywhere on your submission.
You have until 19:00 (BST) on Tuesday, 4 May 2021 to upload your file into the VLE
submission portal. However, you are advised not to leave your submission to the last
minute.
Workings should be submitted for all questions requiring calculations. Any necessary
assumptions introduced in answering a question are to be stated.
A list of formulae and extracts from statistical tables are provided after the final question
of this paper.
You may use any calculator for any appropriate calculations, but you may not use any
computer software to obtain solutions. Credit will only be given if all workings are shown.
© University of London 2021
Page 1 of 9
UL21/0184
Downloaded by Georgii Shamugia (shamugiaofficial@gmail.com)
lOMoARcPSD|17532581
If you think there is any information missing or any error in any question, then you
should indicate this but proceed to answer the question stating any assumptions you
have made.
The assessment has been designed with a duration of 4 hours to provide a more
flexible window in which to complete the assessment and to appropriately test the
course learning outcomes. As an open-book exam, the expected amount of effort
required to complete all questions and upload your answers during this window is no
more than 2 hours. Organise your time well.
You are assured that there will be no benefit in you going beyond the expected 2 hours
of effort. Your assessment has been carefully designed to help you show what you
have learned in the hours allocated.
This is an open book assessment and as such you may have access to additional
materials including but not limited to subject guides and any recommended reading. But
the work you submit is expected to be 100% your own. Therefore, unless instructed
otherwise, you must not collaborate or confer with anyone during the assessment. The
University of London will carry out checks to ensure the academic integrity of your work.
Many students that break the University of London’s assessment regulations did not
intend to cheat but did not properly understand the University of London’s regulations
on referencing and plagiarism. The University of London considers all forms of
plagiarism, whether deliberate or otherwise, a very serious matter and can apply severe
penalties that might impact on your award.
The University of London 2020-21 Procedure for the consideration of Allegations of
Assessment Offences is available online at:
Assessment Offence Procedures - University of London
Page 2 of 9
UL21/0184
Downloaded by Georgii Shamugia (shamugiaofficial@gmail.com)
lOMoARcPSD|17532581
SECTION A
Answer all parts of question 1 (50 marks in total).
1. (a) Suppose that x1 = −3, x2 = 9, x3 = 16, and y1 = −2, y2 = 1, y3 = 0.5.
Calculate the following quantities:
!2
3
3
3
X
X
X
√
yi3
i.
xi
ii.
xi yi
iii. |x1 | +
.
y2
i=1
i=2 i
i=2
(6 marks)
(b) Classify each one of the following variables as either measurable
(continuous) or categorical. If a variable is categorical, further classify it as
either nominal or ordinal. Justify your answer. (No marks will be awarded
without a justification.)
i. Age brackets of 18–30, 31–50, 51–70, 70+.
ii. Passport number.
iii. A country’s inflation rate.
(6 marks)
(c) State whether the following statements are true or false, and provide a brief
explanation. (No marks will be awarded for a simple true/false answer.)
i. For a set of observations x1 , x2 , . . . , xn , with mean x̄, then:
n
X
i=1
(xi − x̄) > 0.
ii. For two independent events A and B such that P (A) > 0 and P (B) > 0,
then:
P (A ∪ B) < P (A) + P (B).
iii. For a random variable X, E(X 2 ) can be less than (E(X))2 .
iv. Rejecting a true null hypothesis is known as the power of a test.
v. A 4-by-2 contingency table which results in a χ2 test statistic value of
6.724 is statistically significant at the 5% significance level.
(10 marks)
(d) X is a normal random variable with a mean of µ = 5. If P (X < 1) = 0.20,
approximately what is the value of the variance, σ 2 ?
(5 marks)
Downloaded by Georgii Shamugia (shamugiaofficial@gmail.com)
lOMoARcPSD|17532581
(e) The probability distribution of a random variable X is given below.
X=x
P (X = x)
−2
k
−1
2k
0
4k
1
2k
2
k
i. Explain why k = 0.10.
(2 marks)
ii. Given that E(X) = 0, calculate the standard deviation of X to four decimal
places.
(3 marks)
iii. Is it possible to calculate E(1/X)? If yes, calculate its value. If no, explain
why.
(3 marks)
iv. Does X have a normal distribution? Briefly explain your answer.
(2 marks)
(f) Based on the central limit theorem, you are told that a 90% confidence interval
for a population proportion is (0.7077, 0.7723).
i. What was the sample proportion which resulted in this confidence interval?
(2 marks)
ii. What was the size of the sample used?
(4 marks)
(g) It is assumed that investors are equally split between those who prefer ‘growth’
stocks and those who prefer ‘value’ stocks. In a random sample of 200 investors,
105 agreed with the statement ‘Growth stocks are better than value stocks’.
i. Conduct a two-sided hypothesis test, at the 5% significance level, to test
whether in the population of investors there are equal preferences for
growth and value stocks. Show all steps of your calculation and use the
‘critical value’ approach to perform the test.
(5 marks)
ii. Calculate the p-value of the test statistic value calculated in part i.
(2 marks)
Downloaded by Georgii Shamugia (shamugiaofficial@gmail.com)
lOMoARcPSD|17532581
SECTION B
Answer two out of the three questions from this section (25 marks each).
2. The manager of a store selling shoes is looking into the association between daily
sales (in hundreds of $) in the store, y, and the number of customers who visited
the store in that day, x. For this reason, in 10 days selected at random the variables
x and y were recorded. They appear in the table below:
Days
# of customers (x)
Sales (y)
#1
90
11.2
#2
92
11.1
#3
50
6.8
#4
74
9.2
#5 #6
78
88
9.4 10.1
#7
87
9.4
#8
51
7.7
#9
53
8.2
#10
42
6.1
The summary statistics for these data are:
Sum of x data: 705 Sum of the squares of x data: 53,111
Sum of y data: 89.2
Sum of the squares of y data: 822
Sum of the products of x and y data: 6,573.3
(a)
i. Draw a scatter diagram of these data. Label the diagram carefully.
ii. Calculate the sample correlation coefficient. Interpret your findings.
iii. Calculate the least squares line of y on x and draw the line on the scatter
diagram.
iv. Suppose that you observe more data and when you draw the corresponding
scatter diagram a non-linear association is revealed. Discuss how this can
be interpreted in the context of the problem.
(13 marks)
(b) A study focused on the perception of job satisfaction that may vary between
women and men. For this reason, at random 15 women and 13 men took a job
satisfaction questionnaire that gave a score for each one of them (high values
of the score indicate higher job satisfaction). Summaries of these scores are
presented below.
Women
Men
Sample size
15
13
Sample mean
32.1
28.5
Sample variance
15.2
19.3
i. Use an appropriate hypothesis test to determine whether the mean job
satisfaction scores differ between women and men. Test at two appropriate
significance levels, stating clearly the hypotheses, the test statistic and its
distribution under the null hypothesis. Comment on your findings.
ii. State clearly any assumptions you made in part i.
iii. Is it possible that there is no difference between men and women in terms
of their job satisfaction? Discuss.
(12 marks)
Downloaded by Georgii Shamugia (shamugiaofficial@gmail.com)
lOMoARcPSD|17532581
3. (a) Thirty people were asked about the number of hours they exercise in a week
and their answers were recorded and listed below.
2.0
6.0
7.5
8.5
10.5
13.0
4.0
6.5
7.5
8.5
10.5
14.0
4.5
6.5
8.0
9.0
11.0
17.0
5.0
7.0
8.0
9.0
11.5
18.0
5.5
7.0
8.5
10.0
12.0
21.0
i. Carefully construct, draw and label a histogram of these data.
ii. Find the mean (given that the sum of the data is 277), the median and
the modal group.
iii. Comment on the data based on the shape of the histogram and the
measures you have calculated.
iv Name two other types of graphical displays that would be suitable to
represent the data.
(12 marks)
(b) A researcher is interested in determining whether taking additional vitamin C
helps prevent the common cold. A randomised experiment was conducted to
address this question. The study randomly allocated 279 people to either a
group where vitamin C supplements were given, or a group where a placebo
pill was given. These people were monitored and the numbers of those who
got or did not get a cold were recorded. The results are summarised below:
Vitamin C
Placebo
Got a cold
17
31
Did not get a cold
122
109
i. Give a 95% confidence interval for the difference in the probabilities of
getting a cold between the vitamin C and the placebo groups.
ii. Carry out an appropriate hypothesis test at the 5% significance level to
determine whether the probability of getting a cold is lower in the vitamin
C group, compared to the probability in the placebo group. State the test
hypotheses, and specify your test statistic and its distribution under the
null hypothesis. Comment on your findings.
iii. State any assumptions you made in part ii.
iv. On the basis of the data alone, would you conclude that a vitamin C pill
reduces the chances of getting a cold? Provide an explanation with your
answer.
(13 marks)
Downloaded by Georgii Shamugia (shamugiaofficial@gmail.com)
lOMoARcPSD|17532581
4. (a) A mental health study focused on 300 patients visiting three community mental
health centres. The patients were classified into three groups according to the
primary issue for which they were seen. The data are shown below.
Type of Problem
Social Adjustment Stress Related
Centre 1
45
28
Centre 2
28
44
Centre 3
46
29
Total
119
101
Other
27
28
25
80
Total
100
100
100
300
i. Based on the data in the table, and without conducting a significance test,
describe the differences in terms of the primary issue for which the patients
were seen across the different centres.
ii. Calculate the χ2 statistic and use it to test for independence, using a 5%
significance level. What do you conclude?
(13 marks)
(b)
i. You have been asked to design a nationwide survey in your country to
find out about internet use by children less than 10 years old. Provide
a probability sampling scheme and a sampling frame that you would like
to use. Identify a potential source of selection bias that may occur and
discuss how this issue could be addressed.
ii. Describe what a longitudinal survey is. State two ways in which panel
surveys differ from longitudinal surveys.
(12 marks)
END OF PAPER
Downloaded by Georgii Shamugia (shamugiaofficial@gmail.com)
lOMoARcPSD|17532581
ST104a Statistics 1
Examination Formula Sheet
Expected value of a discrete random
variable:
µ = E(X) =
N
X
pi x i
i=1
The transformation formula:
Z=
X −µ
σ
Standard deviation of a discrete random
variable:
v
uN
uX
√
2
σ= σ =t
pi (xi − µ)2
i=1
Finding Z for the sampling distribution
of the sample mean:
Z=
X̄ − µ
√
σ/ n
Finding Z for the sampling distribution
of the sample proportion:
Confidence interval endpoints for a
single mean (σ known):
P −π
Z=p
π(1 − π)/n
σ
x̄ ± zα/2 × √
n
Confidence interval endpoints for a
single mean (σ unknown):
Confidence interval endpoints for a
single proportion:
r
p(1 − p)
p ± zα/2 ×
n
s
x̄ ± tα/2, n−1 × √
n
Sample size determination for a mean:
(zα/2 )2 σ 2
n≥
e2
z test of hypothesis for a single mean (σ
known):
X̄ − µ0
√
Z=
σ/ n
Sample size
proportion:
determination
n≥
for
a
(zα/2 )2 p(1 − p)
e2
t test of hypothesis for a single mean (σ
unknown):
T =
Downloaded by Georgii Shamugia (shamugiaofficial@gmail.com)
X̄ − µ0
√
S/ n
lOMoARcPSD|17532581
z test of hypothesis for a single
proportion:
P − π0
Z∼
=p
π0 (1 − π0 )/n
t test for the difference between two means
(variances unknown):
X̄1 − X̄2 − (µ1 − µ2 )
T = q
Sp2 (1/n1 + 1/n2 )
Pooled variance estimator:
Sp2 =
(n1 − 1)S12 + (n2 − 1)S22
n1 + n2 − 2
z test for the difference between two means
(variances known):
Z=
X̄1 − X̄2 − (µ1 − µ2 )
p
σ12 /n1 + σ22 /n2
Confidence interval endpoints for the
difference between two means:
s 1
1
2
+
x̄1 − x̄2 ± tα/2, n1 +n2 −2 × sp
n1 n2
t test for the difference in means in
paired samples:
T =
X̄d − µd
√
Sd / n
Confidence interval endpoints for the
difference in means in paired samples:
z test for the difference between two
proportions:
sd
x̄d ± tα/2, n−1 × √
n
P1 − P2 − (π1 − π2 )
Z=p
P (1 − P )(1/n1 + 1/n2 )
Pooled proportion estimator:
P =
R1 + R 2
n1 + n2
χ2 statistic for test of association:
r X
c
X
(Oij − Eij )2
Eij
i=1 j=1
Spearman rank correlation:
6
rs = 1 −
n
P
Confidence interval endpoints for the
difference between two proportions:
s
p1 (1 − p1 ) p2 (1 − p2 )
+
p1 −p2 ±zα/2 ×
n1
n2
Sample correlation coefficient:
r = s
n
P
i=1
n
P
i=1
x2i − nx̄2
1)
n
P
i=1
yi2 − nȳ 2
Simple linear regression line estimates:
d2i
i=1
n(n2 −
xi yi − nx̄ȳ
b=
n
P
xi yi − nx̄ȳ
i=1
n
P
i=1
x2i − nx̄2
a = ȳ − bx̄
Downloaded by Georgii Shamugia (shamugiaofficial@gmail.com)
Download