Uploaded by Tina Nguyen

STAM4000 Lecture Week 5 T3 2021 in class solution and note

advertisement
1
STAM4000
Quantitative Methods
Week 5
Sampling distributions
https://www.google.com/search?q=normal+distribution+funny&tbm=isch&hl=en&chips=q:normal+distribution+funny,online_chips:central+limit+theorem&rlz=1C1CHBF_enAU841AU846&sa=X&ved=2ahUKEwix3OAjdPtAhVBYCsKHQKADx4Q4lYoBXoECAEQHw&biw=1353&bih=641#imgrc=KyS7tw1mVJHIZM
1
Kaplan Business School (KBS), Australia
1
2
COMMONWEALTH OF AUSTRALIA
Copyright Regulations 1969
WARNING
This material has been reproduced and communicated to you by or on behalf of Kaplan
Business School pursuant to Part VB of the Copyright Act 1968 (the Act).
2
The material in this communication may be subject to copyright under the Act. Any further
reproduction or communication of this material by you may be the subject of copyright
protection under the Act.
Do not remove this notice.
Kaplan Business School (KBS), Australia
2
3
#1
Recognise when to use a census or a
sample
#2
Examine sampling distributions of the
เดฅ , read as “X bar”
sample mean, ๐‘ฟ
Week 5
Sampling
distributions
Learning
Outcomes
#3
Kaplan Business School (KBS), Australia
Examine sampling distributions of the
เทก , read as “P hat”
sample proportion, ๐‘ท
or “P cap”
3
4
Why does this matter?
We need sampling
distributions for
statistical inference using one sample to infer
or draw conclusions on
the entire population.
https://lovestats.wordpress.com/dman/
Kaplan Business School (KBS), Australia
4
5
#1
Recognise when to use a census or a sample
This Photo by Unknown Author is licensed under CC BY-SA-NC
Kaplan Business School (KBS), Australia
5
6
#1
Recognise when to use a census or a sample
A census is the
process of
collecting
information on
items or
individuals in
the entire
population.
Sampling is the
process of only
collecting
information on a
subset of the
population.
Advantages of sampling versus a census:
•Can save time, money and resources.
•Does not destroy all the product, as it may do in a
research process, or quality control process.
•Is practical - accessing the entire population is
often impossible.
•In real life, calculating the parameters of
populations (population measurement) is
prohibitive because populations can be very large.
https://www.cartoonstock.com/directory/s/statistician.asp
What is best; census or sampling?
See what the Bureau of Statistics has to say about census and samples:
http://www.abs.gov.au/websitedbs/a3121120.nsf/home/statistical+language++census+and+sample
6
7
#1
Parameters and statistics
Descriptive measures calculated from
a population of data are called
population parameters.
Descriptive measures calculated from
a sample of data are called
sample statistics.
In the simplest of terms, statistical inference is when we use a sample
statistic to infer or draw conclusions about a population parameter.
Recall, there are two branches of statistics:
i)
Descriptive statistics: analysis done on a sample of data to describe just that
sample.
ii)
Inferential statistics: analysis done on a sample of data to infer or draw
conclusions on the entire population.
A number of factors determine which statistical technique should be used, but two of
these are especially important:
• Data type: The type of data being measured
• Problem objective: The purpose of the statistical inference
In many applications of statistical inference, we draw conclusions about a population
parameter by using a sample statistic.
7
8
#1
Sampling distributions
So far, we have seen a
continuous random variable,
X, that has N possible values
of
x1, x2, …, xN
The population parameter is
usually unknown and fixed.
We can take a sample of size
n, with values
x1, x2, …, xn
and use this sample to
calculate a statistic.
If we picture taking all
possible samples, of the
same size n, from the
population and calculate the
same statistic for each
sample, we create a
sampling distribution of the
statistic.
The sampling distribution of the statistic is the tool that tells us how close the statistic
is to the parameter. See video at https://www.youtube.com/watch?v=olK80ngCbXc
A sampling distribution tells us which outcomes we should expect for some sample
statistic.
Don’t confuse the distribution of a sample with the sampling distribution:
• with a sample from a population, you can find descriptive statistics to summarize
just that sample.
• with a sampling distribution, we are thinking of all the possible values that a
statistic can take, based on infinite sample of the same size n. The sampling
distribution is used to describe how the statistic varies.
8
9
เดฅ
#2 Examine sampling distributions of the sample mean, ๐‘ฟ
https://www.google.com/search?q=Then+a+miracle+occurs+cartoon&rlz=1C1CHBF_enAU841AU846&sxsrf=ALeKk01kJi6GAQpObYpi3LbLE1UsEOAKeg:1608153170232&tbm=isch&source=iu&ictx=1&fir=x9PvfewMxCIBgM%252CbhLiFj0qb6LfjM%252C_&vet=1&usg=AI4_kSlnSV3wm31F7GGpKZSyjtmp7p3Cg&sa=X&ved=2ahUKEwiVxtKztdPtAhWryjgGHYlqBLcQ9QF6BAgPEAE&biw=1366&bih=589#imgrc=1KUkZ0n_rsQW-M
Kaplan Business School (KBS), Australia
9
10
In Week 4, we learned about Z, the standard normal variable,
Z ~ ๐‘ต๐‘ถ๐‘น๐‘ด๐‘จ๐‘ณ ๐ŸŽ, ๐Ÿ
We also learned about a normally distributed random variable, X
where,
๐‘ฟ ~ ๐‘ต๐‘ถ๐‘น๐‘ด๐‘จ๐‘ณ (μ , σ)
μ = population mean
σ = population standard deviation
10
#2
Examine sampling distributions of the sample mean, ๐‘‹เดค
11
เดฅ, is a collection of all possible sample
The sampling distribution of the sample mean, X
means ๐‘ฅาง , …., ๐‘ฅาง for random samples taken from a population, each based on the same
sample size, n.
The sampling distribution of the statistic is the tool that tells us how close the statistic is
to the parameter.
เดฅ:
Two ways of creating a sampling distribution of the mean, ๐‘ฟ
เดฅ , then represent
1. Draw samples from the population, calculate the mean for each, ๐’™
(graph or table) the distribution.
2. Use the laws of probability and expected value (long run average) to derive the
distribution.
เดฅ
The Sampling Distribution of the sample mean, ๐‘ฟ
The sampling distribution of the sample mean is the tool that tells us how close a
sample mean, ๐‘ฅาง is to the population mean, μ. See video at
https://www.youtube.com/watch?v=olK80ngCbXc
11
12
Week 4:
With X, values were
๐‘ฅ1 , ๐‘ฅ2 , … , ๐‘ฅ๐‘
Now, Week 5,
เดฅ , values here are now,
we have ๐‘ฟ
เดฅ๐Ÿ , ๐’™
เดฅ๐Ÿ , ….etc.
๐’™
This is so we can have a normally distributed random variable and we can
use the corresponding statistical tables that require normality, e.g.: to use Z
tables.
12
12
13
#2
เดฅ
Properties of the sampling distribution of the mean, ๐‘ฟ
Recall, if we have a continuous random variable, X, then the,
• population mean of X is µ
• population standard deviation of X is σ
NOTE: stdev
= standard
deviation
เดฅ
Now, we have the sampling distribution of the sample mean, ๐‘ฟ
• population mean of ๐‘‹เดค is
๐๐‘ฟเดฅ = ๐
• population standard error of ๐‘‹เดค is
๐ˆ๐‘ฟเดฅ =
๐ˆ
this is smaller than stdev of X
๐’
Note: the population standard deviation of the sampling distribution of the
sample mean is usually called the “standard error of the mean”.
เดฅ
How about the SHAPE of ๐‘ฟ
? Lets do an illustration first.
Properties of the sampling distribution of the mean
The population mean of ๐‘‹เดค is
๐œ‡๐‘ฅาง = ๐œ‡.
The population mean of all the sample means, is the population mean, μ, of X.
The population standard deviation (or population standard error) of ๐‘‹เดค is ๐œŽ๐‘ฅาง =
๐œŽ
.
๐‘›
Note that the standard deviation of ๐‘‹เดค , decreases, as the sample size increases.
Why? As we increase n, we increase characteristics of the population into our
samples. This increases the information we have, which decreases
variation/dispersion/deviation.
Note: the population standard deviation of the mean is called the “standard error of
the mean”. This is not an error in terms of a mistake. This is a standard error in
terms of standard variability.
N = population
#2 Illustration - developing a sampling distribution size, in this
context
Let X be a random variable of ages of individuals (years).
Say, that a population of size N = 4, has the following values: 18, 20, 22, 24 in years
What does X look like? X is UNIFORM
This is a Uniform Distribution
18+20+22+24
๐œ‡=
4
Histogram of X
0.3
๐= 21 years of X
0.25
P(X)
0.2
Formulae are
from Week 2
0.15
0.1
๐œŽ=
0.05
0
18 − 21
2
+ … + 24 − 21
๐Ÿ’
2
๐ˆ = 2.326 years of X
X, Age (years)
A uniform distribution is a distribution where each value in the data set has the same
frequency. The distribution is considered as having no mode.
Let X be a random variable of ages of individuals (years).
Say, that a population of size N = 4, has the following values: 18, 20, 22, 24 in years
What does X look like?
Each value only occurs once, so they have a frequency of one and a relative
1
frequency of 4 = 0.25 or 25%.
As we have a population of data, N = population size,
• population mean, ๐œ‡ =
σ๐‘ฅ
๐‘
=
18+20+22+24
4
• population standard deviation, ๐œŽ =
= 21 years
σ(๐‘ฅ − ๐œ‡)2
(18 −21)2 +(20 −21)2 +(22 −21)2 +(24 −21)2
4
๐‘
=
20
4
=
=
5 = 2.236 years
14
15
Illustration continued …
#2
Say, that a population of size N = 4, has the following values: 18, 20, 22, 24 in years
• Now, consider all possible samples of size
n = sample size
n = 2.
• If sampling with replacement, there are 16 possible samples, EACH OF SIZE n = 2
• With our 16 samples, we can find the mean, เดฅ
๐’™ for each sample; we now have 16 ๐‘ฅาง ′ s
n=2
2nd Observation
1st Observation
18
20
22
24
18
18,18 so ๐‘ฅาง = 18
18,20 so ๐‘ฅาง = 19
18,22 so ๐‘ฅาง = 20
18,24 so ๐‘ฅาง = 21
20
20,18 so ๐‘ฅาง = 19
20,20 so ๐‘ฅาง = 20
20,22 so ๐‘ฅาง = 21
20,24 so ๐‘ฅาง = 22
22
22,18 so ๐‘ฅาง =20
22,20 so ๐‘ฅาง = 21
22,22 so ๐‘ฅาง = 22
22,24 so ๐‘ฅาง = 23
24
24,18 so ๐‘ฅาง = 21
24,20 so ๐‘ฅาง = 22
24,22 so ๐‘ฅาง = 23
24,24 so ๐‘ฅาง = 24
Copyright © 2013 Pearson Australia (a division of Pearson Australia Group Pty Ltd) – 9781442549272/Berenson/Business
Statistics /2e
If sampling with replacement, there are 16 possible samples.
Note:
•Sampling with replacement occurs when we sample (select) an individual/item
from the population, note its value, then return it to the population before
selecting the next item/individual.
•Sampling without replacement is when we select an item/individual from the
population, and do not return it to the population before selecting the next
item/individual. When sampling without replacement, we must be careful that
the population size is large enough, such that sampling will not distort the
proportions of characteristics/values in the population. See more on this later .
15
16
“sampling with replacement”, refers to the process of taking the first
sample from the population, recording the values, then replacing the
values, before taking the second sample from the population. This
process is then repeated multiple times.
16
16
Illustration continued …
#2
เดฅ
Sample Means Distribution, ๐‘ฟ
Histogram of sample mean ages
P(X)
๐Ÿ
๐Ÿ๐Ÿ”
0.25
= 0.0625
18
1
19
2
0.125
20
3
0.1875
21
4
0.25
22
3
0.1875
23
2
0.125
24
1
0.0625
Total
16
1
๐’‡๐’“๐’†๐’’๐’–๐’†๐’๐’„๐’š
frequency: ๐‘ป๐‘ถ๐‘ป๐‘จ๐‘ณ ๐’‡๐’“๐’†๐’’๐’–๐’†๐’๐’„๐’š
0.30
0.20
P(X)
Sample mean Frequency
17
A probability is like a long run relative
0.15
0.10
0.05
0.00
18
19
20
21
22
Sample mean (years)
23
24
เดฅ
๐‘ฟ
๐‘‹เดค is unimodal and symmetric, a normal distribution.
Previously, we noted that X had a uniform distribution.
Copyright © 2013 Pearson Australia (a division of Pearson Australia Group Pty Ltd) – 9781442549272/Berenson/Business Statistics /2e
Copyright © 2013 Pearson Australia (a division of Pearson Australia Group Pty Ltd) –
9781442549272/Berenson/Business Statistics /2e
17
18
Recall, to have a valid probability distribution:
• 0 ≤ P(X) ≤ 1
• ∑P(X) = 1
18
18
19
Illustration continued - comparing summary measures
#2
เดฅ
Sampling distribution of the mean, ๐‘ฟ
Population distribution, X
๐œ‡=
๐œŽ=
18+20+22+24
=
4
18 − 21
2
21 years
๐œ‡๐‘‹เดค =
+ … + 24 − 21
4
2
๐œŽ๐‘‹เดค =
=
= 2.236 years
21 years
2
σ(๐‘ฅาง −๐œ‡๐‘‹
เดฅ)
๐‘
18−21 2+ 19−21 2+ 20−21 2+ …+ 24−21
16
2
= 1.581 years
เดฅ
Histogram of sample mean ages, ๐‘ฟ
Histogram of ages, X
0.3
18+19 + 20+ …+24
=
16
0.30
P(X)
P(X)
0.2
0.1
0.20
0.10
0.00
18
0
19
20
21
22
Sample mean (years)
23
24
Copyright © 2013 Pearson Australia (a division of Pearson Australia Group Pty Ltd) – 9781442549272/Berenson/Business Statistics /2e
Sampling distribution of the mean, ๐‘‹เดค
Alternative calculation for the population standard error of the sampling distribution
of the sample mean:
For our population, N = 4
For our samples, n = 2
๐œŽ๐‘‹เดค
=
=
=
๐‘๐‘œ๐‘๐‘ข๐‘™๐‘Ž๐‘ก๐‘–๐‘œ๐‘› ๐‘ ๐‘ก๐‘Ž๐‘›๐‘‘๐‘Ž๐‘Ÿ๐‘‘ ๐‘‘๐‘’๐‘ฃ๐‘–๐‘Ž๐‘ก๐‘–๐‘œ๐‘›
๐‘›
๐œŽ
๐‘›
σ(๐‘ฅ−๐œ‡)2
๐‘
เต˜
๐‘›
18 − 21
+ 19 − 21
2
+ 20 − 21
4
=
=
2
2.236
1.414
= 1.581 years
2
+ 20 − 21
2
เต™
2
20
#2
Central Limit Theorem
The fact that the histogram of sample means on the previous slides appear to be bellshaped (Normal) is a consequence of the Central Limit Theorem.
Hence, for sufficiently large sample sizes (approximately n ≥ 30), the sample distribution is
approximately normal, even if the population distribution, X, is non-normal. e.g.
0.2
P(X)
Distribution of X
0.2
0.16
0.12
0.08
0.04
0
Distribution of ๐‘‹เดค
0.1
1
2
3
4
5
6 X
0
Uniform distribution
1
2
3
4
5
6
๐‘‹เดค
Normal distribution
เดฅ:
More on the shape of ๐‘ฟ
1. Sampling from Normal populations: If X ~ Normal, then ๐‘‹เดค ~ Normal, this is NOT
the Central Limit Theorem
2. Sampling from Non-Normal Populations: If X is NOT Normal BUT n ≥ 30 then, by
the CENTRAL LIMIT THEOREM, ๐‘‹เดค ~ Normal
20
21
#2 The Central Limit Theorem continued …
เดฅ approaches normality as the sample size, n
The Central Limit Theorem: when n ≥ 30, ๐‘ฟ
increases, regardless of the shape of the population distribution, X.
Normal
Uniform
Skewed
Bimodal
Population
n=2
n = 30
๐‘ฟ
๐‘ฟ
๐‘ฟ
๐‘ฟ
เดฅ
๐‘ฟ
เดฅ
๐‘ฟ
เดฅ
๐‘ฟ
เดฅ
๐‘ฟ
เดฅ
๐‘ฟ
เดฅ
๐‘ฟ
เดฅ
๐‘ฟ
เดฅ
๐‘ฟ
As n increases,
เดฅ distribution
the ๐‘ฟ
approaches
normality.
Usually at
n = 30,
เดฅ ~ ๐‘ต๐’๐’“๐’Ž๐’‚๐’
๐‘ฟ
The Central Limit Theorem
The sampling distribution of any mean approaches a normal distribution, as the sample size, n, grows.
This is true regardless of the shape of the population distribution, X
However, if the population distribution is very skewed, it may take a sample size of dozens or even
hundreds of observations for the Normal model to work well.
The fact that the histogram of sample means on the previous slide appears to be bell-shaped (Normal)
is a consequence of the Central Limit Theorem.
Hence, for sufficiently large sample sizes (approximately n ≥ 30), the sampling distribution is
approximately normal, even if the population distribution is non-normal.
เดฅ as well as the
The Central Limit Theorem applies to the sampling distribution of the sample mean, ๐‘‹,
เท 
sampling distribution of the sample proportion, ๐‘ƒ.
Graphic:https://www.google.com/search?q=central+limit+theorem+funny&tbm=isch&hl=en&chips=q:statistics+t
heorem+explained+central+limit+theorem+funny,online_chips:statistics,online_chips:theorem+explained,online
_chips:sampling+distribution&rlz=1C1CHBF_enAU841AU846&sa=X&ved=2ahUKEwj84-z8v7uAhUt3nMBHSwdDtUQ4lYoBHoECAEQIg&biw=1013&bih=470#imgrc=E0SnD9IFQT4mcM&imgdii=mBZMrniZr
qsDiM
22
As n increases, our value of population standard error (or standard
เดฅ,
deviation) of ๐‘ฟ
๐ˆ
๐ˆ๐‘ฟเดฅ = ,
๐’
gets smaller and smaller.
As n increases, we capture more population characteristics INTO our
sample, so our sample has LESS sampling error (variability)
22
#2
Conditions to check before using ๐‘‹เดค
23
10% condition is to ensure that the proportions of characteristics
in the population are not DISTORTED or are NOT changed by
sampling WITHOUT replacement.
23
If random, they should be
INDEPENDENT, and hopefully are
REPRESENTATIVE of the population.
https://pixabay.com/photos/bulldog-cute-easter-animal-dog-2952049
เดฅ:
Conditions to check before using ๐‘ฟ
Random Sample Condition:
•
The data values must be sampled randomly.
•
This is to try and ensure that the samples are independent and representative of the population.
10% Condition:
•
If sampling without replacement, the sample size, n, should be no more than 10% of the population size.
•
We can also satisfy this condition by checking whether the population size is at least n × 10.
•
This is to ensure that if we are sampling without returning items/individual to the population before sampling the next
item/individual, the characteristics in the population are not affected by the non-replacement and we can assume
independence.
Normal or Large Enough Sample Condition:
•
If the population is normal, then a small n is ok.
•
If the population is non-normal or unknown, we need n ≥ 30 to apply the Central Limit Theorem.
o Note if the population is very skewed, we may need a sample size greater than 30, for our sampling
distribution to be normal by the Central Limit Theorem.
If any of the conditions are not satisfied, it should be noted and stated to proceed with caution.
24
10% condition:
“sampling without replacement”:
process of taking the 1st sample from the population, and NOT returning
the sampled values before taking a 2nd sample from the population. The
repeating the process many times.
24
24
25
#2
Example
Suppose that the population of coffee drinkers in a city spend on average of μ = $50 per month
on coffee with a standard deviation of σ = $6. Assume that the amount spent on coffee, X, is
normally distributed. If a random sample of n = 25 coffee drinkers is chosen from this population, answer
the following:
เดฅ.
a) Check the conditions for ๐‘ฟ
https://www.publicdomainpictures.net/pictures/30000/velka/co
ffee-cup-1350307722Zz7.jpg
•Random Sample Condition: We are told the sample was randomly chosen.
•10% Condition: If sampling without replacement, 25 coffee drinkers is no more than 10% of the
population of coffee drinkers. Or, we could say, the population size of coffee drinkers is at least 25 ×
10 or 250. As we have a city, we can assume this to be true.
•Normal or Large Enough Sample Condition: told to assume amount spent on coffee, X, is normally
distributed, which tells us that เดฅ
X ~ N. (Note: this is NOT the Central Limit Theorem)
As the conditions are satisfied, we can use the Z tables.
เดฅ?
b) What is the mean of the sampling distribution of the mean, ๐‘ฟ
๐œ‡๐‘‹เดค = ๐œ‡ = $50
เดฅ?
c) What is the standard error of of the sampling distribution of the mean, ๐‘ฟ
๐œŽ๐‘‹เดค =
๐œŽ
๐‘›
=
6
=
25
$1.2
25
25
#2
26
เดฅ
Probabilities and ๐‘ฟ
เดฅ
We can take samples and ask probability type questions about the mean, ๐‘ฟ
in the same way we solved problems related to a single random variable, X.
The new Z formula is
๐’=
เดฅ
๐’™−๐
๐ˆ
เต— ๐’
or we can write this as
Z=
าง ๐œ‡๐‘‹
๐‘ฅ−
เดฅ
๐œŽ๐‘‹
เดฅ
Such that,
๐‘ฅาง = sample mean
μ = population mean (๐‘ ๐‘–๐‘›๐‘๐‘’ ๐œ‡๐‘‹เดค = ๐œ‡)
σ = population standard deviation of X
n = sample size
๐œŽ๐‘‹เดค = population standard error (standard deviation) of ๐‘‹เดค
26
Note: When the standard deviation of the sampling distribution of a statistic, in this
case, ๐‘ฅ,าง is estimated from data, the corresponding statistic is called a standard error
(SE).
26
27
General structure of a
• Z value,
• Z score,
• standardized score:
๐’=
๐’—๐’‚๐’๐’–๐’† ๐’๐’‡ ๐’—๐’‚๐’“๐’Š๐’‚๐’ƒ๐’๐’† − ๐’„๐’†๐’๐’•๐’“๐’†
๐’”๐’‘๐’“๐’†๐’‚๐’…
Last week, in Week 4, we transformed an X value to a Z value using the formula:
๐’™ − ๐
๐’=
๐ˆ
เดฅ into a Z using:
Now, in Week 5, we transform an ๐‘ฟ
เดฅ − ๐
๐’™
๐’= ๐ˆ
เต— ๐’
27
27
28
เดฅ
#2 4 Steps to find probabilities about ๐‘ฟ
Step 1: Check relevant conditions, unless otherwise stated
เดฅ to Z , using
Step 2: Convert ๐‘ฟ
๐’=
เดฅ
๐’™−๐
๐ˆ
เต— ๐’
Step 3: Sketch a curve(s) for the area (probability)
Step 4: Find the area using Z tables
This Photo by Unknown Author is licensed under CC BYSA
28
28
29
#2
Example
Suppose that the population of coffee drinkers spend on average μ =
$50 per month on coffee with a standard deviation of σ = $6. Assume
that the amount spent on coffee is normally distributed and the
conditions are satisfied.
https://www.publicdomainpictures.net/pictures/30000/velka/co
ffee-cup-1350307722Zz7.jpg
a) Suppose a random sample of n = 25 coffee drinkers is
chosen from this population. What is the
เดฅ on
probability that the average monthly spend, ๐‘ฟ,
coffee is less than $47?
b) Suppose a random sample of 25 coffee drinkers is
chosen from this population. What is the
เดฅ , on
probability that the average monthly spend, ๐‘ฟ
coffee is between $50 and $53?
https://www.google.com/search?q=coffee+comic+joke&tbm=isch&chips=q:coffee+comic+joke,online_chips:caffeine&rlz=1C1CHBF_enAU841AU846&hl=en&sa=X&ve
d=2ahUKEwi2-oCs6dPtAhVJTSsKHRMTCDkQ4lYoDHoECAEQJg&biw=1351&bih=574#imgrc=CfmCvX_v_0gK4M
29
30
#2
Example solution
How to type this for ONLINE ASSESSMENT:
P(X bar < 47)
เดฅ < 47)
a) Want P(๐‘ฟ
Step 1: Check conditions Told conditions are satisfied
Step 2: Find the z value
z=
z=
z=
https://www.publicdomainpictures.net/pictures/30000/velka/co
ffee-cup-1350307722Zz7.jpg
Step 3: Sketch a curve(s)
าง ๐œ‡
๐‘ฅ−
๐œŽ
๐‘›
47−50
6
25
47
๐œ‡ = 50
๐‘‹เดค
−3
1.2
z = − 2.50
Step 4: Use the Z tables
0.0062
P(๐‘‹เดค < 47)
= P(Z < − 2.50)
= 0.0062
− 2.50 ๐œ‡ = 0
Z
a) For x = 47, Z = −2.5, so P (Z< −2.5) = 0.0062
b) For x = 50, Z = 0, and for x = 53, Z = 2.5, so P( 50 < X-bar <53) = P( 0 < Z < 2.50) = P(Z
< 2.50) – P(Z < 0) = 0.9938 – 0.5 = 0.4938
30
31
If asked to “SHOW WORKINGS” with ONLINE ASSESSMENT:
• Substitute into the formula
• Give the Z value
• Give probability – be careful of LAYOUT
a) P(X bar < 47)
Z = (47 – 50)/(6/sqrt(25))
Z = -2.50
P(Z < -2.50) = 0.0062
31
31
32
Example solution continued
#2
เดฅ < 53). Step 1: Check conditions Told satisfied
b) Want P(50 < ๐‘ฟ
Step 2: Find the z values
z=
z=
าง ๐œ‡
๐‘ฅ−
๐œŽ
๐‘›
50−50
z=0
6
25
z=
z=
https://www.publicdomainpictures.net/pictures/30000/velka/co
ffee-cup-1350307722Zz7.jpg
Step 3: Sketch a curve
าง ๐œ‡
๐‘ฅ−
๐œŽ
๐‘›
0.9938
0.4938
53−50
6
25
WANT
0.5
z = 2.5
Step 4: Use the Z tables
P(50 < ๐‘‹เดค < 53) = P(0 < Z < 2.5)
= P( Z < 2.5) − P(Z < 0)
= 0.9938 − 0.5
= 0.4938
50
53
๐‘‹เดค
0
2.5
Z
32
a) Z = -2.5, so P (Z< -2.5) = 0.0062
b) For 50, Z = 0, and for 53, Z = 2.5, so P( 50 < X-bar <53) = P( 0 < Z < 2.50) = P(Z <
2.50) – P(Z < 0) = 0.9938 – 0.5 = 0.4938
32
33
#2
Exercise
A petrol station is open, an average, of
μ = 100 hours per week with a standard
deviation of σ = 12 hours per week.
The opening hours are not normally
distributed. A random sample of n = 36 petrol
stations is taken.
a) Check conditions.
b) What is the probability that the mean of
this sample is less than 105 hours?
c) What is the probability that the mean
This Photo by Unknown Author is licensed under CC BY
of this sample is above 102.2 hours per week?
33
34
#2
Exercise solution
เดค
a) Check conditions for ๐‘‹.
•Random Sample Condition: We are told the sample was
randomly taken.
This Photo by Unknown Author is licensed under CC BY
•10% Condition: If sampling without replacement, we can
say that 36 petrol stations is no more than 10% of all
petrol stations. Or, we can say that we need to have
more than 360 petrol stations in the population.
Assume this is true.
•Normal or Large Enough Sample Condition: told the
opening hours are not normally distributed. However, as
เดฅ ~ N.
n = 36 > 30, by the Central Limit Theorem, ๐—
As the conditions are satisfied, we can use Z tables to find
probabilities.
https://pixabay.com/photos/bulldog-cute-easter-animal-dog-2952049
34
35
Exercise solution
#2
b) Want P(๐‘‹เดค < 105)
Conditions checked earlier.
z=
z=
This Photo by Unknown Author is licensed under CC BY
าง ๐œ‡
๐‘ฅ−
๐œŽ
๐‘›
105 −100
12
36
z = 2.5
0.9938
100 105
0 2.5
๐‘‹เดค
Z
P(๐‘‹เดค < 105) = P( Z < 2.5)
= 0.9938
35
36
#2
Exercise solution continued
c) Want P(๐‘‹เดค > 102.2)
Conditions checked earlier.
z=
z=
าง ๐œ‡
๐‘ฅ−
๐œŽ
๐‘›
102.2−100
12
36
z = 1.10
This Photo by Unknown Author is licensed under CC BY
100% or 1
0.8643
We do NOT want
0.1357
WANT
100 102.2
0 1.1
๐‘‹เดค
Z
P(๐‘‹เดค > 102.2) = P( Z > 1.10)
= 1 − P(Z < 1.10)
= 1 − 0.8643
= 0.1357
36
36
#2
37
เดฅ
Reverse normal problems with ๐‘ฟ
เดฅ are like those with X.
Reverse normal problems with ๐‘ฟ
Rearrange the correct Z formula, making the unknown variable the subject of the
equation:
๐’›=
E.g., If solving for
เดฅ− ๐
๐’™
๐ˆ
๐’
เดฅ rearrange this Z formula to get
๐’™
เดฅ= ๐ +๐’›
๐’™
Remember the units for ๐‘‹เดค are the same as the units for X.
37
๐ˆ
๐’
38
#2
Example
Suppose that a population of coffee drinkers spend on average μ = $50 per
month on coffee with a standard deviation of σ = $6. Assume that the
amount spent on coffee is normally distributed. If a random sample of n = 25
coffee drinkers is chosen from this population, what is the average dollar
https://www.publicdomainpictures.net/pictures/30000/velka/co
ffee-cup-1350307722Zz7.jpg
เดฅ, that 10% or less of coffee drinkers spend on coffee per month?
value , ๐’™
๐‘ฅาง = ๐œ‡ + ๐‘
The conditions were checked earlier.
10% =0.10
≈ 0.1003
๐œŽ
๐‘›
๐‘ฅาง = 50 − 1.28
6
25
๐‘ฅาง = 50 – 1.28(1.2)
เดฅ
๐’™
−1.28
50
0
๐‘‹เดค
Z
เดฅ = $48.46
๐’™
Using the Z tables “backwards”, we try to find the probability of 0.10 in the body of
the first set of Z tables - those with the left tail shaded, as we are interested in the
left tail of the curve.
We find the closest value to 0.10 in the table is 0.1003.
From the probability of 0.1003, we work to the border of the Z table and find the
corresponding Z value is −1.28.
39
Steps for Reverse Normal:
i) Sketch a curve, place the know values.
ii) Use the Z tables in REVERSE (inside out), look for a
probability in the BODY of the Z tables, then work to
the border for the Z value, (respecting the sign + or −).
iii) Substitute into relevant formula
เดฅ
iv) Solve for ๐’™
39
39
40
#2
Exercise
A petrol station is open, an average, of
μ = 100 hours per week with a standard
deviation of σ = 12 hours per week. The
opening hours are not normally
distributed. A random sample of n = 36
petrol stations is taken. Assume the
conditions are satisfied.
What is the minimum average number
of hours a petrol station is opened for
the 0.6% of longest opening hours?
This Photo by Unknown Author is licensed under CC BY
Petrol station now missing something …
http://72.26.108.11/humor/stories/strange_petrol_pumps.htm
40
40
41
#2
Exercise solution What is the minimum
average number of hours a
petrol station is opened for
the 0.6% of longest
opening hours?
๐‘ฅาง − ๐œ‡
๐‘= ๐œŽ
เต— ๐‘›
2.51 =
This Photo by Unknown Author is licensed under CC BY
๐“๐จ๐ญ๐š๐ฅ ๐ฉ๐ซ๐จ๐›๐š๐›๐ข๐ฅ๐ข๐ญ๐ฒ ๐ฎ๐ง๐๐ž๐ซ ๐œ๐ฎ๐ซ๐ฏ๐ž = ๐Ÿ
๐‘ฅาง −100
12
เต— 36
1 − 0.0060
= 0.9940
๐‘ฅาง = 105.02
TOLD
0.0060 = 0.6%
hours
100
0
เดฅ
๐’™
2.51
๐‘‹เดฅ
Z
41
Here, we have 0.06 in the right tail of the Z curve.
This is equivalent to an area of 1 – 0.06 = 0.994 area in the left side of the Z-curve.
Using the Z tables “backwards”, we try to find the probability of 0.94 in the body of
the second set of Z tables - those with the large left shaded area, as we are
interested in the area to the left of a positive Z value.
We find the exact probability of 0.9940 in the body of the Z table.
From the probability of 0.9940, we work to the border of the Z table and find the
corresponding Z value is 2.51.
41
42
เดฅ that has an area of 0.06% in the
Want the ๐’™
RIGHT TAIL.
“longest opening hours” are in the RIGHT TAIL.
42
42
43
เทก
#3 Examine sampling distributions of the sample proportion, ๐‘ท
This Photo by Unknown Author is licensed under CC BY-SA-NC
Kaplan Business School (KBS), Australia
43
Examine the sampling distribution of the sample
#3
เทก
proportion, ๐‘ท
44
For a categorical variable, we
If the objective is to describe a single population for a categorical
can find the proportion of
variable,
the population parameter is the proportion, p, of times
times a specific characteristic
that
a
specific
characteristic of interest (success) occurs.
of interest, occurs.
Population:
p = population proportion of success (of interest)
q = population proportion of failure (not of interest), where q = 1 − p
Sample:
เท = sample proportion of success = p hat = p cap
๐’‘
เท = sample proportion of failure = q hat = q cap where ๐‘žเทœ = 1 − ๐‘ฦธ
๐’’
Note:
p is estimated by ๐‘ฦธ
and
๐‘ฅ
๐‘›
๐‘ฦธ = =
the count of successes in a sample
sample size
เทก
Examine the sampling distribution of the sample proportion, ๐‘ท
We have a categorical variable which has different characteristics. Our data is counts.
We are interested in a specific characteristic, that we will label as “success”.
All other characteristics are labelled as “failures”.
We count all successes in the population.
The population parameter of interest is labelled as p, the proportion of success in the
population.
The corresponding sample statistic that estimates p, is ๐‘,
เทœ which may be read as “p
hat” or “p cap”.
45
เทก
#3 Sampling distribution of the proportion, ๐‘ท
Think about the true proportion,
p, and the proportion we might
expect to get in a random
sample, ๐‘ฦธ .
From sample to sample,
๐‘ฦธ will vary.
Imagine, repeated, independent
samples (each the same size, n),
finding ๐‘ฦธ for each sample and
building the distribution of the
sample proportion, ๐‘ƒเท  .
Sampling Distribution of the
เทก
proportion, ๐‘ท
P(เท
๐’‘)
0.3
0.2
0.1
0
0
0.2
0.4
0.6
0.8
1
Sampling Distribution of the Proportion
Note that the sampling distribution of the sample proportion is based on a categorical variable.
Whereas, the sampling distribution of the sample mean is based on a quantitative variable.
Proportions can only lie between −1 and 1 (or −100% and 100%) and are units free.
Whereas, means can lie between −∞ ๐‘Ž๐‘›๐‘‘ ∞, with the same units as the original population, X.
Note: Technically, here we are looking at the “sampling distribution of the sample proportion”, but
sometimes, for ease, for we refer to this as just the “sampling distribution of the proportion”.
เทก
๐‘ท
46
เทก
#3 Properties of the sampling distribution of the proportion, ๐‘ท
and the corresponding Z formula
เทก for a categorical variable:
The sampling distribution of the sample proportion ๐‘ท
เทก is
• population (true) mean of ๐‘ท
๐œ‡๐‘ƒเท  = p
เทก is
• population standard deviation of ๐‘ท
๐œŽ๐‘ƒเท  =
๐‘๐‘ž
๐‘›
เทก
= standard error of ๐‘ท
• We standardize ๐‘ฦธ to a Z value with the following formula
๐‘=
๐‘ฦธ − ๐‘
๐‘๐‘ž
๐‘›
เทก for a categorical variable:
The sampling distribution of the sample proportion ๐‘ท
เทก is
population (true) mean of ๐‘ท
๐œ‡๐‘ƒเท  = p
เทก is
population standard deviation of ๐‘ท
๐œŽ๐‘ƒเท  =
๐‘๐‘ž
๐‘›
We standardize ๐‘เทœ to a Z value with the following formula
๐‘เทœ − ๐‘
๐‘=
๐‘๐‘ž
๐‘›
This can be re-written as ๐‘ =
๐‘เทœ −๐‘
๐‘(1−๐‘)
๐‘›
Note:
• In the denominator of the Z formula, we use p, not ๐‘.
เทœ Why? Well, p is the population proportion and is usually fixed
and reliable, whereas ๐‘เทœ is the sample proportion and will vary from sample to sample.
•
When the standard deviation of the sampling distribution of a statistic( in this class, either ๐‘ฅาง ๐‘œ๐‘Ÿ ๐‘,
เทœ is estimated from
data, the corresponding statistic is called a standard error (SE).
•
With proportions, we work with the decimal form, not the percentage.
•
เทก curve.
We have the proportion value on the horizontal axis of the ๐‘ท
•
เทก curve.
We have areas or probabilities under the ๐‘ท
•
We usually work with at least three decimal places when calculating with proportions.
46
47
เทก
#3 4 Steps to find probabilities about ๐‘ท
Step 1: Check relevant conditions, unless
otherwise stated
Step 2: Convert ๐‘ƒเท  to Z , using
๐‘=
๐‘ฦธ − ๐‘
๐‘๐‘ž
๐‘›
Step 3: Sketch a curve for the area (probability)
Step 4: Find the area using Z tables
This Photo by Unknown Author is licensed under CC BYSA
47
47
FINM4000 Finance
48
#3 Conditions for the sampling distribution of the proportion
This Photo by Unknown Author is licensed under CC BY-NC-ND
Conditions for the sampling distribution of the proportion
Randomization Condition: The sample is random and representative, (without bias).
10% Condition: If sampling without replacement, the sample size, n, must be no larger than
10% of the population. This is to ensure that if we are sampling without returning
items/individual to the population before sampling the next item/individual, the proportions of
characteristics in the population are not affected by the non-replacement.
Success/Failure Condition: The sample size must be big enough so that both the number of
“successes,” np, and the number of “failures,” nq, are expected to be at least 10 (i.e., greater
than 10). This is related to the Central Limit Theorem, in that the sample size must be large
enough to rely on the Normal distribution.
Note: Only when p and q are unknown, should we use ๐‘เทœ and ๐‘žเทœ in the Success/Failure Condition.
p = population proportion of “success” (of interest)
q = population proportion of “failure” = 1 - p
pเทœ = sample proportion of “success” (of interest)
qเทœ = sample proportion of “failure” = 1 − pเทœ
n = sample size
Note: with proportions, we work with the decimal form, not the percentage.
If any of the conditions are not satisfied, it should be noted and stated to proceed with caution.
Kaplan Business School (KBS), Australia
48
49
Success/failure condition:
Check by SUBSITUTING relevant values from the question INTO
the condition:
• np ≥ 10
• nq ≥ 10
49
50
#3 Example
https://www.nbcnews.com/politics/meet-the-press/did-biden-win-little-or-lot-answer-yes-n1251845
The true proportion (population proportion)of U.S. voters who supported Joe
Biden in the 2020 U.S. presidential election was p = 51.3%. If a sample of n = 1000
U.S. voters were selected, what is the probability that more than 500 of those
sampled were in favour of Joe Biden? Comment on your solution.
50
51
With proportions, we may be give a percentage value
or a fractional value out of 1.
In the Conditions AND in the Z formula, we do NOT
use the percentage form.
If we are given a “percentage” we must convert this
to a fractional value out of 1, before checking
conditions and using the Z formula for proportions.
E.g. In this question, p = proportion of voters in the
U.S.A. population who voted for Joe Biden.
Given p = 51.3% = 0.513, USE 0.513 in the conditions
and Z formula.
With proportions,
please use the entire
window of your
calculator, best to NOT
round numbers,
otherwise the
rounding errors get
compounded.
๐Ÿ“๐ŸŽ๐ŸŽ
Want (P hat > ๐Ÿ๐ŸŽ๐ŸŽ๐ŸŽ)
51
51
52
#3
Example solution
a) p = population proportion in favour of Joe Biden = 0.513
q = population proportion NOT in favour of Joe Biden = 1 − p = 1 − 0.513 = 0.487
n = 1000
500
๐‘ฦธ =
= 0.5 = p hat
https://www.nbcnews.com/politics/meet-the-press/did-biden-win-little-or-lot-answer-yes-n1251845
1000
Step 1: Check conditions Random sample condition: not told whether the sample was
random. Assume random and proceed with caution.
10% condition: If sampling without replacement, we know that
1000 is far less than 10% of the population of U.S. voters.
Success/Failure Condition: np = 1000(0.513) = 513 > 10
and nq = 1000(0.487) = 487 > 10
เทก is normal.
Two from three conditions are satisfied, proceed with caution and conclude ๐‘ท
We can use the Z tables to find probabilities.
This Photo by Unknown Author is licensed under CC BY-NC-ND
Note:
• With proportions, we work with the decimal form, not the percentage.
• We usually work with at least three decimal places when calculating with
proportions.
52
53
#3
Example solution
Step 2: Find the z value
๐‘=
๐‘=
Step 3: Sketch a curve(s)
Total area = 100% = 1
๐‘ฦธ − ๐‘
๐‘๐‘ž
๐‘›
0.5 − 0.513
0.513(0.487)
1000
๐’ = −๐ŸŽ. ๐Ÿ–๐Ÿ๐Ÿ
0.0261
Do NOT want
๐‘ฦธ
= 0.5
https://www.nbcnews.com/politics/meet-the-press/did-biden-winlittle-or-lot-answer-yes-n1251845
Want
1 − 0.0261 = 0.7939
p
= 0.513
๐‘ƒเท 
Step 4: Use the Z tables
0.7939
Do NOT want
P(๐‘ƒเท  > 0.50) = P(Z > −0.822)
0.2061
= 1 − P(Z < −0.822)
= 1 − 0.2061
−0.82
0
Z
= 0.7939
Interpretation: There is a 79.39% chance or 0.7939 probability, that more than half of those
U.S. voters sampled, were in favour of Joe Biden in the 2020 US presidential election.
Note:
เทก is the sampling distribution of the proportion of U.S. voters who were in favour of
๐‘ท
Joe Biden
p = population proportion of U.S. voters in favour of Joe Biden
= 51.3%
= 0.513, the form we use in our calculation with proportions.
เท = sample proportion of U.S. voters in favour of Joe Biden
๐’‘
= 50%
= 0.50, the form we use in our calculation with proportions.
Note: with proportions, be careful.
เทก curve.
• We have the proportion value on the horizontal axis of the ๐‘ท
เทก
• We have areas or probabilities under the ๐‘ท curve.
• We usually work with at least three decimal places when calculating with
proportions.
53
54
For Online Assessment:
Z = (0.5 – 0.513)/sqrt[(0.13*0.487)/100]
Z = -0.822
P(Z > -0.82) = 1 – 0.0261 = 0.7939
54
54
55
#3
Exercise
Based on past experience, a bank believes that p = 7%
of customers who receive loans will not make
payments on time. This is termed “defaulting” on a
loan. The bank has recently approved n = 200 loans.
a) What are the mean and standard deviation of the
proportion of customers in this group who may
default on their loan?
b) Check the conditions.
๐Ÿ๐ŸŽ
WANT P(P hat >
)
๐Ÿ๐ŸŽ๐ŸŽ
c) What is the probability, that over 20 of the that is, we want P(P hat > 0.10
https://www.glasbergen.com/ngg_tag/real-estate-cartoon-comics/
customers sampled, will default on their loan?
Interpret.
Note:
เทก is the sampling distribution of the sample proportion of bank customers who default on their loan.
๐‘ท
p = population proportion of bank customers who default
= 7%
= 0.07, the form we use in our calculation with proportions.
เท = sample proportion of bank customers who default on their loan
๐’‘
= 20 customers defaulting from 200 customers in the sample
20
=
200
= 0.10, the form we use in our calculation with proportions.
This question is based on Sharpe et al. “ Business Statistics” Pearson International Edition 2010,
Chapter 9, Exercises 35, page 252
55
#3
Exercise solution
56
“Success” in this question is the proportion
of interest, those that default.
เทก is ๐œ‡๐‘ƒเท  = p = 7% = 0.07
a) population mean of ๐‘ท
เทก is ๐ˆ๐‘ทเทก =
population standard deviation of ๐‘ท
https://www.glasbergen.com/ngg_tag/real-estate-cartoon-comics/
๐‘๐‘ž
๐‘›
=
0.07(1−0.07)
=
200
0.018
b)
Random sample condition: not told that this is a random sample. We have to
assume that these new customers are a random sample from the same population
on which the default percentage is based.
10% condition: If sampling without replacement, this bank has to have approved
at least 2000 loans in the past. Assume true.
Success/failure condition: np = 200(0.07) = 14 > 10 and nq = 200(0.93) = 186 > 10.
As the first two conditions involve assumptions, we should proceed with caution.
๐‘ƒเท  ~ Normal, we can use the Z tables to find probabilities.
This Photo by Unknown Author is licensed under CC
BY-NC-ND
Note:
เทก is the sampling distribution of the sample proportion of bank customers who
๐‘ท
default on their loan.
p = population proportion of bank customers who default
= 7%
= 0.07, the form we use in our calculation with proportions.
เท = sample proportion of bank customers who default on their loan
๐’‘
20
=200
= 0.10, the form we use in our calculation with proportions.
Note: we usually work with at least three decimal places when calculating with
proportions.
57
Recall, from the question:
p = 7% = 0.07
q=1−p
q = 1 − 0.07
q = 0.93
p hat = 20/200
p hat = 0.10
57
57
58
Example solution
#3
p hat = 20/200 = 0.10
Step 2: Find the z value
๐‘=
๐’=
Step 3: Sketch a curve(s)
https://www.glasbergen.com/ngg_tag/real-estate-cartoon-comics/
Total area = 100% = 1
๐‘ฦธ − ๐‘
๐‘๐‘ž
๐‘›
๐ŸŽ. ๐Ÿ๐ŸŽ − ๐ŸŽ. ๐ŸŽ๐Ÿ•
Do NOT want
๐ŸŽ. ๐ŸŽ๐Ÿ•(๐ŸŽ. ๐Ÿ—๐Ÿ‘)
๐Ÿ๐ŸŽ๐ŸŽ
๐‘ = 1.663 = 1.66
p
0.07
Want
๐‘ฦธ
0.1
๐‘ƒเท 
Step 4: Use the Z tables
Do NOT want
P(๐‘ƒเท  > 0.10) = P(Z > 1.66)
1 – 0.9515 = 0.0485
0.9515
= 1 − P(Z < 1.66)
= 1 − 0.9515
Z
0 1.66
= 0.0485
Interpretation: There is a 4.85% chance or 0.0485 probability that more
than 20 out of these 200 customers sampled, will default on their loan payments.
Note:
•
เทก curve.
We have the proportion value on the horizontal axis of the ๐‘ท
•
เทก curve.
We have the area or probability under the ๐‘ท
เทก is the sampling distribution of the sample proportion of bank customers who default on their loan.
๐‘ท
p = population proportion of bank customers who default
= 7%
= 0.07, the form we use in our calculation with proportions.
เท = sample proportion of bank customers who default on their loan
๐’‘
20
=
200
= 0.10, the form we use in our calculation with proportions.
เทก > 20 )
We want P(๐‘ท
200
เทก > 0.10)
= P(๐‘ท
เทก > 0.10) will be in the right tail of the curve.
เท = 0.10 > μ, we know that P(๐‘ท
As p = 0.07 = μ, and ๐’‘
Note: we usually work with at least three decimal places when calculating with proportions.
58
59
Supplementary Exercises
• Students are advised that Supplementary Exercises to this topic may be found on the
subject portal under “Weekly materials”.
• Solutions to the Supplementary Exercises may be available on the portal under “Weekly
materials "at the end of each week.
• Time permitting, the lecturer may ask students to work through some of these
exercises in class.
• Otherwise, it is expected that all students work through all Supplementary Exercises
outside of class time.
Kaplan Business School (KBS), Australia
59
60
Extension
• The following slides are an extension to this week’s topic.
• The work covered in the extension:
o Is not covered in class by the lecturer.
o May be assessed.
Extension topics may be studied outside class time.
Kaplan Business School (KBS), Australia
60
61
Your turn: build a sampling distribution
Say, you are in a class with a population of N = 14 students, and you ask
asked everyone their age. You have the following data set:
• From this dataset, you could select a sample of size n = 5 ages & find the
sample mean age.
• You can repeat this until you have say, 20 different samples of data.
• Now, you can create a frequency distribution using all of the sample
mean ages.
• See an example on the next slide with 20 sample means.
61
61
62
Example of sampling distribution
Sample
Raw data on age
20
18
21
34
26
22
20
20
21
18
19
21
19
25
Each row is a sample,
with the corresponding
sample mean
(sample average)
The next slide has the
frequency distribution
and histogram.
Sample 1
Sample 2
Sample 3
Sample 4
Sample 5
Sample 6
Sample 7
Sample 8
Sample 9
Sample 10
Sample 11
Sample 12
Sample 13
Sample 14
Sample 15
Sample 16
Sample 17
Sample 18
Sample 19
Sample 20
20
27
18
25
21
22
20
34
20
26
20
19
26
22
18
25
20
19
26
21
18
29
26
20
22
25
20
21
22
21
20
20
34
20
26
19
19
26
18
22
34
20
21
19
19
34
18
18
18
18
22
18
20
21
19
18
18
20
25
20
Average
19
18
20
21
20
18
25
20
21
25
21
21
19
25
21
20
34
19
21
25
21
20
22
21
25
19
20
21
19
18
22
25
19
22
18
19
20
21
19
20
22.4
22.8
21.4
21.2
21.4
23.6
20.6
22.8
20
21.6
21
20.6
23.6
22
20.4
20.2
22.2
21
21.8
21.6
We have only represented 20 samples here.
62
63
Example continued …
Class
frequency
>19 - 20
>20 - 21
>21- 22
>22 - 23
>23 - 24
1
6
7
4
2
You can see the
Sampling distribution
of mean age, ๐‘‹เดค
is unimodal &
approximately
symmetric
63
64
Exercise
Assume that the population mean age for all students enrolled in this
subject is 22 years with a standard deviation of 1.6 years. The distribution
of age is approximately normal. Assume the conditions are satisfied.
a) What is the probability that a sample of 5 students has a mean age less
than 23 years?
b) For a sample of 5 students, what age will the youngest 33% be less than?
64
64
65
Exercise solution
a) As told conditions are satisfied, ๐‘‹เดค ~ N, and we can use the Z tables.
Want P(๐‘‹เดค < 23) = P( Z < 1.40) = 0.9192
b) Want ๐‘ฅาง for P(๐‘‹เดค < ๐‘ฅ)าง = 0.33
๐‘ฅาง − ๐œ‡
๐‘= ๐œŽ
เต— ๐‘›
๐‘ฅาง −22
−0.44 = 1.6
เต— 5
๐‘ฅาง = 21.685 years
65
65
66
เดฅ
Here is another illustration from X to ๐‘ฟ
Let’s use some fair six sided-dice, where the sides of each are labelled 1, 2 , 3 , 4 , 5 , 6.
Now, we can simulate rolling 3 fair dice
50,000 times & calculate the mean value
from each roll (now n = 3)
We can simulate rolling 1 fair die 50,000
times & record the value the die lands
on from each roll (n = 1)
66
0.2
Distribution of X
0.2
0.16
0.12
0.08
0.04
0
Distribution of ๐‘ฟ าง
0.15
0.1
0.05
1
2
3
4
5
Uniform distribution
6
0
1
2
3
4
Normal distribution
5
6
Simulate the sampling distribution of the sample mean
Let’s use some fair six-sided dice, where the sides of each are labelled 1, 2 , 3 , 4 , 5 , 6.
Note: singular, die and plural, dice.
We can simulate rolling 1 fair die 50,000 times & record the value the die lands on from each
roll (here sample size, n = 1). Note, the distribution of values, X is uniform, where each side has
the same relative frequency.
Now, we can simulate rolling 3 fair dice 50,000 times & calculate the mean value from each roll
(now sample size, n = 3). Note, the distribution of values, ๐‘‹เดค is unimodal and symmetric, or
normally distributed.
The fact that the histogram of sample means on the right appears to be bell-shaped (Normal) is
a consequence of the Central Limit Theorem, even for such a small n here.
Hence, for sufficiently large sample sizes (n ≥ 30), the sample distribution is approximately
normal, even if the population distribution is non-normal.
(This Photo by Unknown Author is licensed under CC BY-NC-ND)
66
67
Z = (x
-
σ=8
Area = 5.05%, prob = 0.0505 BELOW 160
x = 160
Note: n not given, so we must be dealing with X.
Steps for reverse normal (from Week 4):
1. Sketch, placed known values
2. Use Z tables BACKWARDS or INSIDE OUT, find Z value for left tail
area of 0.0505
3. Substitute known values into Z formula: Z = (x – μ)/σ
4. Solve for μ
67
–
–
173.12 =
μ
67
68
Weekly Content, Week 4, Supplementary Exercises, Q7.
a) Steps:
1) Sketch
2) Use Z tables backwards and search for the area of 1 – 0.05 = 0.95
in the LEFT SIDE; 0.95 is the average of 0.9495 and 0.9505
3) Find the Z value, CHECK SIGN OF Z: Z = (1.64 + 1.65)/2 = 1.645
4) Substitute into Z =(x – mean)/standard deviation and solve for
the standard deviation.
1.645 = (130 – 125)/σ
σ = (130 – 125)/1.645
σ = 3.04 units
68
68
69
Weekly Content, Week 4, Supplementary Exercises, Q7.
b) Steps:
1) Sketch
2) Use Z tables backwards and search for the area of
3) Find the Z value, CHECK SIGN OF Z: Z = -1.17 etc.
4) Substitute into Z =(x – mean)/standard deviation and solve for
the standard deviation.
69
69
Download