STA2023 E.Philias

advertisement
STA2023
E.Philias
The normal distribution is the most important and most widely used of all probability distributions. A large number
of phenomena in the real world are normally distributed either exactly or approximately. The continuous random
variables representing heights and weights of people, scores on an examination, weights of packages( e.g., cereal
boxes, boxes of cookies), amount of milk in a gallon, life of an item ( such as a light-bulb or a television set), and
time taken to complete a certain job have all observed to have a (approximate) normal distribution.
Part 3
Chapter 7: Probability Distributions for Continuous Random Variables-The normal
distribution
COMMON PATTERNS OF FREQUENCY DISTRIBUTION
1) Bell type or normal distributions
2) Positively skewed
3) Negatively skewed
Continuous Probability distribution
The probability distribution for a continuous random variable, x, can be represented by a
smooth curve-a function of x, denoted f(x). The curve is called a density function or
frequency function. The probability that x falls between two values, a and b, i.e.,
P (a < x< b), is the area under the curve between a and b.
Normal random variable is a continuous random variable with a frequency curve that is
smooth, symmetric, and bell-shaped.
Normal or Bell Curve is the frequency curve for a normal random variable.
Normal probability distribution is the probability model for a normal random variable.
Parameters of a normal probability distribution are the mean (µ) and standard deviation
(σ) of the associated normal random variable.
Normal population is a population in which a normal random variable has been defined.
.
.
STA2023
E.Philias
Normal distributions
1) Its peak occurs directly above the mean µ (mu)
2) The curve is symmetric about the vertical line through the mean (That is, if you fold the graph
along this line, the left half of the graph will fit exactly on the right half).
3) The curve never touches the x-axis- it extends indefinitely in both directions.
4) The area under the curve (and above the horizontal axis) is 1.
Notice: A normal distribution is completely determined by its mean µ and standard deviation
sigma σ.
The area of the shaded region under the normal curve from a to b is the probability that an
observed value will be between a and b.
Standard Normal Distribution
The normal distribution with mean µ = 0 and sigma σ =1 is called the standard normal distribution
Z Values or Z Scores
Z-scores are the values of the standard normal variable. They indicate the number of
standard deviations that any value of a normal random variable deviates from the mean
The units marked on the horizontal axis of the standard normal curve are denoted by z and are
called z values or z scores. A specific value of z gives the distance between the mean and the
point represented by z in terms of the standard deviation.
Example: A point with a value of z = 2 is two standard deviations to the right (above the mean)
of the mean. Similarly, a point with a value of z = -2 is two standards deviations to the left (or
below the mean) of the mean.
Property of Normal Distribution
If x is a normal random variable with mean πœ‡ and standard deviation 𝜎 , then the random
variable z defined by the formula
has a standard normal distribution. The value z describes the number of standard deviations
between x and πœ‡
STA2023
E.Philias
Converting an x value to a z value
If a normal distribution has mean πœ‡ and standard deviation sigma σ, then the z-score for the
number x is:
Area under a normal curve
The area under a normal curve between x =a and x = b is the same as the area under the
standard normal curve between the z-score for a and the z-score for b.
Procedure for solving problems based upon the Normal Distribution
1) Convert all data items to z scores using the formula
2) Make a sketch of the normal curve. Along the horizontal axis show the mean (z = 0) and all
z scores obtained in step 1. Shade the region whose area is desired.
3) Use the Z table to fill in the appropriate percents under the curve, and answer the
question.
STA2023
E.Philias
Technology Step by Step
The Normal Distribution
TI-83/84 Plus Finding Areas under the Normal Curve
Step 1 : From the HOME sreen, press 2nd VARS to access the DISTRibution menu.
Step 2: Select 2: normalcdf (
Step 3: With normal cdf ( on the HOME screen, type lowerbound, upperbound, πœ‡, σ).
For example, to find the area to the left of X = 35 under the normal curve with πœ‡ = 40 and
σ = 10, type normalcdf (-1E99, 35, 40, 10) and hit ENTER.
Note: When there is no lowerbound, enter -1E99. When there is no upperbound, enter 1E99.
The E shown is scientific notation: it is 2nd ,
Finding Scores Corresponding to an Area
Step 1: From the HOME sreen, press 2nd VARS to access the DISTRibution menu.
Step 2 : Select 3: invNorm(
Step 3: With invNorm( on the screen, type “area left”, πœ‡, σ ).
For example, to find the score such that the area under the normal curve to the left of the score
is 0.68 with πœ‡ = 40 and σ 10,
type InvNorm ( 0.68, 40, 10) and hit ENTER.
STA2023
E.Philias
Part 3
Chapter 8: Sampling and Sampling Distributions
Sampling Distributions Illustration
To illustrate the concept of a sampling distribution, let us construct the one for the mean of
a random sample of size n = 2 from the finite population of size N = 5, whose elements are
the numbers 1, 3, 5, 7, and 9.
1) The mean of this population is : πœ‡ =
1+3+5+7+9
5
=5
2) The standard deviation is
𝜎 = 2.828
Note: In actual applications we never know all the values. Now, if we take a random sample
of size n = 2 from this population, there are 5C 2 = 10 possibilities
And they are 1 and 3
3 and 7
5 and 7
1 and 5
3 and 9
5 and 9
1 and 7
3 and 5
7 and 9
1 and 9
The means of these samples are
Μ…=
𝒙
𝟏+πŸ‘
𝟐
= 2,
1+5
1+7
1+9
2
2
2
= 3,
= 4,
= 5,
3+7
2
= 5, 6, 4, 6, 7, and 8
If sampling is random, then each sample has the probability
1
10
Μ… ) of the mean:
The sampling distribution (Probability distribution of 𝒙
Μ…
𝒙
2
3
4
5
6
7
8
Probability
1
10
1
10
1
10
2
10
2
10
1
10
1
10
STA2023
E.Philias
Estimation of 𝝁
If we did not know the mean of the given population and wanted to estimate it with the
mean of a random sample of two observations, this would give us some idea about size of
our error.
More information
Further useful information about sampling distribution of the mean can be obtained by
calculating its mean πœ‡π’™Μ… and its standard deviation πœŽπ’™Μ…
Μ… : πœ‡π’™Μ… = 5
1) Mean of 𝒙
Μ… ∢ πœŽπ’™Μ… = 1.732
2) Standard deviation of 𝒙
Observe that, at least for this example,
Μ… , equals πœ‡, the mean of the
1. πœ‡π’™Μ… , the mean of the sampling distribution of 𝒙
population.
2. 𝜎π‘₯Μ… , the standard deviation of the sampling distribution of π‘₯Μ… , is smaller than 𝜎 , the
standard deviation of the population
List of Population parameters and Corresponding Sample Statistics
Mean
Variance
Standard deviation
Binomial Proportion
Population Parameter
πœ‡
𝜎2
𝜎
p
Sample Statistics
π‘₯Μ…
𝑠2
s
𝑝̂
Parameter is a descriptive numerical measure of the population. Parameters are fixed
numbers usually unknown because the associated population is very large.
Statistic is a descriptive numerical measure of a sample. It varies from sample to sample.
Statistics are used to estimate parameters.
Random Sampling is a method of sampling for which every possible sample has the same
probability of being selected.
Sampling Distribution is the probability distribution (model) associated with any statistic
when repeated random samples are drawn from the defined population.
STA2023
E.Philias
Μ…
Mean and standard deviation of 𝒙
For random samples of size n taken from a population having mean πœ‡ and the standard
Μ… has the mean
deviation𝜎, the sampling distribution of 𝒙
πœ‡π’™Μ… = πœ‡
And the standard deviation
πœŽπ’™Μ… =
𝜎
√𝑛
It is customary to refer to πœŽπ’™Μ… , the standard deviation of the sampling distribution of the
mean, as the standard error of the mean.
Μ…
Shape of the Sampling Distribution of 𝒙
Μ… relates to the following cases.
The shape of the sampling distribution of 𝒙
1. The population from which samples are drawn has a normal distribution.
2. The population from which samples are drawn does not have a normal distribution.
Central Limit Theorem is a statistical property stating that the sampling distribution of
the sample mean is approximately normal when the sample size is large enough.
Central Limit Theorem
According to the central limit theorem, for a large sample size, the sampling distribution of
Μ… is approximately normal, irrespective of the shape of the population distribution. The
𝒙
mean and standard deviation of the sampling distribution are
πœ‡π’™Μ… = πœ‡ and πœŽπ’™Μ… =
𝜎
√𝑛
The sample size is usually considered to be large if n ≥ 30
STA2023
E.Philias
Chapter 9: Estimation with Confidence Intervals
In statistical inference we make generalizations based on samples and, traditionally, such inferences have been
divided into problems of estimation and tests of hypothesis.
Estimation is the process of estimating or predicting the value of a population parameter
using a random sample and an estimator.
Estimator is a formula or statistic defined on sample data with the purpose of estimating
a parameter.
Estimate is the numerical result obtained by substituting the sample data on any given
estimator.
Types of estimates: point estimate and interval estimate.
Point estimate consists of a single figure predicting the parameter value.
Structure of Interval Estimates for Means and Proportions
Point estimate ± Margin Error
Maximum Error of Estimate (n ≥ 30)
Μ… as an estimate of πœ‡, the probability is 1 - 𝛼 that this estimate will be “off”
When we use 𝒙
either way by at most
𝜎
E= 𝑧𝛼⁄2 βˆ™ 𝑛
√
The value of 𝑧𝛼⁄2 used here is read from the standard normal distribution table for the
given confidence level. The critical value 𝑧𝛼⁄2 denotes the positive value of z for which the area
under the standard normal curve to its right is equal to 𝛼 ⁄2 (Greek lowercase alpha),
The three values that are most common used for 1 - 𝛼 are 0.90, 0.95 and 0.99
Level of Confidence
(1- 𝛼)βˆ™100%
90%
95%
99%
𝛼
Area in each Tail, 2
Critical Value, 𝑧𝛼
0.05
0.025
0.005
1.645
196
2.575
2
STA2023
E.Philias
Sample size for estimating 𝝁 (n ≥ 30)
n=(
πœŽβˆ™ 𝑧𝛼⁄2 2
)
𝐸
Interval estimate consists of a numerical range where the parameter is expected to fall with
certain confidence.
Confidence coefficient is a probability that measures the reliability of any interval estimate.
Confidence level is the confidence coefficient expressed as a percentage.
Confidence interval is an interval estimate calculated with a specified confidence level.
Margin of error is a measure of the error of estimation that involves the given confidence
level and sample size.
Precision of any confidence interval is associated with the width of the interval estimate.
The precision is better as the margin of error is smaller.
Large-Sample Confidence Interval for 𝝁
The (1 - 𝛼 ) 100% confidence interval for πœ‡ is
Μ… ± 𝑧𝛼⁄2 πœŽπ’™Μ… if 𝜎 is known
𝒙
Where
Μ… ± 𝑧𝛼⁄2 𝑠𝒙̅ if 𝜎 is not known
𝒙
πœŽπ’™Μ… =
𝜎
√𝑛
and 𝑠𝒙̅ =
𝑠
√𝑛
The value of 𝑧𝛼⁄2 used here is read from the standard normal distribution table for the
given confidence level. The critical value 𝑧𝛼⁄2 denotes the positive value of z for which the area
under the standard normal curve to its right is equal to 𝛼 ⁄2 (Greek lowercase alpha),
ο‚·
ο‚·
ο‚·
ο‚·
ο‚·
ο‚·
Μ… − 𝑧𝛼⁄2 πœŽπ’™Μ… , 𝒙
Μ… + 𝑧𝛼⁄2 πœŽπ’™Μ… ) is called a
Confidence Interval: An interval like this (𝒙
confidence interval.
Μ… − 𝑧𝛼⁄2 πœŽπ’™Μ… and 𝒙
Μ… + 𝑧𝛼⁄2 πœŽπ’™Μ… are called confidence limits.
Confidence Limit: 𝒙
Degree of confidence: The probability 1 - 𝛼 is called the degree of confidence.
Significance level: 𝛼 is called the significance level.
If n is at least 30, then the use of s is recommended, even if the value of 𝜎 is claimed
to be known.
Critical value: 𝑧𝛼⁄2 is called a critical value.
STA2023
E.Philias
Interpretation of a confidence Interval for a Population Mean
When we form a (1 - 𝛼 ) 100% confidence interval for πœ‡, we usually express our
confidence in the interval with a statement such as, “We can be (1 - 𝛼 ) 100% confident
that πœ‡ lies between the lower and upper bounds of the confidence interval.”
For example: With a 95% confidence interval and 0 .476 < πœ‡ <0.544, we can state that
” We are 95% confident that the interval from 0.476 to 0.544 actually does contain the true
value of πœ‡. “
Confidence Intervals for Means (Small samples)
The t distribution
The t distribution is a specific type of bell-shaped distribution with a lower height and a
wider spread than the standard normal distribution. As the sample size becomes larger, the
t distribution approaches the standard normal distribution. The t distribution has only one
parameter, called the degrees of freedom (df). The mean of the t distribution is equal to 0
and its standard deviation is √𝑑𝑓(𝑑𝑓 − 2)
The t distribution is used to make a confidence interval about πœ‡ if
1. The population from which the sample is drawn is (approximately) normally
distributed.
2. The sample size is small (that is, n < 30)
3. The population standard deviation 𝜎 , is not known.
Small-Samples Confidence Intervals for Means (πœ‡)
The (1 - 𝛼 )100% confidence interval for πœ‡ is
Where
Μ… ± 𝑑𝛼⁄2 𝑠𝒙̅
𝒙
𝑠𝒙̅ =
𝑠
√𝑛
The value 𝑑𝛼⁄2 is obtained from the t distribution table for n -1 degrees of freedom and the
given confidence level.
STA2023
E.Philias
Population and Sample Proportions
The population and sample proportions, denoted by p and 𝑝̂ (pronounced p hat),
respectively are calculated as
𝑋
π‘₯
P = 𝑁 and 𝑝̂ = 𝑛
Where
N = total number of elements in the population
n= total number of elements in the sample
X = number of elements in the population that possess a specific characteristic
x= number of elements in the sample that possess a specific characteristic
Μ‚
Sampling distribution of 𝒑
The probability distribution of the sample proportion, 𝑝̂ , is called its sampling distribution.
It gives the various values that 𝑝̂ can assume and their probabilities.
Large-Sample Confidence Interval for p
The (1 - 𝛼 )100% confidence interval for the population proportion, p, is
𝑝̂ (1 − 𝑝̂ )
𝑝̂ ± 𝑧𝛼⁄2 √
𝑛
π‘₯
Approximate Maximum Error of Estimate using 𝑝̂ = 𝑛 to estimate p
E = 𝑧𝛼⁄2 √
𝑝̂(1−𝑝̂)
𝑛
STA2023
E.Philias
Technology Step by Step
Confidence Intervals about 𝝁, 𝝈 Known (Large Samples)
TI-83-84 Plus
Step 1: If necessary, enter raw data in 𝐿1 .
Step 2: Press STAT, highlight TESTS, and select 7 : Z Interval.
Step 3: If the data are raw, highlight DATA. Make sure List1 is set to 𝐿1 . And Freq to 1. If
summary statistics are known, highlight STATS and enter the summary statistics. Following
𝜎:, enter the population standard deviation.
Step 4: Enter the confidence level following C-level:.
Step 5: Highlight Calculate; press ENTER.
Confidence Intervals about 𝝁, 𝝈 unKnown (Small Samples)
Step 1: If necessary, enter raw data in 𝐿1 .
Step 2: Press STAT, highlight TESTS, and select 8: TInterval.
Step 3: If the data are raw, highlight DATA, Make sure List1 is set to π‘³πŸ and Freq to 1. If
summary statistics are known, highlight STATS and enter the summary statistics.
Step 4: Enter the confidence level following C-Level:.
Step 5: Highlight Calculate; press ENTER.
Confidence Intervals about p
Step 1: Press STAT, highlight TESTS, and select A: 1-PropZInt..
Step 2: Enter the values of x and n.
Step 3: Enter the confidence level following C-level:
Step 4: Highlight Calculate; press ENTER.
Download