lecture 1

advertisement
University of Khartoum
Faculty of Mathematical Science
Department of Information
Technology
Applied Statistics
)301 ‫احصاء تطبيقي (احص‬
Azza Osman Mohamed
Course component ‫محتوى‬







Statistical Estimates.
Test of Hypotheses .
Correlation.
Simple Linear Regression
Analysis.
Analysis of Variance.
Non-parametric Test.
Statistic package SPSS.
‫المقرر‬
.‫التقدير االحصائي‬
.‫اختبارات الفروض‬
.‫االرتباط الخطي‬
.‫االنحدار الخطي البسيط‬
.‫تحليل التباين‬
.‫االختبارات الالمعلمية‬
SPSS ‫الحزمة االحصائية‬








Course aim:
The aim of this course is to develop further understanding of
statistical methods.

Outcome: By the end of this course you will be able to:
o
o
o
o
o
o

Understand the inferential statistics.
Describing common measures of correlation and
association, and performing simple regression analysis.
understand the workings of the analysis of variance table
and its application to one-way ANOVA, and two-way
ANOVA situations.
understand the workings of the non-parametric methods.
Perform statistical analysis using SPSS.
Present and interpret the results.
Course evaluation:
o
o
o
o
Assignments.
Labs .
Mid-term exam.
Final exam.
Session 1
Learning Objectives

At the end of session 1 and 2 you will be able to

State Estimation Process

Introduce Properties of Point Estimates

Explain Confidence Interval Estimates

Compute Confidence Interval Estimation for
Population Mean ( known and unknown)

Compute Confidence Interval Estimation for
Population Proportion


Introduction to
Estimation
Point Estimation
Statistical Methods
Statistical
Methods
Descriptive
Statistics
Inferential
Statistics
Estimation
Hypothesis
Testing
Statistical Inference…
Statistical inference is the process by which we acquire information and
draw conclusions about populations from samples.
Statistics
Information
Data
Population
Sample
Inference
Statistic
Parameter

In order to do inference, we require the skills and knowledge of descriptive
statistics, probability distributions, and sampling distributions.
Inference Process
Estimates
& Tests
Population
Sample
Statistics
X, Ps
Sample
Thinking Challenge

Suppose you’re
interested in the
average amount of
money that students in
this class (the
population) have on
them. How would you
find out?
Estimation Methods
Estimation
Point
Estimation
Interval
Estimation
Estimation…




The objective of estimation is to determine the approximate value
of a population parameter on the basis of a sample statistic.
An estimator is a method for producing a best guess about a
population value.
An estimate is a specific value provided by an estimator.
Example: We said that the sample mean is a good estimate of the
population mean
o The sample mean is an estimator
o A particular value of the sample mean is an estimate
Point Estimator…
Definition:

A point estimator draws inferences about a population by
estimating the value of an unknown parameter using a single value
or point.

Gives no information about how close value is to the unknown
population parameter

Example: the sample mean (
population mean ( ).
) is employed to estimate the
Population Parameters Are
Estimated with Point Estimator
Estimate Population
Parameter
with Sample
Statistic
Mea
n
Proportion

X
p
ps
Variance
2
Differences
12
s
X1

2
X2
Point Estimator…

Question: Is there a unique estimator for a population parameter?
For example, is there only one estimator for the population mean?

The answer is that there may be many possible estimators

Those estimators must be ranked in terms of some desirable
properties that they should exhibit
Properties of Point Estimators

The choice of point estimator is based on the following criteria
o Unbiasedness
o Efficiency
o Consistency
Unbiased Estimators
: ‫عدم التحيز‬
Definition
 A point estimator is said to be an unbiased estimator of the
population parameter  if its expected value (the mean of its
sampling distribution) is equal to the population parameter it is
trying to estimate ˆ

E ˆ  

We can also define the bias of an estimator as follows
 
Bias ˆ  E ˆ  
Properties of Point Estimators

To select the “best unbiased” estimator, we use the criterion of
efficiency
Efficiency:
‫الكفاءة‬
Definition
 An unbiased estimator is efficient if no other unbiased estimator of
the particular population parameter has a lower sampling
distribution variance.
 If ˆ1 and ˆ2 are two unbiased estimators of the population
parameter , then ˆ1 is more efficient than ˆ2 if
 
 
V ˆ1  V ˆ2

The unbiased estimator of a population parameter with the lowest
variance out of all unbiased estimators is called the most efficient
or minimum variance unbiased estimator (MVUE).
Properties of Point Estimators
Consistency :
‫االتساق‬
Definition:
 We say that an estimator is consistent if the probability of
obtaining estimates close to the population parameter
increases as the sample size increases
 One measure of the expected closeness of an estimator
to the population parameter is its mean squared error

The problem of selecting the most appropriate estimator
for a population parameter is quite complicated
References…..



Inferences Based on a Single Sample: Estimation with Confidence
Intervals John J. McGill/Lyn Noble Revisions by Peter Jurkat
Chapter 10 Introduction on to Estimation Brocks/Cole , a division of
Thomson learning, Inc.
Basic Business Statistics: Concepts & Applications Chapter 8 Confidence Interval Estimation

Chapter 1, Point Estimation Algorithms , Department of Computer
science, University of Tennessee ,USA
Session 2
Introduction to
Estimation
Interval Estimation
Estimation Methods
Estimation
Point
Estimation
Interval
Estimation
Confidence Interval Estimation
Process
Population
Mean, , is
unknown
Random Sample
Mean
X = 50
I am 95%
confident
that  is
between
40 & 60.
Interval Estimator…

An interval estimator draws inferences about a population by
estimating the value of an unknown parameter using an interval.
Confidence Interval
Confidence Limit
(Lower)
Sample Statistic
(Point Estimate)
Confidence Limit
(Upper)

Provide us with a range of values that we belive, with a given level
of confidence, containes a true value.

That is we say (with some ___% certainty) that the population
parameter of interest is between some lower and upper bounds.

Gives Information about Closeness to Unknown Population
Parameter
Point & Interval Estimation…

For example, suppose we want to estimate the mean summer
income of a class of IT students. For n=25 students,
is calculated to be 400 $/week.
point estimate
interval estimate
An alternative statement is:
The mean income is between 380 and 420 $/week.
Confidence Interval (CI)..... ‫فترة الثقة‬

Probability that the unknown population parameter θ falls within
interval ˆ ˆ
 , 
l

u
.θ ‫ تسمي فترة الثقة للمعلمة‬ˆl ,ˆu


probability that “true” parameter  is in the interval ˆl ,ˆu
to 1-.

‫الفترة‬

 is equaled
P(ˆL    ˆU )  1 

1-  is called confidence level.

. θ ‫ على المعلمة‬ˆ ,ˆ
l
u


‫ يسمى معامل الثقة وهو احتمال احتواء الفترة‬1- 
Limits of the interval are called lower and upper confidence limits.

Confidence Interval (CI)..... ‫فترة الثقة‬
 Actual realization of this interval ˆl ,ˆu is called a (1- )% 100 of
confidence interval.
. ‫( بأن المعلمة المجهولة تقع داخل الفترة‬1- )% 100 ‫ نكون واثقين بمقدار‬
 We are 95% confident that the 95% confidence interval will include
the population parameter
  5% is probability that parameter is Not within interval
 Typical values are 99%, 95%, 90%, …
Interval and Level of Confidence
Sampling Distribution of the Mean
  Z / 2 X
Intervals
extend from
 /2
X
1
X  
X  Z X
X
1   100%
of intervals
constructed
contain  ; 
100% do not.
to
X  Z X
  Z / 2 X
 /2
Confidence Intervals
Know Central Intervals of the
Normal Distribution
X =  ± Zx
-2.58x
-1.65
x
-1.96x

+2.58x
+1.65x
+1.96x
90% Confidence
95% Confidence
99% Confidence
Factors Affecting
Interval Width

1. Data Dispersion
 Measured

2. Sample Size
 X

by X
Intervals Extend from
X - ZX toX + ZX
= X / n
3. Level of Confidence
(1 - )
 Affects
Z
Confidence Interval Estimates
Confidence
Intervals
Mean
x Known
Proportion
x Unknown
Variance
Estimating μ when σ is known…
Known, i.e. standard
normal distribution
Known, i.e. its
assumed we know
the population
standard deviation…
Known, i.e. sample
mean
Unknown, i.e. we
want to estimate
the population mean
Known, i.e. the
number of items
sampled
Confidence Interval Estimator for μ
Usually represented
with a “plus/minus”
( ± ) sign
upper confidence
limit (UCL)
lower confidence
limit (LCL)
Four commonly used confidence
levels…

Confidence Level
Example …

A computer company samples demand during lead time over 25
time periods:
235
421
394
261
386

374
361
439
374
316
309
514
348
302
296
499
462
344
466
332
253
369
330
535
334
Its is known that the standard deviation of demand over lead time is
75 computers. We want to estimate the mean demand over lead
time with 95% confidence in order to set inventory levels…
Example …

“We want to estimate the mean demand over lead time with 95%
confidence in order to set inventory levels…”

Thus, the parameter to be estimated is the pop’n mean μ .
And so our confidence interval estimator will be:

Example …

In order to use our confidence interval estimator, we need the following
pieces of data:
370.16
Calculated from the data…
1.96
75
n


Given
25
therefore:
The lower and upper confidence limits are 340.76 and 399.56.
Thinking Challenge
 The
mean of a random sample of
n = 25 isX = 50. Set up a 95%
confidence interval estimate for X
if 2X = 100.
X  Z / 2 

   X  Z / 2 

n
n
10
10
50  1.96 
   50  1.96 
25
25
46.08    53.92
 What is interval for sample size = 100?
Confidence Interval Estimates
Confidence
Intervals
Mean
x Known
Proportion
x Unknown
Variance
Confidence Interval for Mean of a Normal
Distribution with Unknown Variance

If the sample size is large n ≤ 30 :
‫في حالة حجم العينة كبير‬
 The population variance is not be known
 The sample standard deviation will be a sufficiently good
estimator of the population standard deviation

Z 
s
n

Thus, the confidence interval for the population mean is:
s
s
X  Z / 2 
   X  Z / 2 
n
n
Confidence Interval for Mean of a Normal
Distribution with Unknown Variance

If the sample size is small and the population variance is unknown,
we cannot use the standard normal distribution

If we replace the unknown  with the sample st. deviation s the
following quantity
X 
t
s/ n
follows Student’s t distribution with (n – 1) degrees of freedom

The t-distribution has mean 0 and (n – 1) degrees of freedom

As degrees of freedom increase, the t-distribution approaches the
standard normal distribution
Student’s t Distribution
Estimates the distribution of the sample mean, X , when the
distribution to be sample is normal
Standard
Normal
Bell-Shaped
t (df = 13)
Symmetric
t (df = 5)
‘Fatter’ Tails
0
Z
t
Confidence Interval for Mean of a Normal
Distribution with Unknown Variance

a 100(1-)% confidence interval for the population mean  when we
draw small samples from a normal distribution with an unknown
variance 2 is given by
s
X  tn 1, / 2
n
Student’s t Table
/2
v
t .10
t .05
t .025
1 3.078 6.314 12.706
Assume:
n=3
df = n - 1 = 2
 = .10
/2 =.05
2 1.886 2.920 4.303
/2
3 1.638 2.353 3.182
t values
0 2.920
t
Estimation Example
Mean ( Unknown)
A random sample of n = 25 has X = 50 and s = 8. Set up a 95%
confidence interval estimate for .
S
S
X  t / 2 
   X  t / 2 
n
n
8
8
50  2.064 
   50  2.064 
25
25
46.69    53.30
with 95% confidence
Thinking Challenge
For a sample where the sample size = 9, the
sample mean = 28 and the sample s.d. = 3.
What is the closest 95% confidence interval
of the mean?
 Select A for [27, 29] B for [26.5, 29.5]
C for [26, 30] D for [25.25, 30.75]
E for [24.5, 31.5]

Confidence Interval
For the Population Proportion

If we want to estimate the population proportion and n is large then:
: ‫اذا كان من المتوقع ان ال تكون نسبة النجاح غير معلومة وكان حجم العينة كبير فإن‬
Z
x
pˆ 
n
pˆ  p
pˆ 1  p 
n

and

Where x is the number of success .

Confidence interval estimate
pˆ  z 2 
ˆˆ
pq
 p  pˆ  z 2 
n
ˆˆ
pq
n

Example ….
A random sample of 400 graduates showed 32 went to graduate
school. Set up a 95% confidence interval estimate for p.
ˆˆ
ˆˆ
pq
pq
pˆ  Z / 2 
 p  pˆ  Z / 2 
n
n
.08  .92
.08  .92
.08  1.96 
 p  .08  1.96 
400
400
.053  p  .107
with 95% confidence
Thinking Challenge
You’re a production
manager for a newspaper.
You want to find the %
defective. Of 200
newspapers, 35 had
defects. What is the 90%
confidence interval estimate
of the population
proportion defective?
Solution ….
pˆ  qˆ
pˆ  qˆ
pˆ  z / 2 
 p  pˆ  z / 2 
n
n
pˆ
.175  (.825)
.175  (.825)
.175  1.645 
 p  .175  1.645 
200
200
.1308  p  .2192
with 90% confidence
References…..



Inferences Based on a Single Sample: Estimation with Confidence
Intervals John J. McGill/Lyn Noble Revisions by Peter Jurkat
Chapter 10 Introduction on to Estimation Brocks/Cole , a division of
Thomson learning, Inc.
Basic Business Statistics: Concepts & Applications Chapter 8 Confidence Interval Estimation.
Download