Uploaded by Hồng Quân Đỗ

Review all chapters s2-2021-PRINT

advertisement
Lê Thị Mai Trang
REVIEW CHAPTER 1: OVERVIEW AND DESCRIPTIVE STATISTICS
Two types of statistics are descriptive statistics and inferential statistics.
A. Descriptive statistics: (page 2)
1. Pictorial and Tabular Methods:
1.1 Stem-and-Leaf Displays
1.2 Dotplots
1.3 Histogram
- Histogram for discrete data
- Histogram for continuous data: select the class interval [a;b)
+ Class widths are equal.
+ Class widths are unequal.
+ Histogram shapes: 1 peak, 2 peaks, more than two peaks, symmetric , positively
skewed , negatively skewed .
+ Multivariate Data
2. Measurement
2.1 Measures of location: (page 28)
x
n
-
Mean: x 
i1
i
n
Median ( x ): The sample median is obtained by first ordering the n observations
from smallest to largest (with any repeated values included so that every sample
observation appears in the ordered list). Then,
1
Lê Thị Mai Trang
  n  1 th
 
 ordered value if n is odd
x    2 
th
th
n
n 


average
of
and

1
 

 ordered values if n is even

2
2 

Both mean and median describe where the data is centered, but they will not in
general be equal because they focus on different aspects of the sample . Median likes
the middle value in the sample . The sample median is very insensitive to outliers.
-
-
-
Trimmed means: a% (The mean is quite sensitive to a single outlier, whereas the
median is impervious to many outliers . A trimmed mean is a compromise between
mean and median)
Quartiles, Percentiles,: The median (population or sample) divides the data set into
two parts of equal size . quartiles divide the data set into four equal parts . Similarly,
a data set (sample or population) can be even more finely divided using percentiles
Sample Proportions: f 
m
n
2.2 Measures of Variability: (page 35)
 ( x  x)
k
-
The sample variance:
s 
2
i 1
i
n 1
2
 xi 
S
 xx   xi2 
n 1
n
2
-
The sample standard deviation: s  s 2
2.3 Boxplots:
There are two ways to make a boxplot: the median is included in both halves or not.
-
Boxplots that show outliers
B. Inferential statistics: estimation, tests of hypotheses, regression
C. Softwares: Minitab, R, SAS, S-plus
2
Lê Thị Mai Trang
REVIEW CHAPTER 2: PROBABILITY
2.1 Sample spaces and events: (page 51)
-
-
-
Experiment: An experiment is any activity or process whose outcome is subject to
uncertainty.
Experiment => outcomes => all the outcomes = sample space
Sample space  : the set of possible outcomes .
Event: an event is a subset of  .
Simple event: an event consists of exactly one outcome .
Compound event: an event consists of more than one outcome.
Empty set  : an event consists of no outcome.
Some relations from set theory: (page 53)
a. The union: C = A + B or C  A  B . (at least one of the events occur)
Probability of union of events: P(A  B)  P(A)  P(B)  P(AB)
More generally : P(A  B+C)  P(A)  P(B)+P(C)-P(AB)-P(AC)-P(BC)+P(ABC)
If A and B are mutually exclusive events : P(A  B)  P(A)  P(B) .
C = A.B or C  A  B
b. The intersection:
A A  
c. The compliment: A or AC
A.A  
.
d. Mutually exclusive (or disjoint ) events : A.B   .
e. Independence: Let A and B be two events for some probability space. Event A is
independent of event B if A is true does not affect the probability that B is true, vice
versa. Expecially: P ( A.B )  P ( A).P ( B )
f. Partition: Events A ,A ,…,A are said to form a partition of  if
1
2
n
 A1  A2   An  

 Ai A j   1  i  j  n ( mutually exclusive)
3
Lê Thị Mai Trang
2.2 Axioms, Interpretations and Properties of Probability: (page 55)
- Definition : P( A) 
- Properties:
A


mA
n
A : 0  P( A)  1 ; P()  1, P()  0 ; P(A)  1 P(A)
2.3 Counting techniques: (page 64)
a. Addition rule: ( event 1 or 2 or….k will occur) m  m1  m2  ...  mk
b. Multiplication rule: ( event 1 and 2 and….k will occur) n  n1.n2 ...nk
Pn  n!
c. Permutation:
K - Permutation: Pn 
n!
k !l !
sao cho k  l  n .
d. Arrangement: ( k-permutations of n) An  Pk ,n 
k
n
k 
e. Combination: Cnk    
n!
,k  n
(n  k )!
n!
,k  n
k !(n  k )!
2.4 Conditional probability: (page 73)
-
Conditional probability: The conditional probability of A given B is defined by
P(AB)
P(A|B) 
P(B)
-
Probability of intersection of events: P(AB)  P(A) . P(B|A)
More generally : P(ABC)  P(A) . P(B|A) . P(C|AB)
Event A is independent of event B if and only if : P(AB)  P(A)P(B)
-
Bernoulli formula: P ( X  k )  C nk p k (1  p ) n  k
- The law of total probability:
n
P( B)   P ( Ai ) P( B | Ai )  P( A1 ) P( B | A1 )   P( An ) P( B | An )
i 1
4
.
Lê Thị Mai Trang
- Bayes formula:
P( Ai | B) 
P( Ai ) P( B | Ai )
, i  1,, n
P( B)
REVIEW CHAPTER 3: DISCRETE RANDOM VARIABLES and PROBABILITY
DISTRIBUTIONS
3.1 Random variables: (page 93)
A random variable is a real-valued function on  .
Classification of random variables:
- Discrete random variable: if X is a finite set or a countably infinite set of possible
outcomes.
- Continuous random variable: if X is an uncountably infinite set of possible outcomes.
3.2 Probability distributions for discrete random variables: (page 96)
a/ Probability mass function of X (pmf) is p ( x)  P ( X  x ) with
p( x)  0 ;  p( xi )  1 .
n
i1
Table of pmfs
X
x1
….
p ( xi )
p ( x1 )
…
b/ Cumulative distribution functions (CDF) : F ( x)
xn
p ( xn )
( for both discrete and continuous random variable)
F (x)  P( X  x )
Note: X is a discrete random variable, then F (x)  P ( X  x)   p ( y ) ; x  
yx
5
Lê Thị Mai Trang
3.3 Expected value and variance: (page 106)
a/ Expected value (mean value) :
E ( X )   X   x.p(x)  x1 p(x1 )  x2 p(x 2 )  ...  xn p(x n )
xD
Properties:
1/ E(c)  c , c  const
2 / E(c.X )  c.E( X )
3 / E( X  Y )  E( X )  E(Y )
4/ E( X.Y)  E( X).E(Y) if X and Y are independent.
5/
E [ h ( X )]   h ( x ). p (x)
xD
b/ Variance:
V ( X )   2  E ( X 2 )  ( EX )2 with E ( X 2 )   x 2 p (x) .
x D
The standard deviation (SD):   V ( X )
Note: V ( X )   (x EX).p(x)  E[(X EX)2 ]  E ( X 2 )  ( EX )2
D
Properties:
1/ V ( X )  0
2 / V (c )  0 , c  const
3 / V (c. X )  c 2 .V ( X )
4 / V ( X  Y )  V ( X )  V (Y ) if X , Y are independent .
3.4 -3.6 The probability distributions ( discrete r.v )
1/ The binomial probability distribution:
X  Bin( n, p )
(page 114)
A binomial experiment: The experiment consists of a sequence of n smaller experiments
called trials. . Each trial can result in one of the same two possible outcomes which
denoted by success (S) and failure (F). The probability of success P(S) is probability p
and P(F)=1-p
Binomial distribution: Let X denote the total number of success in the n trials. The
distribution of X is called the binomial distribution with parameters n and p .
6
Lê Thị Mai Trang
Pmf of X is b( x, n, p) :
Cnx p x (1  p ) n x
b( x; n; p )  P ( X  x )  
0
,  x  0,1, 2,..., n
, otherwise
( Bernoulli formula)
CDF of X is B ( x; n; p ) :
x
B( x; n; p)  P( X  x)   b( y; n; p)
; x  0,1,..., n
y 0
Properties :
EX    np
V ( X )   2  npq
( voi q  1 p )
2/ Hypergeometric distribution: (page 122)
X has a hypergeometric distribution, then
CMx .C Nn xM
P ( X  x)  h( x; n, M , N) 
CNn
EX  n.
M
N
; V ( X )  n. p.(1 p).
N n
M
with p 
N 1
N
3/ The negative binomial distribution: (page 125) ( different from your book)
Let Y be the number of trials in a sequence of independent and identically distributed
binomial trials until “ r ” outcomes occurs.
P (Y  n)  Cnr11. p r 1.(1  p ) n  r . p
Properties: EY 
r
p
; V (Y ) 
for n  r .
r (1  p)
p2
4/ The Poisson distribution: X  P ( ) with parameter   0
X is the number of successful trials .
e . x
p( x; )  P( X  x) 
x!
Properties:
EX  V ( X )  
7
(x  0,1,2,...)
(page 128)
Lê Thị Mai Trang
Note:
-
X  Bin ( n, p ) , n and p are known. When n is very large ( n > 50), p is very small,
np    5 . Then X  P ( )
X  Bin ( n, p ) , n and p are unknown, but we have the average  . Then X  P( )
REVIEW CHAPTER 4: CONTINUOUS RANDOM VARIABLES and
PROBABILITY DISTRIBUTIONS
4.1 Probability density function (pdf) : f X (page 138)
X is a continuous-type random variable, if X has the pdf satisfying
1/
2/
f (x)  0 , x  R

 f (x)dx  1

3/
P ( a  X  b)  P ( a  X  b)
 P (a  X  b)
b
 P ( a  X  b)   f (x) dx
a
4/ P( X  a)  0
4.2 Cumulative distribution functions (CDF) and expected values:
a/ Cumulative distribution functions F(x) (X is a continuous r.v) (page 143)
x
F (x)  P(X  x)   f (y)dy

Note:
F / ( x)  f ( x)
8
Lê Thị Mai Trang
Properties: P ( X  a )  1  F ( a )
; P ( a  X  b )  F (b )  F ( a )
b/ Percentiles of a Continuous Distribution: (page 146)
Let p be a number between 0 and 1. The (100p)th percentile of the distribution of a
 ( p)
continuous rv X, denoted by  ( p) , is defined by p  F ( ( p))   f ( y )dy

c/ Median :
The median of a continuous distribution, denoted by  , is the 50th percentile, so 
satisfies F (  )  0.5 . That is, half the area under the density curve is to the left of  and
half is to the right of  .

d/ Expected value ( page 148):  X  E ( X )   x . f (x)dx .


Note: h ( X )  E[h( X )]   h(x). f (x)dx



 x . f ( x )dx


V
(
X
)

E
(
X
)

(
EX
)

x
.
f
(
x
)
dx

e/ Variance (page 150) :


 



2
X
2
2


2
2
Note: The second way is V ( X )  E  ( X   )2    (x  )2 . f ( x ) dx

f/ The standard deviation (SD):   V ( X ) .
9
Lê Thị Mai Trang
4.3 The Probability Distributions: ( continuous r.v):
1/ Uniform distribution: (page 140)
X is uniformly distributed over the interval [a;b] if
 1

f X (u )   b  a
 0
aub
.
else
ab
( a  b) 2
; Var X 
Properties: EX 
2
12
2/ The normal distribution: (page 152)
Z  N (0;1)
a/ The standard normal distribution:
-
Pdf of Z :
f (z;0,1) 
1
2
,   0 ;  1
(page 153)
2
e z /2
z
-
CDF of Z : P ( Z  z )   f ( y; 0,1) dy   ( z )

P(Z  a)   (a)
P(Z  b)  1   (b)
-
Properties:
P ( a  Z  b )   (b )   ( a )
 (z)  1 when z  3.49
 (z)  0 when z  3.49
b/ Percentiles of the standard normal distribution: (page 155)
z will denote the value on the z axis for which  of the area under the z curve
lies to the right of z . Thus z is the 100(1-  )th percentile of the standard normal
distribution.
10
Lê Thị Mai Trang
X  N ( ,  2 )
c/ The normal distribution:
1
2
e ( x  ) /(2 )
-
Pdf of X : f ( x;  ,  ) 
-
Properties: EX   ; V ( X )   2
2
2
(page 152)
Note: If X  N (  ,  2 ) , then the standardized version of X ,
namely Z 
X 

 N (0;1) is a standard normal random variable.
d/ Approximating the Binomial Distribution: (page 160)
Let X be a binomial rv based on n trials with success probability p.
Then if the binomial probability histogram is not too skewed,
X has approximately a normal distribution with
  np
 2  npq
.
In particular, for a possible value of X
 x  0.5  np 
P ( X  x )  B(x, n, p)   
 np (1  p ) 


In practice, the approximation is adequate provided that both
np  10
and n(1  p )  10 , since there is then enough symmetry
in the underlying binomial distribution.
3/ The exponential distribution : (page 165)
-
X has the exponential distribution with parameter   0
if its pdf is given by
11
Lê Thị Mai Trang
-
 .e  t ; x  0
.
f (x)  
0
;
otherwise

1  e   x ; x  0
CDF of X is F (x;  )  P (X  x )  
; x0
0
1
1
Properties:   ; 2  2 .


4/ The Gamma distributions: (page 167)

- For   0 , the gamma function is defined by ( )   x 1e x dx .
0
-
A continuous random variable X is said to have a gamma distribution if the pdf
 1
x 1e x
 
of X is f ( x;  ,  )    ( )
0

-
; x0
where the parameters  ; 
; otherwise
satisfy   0,   0 .
Properties: E ( X )     ; V ( X )   2   2
The standard gamma distribution has   1 , so
 1  1  x
x e

f ( x;  ,1)   ( )
0

; x0
; otherwise
y 1e  y
dy

(

)
0
x
When X is a standard gamma r.v, the cdf of X : F ( x; )  
;x 0
- Proposition: Let X have a gamma distribution with parameters  ;  . Then for any
x

x>0 , the cdf of X is given by P( X  x)  F ( x;  ,  )  F  ;   where F ( ;  ) is the
 
incomplete gamma function.
5/ The chi-squared distribution: X   2 with parameter 
(page 169)
X is said to have a chi-squared distribution with parameter  if the pdf of X is the
gamma density with    / 2 ;   2 . The pdf of a chi-squared rv is thus
1

x / 21e  x /2
  /2
f ( x; )   2 ( / 2)
0

; x0
.
; x0
12
Lê Thị Mai Trang
The parameter is called the number of degrees of freedom (df) of X. The symbol  2 is
often used in place of “chi-squared.”
6/ The Student distribution:
4.3 Other continuous distributions : ( read book) The Weibull distribution; The Beta
distribution; The lognormal distribution.
REVIEW CHAPTER 5: JOINT PROBABILITY DISTRIBUTIONS AND
RANDOM SAMPLES (read book)
13
Lê Thị Mai Trang
REVIEW CHAPTER 6,7: ESTIMATION
- Population: (page 3)
An investigation will typically focus on a well-defined collection
of objects constituting a population of interest.
- Sample: a subset of the population is selected in some prescribed manner
- Population:
- Sample:
N : size of population
n: size of a random sample
MA
mA : the number of population successes
(or X ) : the number of population
successes
p
p  mA or f  mA : sample proportion.
n
n
MA
: population proportion
N
 : population mean
x : sample mean
 2 : population variance
s 2 : sample variance
 : population standard deviation
s : sample standard deviation ( or n1 )
 Calculator fx570 ES for statistics:
Step 1: (frequent column) Shift Mode
Step 2: Mode 3.STAT 1.
Input data, then press AC .
Step 3: Shift 1. 5.Var
1.n
4
1.on
2. x
4. xn1 or s
Note:
Shift 1. 3.Edit 2. Del (delete data)
Mode 1.
( exit)
 Calculator fx580 :
Step 1: (frequent column) Shift Menu
Step 2: Mode 6
1
Input data, then press AC .
Step 3: OPTN
2
2
you can see n, x , xn1 or s
14
1
Lê Thị Mai Trang
1. Point estimation: (page 240)
- Unbiased Estimators: (page 243) A point estimator  is said to be an unbiased
estimator of  if E ( )   for every possible value of  .
x is an unbiased estimator of 
s2 is an unbiased estimator of  2 .
p is an unbiased estimator of p .
2.
Interval estimation or Confidence interval (CI): (page 267)
- A confidence level :  where   1   (  is significance level)
2.1. Confidence interval for population mean  : (page 270)
a/ Two-sided confidence bound :
Let  be th population mean (  is unkown). Find the CI for  with confidence
level   1 .
- Sample mean: x
- Precision (or error of estimation)  :
Case 1 (page 272):  2 is known:
  z/2

n
(read ex 7.3 page 272)
Case 2 (page 277) :  2 is unknown , n is sufficiently large (n > 40):
 z/2
s
n
(Using normal table Z) (read ex7.6 page 278)
s
; n1 n
2
  t
Case 3: (page 285)  2 is unknown , n is small
( Using T – Student table)
(read ex7.11 page 288 ; ex 7.12 page 289)
- Conclusion : CI for population mean is ( x      x  )
( See Student distribution in your book page 286)
b/ One-sided confidence bound : (page 283) : read ex 7.10 page 283
15
Lê Thị Mai Trang
- An upper confidence bound for  is:
- A lower confidence bound for  is:
Note: (z/2 ) 
 1

1
2
2
s
n
  x  z
  x  z
s
n
;  (z  )    1 
;   x  t; n1
;   x t; n1
s
n
( using table Z)
c/ Find sample size, confidence level :
-
Find sample size n :   z / 2
s
 z .s 
 n    /2 
n
  
s
A width is w  2 , and w  2  2.z /2
n
2
 n
4.z2 /2 .s 2
w2
(read ex 7.4 page 273 ; ex7.7 page 279)
-
Find confidence level with precision known:
  z /2
s
 n
 z /2 
s
n
  z
s
 n
 z 
s
n
  ( z /2 )  1 

2

 1
2
   ? (two-sided)
  ( z )    ? (one-sided)
2.2. CI for population proportion p :
a/ Two-sided confidence bound : (page 280)
Let p be the proportion of “successes” in a population ( p is unknown),
find the CI for p with confidence level   1 .
m
- Sample proportion: p  A
n
- Precision ( Error of estimation) :   z /2

p (1  p )
n
(using table Z)
    p  p   )
- Conclusion : CI for population proportion is (p
b/ One-sided confidence bound :
- An upper confidence bound for p is:
16
s
n
Lê Thị Mai Trang

p(1 p)
p  p  z
n
- A lower confidence bound for p is:

p(1 p)
p  p  z
n
c/ Find sample size, confidence level :
-
Find sample size n : A width is w  2 , then n  4 z2 /2
-
Find confidence level with precision known:
-
p (1  p)
w2
  z /2
p (1  p )
 z /2  ...   ( z / 2 )  ....    ? (two-sided)
n
  z
p (1  p)
 z  ...   ( z )  ....    ? (one-sided)
n
Homework:
Population mean: Exercises 5a page 276 ; 12,13,14,15 page 283
; 34,36a,37a page 293 ;
48,49c page 297
Population proportion: Exercise 19,20,21,22,23,25a page 284 ; 51a,54,56b page 297
2.3. CI for population variance  2 :
a/ Two-sided confidence bound :
Let  2 be the population variance (  2 is unknown). CI for  2 with confidence level
  1  is
( n  1) s 2
( n  1) s 2
2
  2
2 / 2; n1
 1(  / 2) ; n 1
Using chi-squared table  2   2 ( n  1) to fidn 2/ 2 and 12 (  / 2) .
b/ One-sided confidence bound :
- An upper confidence bound for 
2
is:
2 
17
(n 1)s2
12 ; n1
Lê Thị Mai Trang
- A lower confidence bound for 
2
is:
(n 1)s2
  2
; n1
2
( n 1) s 2
( n 1) s 2



2 / 2; n1
 12(  / 2) ; n 1
c/ CI for population standard deviation  :
CHAPTER 8: TESTS OF HYPOTHESES BASED ON A SINGLE SAMPLE
1. Definitions: (page 301)
-
A statistical hypothesis: is a claim or assertion either about the value of a single parameter
(population characteristic or characteristic of a probability distribution), about the values of
several parameters, or about the form of an entire probability distribution.
-
In any hypothesis-testing problem, there are two contradictory hypotheses under consideration.
The null hypothesis, denoted by H 0 , is the claim that is initially assumed to be true (the “prior
belief” claim). The alternative hypothesis, denoted by H a , is the assertion that is
contradictory to H 0 ,.
The null hypothesis will be rejected in favor of the alternative hypothesis only if sample
evidence suggests that H 0 is false. If the sample does not strongly contradict H 0 , we will
continue to believe in the plausibility of the null hypothesis. The two possible conclusions
from a hypothesis-testing analysis are then reject H 0 or fail to reject H 0 .
18
Lê Thị Mai Trang
- The alternative to the null hypothesis H 0 :    0 will look like one of the following three
assertions:
1. H a :   0
2. H a :    0
3. H a :    0
-
A test procedure is specified by the following: (page 303)
1. A test statistic, a function of the sample data on which the decision (reject H 0 or do not
reject H 0 ) is to be based
2. A rejection region, the set of all test statistic values for which H 0 will be rejected
The null hypothesis will then be rejected if and only if the observed or computed test statistic
value falls in the rejection region.
- Errors in Hypothesis Testing:
A type I error consists of rejecting the null hypothesis H 0 when it is true.
A type II error involves not rejecting H 0 when H 0 is false.
Reality
Ho is true
Ho is false
……..
Type II error
Type I error
……..
Test
Ho is true – not rejecting
Ho is false - rejecting
-
Significance level  : P (type I error )  
( And P (type II error )  
;   1   (page 307)
)
2. Test about population mean  :
Case 1: X has normal distribution with known  2 . (page 310) (read ex 8.6 page 312)
The null hypothesis: H 0 :   0
The statistic Z:
z
x  0
. n

19
Lê Thị Mai Trang
The alternative hypothesis:
Rejection region H 0 with  :
H a :   0
z  z : reject H 0  accept H a
H a :   0
H a :   0
(The test procedure is upper-tailed)
If z  z : accept H 0
(The test procedure is lower-tailed)
z   z : reject H 0  accept H a
If z   z : accept H 0
(two tailed)
z  z  / 2 : reject H 0  accept H a
If z  z / 2 : accept H 0
Case 2: Large sample (n>40) , X has normal distribution with unknown  2 : (page 314)
(read ex 8.8 page 315)
The null hypothesis: H 0 :   0
The statistic Z:
z
x  0
. n
s
20
Lê Thị Mai Trang
Rejection region H 0 with  :
The alternative hypothesis:
H a :   0
(upper-tailed)
z  z
H a :   0
(lower-tailed)
z   z
H a :   0
(two tailed)
z  z / 2
Case 3: Small sample , X has Student distribution with unknown  2 : (page 316)
(read ex 8.9 page 317)
The null hypothesis: H 0 :   0
The statistic T:
t
x  0
. n
s
Rejection region H 0 with  :
The alternative hypothesis:
H a :   0
(upper-tailed)
t  t , n1
H a :   0
(lower-tailed)
t  t , n1
H a :   0
(two tailed)
t  t / 2, n1
Homework:
Page 321: exercises 19a, 20, 22b, 23, 24, 26, 28, 29a, 31, 32
3. Test concerning a population proportion p : (Large sample n. p0  10 ; n(1  p0 )  10 ) (page 323)
The null hypothesis: H 0 : p  p0
The statistic Z:
z
( p  p0 ) n
p0 (1  p0 )
21
Lê Thị Mai Trang – Probability and Statistics
Rejection region H0 with  :
The alternative hypothesis:
H a : p  p0
z  z : reject H0 , accept H a ;
If z  z : accept H0
z   z : reject H0 , accept H a ;
H a : p  p0
If z   z : accept H0
z  z / 2
H a : p  p0
reject H0 , accept H a ;
If z  z / 2 : accept H0
Read example 8.11 page 324
Homework: 39,37a,38ab,39,42a page 327
4. P-value: (page 328)
- The P-value is a probability.
- This probability is calculated assuming that the null hypothesis is true.
- Beware: The P-value is not the probability that H 0 is true, nor is it an error probability!
- The smaller the P-value, the more evidence there is in the sample data against the null
hypothesis and for the alternative hypothesis.
- The P-value is the smallest significance level a at which the null hypothesis can be rejected.
Because of this, the P-value is alternatively referred to as the observed significance level
(OSL) for the data.
- Decision rule based on the P-value:
Select a significance level  , (as before, the desired type I error probability). Then
do not reject H 0 if P  value  
reject H 0 if P  value  
-
The two procedures—the rejection region method and the P-value method—are in fact
identical
-
P-value for Z Tests (normal):
-
P-value cho T Test (Student):
1   ( z ) ; for an upper  tailed z test

P  value : P   ( z )
; for an lower  tailed z test

 2 1   ( z )  ; for a two  tailed z test
22
Lê Thị Mai Trang – Probability and Statistics
REVIEW CHAPTER 9: INFERENCES BASED ON TWO SAMPLES
CASES
H0
REJECTION
REGION
Ha
P-VALUE
1/ Tests for a diffeference between two population means: (page 346)
Population
H 0 : 1  2  0
𝟐
𝑵(𝝁𝟏 , 𝝈𝟏 );
𝑥̅ − 𝑦 − ∆
𝟐
𝑧
=
𝑵(𝝁𝟐 , 𝝈𝟐 )
𝟐
𝜎
𝜎
𝝈𝟏 ; 𝝈𝟐𝟐 known
+
𝑛
𝑛
𝜇 −𝜇 ≠∆
𝜇 −𝜇 >∆
𝜇 −𝜇 <∆
|𝑧| ≥ 𝑧 /
𝑧≥𝑧
𝑧 ≤ −𝑧
𝑃 = 2(1 − ∅(|𝑧|))
H 0 : 1  2  0
𝑥̅ − 𝑦 − ∆
𝑧=
𝑠
𝑠
+
𝑛
𝑛
𝜇 −𝜇 ≠∆
𝜇 −𝜇 >∆
𝜇 −𝜇 <∆
|𝑧| ≥ 𝑧 /
𝑧≥𝑧
𝑧 ≤ −𝑧
𝑃 = 2(1 − ∅(|𝑧|))
H 0 : 1  2  0
𝑥̅ − 𝑦 − ∆
𝑡=
𝑠
𝑠
+
𝑛
𝑛
𝜇 −𝜇 ≠∆
Large sample
𝑛 > 40;
𝑛 > 40
𝝈𝟐𝟏 ; 𝝈𝟐𝟐
unknown
Small sample,
𝝈𝟐𝟏 ; 𝝈𝟐𝟐
unknown
|𝑡| ≥ 𝑡
;
𝜇 −𝜇 <∆
𝑡 ≤ −𝑡( ; )
Ex: n1  10; n 2  10 ; s1  0, 79 ; s 2  3, 59  v  9, 87    9
 t / 2 ;   t0,25 ;9  2, 262
2
𝑆1
)
+
2
𝑆2
(*)
𝑡 ≥ 𝑡( ; )
2
(
𝑃 = (1 − ∅(𝑧))
𝑃 = ∅(𝑧)
𝜇 −𝜇 >∆
 s12 s22 
  
 n1 n2 
(*) 
; (round  down to the nearest integer).
2
2
 s12 / n1    s22 / n2 
n1  1
n2  1
(**) 𝑇 =
𝑃 = (1 − ∅(𝑧))
𝑃 = ∅(𝑧)
.
Read example 9.1 ; 9.2 page 348; ex9.4 page 351; ex9.7 page 359
Homework: 2b, 3, 6a, 7, 8a page 354 ; 19,28,32 page 362
23
𝑃 = 2(1 − 𝑃(𝑇 ≤
|𝑡|))
(**)
𝑃 = 1 − 𝑃 (𝑇
≤ 𝑡)
𝑃 = 𝑃(𝑇 ≤ 𝑡)
Lê Thị Mai Trang – Probability and Statistics
2/Inference concerning a diffeference between population proportion: (page 375)
H 0 : p1  p2  0

p1  p 2


p. q  1  1 
 n1 n2 
z
𝑝 ≠𝑝
𝑝 >𝑝
𝑝 <𝑝
|𝑧| ≥ 𝑧 /
𝑧≥𝑧
𝑧 ≤ −𝑧
𝑃 = 2(1 − ∅(𝑧|))
𝑓≥
𝑃 = 2(1 − 𝑃(𝐹 ≤
|𝑓|))
(***)
𝑃 = (1 − ∅(𝑧 ))
𝑃 = ∅(𝑧 )
x
y 
x y

p1 
;
p2 
; p
n1
n2
n1  n2

q  1 p
Read example 9.11 page 376
Exercise: 49, 51, 53a page 380
3/ Inferences concerning two population variances: (page 382)
𝜎 =𝜎
𝑓 =
𝜎 ≠𝜎
𝑠21
𝑠22
𝐹( / ,𝑛
,𝑛2
1
)
hay 𝑓 ≤
𝐹(
/ ,𝑛1
,𝑛2
)
𝜎 >𝜎
𝑓 ≥ 𝐹( , 𝑛
1
,𝑛2
)
𝜎 <𝜎
𝑓
≤ 𝐹(
,𝑛1
,𝑛2
)
𝑆21
𝑃 = (1 − 𝑃(𝐹
≤ 𝑓))
𝑃 = 𝑃(𝐹 ≤ 𝑓)
(***) 𝐹 = 𝑆2
2
4/ Analysis of paired data:
a/ A paired T test: (page 366)
Let D  X  Y , where X and Y are the first and second observations, respectively, within an arbitrary
pair. Then the expected difference is  D  1   2
To test hypotheses about 1  2 when data is paired, form the differences D1 , D 2 ,..., Dn and carry out a
one-sample t test (based on df) on these differences.
b/ The Paired t Confidence Interval: (page 368)
The paired t CI for  D is d  t / 2, n 1 .s D / n
A one-sided confidence bound results from retaining the relevant sign and replacing t / 2 by t .
24
Lê Thị Mai Trang – Probability and Statistics
REVIEW CHAPTER 12: SIMPLE LINEAR REGRESSION and
CORRELATION
1. The simple linear regression model: (page 469)
- The variable whose value is fixed by the experimenter will be denoted by x and will be
called the independent, predictor, or explanatory variable.
- For fixed x, the second variable will be random; we denote this random variable and its
observed value by Y and y, respectively, and refer to it as the dependent or response
variable.
- A picture of the data
( x1 , y1 ), ( x2 , y2 ),..., ( xn , yn ) called a
scatter plot gives preliminary
impressions about the nature of any
relationship.
- It appears that the value of y could be predicted from x by finding a line that is
reasonably close to the points in the plot. In other words, there is evidence of a substantial
linear relationship between the two variables.
- Using method the least squares estimates to estimate the parameters of the regression
 x ) (page 477)
line , then the estimated regression line : y  A  Bx
( or y  o  
1
2. Using calculator to find regression equation:
Casio fx-570 ES:
Step 1: (frequent column) Shift Mode
Step 2: Mode 3.STAT 2.
Do data entry then press AC.
Step 3: Shift 1. 7.Reg
1.A
2. B
3. r : correlation
Note: The linear regression equation is Y=A+BX
25
4
1.on
Lê Thị Mai Trang – Probability and Statistics
EX1:
Observe a sample (X,Y):
X
1
3
4
6
8
9
11
14
Y
1
2
4
4
5
7
8
9
Find the linear regression equation of X and Y? When X=12, find Y
Answer: y = 0,6364 x + 0,5455
; y = 8,1823
26
Download