Statistics Review: Descriptive, Inferential, Probability

Lê Thị Mai Trang REVIEW CHAPTER 1: OVERVIEW AND DESCRIPTIVE STATISTICS Two types of statistics are descriptive statistics and inferential statistics. A. Descriptive statistics: (page 2) 1. Pictorial and Tabular Methods: 1.1 Stem-and-Leaf Displays 1.2 Dotplots 1.3 Histogram - Histogram for discrete data - Histogram for continuous data: select the class interval [a;b) + Class widths are equal. + Class widths are unequal. + Histogram shapes: 1 peak, 2 peaks, more than two peaks, symmetric , positively skewed , negatively skewed . + Multivariate Data 2. Measurement 2.1 Measures of location: (page 28) x n - Mean: x  i1 i n Median ( x ): The sample median is obtained by first ordering the n observations from smallest to largest (with any repeated values included so that every sample observation appears in the ordered list). Then, 1 Lê Thị Mai Trang   n  1 th    ordered value if n is odd x    2  th th n n    average of and  1     ordered values if n is even  2 2   Both mean and median describe where the data is centered, but they will not in general be equal because they focus on different aspects of the sample . Median likes the middle value in the sample . The sample median is very insensitive to outliers. - - - Trimmed means: a% (The mean is quite sensitive to a single outlier, whereas the median is impervious to many outliers . A trimmed mean is a compromise between mean and median) Quartiles, Percentiles,: The median (population or sample) divides the data set into two parts of equal size . quartiles divide the data set into four equal parts . Similarly, a data set (sample or population) can be even more finely divided using percentiles Sample Proportions: f  m n 2.2 Measures of Variability: (page 35)  ( x  x) k - The sample variance: s  2 i 1 i n 1 2  xi  S  xx   xi2  n 1 n 2 - The sample standard deviation: s  s 2 2.3 Boxplots: There are two ways to make a boxplot: the median is included in both halves or not. - Boxplots that show outliers B. Inferential statistics: estimation, tests of hypotheses, regression C. Softwares: Minitab, R, SAS, S-plus 2 Lê Thị Mai Trang REVIEW CHAPTER 2: PROBABILITY 2.1 Sample spaces and events: (page 51) - - - Experiment: An experiment is any activity or process whose outcome is subject to uncertainty. Experiment => outcomes => all the outcomes = sample space Sample space  : the set of possible outcomes . Event: an event is a subset of  . Simple event: an event consists of exactly one outcome . Compound event: an event consists of more than one outcome. Empty set  : an event consists of no outcome. Some relations from set theory: (page 53) a. The union: C = A + B or C  A  B . (at least one of the events occur) Probability of union of events: P(A  B)  P(A)  P(B)  P(AB) More generally : P(A  B+C)  P(A)  P(B)+P(C)-P(AB)-P(AC)-P(BC)+P(ABC) If A and B are mutually exclusive events : P(A  B)  P(A)  P(B) . C = A.B or C  A  B b. The intersection: A A   c. The compliment: A or AC A.A   . d. Mutually exclusive (or disjoint ) events : A.B   . e. Independence: Let A and B be two events for some probability space. Event A is independent of event B if A is true does not affect the probability that B is true, vice versa. Expecially: P ( A.B )  P ( A).P ( B ) f. Partition: Events A ,A ,…,A are said to form a partition of  if 1 2 n  A1  A2   An     Ai A j   1  i  j  n ( mutually exclusive) 3 Lê Thị Mai Trang 2.2 Axioms, Interpretations and Properties of Probability: (page 55) - Definition : P( A)  - Properties: A   mA n A : 0  P( A)  1 ; P()  1, P()  0 ; P(A)  1 P(A) 2.3 Counting techniques: (page 64) a. Addition rule: ( event 1 or 2 or….k will occur) m  m1  m2  ...  mk b. Multiplication rule: ( event 1 and 2 and….k will occur) n  n1.n2 ...nk Pn  n! c. Permutation: K - Permutation: Pn  n! k !l ! sao cho k  l  n . d. Arrangement: ( k-permutations of n) An  Pk ,n  k n k  e. Combination: Cnk     n! ,k  n (n  k )! n! ,k  n k !(n  k )! 2.4 Conditional probability: (page 73) - Conditional probability: The conditional probability of A given B is defined by P(AB) P(A|B)  P(B) - Probability of intersection of events: P(AB)  P(A) . P(B|A) More generally : P(ABC)  P(A) . P(B|A) . P(C|AB) Event A is independent of event B if and only if : P(AB)  P(A)P(B) - Bernoulli formula: P ( X  k )  C nk p k (1  p ) n  k - The law of total probability: n P( B)   P ( Ai ) P( B | Ai )  P( A1 ) P( B | A1 )   P( An ) P( B | An ) i 1 4 . Lê Thị Mai Trang - Bayes formula: P( Ai | B)  P( Ai ) P( B | Ai ) , i  1,, n P( B) REVIEW CHAPTER 3: DISCRETE RANDOM VARIABLES and PROBABILITY DISTRIBUTIONS 3.1 Random variables: (page 93) A random variable is a real-valued function on  . Classification of random variables: - Discrete random variable: if X is a finite set or a countably infinite set of possible outcomes. - Continuous random variable: if X is an uncountably infinite set of possible outcomes. 3.2 Probability distributions for discrete random variables: (page 96) a/ Probability mass function of X (pmf) is p ( x)  P ( X  x ) with p( x)  0 ;  p( xi )  1 . n i1 Table of pmfs X x1 …. p ( xi ) p ( x1 ) … b/ Cumulative distribution functions (CDF) : F ( x) xn p ( xn ) ( for both discrete and continuous random variable) F (x)  P( X  x ) Note: X is a discrete random variable, then F (x)  P ( X  x)   p ( y ) ; x   yx 5 Lê Thị Mai Trang 3.3 Expected value and variance: (page 106) a/ Expected value (mean value) : E ( X )   X   x.p(x)  x1 p(x1 )  x2 p(x 2 )  ...  xn p(x n ) xD Properties: 1/ E(c)  c , c  const 2 / E(c.X )  c.E( X ) 3 / E( X  Y )  E( X )  E(Y ) 4/ E( X.Y)  E( X).E(Y) if X and Y are independent. 5/ E [ h ( X )]   h ( x ). p (x) xD b/ Variance: V ( X )   2  E ( X 2 )  ( EX )2 with E ( X 2 )   x 2 p (x) . x D The standard deviation (SD):   V ( X ) Note: V ( X )   (x EX).p(x)  E[(X EX)2 ]  E ( X 2 )  ( EX )2 D Properties: 1/ V ( X )  0 2 / V (c )  0 , c  const 3 / V (c. X )  c 2 .V ( X ) 4 / V ( X  Y )  V ( X )  V (Y ) if X , Y are independent . 3.4 -3.6 The probability distributions ( discrete r.v ) 1/ The binomial probability distribution: X  Bin( n, p ) (page 114) A binomial experiment: The experiment consists of a sequence of n smaller experiments called trials. . Each trial can result in one of the same two possible outcomes which denoted by success (S) and failure (F). The probability of success P(S) is probability p and P(F)=1-p Binomial distribution: Let X denote the total number of success in the n trials. The distribution of X is called the binomial distribution with parameters n and p . 6 Lê Thị Mai Trang Pmf of X is b( x, n, p) : Cnx p x (1  p ) n x b( x; n; p )  P ( X  x )   0 ,  x  0,1, 2,..., n , otherwise ( Bernoulli formula) CDF of X is B ( x; n; p ) : x B( x; n; p)  P( X  x)   b( y; n; p) ; x  0,1,..., n y 0 Properties : EX    np V ( X )   2  npq ( voi q  1 p ) 2/ Hypergeometric distribution: (page 122) X has a hypergeometric distribution, then CMx .C Nn xM P ( X  x)  h( x; n, M , N)  CNn EX  n. M N ; V ( X )  n. p.(1 p). N n M with p  N 1 N 3/ The negative binomial distribution: (page 125) ( different from your book) Let Y be the number of trials in a sequence of independent and identically distributed binomial trials until “ r ” outcomes occurs. P (Y  n)  Cnr11. p r 1.(1  p ) n  r . p Properties: EY  r p ; V (Y )  for n  r . r (1  p) p2 4/ The Poisson distribution: X  P ( ) with parameter   0 X is the number of successful trials . e . x p( x; )  P( X  x)  x! Properties: EX  V ( X )   7 (x  0,1,2,...) (page 128) Lê Thị Mai Trang Note: - X  Bin ( n, p ) , n and p are known. When n is very large ( n > 50), p is very small, np    5 . Then X  P ( ) X  Bin ( n, p ) , n and p are unknown, but we have the average  . Then X  P( ) REVIEW CHAPTER 4: CONTINUOUS RANDOM VARIABLES and PROBABILITY DISTRIBUTIONS 4.1 Probability density function (pdf) : f X (page 138) X is a continuous-type random variable, if X has the pdf satisfying 1/ 2/ f (x)  0 , x  R   f (x)dx  1  3/ P ( a  X  b)  P ( a  X  b)  P (a  X  b) b  P ( a  X  b)   f (x) dx a 4/ P( X  a)  0 4.2 Cumulative distribution functions (CDF) and expected values: a/ Cumulative distribution functions F(x) (X is a continuous r.v) (page 143) x F (x)  P(X  x)   f (y)dy  Note: F / ( x)  f ( x) 8 Lê Thị Mai Trang Properties: P ( X  a )  1  F ( a ) ; P ( a  X  b )  F (b )  F ( a ) b/ Percentiles of a Continuous Distribution: (page 146) Let p be a number between 0 and 1. The (100p)th percentile of the distribution of a  ( p) continuous rv X, denoted by  ( p) , is defined by p  F ( ( p))   f ( y )dy  c/ Median : The median of a continuous distribution, denoted by  , is the 50th percentile, so  satisfies F (  )  0.5 . That is, half the area under the density curve is to the left of  and half is to the right of  .  d/ Expected value ( page 148):  X  E ( X )   x . f (x)dx .   Note: h ( X )  E[h( X )]   h(x). f (x)dx     x . f ( x )dx   V ( X )  E ( X )  ( EX )  x . f ( x ) dx  e/ Variance (page 150) :        2 X 2 2   2 2 Note: The second way is V ( X )  E  ( X   )2    (x  )2 . f ( x ) dx  f/ The standard deviation (SD):   V ( X ) . 9 Lê Thị Mai Trang 4.3 The Probability Distributions: ( continuous r.v): 1/ Uniform distribution: (page 140) X is uniformly distributed over the interval [a;b] if  1  f X (u )   b  a  0 aub . else ab ( a  b) 2 ; Var X  Properties: EX  2 12 2/ The normal distribution: (page 152) Z  N (0;1) a/ The standard normal distribution: - Pdf of Z : f (z;0,1)  1 2 ,   0 ;  1 (page 153) 2 e z /2 z - CDF of Z : P ( Z  z )   f ( y; 0,1) dy   ( z )  P(Z  a)   (a) P(Z  b)  1   (b) - Properties: P ( a  Z  b )   (b )   ( a )  (z)  1 when z  3.49  (z)  0 when z  3.49 b/ Percentiles of the standard normal distribution: (page 155) z will denote the value on the z axis for which  of the area under the z curve lies to the right of z . Thus z is the 100(1-  )th percentile of the standard normal distribution. 10 Lê Thị Mai Trang X  N ( ,  2 ) c/ The normal distribution: 1 2 e ( x  ) /(2 ) - Pdf of X : f ( x;  ,  )  - Properties: EX   ; V ( X )   2 2 2 (page 152) Note: If X  N (  ,  2 ) , then the standardized version of X , namely Z  X    N (0;1) is a standard normal random variable. d/ Approximating the Binomial Distribution: (page 160) Let X be a binomial rv based on n trials with success probability p. Then if the binomial probability histogram is not too skewed, X has approximately a normal distribution with   np  2  npq . In particular, for a possible value of X  x  0.5  np  P ( X  x )  B(x, n, p)     np (1  p )    In practice, the approximation is adequate provided that both np  10 and n(1  p )  10 , since there is then enough symmetry in the underlying binomial distribution. 3/ The exponential distribution : (page 165) - X has the exponential distribution with parameter   0 if its pdf is given by 11 Lê Thị Mai Trang -  .e  t ; x  0 . f (x)   0 ; otherwise  1  e   x ; x  0 CDF of X is F (x;  )  P (X  x )   ; x0 0 1 1 Properties:   ; 2  2 .   4/ The Gamma distributions: (page 167)  - For   0 , the gamma function is defined by ( )   x 1e x dx . 0 - A continuous random variable X is said to have a gamma distribution if the pdf  1 x 1e x   of X is f ( x;  ,  )    ( ) 0  - ; x0 where the parameters  ;  ; otherwise satisfy   0,   0 . Properties: E ( X )     ; V ( X )   2   2 The standard gamma distribution has   1 , so  1  1  x x e  f ( x;  ,1)   ( ) 0  ; x0 ; otherwise y 1e  y dy  (  ) 0 x When X is a standard gamma r.v, the cdf of X : F ( x; )   ;x 0 - Proposition: Let X have a gamma distribution with parameters  ;  . Then for any x  x>0 , the cdf of X is given by P( X  x)  F ( x;  ,  )  F  ;   where F ( ;  ) is the   incomplete gamma function. 5/ The chi-squared distribution: X   2 with parameter  (page 169) X is said to have a chi-squared distribution with parameter  if the pdf of X is the gamma density with    / 2 ;   2 . The pdf of a chi-squared rv is thus 1  x / 21e  x /2   /2 f ( x; )   2 ( / 2) 0  ; x0 . ; x0 12 Lê Thị Mai Trang The parameter is called the number of degrees of freedom (df) of X. The symbol  2 is often used in place of “chi-squared.” 6/ The Student distribution: 4.3 Other continuous distributions : ( read book) The Weibull distribution; The Beta distribution; The lognormal distribution. REVIEW CHAPTER 5: JOINT PROBABILITY DISTRIBUTIONS AND RANDOM SAMPLES (read book) 13 Lê Thị Mai Trang REVIEW CHAPTER 6,7: ESTIMATION - Population: (page 3) An investigation will typically focus on a well-defined collection of objects constituting a population of interest. - Sample: a subset of the population is selected in some prescribed manner - Population: - Sample: N : size of population n: size of a random sample MA mA : the number of population successes (or X ) : the number of population successes p p  mA or f  mA : sample proportion. n n MA : population proportion N  : population mean x : sample mean  2 : population variance s 2 : sample variance  : population standard deviation s : sample standard deviation ( or n1 )  Calculator fx570 ES for statistics: Step 1: (frequent column) Shift Mode Step 2: Mode 3.STAT 1. Input data, then press AC . Step 3: Shift 1. 5.Var 1.n 4 1.on 2. x 4. xn1 or s Note: Shift 1. 3.Edit 2. Del (delete data) Mode 1. ( exit)  Calculator fx580 : Step 1: (frequent column) Shift Menu Step 2: Mode 6 1 Input data, then press AC . Step 3: OPTN 2 2 you can see n, x , xn1 or s 14 1 Lê Thị Mai Trang 1. Point estimation: (page 240) - Unbiased Estimators: (page 243) A point estimator  is said to be an unbiased estimator of  if E ( )   for every possible value of  . x is an unbiased estimator of  s2 is an unbiased estimator of  2 . p is an unbiased estimator of p . 2. Interval estimation or Confidence interval (CI): (page 267) - A confidence level :  where   1   (  is significance level) 2.1. Confidence interval for population mean  : (page 270) a/ Two-sided confidence bound : Let  be th population mean (  is unkown). Find the CI for  with confidence level   1 . - Sample mean: x - Precision (or error of estimation)  : Case 1 (page 272):  2 is known:   z/2  n (read ex 7.3 page 272) Case 2 (page 277) :  2 is unknown , n is sufficiently large (n > 40):  z/2 s n (Using normal table Z) (read ex7.6 page 278) s ; n1 n 2   t Case 3: (page 285)  2 is unknown , n is small ( Using T – Student table) (read ex7.11 page 288 ; ex 7.12 page 289) - Conclusion : CI for population mean is ( x      x  ) ( See Student distribution in your book page 286) b/ One-sided confidence bound : (page 283) : read ex 7.10 page 283 15 Lê Thị Mai Trang - An upper confidence bound for  is: - A lower confidence bound for  is: Note: (z/2 )   1  1 2 2 s n   x  z   x  z s n ;  (z  )    1  ;   x  t; n1 ;   x t; n1 s n ( using table Z) c/ Find sample size, confidence level : - Find sample size n :   z / 2 s  z .s   n    /2  n    s A width is w  2 , and w  2  2.z /2 n 2  n 4.z2 /2 .s 2 w2 (read ex 7.4 page 273 ; ex7.7 page 279) - Find confidence level with precision known:   z /2 s  n  z /2  s n   z s  n  z  s n   ( z /2 )  1   2   1 2    ? (two-sided)   ( z )    ? (one-sided) 2.2. CI for population proportion p : a/ Two-sided confidence bound : (page 280) Let p be the proportion of “successes” in a population ( p is unknown), find the CI for p with confidence level   1 . m - Sample proportion: p  A n - Precision ( Error of estimation) :   z /2  p (1  p ) n (using table Z)     p  p   ) - Conclusion : CI for population proportion is (p b/ One-sided confidence bound : - An upper confidence bound for p is: 16 s n Lê Thị Mai Trang  p(1 p) p  p  z n - A lower confidence bound for p is:  p(1 p) p  p  z n c/ Find sample size, confidence level : - Find sample size n : A width is w  2 , then n  4 z2 /2 - Find confidence level with precision known: - p (1  p) w2   z /2 p (1  p )  z /2  ...   ( z / 2 )  ....    ? (two-sided) n   z p (1  p)  z  ...   ( z )  ....    ? (one-sided) n Homework: Population mean: Exercises 5a page 276 ; 12,13,14,15 page 283 ; 34,36a,37a page 293 ; 48,49c page 297 Population proportion: Exercise 19,20,21,22,23,25a page 284 ; 51a,54,56b page 297 2.3. CI for population variance  2 : a/ Two-sided confidence bound : Let  2 be the population variance (  2 is unknown). CI for  2 with confidence level   1  is ( n  1) s 2 ( n  1) s 2 2   2 2 / 2; n1  1(  / 2) ; n 1 Using chi-squared table  2   2 ( n  1) to fidn 2/ 2 and 12 (  / 2) . b/ One-sided confidence bound : - An upper confidence bound for  2 is: 2  17 (n 1)s2 12 ; n1 Lê Thị Mai Trang - A lower confidence bound for  2 is: (n 1)s2   2 ; n1 2 ( n 1) s 2 ( n 1) s 2    2 / 2; n1  12(  / 2) ; n 1 c/ CI for population standard deviation  : CHAPTER 8: TESTS OF HYPOTHESES BASED ON A SINGLE SAMPLE 1. Definitions: (page 301) - A statistical hypothesis: is a claim or assertion either about the value of a single parameter (population characteristic or characteristic of a probability distribution), about the values of several parameters, or about the form of an entire probability distribution. - In any hypothesis-testing problem, there are two contradictory hypotheses under consideration. The null hypothesis, denoted by H 0 , is the claim that is initially assumed to be true (the “prior belief” claim). The alternative hypothesis, denoted by H a , is the assertion that is contradictory to H 0 ,. The null hypothesis will be rejected in favor of the alternative hypothesis only if sample evidence suggests that H 0 is false. If the sample does not strongly contradict H 0 , we will continue to believe in the plausibility of the null hypothesis. The two possible conclusions from a hypothesis-testing analysis are then reject H 0 or fail to reject H 0 . 18 Lê Thị Mai Trang - The alternative to the null hypothesis H 0 :    0 will look like one of the following three assertions: 1. H a :   0 2. H a :    0 3. H a :    0 - A test procedure is specified by the following: (page 303) 1. A test statistic, a function of the sample data on which the decision (reject H 0 or do not reject H 0 ) is to be based 2. A rejection region, the set of all test statistic values for which H 0 will be rejected The null hypothesis will then be rejected if and only if the observed or computed test statistic value falls in the rejection region. - Errors in Hypothesis Testing: A type I error consists of rejecting the null hypothesis H 0 when it is true. A type II error involves not rejecting H 0 when H 0 is false. Reality Ho is true Ho is false …….. Type II error Type I error …….. Test Ho is true – not rejecting Ho is false - rejecting - Significance level  : P (type I error )   ( And P (type II error )   ;   1   (page 307) ) 2. Test about population mean  : Case 1: X has normal distribution with known  2 . (page 310) (read ex 8.6 page 312) The null hypothesis: H 0 :   0 The statistic Z: z x  0 . n  19 Lê Thị Mai Trang The alternative hypothesis: Rejection region H 0 with  : H a :   0 z  z : reject H 0  accept H a H a :   0 H a :   0 (The test procedure is upper-tailed) If z  z : accept H 0 (The test procedure is lower-tailed) z   z : reject H 0  accept H a If z   z : accept H 0 (two tailed) z  z  / 2 : reject H 0  accept H a If z  z / 2 : accept H 0 Case 2: Large sample (n>40) , X has normal distribution with unknown  2 : (page 314) (read ex 8.8 page 315) The null hypothesis: H 0 :   0 The statistic Z: z x  0 . n s 20 Lê Thị Mai Trang Rejection region H 0 with  : The alternative hypothesis: H a :   0 (upper-tailed) z  z H a :   0 (lower-tailed) z   z H a :   0 (two tailed) z  z / 2 Case 3: Small sample , X has Student distribution with unknown  2 : (page 316) (read ex 8.9 page 317) The null hypothesis: H 0 :   0 The statistic T: t x  0 . n s Rejection region H 0 with  : The alternative hypothesis: H a :   0 (upper-tailed) t  t , n1 H a :   0 (lower-tailed) t  t , n1 H a :   0 (two tailed) t  t / 2, n1 Homework: Page 321: exercises 19a, 20, 22b, 23, 24, 26, 28, 29a, 31, 32 3. Test concerning a population proportion p : (Large sample n. p0  10 ; n(1  p0 )  10 ) (page 323) The null hypothesis: H 0 : p  p0 The statistic Z: z ( p  p0 ) n p0 (1  p0 ) 21 Lê Thị Mai Trang – Probability and Statistics Rejection region H0 with  : The alternative hypothesis: H a : p  p0 z  z : reject H0 , accept H a ; If z  z : accept H0 z   z : reject H0 , accept H a ; H a : p  p0 If z   z : accept H0 z  z / 2 H a : p  p0 reject H0 , accept H a ; If z  z / 2 : accept H0 Read example 8.11 page 324 Homework: 39,37a,38ab,39,42a page 327 4. P-value: (page 328) - The P-value is a probability. - This probability is calculated assuming that the null hypothesis is true. - Beware: The P-value is not the probability that H 0 is true, nor is it an error probability! - The smaller the P-value, the more evidence there is in the sample data against the null hypothesis and for the alternative hypothesis. - The P-value is the smallest significance level a at which the null hypothesis can be rejected. Because of this, the P-value is alternatively referred to as the observed significance level (OSL) for the data. - Decision rule based on the P-value: Select a significance level  , (as before, the desired type I error probability). Then do not reject H 0 if P  value   reject H 0 if P  value   - The two procedures—the rejection region method and the P-value method—are in fact identical - P-value for Z Tests (normal): - P-value cho T Test (Student): 1   ( z ) ; for an upper  tailed z test  P  value : P   ( z ) ; for an lower  tailed z test   2 1   ( z )  ; for a two  tailed z test 22 Lê Thị Mai Trang – Probability and Statistics REVIEW CHAPTER 9: INFERENCES BASED ON TWO SAMPLES CASES H0 REJECTION REGION Ha P-VALUE 1/ Tests for a diffeference between two population means: (page 346) Population H 0 : 1  2  0 𝟐 𝑵(𝝁𝟏 , 𝝈𝟏 ); 𝑥̅ − 𝑦 − ∆ 𝟐 𝑧 = 𝑵(𝝁𝟐 , 𝝈𝟐 ) 𝟐 𝜎 𝜎 𝝈𝟏 ; 𝝈𝟐𝟐 known + 𝑛 𝑛 𝜇 −𝜇 ≠∆ 𝜇 −𝜇 >∆ 𝜇 −𝜇 <∆ |𝑧| ≥ 𝑧 / 𝑧≥𝑧 𝑧 ≤ −𝑧 𝑃 = 2(1 − ∅(|𝑧|)) H 0 : 1  2  0 𝑥̅ − 𝑦 − ∆ 𝑧= 𝑠 𝑠 + 𝑛 𝑛 𝜇 −𝜇 ≠∆ 𝜇 −𝜇 >∆ 𝜇 −𝜇 <∆ |𝑧| ≥ 𝑧 / 𝑧≥𝑧 𝑧 ≤ −𝑧 𝑃 = 2(1 − ∅(|𝑧|)) H 0 : 1  2  0 𝑥̅ − 𝑦 − ∆ 𝑡= 𝑠 𝑠 + 𝑛 𝑛 𝜇 −𝜇 ≠∆ Large sample 𝑛 > 40; 𝑛 > 40 𝝈𝟐𝟏 ; 𝝈𝟐𝟐 unknown Small sample, 𝝈𝟐𝟏 ; 𝝈𝟐𝟐 unknown |𝑡| ≥ 𝑡 ; 𝜇 −𝜇 <∆ 𝑡 ≤ −𝑡( ; ) Ex: n1  10; n 2  10 ; s1  0, 79 ; s 2  3, 59  v  9, 87    9  t / 2 ;   t0,25 ;9  2, 262 2 𝑆1 ) + 2 𝑆2 (*) 𝑡 ≥ 𝑡( ; ) 2 ( 𝑃 = (1 − ∅(𝑧)) 𝑃 = ∅(𝑧) 𝜇 −𝜇 >∆  s12 s22      n1 n2  (*)  ; (round  down to the nearest integer). 2 2  s12 / n1    s22 / n2  n1  1 n2  1 (**) 𝑇 = 𝑃 = (1 − ∅(𝑧)) 𝑃 = ∅(𝑧) . Read example 9.1 ; 9.2 page 348; ex9.4 page 351; ex9.7 page 359 Homework: 2b, 3, 6a, 7, 8a page 354 ; 19,28,32 page 362 23 𝑃 = 2(1 − 𝑃(𝑇 ≤ |𝑡|)) (**) 𝑃 = 1 − 𝑃 (𝑇 ≤ 𝑡) 𝑃 = 𝑃(𝑇 ≤ 𝑡) Lê Thị Mai Trang – Probability and Statistics 2/Inference concerning a diffeference between population proportion: (page 375) H 0 : p1  p2  0  p1  p 2   p. q  1  1   n1 n2  z 𝑝 ≠𝑝 𝑝 >𝑝 𝑝 <𝑝 |𝑧| ≥ 𝑧 / 𝑧≥𝑧 𝑧 ≤ −𝑧 𝑃 = 2(1 − ∅(𝑧|)) 𝑓≥ 𝑃 = 2(1 − 𝑃(𝐹 ≤ |𝑓|)) (***) 𝑃 = (1 − ∅(𝑧 )) 𝑃 = ∅(𝑧 ) x y  x y  p1  ; p2  ; p n1 n2 n1  n2  q  1 p Read example 9.11 page 376 Exercise: 49, 51, 53a page 380 3/ Inferences concerning two population variances: (page 382) 𝜎 =𝜎 𝑓 = 𝜎 ≠𝜎 𝑠21 𝑠22 𝐹( / ,𝑛 ,𝑛2 1 ) hay 𝑓 ≤ 𝐹( / ,𝑛1 ,𝑛2 ) 𝜎 >𝜎 𝑓 ≥ 𝐹( , 𝑛 1 ,𝑛2 ) 𝜎 <𝜎 𝑓 ≤ 𝐹( ,𝑛1 ,𝑛2 ) 𝑆21 𝑃 = (1 − 𝑃(𝐹 ≤ 𝑓)) 𝑃 = 𝑃(𝐹 ≤ 𝑓) (***) 𝐹 = 𝑆2 2 4/ Analysis of paired data: a/ A paired T test: (page 366) Let D  X  Y , where X and Y are the first and second observations, respectively, within an arbitrary pair. Then the expected difference is  D  1   2 To test hypotheses about 1  2 when data is paired, form the differences D1 , D 2 ,..., Dn and carry out a one-sample t test (based on df) on these differences. b/ The Paired t Confidence Interval: (page 368) The paired t CI for  D is d  t / 2, n 1 .s D / n A one-sided confidence bound results from retaining the relevant sign and replacing t / 2 by t . 24 Lê Thị Mai Trang – Probability and Statistics REVIEW CHAPTER 12: SIMPLE LINEAR REGRESSION and CORRELATION 1. The simple linear regression model: (page 469) - The variable whose value is fixed by the experimenter will be denoted by x and will be called the independent, predictor, or explanatory variable. - For fixed x, the second variable will be random; we denote this random variable and its observed value by Y and y, respectively, and refer to it as the dependent or response variable. - A picture of the data ( x1 , y1 ), ( x2 , y2 ),..., ( xn , yn ) called a scatter plot gives preliminary impressions about the nature of any relationship. - It appears that the value of y could be predicted from x by finding a line that is reasonably close to the points in the plot. In other words, there is evidence of a substantial linear relationship between the two variables. - Using method the least squares estimates to estimate the parameters of the regression  x ) (page 477) line , then the estimated regression line : y  A  Bx ( or y  o   1 2. Using calculator to find regression equation: Casio fx-570 ES: Step 1: (frequent column) Shift Mode Step 2: Mode 3.STAT 2. Do data entry then press AC. Step 3: Shift 1. 7.Reg 1.A 2. B 3. r : correlation Note: The linear regression equation is Y=A+BX 25 4 1.on Lê Thị Mai Trang – Probability and Statistics EX1: Observe a sample (X,Y): X 1 3 4 6 8 9 11 14 Y 1 2 4 4 5 7 8 9 Find the linear regression equation of X and Y? When X=12, find Y Answer: y = 0,6364 x + 0,5455 ; y = 8,1823 26

Statistics Review: Descriptive, Inferential, Probability

Related documents

Products

Support

Statistics Review: Descriptive, Inferential, Probability

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib