Lê Thị Mai Trang REVIEW CHAPTER 1: OVERVIEW AND DESCRIPTIVE STATISTICS Two types of statistics are descriptive statistics and inferential statistics. A. Descriptive statistics: (page 2) 1. Pictorial and Tabular Methods: 1.1 Stem-and-Leaf Displays 1.2 Dotplots 1.3 Histogram - Histogram for discrete data - Histogram for continuous data: select the class interval [a;b) + Class widths are equal. + Class widths are unequal. + Histogram shapes: 1 peak, 2 peaks, more than two peaks, symmetric , positively skewed , negatively skewed . + Multivariate Data 2. Measurement 2.1 Measures of location: (page 28) x n - Mean: x i1 i n Median ( x ): The sample median is obtained by first ordering the n observations from smallest to largest (with any repeated values included so that every sample observation appears in the ordered list). Then, 1 Lê Thị Mai Trang n 1 th ordered value if n is odd x 2 th th n n average of and 1 ordered values if n is even 2 2 Both mean and median describe where the data is centered, but they will not in general be equal because they focus on different aspects of the sample . Median likes the middle value in the sample . The sample median is very insensitive to outliers. - - - Trimmed means: a% (The mean is quite sensitive to a single outlier, whereas the median is impervious to many outliers . A trimmed mean is a compromise between mean and median) Quartiles, Percentiles,: The median (population or sample) divides the data set into two parts of equal size . quartiles divide the data set into four equal parts . Similarly, a data set (sample or population) can be even more finely divided using percentiles Sample Proportions: f m n 2.2 Measures of Variability: (page 35) ( x x) k - The sample variance: s 2 i 1 i n 1 2 xi S xx xi2 n 1 n 2 - The sample standard deviation: s s 2 2.3 Boxplots: There are two ways to make a boxplot: the median is included in both halves or not. - Boxplots that show outliers B. Inferential statistics: estimation, tests of hypotheses, regression C. Softwares: Minitab, R, SAS, S-plus 2 Lê Thị Mai Trang REVIEW CHAPTER 2: PROBABILITY 2.1 Sample spaces and events: (page 51) - - - Experiment: An experiment is any activity or process whose outcome is subject to uncertainty. Experiment => outcomes => all the outcomes = sample space Sample space : the set of possible outcomes . Event: an event is a subset of . Simple event: an event consists of exactly one outcome . Compound event: an event consists of more than one outcome. Empty set : an event consists of no outcome. Some relations from set theory: (page 53) a. The union: C = A + B or C A B . (at least one of the events occur) Probability of union of events: P(A B) P(A) P(B) P(AB) More generally : P(A B+C) P(A) P(B)+P(C)-P(AB)-P(AC)-P(BC)+P(ABC) If A and B are mutually exclusive events : P(A B) P(A) P(B) . C = A.B or C A B b. The intersection: A A c. The compliment: A or AC A.A . d. Mutually exclusive (or disjoint ) events : A.B . e. Independence: Let A and B be two events for some probability space. Event A is independent of event B if A is true does not affect the probability that B is true, vice versa. Expecially: P ( A.B ) P ( A).P ( B ) f. Partition: Events A ,A ,…,A are said to form a partition of if 1 2 n A1 A2 An Ai A j 1 i j n ( mutually exclusive) 3 Lê Thị Mai Trang 2.2 Axioms, Interpretations and Properties of Probability: (page 55) - Definition : P( A) - Properties: A mA n A : 0 P( A) 1 ; P() 1, P() 0 ; P(A) 1 P(A) 2.3 Counting techniques: (page 64) a. Addition rule: ( event 1 or 2 or….k will occur) m m1 m2 ... mk b. Multiplication rule: ( event 1 and 2 and….k will occur) n n1.n2 ...nk Pn n! c. Permutation: K - Permutation: Pn n! k !l ! sao cho k l n . d. Arrangement: ( k-permutations of n) An Pk ,n k n k e. Combination: Cnk n! ,k n (n k )! n! ,k n k !(n k )! 2.4 Conditional probability: (page 73) - Conditional probability: The conditional probability of A given B is defined by P(AB) P(A|B) P(B) - Probability of intersection of events: P(AB) P(A) . P(B|A) More generally : P(ABC) P(A) . P(B|A) . P(C|AB) Event A is independent of event B if and only if : P(AB) P(A)P(B) - Bernoulli formula: P ( X k ) C nk p k (1 p ) n k - The law of total probability: n P( B) P ( Ai ) P( B | Ai ) P( A1 ) P( B | A1 ) P( An ) P( B | An ) i 1 4 . Lê Thị Mai Trang - Bayes formula: P( Ai | B) P( Ai ) P( B | Ai ) , i 1,, n P( B) REVIEW CHAPTER 3: DISCRETE RANDOM VARIABLES and PROBABILITY DISTRIBUTIONS 3.1 Random variables: (page 93) A random variable is a real-valued function on . Classification of random variables: - Discrete random variable: if X is a finite set or a countably infinite set of possible outcomes. - Continuous random variable: if X is an uncountably infinite set of possible outcomes. 3.2 Probability distributions for discrete random variables: (page 96) a/ Probability mass function of X (pmf) is p ( x) P ( X x ) with p( x) 0 ; p( xi ) 1 . n i1 Table of pmfs X x1 …. p ( xi ) p ( x1 ) … b/ Cumulative distribution functions (CDF) : F ( x) xn p ( xn ) ( for both discrete and continuous random variable) F (x) P( X x ) Note: X is a discrete random variable, then F (x) P ( X x) p ( y ) ; x yx 5 Lê Thị Mai Trang 3.3 Expected value and variance: (page 106) a/ Expected value (mean value) : E ( X ) X x.p(x) x1 p(x1 ) x2 p(x 2 ) ... xn p(x n ) xD Properties: 1/ E(c) c , c const 2 / E(c.X ) c.E( X ) 3 / E( X Y ) E( X ) E(Y ) 4/ E( X.Y) E( X).E(Y) if X and Y are independent. 5/ E [ h ( X )] h ( x ). p (x) xD b/ Variance: V ( X ) 2 E ( X 2 ) ( EX )2 with E ( X 2 ) x 2 p (x) . x D The standard deviation (SD): V ( X ) Note: V ( X ) (x EX).p(x) E[(X EX)2 ] E ( X 2 ) ( EX )2 D Properties: 1/ V ( X ) 0 2 / V (c ) 0 , c const 3 / V (c. X ) c 2 .V ( X ) 4 / V ( X Y ) V ( X ) V (Y ) if X , Y are independent . 3.4 -3.6 The probability distributions ( discrete r.v ) 1/ The binomial probability distribution: X Bin( n, p ) (page 114) A binomial experiment: The experiment consists of a sequence of n smaller experiments called trials. . Each trial can result in one of the same two possible outcomes which denoted by success (S) and failure (F). The probability of success P(S) is probability p and P(F)=1-p Binomial distribution: Let X denote the total number of success in the n trials. The distribution of X is called the binomial distribution with parameters n and p . 6 Lê Thị Mai Trang Pmf of X is b( x, n, p) : Cnx p x (1 p ) n x b( x; n; p ) P ( X x ) 0 , x 0,1, 2,..., n , otherwise ( Bernoulli formula) CDF of X is B ( x; n; p ) : x B( x; n; p) P( X x) b( y; n; p) ; x 0,1,..., n y 0 Properties : EX np V ( X ) 2 npq ( voi q 1 p ) 2/ Hypergeometric distribution: (page 122) X has a hypergeometric distribution, then CMx .C Nn xM P ( X x) h( x; n, M , N) CNn EX n. M N ; V ( X ) n. p.(1 p). N n M with p N 1 N 3/ The negative binomial distribution: (page 125) ( different from your book) Let Y be the number of trials in a sequence of independent and identically distributed binomial trials until “ r ” outcomes occurs. P (Y n) Cnr11. p r 1.(1 p ) n r . p Properties: EY r p ; V (Y ) for n r . r (1 p) p2 4/ The Poisson distribution: X P ( ) with parameter 0 X is the number of successful trials . e . x p( x; ) P( X x) x! Properties: EX V ( X ) 7 (x 0,1,2,...) (page 128) Lê Thị Mai Trang Note: - X Bin ( n, p ) , n and p are known. When n is very large ( n > 50), p is very small, np 5 . Then X P ( ) X Bin ( n, p ) , n and p are unknown, but we have the average . Then X P( ) REVIEW CHAPTER 4: CONTINUOUS RANDOM VARIABLES and PROBABILITY DISTRIBUTIONS 4.1 Probability density function (pdf) : f X (page 138) X is a continuous-type random variable, if X has the pdf satisfying 1/ 2/ f (x) 0 , x R f (x)dx 1 3/ P ( a X b) P ( a X b) P (a X b) b P ( a X b) f (x) dx a 4/ P( X a) 0 4.2 Cumulative distribution functions (CDF) and expected values: a/ Cumulative distribution functions F(x) (X is a continuous r.v) (page 143) x F (x) P(X x) f (y)dy Note: F / ( x) f ( x) 8 Lê Thị Mai Trang Properties: P ( X a ) 1 F ( a ) ; P ( a X b ) F (b ) F ( a ) b/ Percentiles of a Continuous Distribution: (page 146) Let p be a number between 0 and 1. The (100p)th percentile of the distribution of a ( p) continuous rv X, denoted by ( p) , is defined by p F ( ( p)) f ( y )dy c/ Median : The median of a continuous distribution, denoted by , is the 50th percentile, so satisfies F ( ) 0.5 . That is, half the area under the density curve is to the left of and half is to the right of . d/ Expected value ( page 148): X E ( X ) x . f (x)dx . Note: h ( X ) E[h( X )] h(x). f (x)dx x . f ( x )dx V ( X ) E ( X ) ( EX ) x . f ( x ) dx e/ Variance (page 150) : 2 X 2 2 2 2 Note: The second way is V ( X ) E ( X )2 (x )2 . f ( x ) dx f/ The standard deviation (SD): V ( X ) . 9 Lê Thị Mai Trang 4.3 The Probability Distributions: ( continuous r.v): 1/ Uniform distribution: (page 140) X is uniformly distributed over the interval [a;b] if 1 f X (u ) b a 0 aub . else ab ( a b) 2 ; Var X Properties: EX 2 12 2/ The normal distribution: (page 152) Z N (0;1) a/ The standard normal distribution: - Pdf of Z : f (z;0,1) 1 2 , 0 ; 1 (page 153) 2 e z /2 z - CDF of Z : P ( Z z ) f ( y; 0,1) dy ( z ) P(Z a) (a) P(Z b) 1 (b) - Properties: P ( a Z b ) (b ) ( a ) (z) 1 when z 3.49 (z) 0 when z 3.49 b/ Percentiles of the standard normal distribution: (page 155) z will denote the value on the z axis for which of the area under the z curve lies to the right of z . Thus z is the 100(1- )th percentile of the standard normal distribution. 10 Lê Thị Mai Trang X N ( , 2 ) c/ The normal distribution: 1 2 e ( x ) /(2 ) - Pdf of X : f ( x; , ) - Properties: EX ; V ( X ) 2 2 2 (page 152) Note: If X N ( , 2 ) , then the standardized version of X , namely Z X N (0;1) is a standard normal random variable. d/ Approximating the Binomial Distribution: (page 160) Let X be a binomial rv based on n trials with success probability p. Then if the binomial probability histogram is not too skewed, X has approximately a normal distribution with np 2 npq . In particular, for a possible value of X x 0.5 np P ( X x ) B(x, n, p) np (1 p ) In practice, the approximation is adequate provided that both np 10 and n(1 p ) 10 , since there is then enough symmetry in the underlying binomial distribution. 3/ The exponential distribution : (page 165) - X has the exponential distribution with parameter 0 if its pdf is given by 11 Lê Thị Mai Trang - .e t ; x 0 . f (x) 0 ; otherwise 1 e x ; x 0 CDF of X is F (x; ) P (X x ) ; x0 0 1 1 Properties: ; 2 2 . 4/ The Gamma distributions: (page 167) - For 0 , the gamma function is defined by ( ) x 1e x dx . 0 - A continuous random variable X is said to have a gamma distribution if the pdf 1 x 1e x of X is f ( x; , ) ( ) 0 - ; x0 where the parameters ; ; otherwise satisfy 0, 0 . Properties: E ( X ) ; V ( X ) 2 2 The standard gamma distribution has 1 , so 1 1 x x e f ( x; ,1) ( ) 0 ; x0 ; otherwise y 1e y dy ( ) 0 x When X is a standard gamma r.v, the cdf of X : F ( x; ) ;x 0 - Proposition: Let X have a gamma distribution with parameters ; . Then for any x x>0 , the cdf of X is given by P( X x) F ( x; , ) F ; where F ( ; ) is the incomplete gamma function. 5/ The chi-squared distribution: X 2 with parameter (page 169) X is said to have a chi-squared distribution with parameter if the pdf of X is the gamma density with / 2 ; 2 . The pdf of a chi-squared rv is thus 1 x / 21e x /2 /2 f ( x; ) 2 ( / 2) 0 ; x0 . ; x0 12 Lê Thị Mai Trang The parameter is called the number of degrees of freedom (df) of X. The symbol 2 is often used in place of “chi-squared.” 6/ The Student distribution: 4.3 Other continuous distributions : ( read book) The Weibull distribution; The Beta distribution; The lognormal distribution. REVIEW CHAPTER 5: JOINT PROBABILITY DISTRIBUTIONS AND RANDOM SAMPLES (read book) 13 Lê Thị Mai Trang REVIEW CHAPTER 6,7: ESTIMATION - Population: (page 3) An investigation will typically focus on a well-defined collection of objects constituting a population of interest. - Sample: a subset of the population is selected in some prescribed manner - Population: - Sample: N : size of population n: size of a random sample MA mA : the number of population successes (or X ) : the number of population successes p p mA or f mA : sample proportion. n n MA : population proportion N : population mean x : sample mean 2 : population variance s 2 : sample variance : population standard deviation s : sample standard deviation ( or n1 ) Calculator fx570 ES for statistics: Step 1: (frequent column) Shift Mode Step 2: Mode 3.STAT 1. Input data, then press AC . Step 3: Shift 1. 5.Var 1.n 4 1.on 2. x 4. xn1 or s Note: Shift 1. 3.Edit 2. Del (delete data) Mode 1. ( exit) Calculator fx580 : Step 1: (frequent column) Shift Menu Step 2: Mode 6 1 Input data, then press AC . Step 3: OPTN 2 2 you can see n, x , xn1 or s 14 1 Lê Thị Mai Trang 1. Point estimation: (page 240) - Unbiased Estimators: (page 243) A point estimator is said to be an unbiased estimator of if E ( ) for every possible value of . x is an unbiased estimator of s2 is an unbiased estimator of 2 . p is an unbiased estimator of p . 2. Interval estimation or Confidence interval (CI): (page 267) - A confidence level : where 1 ( is significance level) 2.1. Confidence interval for population mean : (page 270) a/ Two-sided confidence bound : Let be th population mean ( is unkown). Find the CI for with confidence level 1 . - Sample mean: x - Precision (or error of estimation) : Case 1 (page 272): 2 is known: z/2 n (read ex 7.3 page 272) Case 2 (page 277) : 2 is unknown , n is sufficiently large (n > 40): z/2 s n (Using normal table Z) (read ex7.6 page 278) s ; n1 n 2 t Case 3: (page 285) 2 is unknown , n is small ( Using T – Student table) (read ex7.11 page 288 ; ex 7.12 page 289) - Conclusion : CI for population mean is ( x x ) ( See Student distribution in your book page 286) b/ One-sided confidence bound : (page 283) : read ex 7.10 page 283 15 Lê Thị Mai Trang - An upper confidence bound for is: - A lower confidence bound for is: Note: (z/2 ) 1 1 2 2 s n x z x z s n ; (z ) 1 ; x t; n1 ; x t; n1 s n ( using table Z) c/ Find sample size, confidence level : - Find sample size n : z / 2 s z .s n /2 n s A width is w 2 , and w 2 2.z /2 n 2 n 4.z2 /2 .s 2 w2 (read ex 7.4 page 273 ; ex7.7 page 279) - Find confidence level with precision known: z /2 s n z /2 s n z s n z s n ( z /2 ) 1 2 1 2 ? (two-sided) ( z ) ? (one-sided) 2.2. CI for population proportion p : a/ Two-sided confidence bound : (page 280) Let p be the proportion of “successes” in a population ( p is unknown), find the CI for p with confidence level 1 . m - Sample proportion: p A n - Precision ( Error of estimation) : z /2 p (1 p ) n (using table Z) p p ) - Conclusion : CI for population proportion is (p b/ One-sided confidence bound : - An upper confidence bound for p is: 16 s n Lê Thị Mai Trang p(1 p) p p z n - A lower confidence bound for p is: p(1 p) p p z n c/ Find sample size, confidence level : - Find sample size n : A width is w 2 , then n 4 z2 /2 - Find confidence level with precision known: - p (1 p) w2 z /2 p (1 p ) z /2 ... ( z / 2 ) .... ? (two-sided) n z p (1 p) z ... ( z ) .... ? (one-sided) n Homework: Population mean: Exercises 5a page 276 ; 12,13,14,15 page 283 ; 34,36a,37a page 293 ; 48,49c page 297 Population proportion: Exercise 19,20,21,22,23,25a page 284 ; 51a,54,56b page 297 2.3. CI for population variance 2 : a/ Two-sided confidence bound : Let 2 be the population variance ( 2 is unknown). CI for 2 with confidence level 1 is ( n 1) s 2 ( n 1) s 2 2 2 2 / 2; n1 1( / 2) ; n 1 Using chi-squared table 2 2 ( n 1) to fidn 2/ 2 and 12 ( / 2) . b/ One-sided confidence bound : - An upper confidence bound for 2 is: 2 17 (n 1)s2 12 ; n1 Lê Thị Mai Trang - A lower confidence bound for 2 is: (n 1)s2 2 ; n1 2 ( n 1) s 2 ( n 1) s 2 2 / 2; n1 12( / 2) ; n 1 c/ CI for population standard deviation : CHAPTER 8: TESTS OF HYPOTHESES BASED ON A SINGLE SAMPLE 1. Definitions: (page 301) - A statistical hypothesis: is a claim or assertion either about the value of a single parameter (population characteristic or characteristic of a probability distribution), about the values of several parameters, or about the form of an entire probability distribution. - In any hypothesis-testing problem, there are two contradictory hypotheses under consideration. The null hypothesis, denoted by H 0 , is the claim that is initially assumed to be true (the “prior belief” claim). The alternative hypothesis, denoted by H a , is the assertion that is contradictory to H 0 ,. The null hypothesis will be rejected in favor of the alternative hypothesis only if sample evidence suggests that H 0 is false. If the sample does not strongly contradict H 0 , we will continue to believe in the plausibility of the null hypothesis. The two possible conclusions from a hypothesis-testing analysis are then reject H 0 or fail to reject H 0 . 18 Lê Thị Mai Trang - The alternative to the null hypothesis H 0 : 0 will look like one of the following three assertions: 1. H a : 0 2. H a : 0 3. H a : 0 - A test procedure is specified by the following: (page 303) 1. A test statistic, a function of the sample data on which the decision (reject H 0 or do not reject H 0 ) is to be based 2. A rejection region, the set of all test statistic values for which H 0 will be rejected The null hypothesis will then be rejected if and only if the observed or computed test statistic value falls in the rejection region. - Errors in Hypothesis Testing: A type I error consists of rejecting the null hypothesis H 0 when it is true. A type II error involves not rejecting H 0 when H 0 is false. Reality Ho is true Ho is false …….. Type II error Type I error …….. Test Ho is true – not rejecting Ho is false - rejecting - Significance level : P (type I error ) ( And P (type II error ) ; 1 (page 307) ) 2. Test about population mean : Case 1: X has normal distribution with known 2 . (page 310) (read ex 8.6 page 312) The null hypothesis: H 0 : 0 The statistic Z: z x 0 . n 19 Lê Thị Mai Trang The alternative hypothesis: Rejection region H 0 with : H a : 0 z z : reject H 0 accept H a H a : 0 H a : 0 (The test procedure is upper-tailed) If z z : accept H 0 (The test procedure is lower-tailed) z z : reject H 0 accept H a If z z : accept H 0 (two tailed) z z / 2 : reject H 0 accept H a If z z / 2 : accept H 0 Case 2: Large sample (n>40) , X has normal distribution with unknown 2 : (page 314) (read ex 8.8 page 315) The null hypothesis: H 0 : 0 The statistic Z: z x 0 . n s 20 Lê Thị Mai Trang Rejection region H 0 with : The alternative hypothesis: H a : 0 (upper-tailed) z z H a : 0 (lower-tailed) z z H a : 0 (two tailed) z z / 2 Case 3: Small sample , X has Student distribution with unknown 2 : (page 316) (read ex 8.9 page 317) The null hypothesis: H 0 : 0 The statistic T: t x 0 . n s Rejection region H 0 with : The alternative hypothesis: H a : 0 (upper-tailed) t t , n1 H a : 0 (lower-tailed) t t , n1 H a : 0 (two tailed) t t / 2, n1 Homework: Page 321: exercises 19a, 20, 22b, 23, 24, 26, 28, 29a, 31, 32 3. Test concerning a population proportion p : (Large sample n. p0 10 ; n(1 p0 ) 10 ) (page 323) The null hypothesis: H 0 : p p0 The statistic Z: z ( p p0 ) n p0 (1 p0 ) 21 Lê Thị Mai Trang – Probability and Statistics Rejection region H0 with : The alternative hypothesis: H a : p p0 z z : reject H0 , accept H a ; If z z : accept H0 z z : reject H0 , accept H a ; H a : p p0 If z z : accept H0 z z / 2 H a : p p0 reject H0 , accept H a ; If z z / 2 : accept H0 Read example 8.11 page 324 Homework: 39,37a,38ab,39,42a page 327 4. P-value: (page 328) - The P-value is a probability. - This probability is calculated assuming that the null hypothesis is true. - Beware: The P-value is not the probability that H 0 is true, nor is it an error probability! - The smaller the P-value, the more evidence there is in the sample data against the null hypothesis and for the alternative hypothesis. - The P-value is the smallest significance level a at which the null hypothesis can be rejected. Because of this, the P-value is alternatively referred to as the observed significance level (OSL) for the data. - Decision rule based on the P-value: Select a significance level , (as before, the desired type I error probability). Then do not reject H 0 if P value reject H 0 if P value - The two procedures—the rejection region method and the P-value method—are in fact identical - P-value for Z Tests (normal): - P-value cho T Test (Student): 1 ( z ) ; for an upper tailed z test P value : P ( z ) ; for an lower tailed z test 2 1 ( z ) ; for a two tailed z test 22 Lê Thị Mai Trang – Probability and Statistics REVIEW CHAPTER 9: INFERENCES BASED ON TWO SAMPLES CASES H0 REJECTION REGION Ha P-VALUE 1/ Tests for a diffeference between two population means: (page 346) Population H 0 : 1 2 0 𝟐 𝑵(𝝁𝟏 , 𝝈𝟏 ); 𝑥̅ − 𝑦 − ∆ 𝟐 𝑧 = 𝑵(𝝁𝟐 , 𝝈𝟐 ) 𝟐 𝜎 𝜎 𝝈𝟏 ; 𝝈𝟐𝟐 known + 𝑛 𝑛 𝜇 −𝜇 ≠∆ 𝜇 −𝜇 >∆ 𝜇 −𝜇 <∆ |𝑧| ≥ 𝑧 / 𝑧≥𝑧 𝑧 ≤ −𝑧 𝑃 = 2(1 − ∅(|𝑧|)) H 0 : 1 2 0 𝑥̅ − 𝑦 − ∆ 𝑧= 𝑠 𝑠 + 𝑛 𝑛 𝜇 −𝜇 ≠∆ 𝜇 −𝜇 >∆ 𝜇 −𝜇 <∆ |𝑧| ≥ 𝑧 / 𝑧≥𝑧 𝑧 ≤ −𝑧 𝑃 = 2(1 − ∅(|𝑧|)) H 0 : 1 2 0 𝑥̅ − 𝑦 − ∆ 𝑡= 𝑠 𝑠 + 𝑛 𝑛 𝜇 −𝜇 ≠∆ Large sample 𝑛 > 40; 𝑛 > 40 𝝈𝟐𝟏 ; 𝝈𝟐𝟐 unknown Small sample, 𝝈𝟐𝟏 ; 𝝈𝟐𝟐 unknown |𝑡| ≥ 𝑡 ; 𝜇 −𝜇 <∆ 𝑡 ≤ −𝑡( ; ) Ex: n1 10; n 2 10 ; s1 0, 79 ; s 2 3, 59 v 9, 87 9 t / 2 ; t0,25 ;9 2, 262 2 𝑆1 ) + 2 𝑆2 (*) 𝑡 ≥ 𝑡( ; ) 2 ( 𝑃 = (1 − ∅(𝑧)) 𝑃 = ∅(𝑧) 𝜇 −𝜇 >∆ s12 s22 n1 n2 (*) ; (round down to the nearest integer). 2 2 s12 / n1 s22 / n2 n1 1 n2 1 (**) 𝑇 = 𝑃 = (1 − ∅(𝑧)) 𝑃 = ∅(𝑧) . Read example 9.1 ; 9.2 page 348; ex9.4 page 351; ex9.7 page 359 Homework: 2b, 3, 6a, 7, 8a page 354 ; 19,28,32 page 362 23 𝑃 = 2(1 − 𝑃(𝑇 ≤ |𝑡|)) (**) 𝑃 = 1 − 𝑃 (𝑇 ≤ 𝑡) 𝑃 = 𝑃(𝑇 ≤ 𝑡) Lê Thị Mai Trang – Probability and Statistics 2/Inference concerning a diffeference between population proportion: (page 375) H 0 : p1 p2 0 p1 p 2 p. q 1 1 n1 n2 z 𝑝 ≠𝑝 𝑝 >𝑝 𝑝 <𝑝 |𝑧| ≥ 𝑧 / 𝑧≥𝑧 𝑧 ≤ −𝑧 𝑃 = 2(1 − ∅(𝑧|)) 𝑓≥ 𝑃 = 2(1 − 𝑃(𝐹 ≤ |𝑓|)) (***) 𝑃 = (1 − ∅(𝑧 )) 𝑃 = ∅(𝑧 ) x y x y p1 ; p2 ; p n1 n2 n1 n2 q 1 p Read example 9.11 page 376 Exercise: 49, 51, 53a page 380 3/ Inferences concerning two population variances: (page 382) 𝜎 =𝜎 𝑓 = 𝜎 ≠𝜎 𝑠21 𝑠22 𝐹( / ,𝑛 ,𝑛2 1 ) hay 𝑓 ≤ 𝐹( / ,𝑛1 ,𝑛2 ) 𝜎 >𝜎 𝑓 ≥ 𝐹( , 𝑛 1 ,𝑛2 ) 𝜎 <𝜎 𝑓 ≤ 𝐹( ,𝑛1 ,𝑛2 ) 𝑆21 𝑃 = (1 − 𝑃(𝐹 ≤ 𝑓)) 𝑃 = 𝑃(𝐹 ≤ 𝑓) (***) 𝐹 = 𝑆2 2 4/ Analysis of paired data: a/ A paired T test: (page 366) Let D X Y , where X and Y are the first and second observations, respectively, within an arbitrary pair. Then the expected difference is D 1 2 To test hypotheses about 1 2 when data is paired, form the differences D1 , D 2 ,..., Dn and carry out a one-sample t test (based on df) on these differences. b/ The Paired t Confidence Interval: (page 368) The paired t CI for D is d t / 2, n 1 .s D / n A one-sided confidence bound results from retaining the relevant sign and replacing t / 2 by t . 24 Lê Thị Mai Trang – Probability and Statistics REVIEW CHAPTER 12: SIMPLE LINEAR REGRESSION and CORRELATION 1. The simple linear regression model: (page 469) - The variable whose value is fixed by the experimenter will be denoted by x and will be called the independent, predictor, or explanatory variable. - For fixed x, the second variable will be random; we denote this random variable and its observed value by Y and y, respectively, and refer to it as the dependent or response variable. - A picture of the data ( x1 , y1 ), ( x2 , y2 ),..., ( xn , yn ) called a scatter plot gives preliminary impressions about the nature of any relationship. - It appears that the value of y could be predicted from x by finding a line that is reasonably close to the points in the plot. In other words, there is evidence of a substantial linear relationship between the two variables. - Using method the least squares estimates to estimate the parameters of the regression x ) (page 477) line , then the estimated regression line : y A Bx ( or y o 1 2. Using calculator to find regression equation: Casio fx-570 ES: Step 1: (frequent column) Shift Mode Step 2: Mode 3.STAT 2. Do data entry then press AC. Step 3: Shift 1. 7.Reg 1.A 2. B 3. r : correlation Note: The linear regression equation is Y=A+BX 25 4 1.on Lê Thị Mai Trang – Probability and Statistics EX1: Observe a sample (X,Y): X 1 3 4 6 8 9 11 14 Y 1 2 4 4 5 7 8 9 Find the linear regression equation of X and Y? When X=12, find Y Answer: y = 0,6364 x + 0,5455 ; y = 8,1823 26