Review of Chapters 1 - 7 Chapter 1 Introduction What? Who? Why? Some Jargon o Subject: A unit (a person, animal, plant, etc.) on which we make observations or measurements.) o Population: Set of all subjects of interest o Sample: Part of the whole. o Random Sample: A sample of population units selected according to some rule of probability o Simple random sample: A sample selected in such a way that every unit in the population has an equal chance o Random variable: A measurement or observation on units in a random sample or in a population o Parameter: A number that summarizes the observations on a population. Parameters belong to a population. A parameter can be calculated only if we have population data. o Statistic (A number that summarizes the observations in a sample. A statistic belongs to a sample; it is calculated using sample data.) Chapters 1 – 7 Summary Fall 2007 Page 1 of 14 Chapter 2 Summarizing Data Identifying Types of Data o Categorical (Ordinal, Nominal) o Quantitative (Discrete, Continuous) Graphical Summaries o Bar Graph & Pie Chart [Categorical Data] o Dot plot, Histogram, Box-plot or Stem-andleaf display (Stem-plot) [Quantitative data] o Some Common Shapes Mound-shaped (Bell-shaped) Left Skewed Right Skewed o Checking for normality (very important for small samples) Numerical summaries o Sample Mean = X o Sample Variance = S2 = (S)2 > 0 o Sample Standard Deviation= S S 2 >0 o Sample proportion = p = X / n Learn to find the sample mean and sample standard deviation using your calculators. Chapter 3 Relation between Two Variables We will go over this chapter in detail later) Chapters 1 – 7 Summary Fall 2007 Page 2 of 14 Chapter 4: Gathering Data Through randomization (ALWAYS) Experimental vs. Observational Studies Simple Random Sampling o Sample, Random sample, Simple Random Sample (SRS) o Sampling error (ME = Margin of Error) o Non-sampling errors (Sources of bias) Statistically significant difference Experimental Design o Technical terms o Some types of experiments Some types of Observational Studies o Cross-sectional studies o Case-control studies o Prospective studies Chapters 1 – 7 Summary Fall 2007 Page 3 of 14 Chapter 5: Probability Statistical Experiments o 2 or more outcomes o Uncertainty Sample space and events o Sample space = S = {all possible outcomes of a statistical experiment} o An Event = Any subset of the sample space (that may contain one or more or all of none of the elements of s). Capital letters at the beginning of the alphabet are used to denote events. o Impossible event = = { } o Definite event = Sample space = S Probability of an event (A) f P( A ) if the outcomes are equally likely. o n f P( A ) lim = Long-run relative frequency o n n Chapters 1 – 7 Summary Fall 2007 Page 4 of 14 Basic Rules of Probability o General Rule: For any event A, 0 ≤ P(A) ≤ 1 o Complementary rule: P(Ac) = 1 – P(A) o Conditional Probability: The probability of observing an event given that (or conditional on) another event has occurred P( A and B ) P( A B ) P( A| B ) P( B ) P( B ) P( A and B ) P( A B ) P( B | A ) P( A ) P( A ) o Multiplication Rule of Probability By cross-multiplication of the above definition of conditional probability we can easily show the following equalities: P( A and B ) P( A ) P( B | A ) P( B ) P( A| B ) Special case of the multiplication rule: IF A and B are INDEPENDENT, Then P( A and B ) P( A ) P( B ) [See definition of independence below.] o Addition Rule of Probability P( A or B ) P( A ) P( B ) P( A and B ) Chapters 1 – 7 Summary Fall 2007 Page 5 of 14 Independence of events: o Four equivalent statements: A and B are independent events P(A and B) = P(A) × P(B) P(A | B) = P(A) P(B | A) = P(B) o The above statements are true ONLY when the events A and B are independent. o If one of the above is true, then all are true o If one of the above is false, then all are false. Chapters 1 – 7 Summary Fall 2007 Page 6 of 14 Chapter 6: Probability Distributions Random Variables (rv) Assume we know the values of the Population parameters (e.g., and ) Distribution of a discrete rv Discrete uniform distribution Binomial Distribution Distribution of a continuous rv Uniform Distribution The normal distribution N(, ) The t-distribution More to come Finding Probabilities given value of rv o Always sketch the problem. o Using Standard Normal Distribution For example, find P(Z < 1.23). o Using the t-distribution, For example, find P(T > 1.23) Finding value of the rv given probability o The opposite of the above processes, e.g., find the constants c and d such that P(Z > c) = 0.05, P(T < d) = 0.01 Chapters 1 – 7 Summary Fall 2007 Page 7 of 14 Chapter 7 Statistical Inference: Confidence Intervals The following are some important concepts you should have learned in Chapter 7 (some were also used in earlier chapters): A Parameter o It is a numerical summary of the population. o We calculate parameters using population data. However, we usually (almost always) do not have population data. Thus, the values of population parameters are almost always unknown. o So we estimate the population parameters using data from a random sample. A Statistic o It is a numerical summary of a sample. o We calculate the values of sample statistics using data from random samples. o We use sample statistics to make statistical inferences about the unknown population parameters. Interpret the following: “Statistics are everywhere, statistics is nowhere.” Richard L Sheaffer Chapters 1 – 7 Summary Fall 2007 Page 8 of 14 Statistical Inference is the process of making a statement about one or more population parameter using one or more sample statistic, obtained from a random and representative sample Types of Statistical Inference: o Point Estimation (Gives just one number as an estimate of the parameter) o Interval Estimation (or Confidence Interval, gives an interval of the number line as possible values of the parameter with some fixed confidence.) o Significance Tests (or Test of hypotheses is a process that yields a decision on whether a claim about the value of the parameter is supported by data observed from a random sample.) Chapters 1 – 7 Summary Fall 2007 Page 9 of 14 Some Point Estimators of parameters: Population Parameters (Unknown) Sample Statistics (Point Estimators) N Mean X X i 1 i X N N Standard Deviation X n ( X i 1 2 ) i X i 1 ( X i 1 i n n SX N X i X )2 n 1 N Proportion p X i 1 N i , 1 if i outcome is a "Success" Xi th 0 if i outcome is a "Failure" th X , n Where X = Number of “Success”s in the sample. p̂ o The Sample Mean is an unbiased point estimator of the population mean. o The Sample Standard Deviation is a point estimator of the population standard deviation. o The Sample Proportion is an unbiased point estimator of the population proportion. Chapters 1 – 7 Summary Fall 2007 Page 10 of 14 Properties of Estimators: o Unbiasedness: An estimator is said to be unbiased if the sampling distribution of the estimator is centered at the parameter. The sample mean is an unbiased estimator of the population mean, , since X E( X ) X Population Mean X = The average of the population of ALL sample means The sample proportion is an unbiased estimator of the population proportion, p because p̂ E( ˆp ) p Population Proportion p̂ = The average of the population of ALL sample proportions The sample standard deviation, S, is not unbiased; it has a small bias that decreases as n (sample size) increases. o Small Standard Error: The standard error of a statistic (an estimator) is the standard deviation of the sampling distribution of the statistic It describes the variability in the possible values of the statistic. A good estimator has a small standard error. The estimators mentioned above all have small standard errors. Chapters 1 – 7 Summary Fall 2007 Page 11 of 14 o Standard Errors of Estimators: * SE( X ) X / n Estimated by : Est. SE( X ) S / n * SE( ˆp ) p( 1 p ) n Estimated by : Est. SE( ˆp ) ˆp( 1 ˆp ) n Interval Estimation: o General Form: Estimator ME o ME = Margin of Error: Measures how accurate the estimate is likely to be in estimating the parameter. o ME = (Table value ) SE( Estimator ) o CI for : S * X ME X t n o CI for p: ˆp( 1 ˆp ) * ˆp ME ˆp z n o Make sure you remember how to use tables of standard normal distribution and t-distribution. o More to come later Chapters 1 – 7 Summary Fall 2007 Page 12 of 14 Significance Tests (or Tests of hypotheses) o A hypothesis is a statement (a sentence) about one or more population parameters. o Null hypothesis (Ho): is the statement that shows the status quo (current knowledge or belief). Equality is always with Ho. o Alternative hypothesis (Ha): is a statement of change (a claim). o Examples: Ho: = 0 vs. Ha: < 0; or Ho: = 0 vs. Ha: > 0; [1-sided Ha] or Ho: = 0 vs. Ha: 0; [2–sided Ha] Ho: p = p0 vs. Ha: p < p0; or Ho: p = p0 vs. Ha: p > p0; [1–sided Ha] or Ho: p = p0 vs. Ha: p p0; [2–sided Ha] More in Chapters 8 – 14. Chapters 1 – 7 Summary Fall 2007 Page 13 of 14 Determining sample size (n): 2 z o For estimating : n m o For estimating p: Note that p( 1 p ) . This () is maximum when p = ½, i.e., 2 ≤ ¼ and hence = ≤ ½. Thus the formula for determining the sample z2 z2 size is n p( 1 p ) 2 ≤ ¼× 2 for any p. m m o If n is not an integer, always round it up to the next integer Chapters 1 – 7 Summary Fall 2007 Page 14 of 14