AP Statistics Notes: Data Analysis, Modeling, and More

ADVANCED PLACEMENT Statistics Notes by: Jeremiah James dela Rosa Temecula, CA jeydicos@gmail.com Thank you, Stats Medic, Luke & Lindsey! ANALYSIS CHAPTER 1:DATA c ategorical of The distribution · - - data frequency(percent/proportion) relative · Total 1 marginal j · For a en one (Alc) percent C Total specific value categorical there between knowing two the variable categorical association is an variables value -percent the specific other for the (*(i) have Outlier:There For one who categorical a with of individuals Categorical · of the · · Quantitative variable (condition). war the gran bar will - mosaic standard range median use has units). + the compare the Let's The (which is greater (lesser) varies more) VARIABILIM (which CENTER the contextof & SD greatlyaffected byoutliers. a re The & IQR median are affected not middle (even value data) of IFthere · ~x-values minimum mean are outliers E (xi-x- mean is roughlysymmetrical. . . use Q3 = - mean & SD data n o . oF -> Find How (IQR):IQR (SD + unit) G1- (Quartile 1) (25%) ↳(Quartile 3) to 1.5xI P R · (15) typically varies From the med is skewed... & IQR median use distribution t he If · mean distribution the or = n (context) by outliers. a mes en datal of values med - problem. aboutOUTLIERS: Talk mean distributions. For both OUTLIERS in write always * plot d istributions. SHAPE OFboth outliers? Rule low outlier < 91-(1.5 outlier > 03 + high IQR) x (1.5 x IQR) SD: "the mean use -> -> cresistant) (for median) Interpreting Symmetric thing with socr+context same the the compare · for means Interquartile * units). (non-resitant) (odd value (SD):Sx deviation do Identify any graph 2x: two of maximum range: · segmented = value · · + (SD/IQR/ range Describe Variability: of - (mean/median n -average Measures at question says... "Compare the distribution." with Numbers Data I middle - (9ap). gaps bet. (values) outliers be Variability:The distribution of (context) · mean-average median is and s kewed Center: of Measures to (shape) (context) of (highestpoint) another of Ent #- 0 Describing seems SOCV+ context * * You graph peak at is side-by-side - · distribution." variable pie chart bar (bimodal) (mean/median) ofthe distribution Center:the a Graph ... the Shape:the distribution a share says "Describe Frequency: variable among value question another proportion value double peaked one for that left-skewed symmetric a categorical same right-skewed e uniform Lunimodal) of value. or individuals one of predicts the of if have relative conditional · roughly variable and a value specific Association proportion that individuals isdoc's s ranges HISTOGRAM I stem-and-leafplot) Frequency: or 4015 STEMPLOT OF have that jointrelative - value proportion or 589 3 DOTPLOT frequency: categorical variable. - · (Label) (P(c) relative specific value I !in 5678 I individuals B A 123 233477 percent - leaves y -8 Frequency (counts) variable e with Graphs Data stem Two-Way Table · Quantitative Displaying Displaying Categorical Data mean (X+ of what if? BOXPLOTS by about Q3 the min. if what max. 11 Outlier unit)." med &I min. ↳ or max is an outlier, whatwill be your min. I max.? - remove your outliers, label them on · · parameter:a statistic:a number number sample (or statement) that estimates - P E (or statement) describes that a · population. describes a sample. variance: · population - P - A ① SAAT> Edit LandlorL2 ② SAT>cak Five number S minimum - - - - 1 1-varstats - : Sx - I z-arstate 2: (SD) your min new (label) data - (Sx)< summary: QI(quartile 1;25th percentile) median Q3 (quartive3; maximum 75th percentile) boxplot. The is the lowest (same For max). CHAPTER 2:MODELING Location Describing in ways to describe Two percentiles * Distribution a For scores (z in scores) - 40 -45 individual many standard deviations x z - M 50 = 50 7 55 13 L 12 - - 55 - 60 60 - 65 O x Interpretation: Z- score where "context) va l u e = m 65 - 70 70 -75 mean standard deviations = the below) 3 1 (apore 28.9%1.2, 26.71.2, y L >4) 15.6.1, 91.11. 3 L -44 b. 712, 97.81 I >45 2.2 1. 100% CENTER LOC. SHAPE change a I change b I no SUBTRACHON ogive VARIABILIN allows you completed no DIVISION · Density curves * is - Normal models the distribution with has exactly * The - and any interval -mean. Height cur ve OF range, horizontal axis the proportion for allows an curve a of Five number vice-versa. & IPR, standard deviation E t is Why the since 9 be 2 2 units' height the area 12? the under equal to 1, then the the horizontal reciprocal the of A L D is axis should curve distance equal on to the heights. w x = 1 2 1 x = = ApproximatelyNormal 2 # would the balance by a described * the is - the is that curve roughlysymmetric, single-peaked, bell-shaped called completely a Normal curve. specified med X x <med left-skewed right-skewed the Finding -1 X= med under the area <Probability) always Find z-score than greater * First 1. ~ upper up per: N 95% A mal. . . . . . . M 0 mean I I(SD) count area:a re a M:mean 5: center Sd IF Finding the * z! Ez mean I 2(SD) count mean I * z - score, use M:0 & 8:1 these If values a re distribution is count plot scatter plot the (data values, individual Besi 20 30 - look For an in of ordered a almost quantitative linear approximately (x, y) pairs expected z-score For data Form i t ' salmost if linear, then 0.15% to approximately Normal. Normal Probability - close 68-95-99.7, then the 3(SD) of the Normal. area 3:invNorm Assessing Normality 68 A 20-0 8:1 an 99.7% ~ From 1. S M lower:.lawer u. (68-95-99.7 Rule) ~ between in tail I 0:1 Upper:1000 Finding a value 12Ft ! M:0 lower:2 context I t -or curve I Empirical Rule Normal distribution half. less than roughly symm, Any parameters:m e a n (M) &SP(O). bytwo -o do- did so so under area in density equal the point point, curve in material. solid of divides - distribution summary, percentage A - in interval. density curve areas 30 a to value individual in estimate the you density curve-pointa twhich made - location examine of thatFall observations curve C the the Median ↳ Uniform density on that a of the under above 4045505560657075 * variabilit b that: curve estimates all Mean a horizontal axis. the a re a values E it. underneath 1 - 20 Distributions variable above always to graph * centers,location change no - MULTIPLICATION) and 40 75th percentile percentile;M e d i a n :5 0th percentile; 03: 25th = an * 60 75.6% Data of ADDITION/ Curves 80 48.9% >34 percentile Density - L the Transformation 108 20% 22 91 33 (m+unit). Freg. 4.4 7 15.6% 9 - - of mean 4.4 % L (E-score) is = 0 SD 2 - Falls, whatdirection. in and vel. 7 the From the value mean 5 4 value 2 tells distribution how cumulative Freg. equal or an a us relative Freg. Cumulative Freg. Age it. to standardized. Frequencygraph (OGIVE) relative cumulative observations oF less than * OF DATA QUANTITATIVE Example: location p1 - DISTRIBUTIONS is each set. the scatter plot distribution is TNO-VARIABLE QUANTITATIVE DATA CHAPTER 3:EXPLORING variables · · Explanatoryvariable (input) o r explain predict helps - in changes correlation (r) - - response a onlyapplies to Response (predicted output) variable -only measures the - outcome study. a of no Least - Squares Line Regression ↑ negative moderate (response variable). (Actual (y LSRL (explanatory) (positive/negative) and and strong)." IE The Predicted) - - y) (y-context) actual was (above/below) cresidual) value predicted " · the that confirms between the in x (# for = context?" doesn't There · relationship. this in Features unusual seem residuals (explanatory variable) between relationship 0 · linear moderatelystrong, positive, perfect strong (r), itgiven. correlation is UNITS. correlation = (weak/moderate nonlinear Direction & Strength using describe r of association response) outlier * correlation linear none Unusual Feature: Interpretation: and the Direction:(positive Strength:weak a CC scatterplot? Form:linear "There is this How to describe Expanony weak 0 NO implycausation. obsstrong! LSRL - a not moderate correlation Scatterplot · number. a does - has shown. graph variable. · linear association. preferably, a correlation slope (b): the Equation: For (x-context) in predicted (y-context(increases/decreases) by(slope bx) SAT > Calc > 8: LinReg(a slope * increase every y)." unit of + predictedy · Residual 1.) Type explanator · - variable y-int back data predicted L1 in (SRL and · L2. Highlight (3 is a if a LINEAR MODEL is APPROPRIATE. to Find the Formulas r. look We NO for LEFTO VER CURVED PATTERN. Extrapolation LSRL Influential points · - variables range very (large residuals). large x-values. 2:take response Exponential & the values of the pth one always check the LSRL or an integer, p. root of the variable. or both scatterplot before concluding ifa by Models:take the Logarithmic (log (base 10) * the 1:raise Power Model:Option explanatory variable Option · outside the greatly affectcorrelation regression calculations. pattern of High Leverage which calculated. was out - that a re data of can and - bx - = explanatory - equation: LSRL y a = the of · the number ly-context) (s+ unit) away predicted bythe From LSRL (context)." = logarithm (basee) In OF variables. & residual plot LINEAR MODEL is (r2% (r4: the of in variability ly-context) is accounted For bythe (x-context)." at Y, use these & r,sx, sy, x the "The actual determination of COCFFicient "About PLOT. RESIDUAL I t identifies X PLAN ATORY E · 0, ly-int)." (s): x with This 3 i Outliers by-contexts is RESIDUAL. E · standard deviation the · · (x-context) is is typically about Equation. table. the to 4) Choose b "When click and It given y-int(a): Calculator: in your Calculate Go + ↳ 2) 3) bx a = APPROPRIATE LSRL simple random sample (SRS) · CHAPTER 4: COLLECTING DATA xx SRS? HOW TO CHOOSE AN · Technology: individual Label: Label each - Randomize: Use an RNG to get n divide the population in that form the sample. to the integers. strata ↑able D: are * n umerical label with a distinct with the same number of digits two digits use using if (e.g., use Oto NN, o r three digits NNN) to Randomize: Read consecutive the appropriate groups of digits of across a length from left to right line in table D, lignore repeats, if necessary) until number of selected. size desired are sample Select: Choose the individuals that correspond randomly Label: individuals take (no replacement). based Select: Group individuals paper they got. on the slip of easier to the that a but to ttempt the influence treatments that 2. BOMIZED In units are Treatments unit receives a (h) individuals combine & (n) & random Block -> assignment (n) - compare compare f Treatment) (r) t -> treatments (x) example: MPD, the I Treatment I Ch) eshmen # a randomly grade levels - random -> number ↑ generator - (100) - &ophomores -> (400) within each pair. In others, each in are worker a productivity DESION uses blocks paired and two treatments assigned number additionalencompare Group -> (10) F Block very similar experimental two Group generator - Group 2 - > the same BLOCK DESIGN some compare & monomn andreassignmenttorepeaterintothebroch (conditions) -> scompare tex random ↑ design for comparing two of size -> experimental common a 7 lighting ED moman PAIRS assigned (n) 20 companies individuals to measure their responses. on tabled) to assign effective. 2->Treatment 2 number generator X I compare a inClass sources 1 combine results & compare X compare a in-class -> scores 2208) way in such a cannot be distinguished from each other. · placebo: treatment active otherwise no an · to has is like other treatments. ·treatment: a applied that ingredient, but specific condition the individuals in · subjects: human beings are experimental units. variable Factors: an explanatory that is manipulated cause a change variable. and may in the response inactive treatment. · survey question. Description: - Form blocks based on grade level (Individuals + Blocks) because scores on the geometry final exam are likely to vary by grade level since Freshmen who takes geometry tend to be more advanced in their math coursework. - Assign each individual student from 1 to 100 for Freshmen. Use a random number generator to obtain 50 random integers (random assignment), select these students and assign them to online (Block 1 + Treatment 1). The remaining students are assigned to teacher taught (Block 1 + Treatment 2). - Assigned each individual student from 1 to 400 for Sophomores. Use a random number generator again to obtain 200 random integers (random assignment), select these students and assign them to online (Block 2 + Treatment 1). The remaining students are assigned to teacher taught (Block 2 + Treatment 2). - At the end of the course, let them take the same geometry final exam and compare the scores (compare). - Once all students have taken the test, and the scores have been compared for each treatment for each block, then combine the results and compare (combine and compare). · treatment a subject receiving. can · divingassignment: experimental rare units are assigned to treatments using a chance process. sampling variability different random samples the people who interact with them and measure the response variable don't know subject is be distinguished. way the response to the treatments. of the same size from the same pop. produce or a each units that are known before the experiment to be similar in some that is expected to affect single-blind: either the subjects which treatment treatment replication: giving units so enough experimental that any difference in the effects block: a group of experimental those is for all experimental · who interact with them and measure the which response variable know nor control: keeping other variables constant units. ·double-blind: neither the subj · the · even an treatment is randomly assigned. control group: used to provide baseline for comparing the a effects of other treatments. placebo effect: describes the fact that some subjects in an experiment will respond favorably to any treatment, experiment. a a Factors x levels · experimental unit: the object to which a * response variable effects on systematic to combination of treatments? when associated that their are two variables values of Factor- ·confounding: occurs A track coach wants to know whether his long-distance runners are faster running the track clockwise or counterclockwise. Design an experiment that uses a matched-pairs design to investigate this question. Explain your method of pairing. levels: different · VOCABS988 random order. Description: Have each long distance runner race 1 mile in each direction. Some runners are faster than others, so using each runner as his or her own “pair” accounts for variation in 1-mile race times among the runners. For each runner, randomly assign the order in which the treatments (clockwise a nd counterclockwise) are assigned — by flipping a coin. Heads indicates the runner will race clockwise first and counterclockwise second; tails indicates the runner will race counterclockwise first and clockwise second. Allow adequate recovery time between the races. For each runner, record the 1-mile race times for each direction. 1503 online (200) random experimental both treatments example: online a Description: - Number the companies from 1 to 20 (20 individuals) - Use a random number generator to produce 10 different random integers from 1 to 20 (random assignment) - Select the first 10 different integers (Group 1) and assign them to additional lighting (Treatment 1) - Select the remaining companies (Group 2) and assign them to the same lighting (Treatment 2). - Compare the increase in worker productivity between the two groups. on X (n) random random delibaretly impose treatments is is answers units so enough experimental treatments can be distinguished effects of the treatment For pattern of inaccurate process (slips of paper, units to treatments. This helps create if a treatment decide example: respons. Experimental - RN6, chosen sample can't be contacted. there treatments. Group" -> Treatment 1 measures variables of interest more likely when response bias: occurs RANDOMIZED DESIGN individuals does not or of the less are individual the population. unitsare mmmminLEET experimental Observational and an not representative of the a members population is ordered, the sample may between groups. from chance differences TYPES OF STUDIES observes some be be chosen or cannot to chosen in a sample. individual. when occurs in -nonresponse: iF there's a pattern * over or WRONG ??? occurs when undercoverage: first individual, and choose every kth be - Ito k to the way the population compares two that a chance each assignment - the can AVOID BIAS, WE MIGHT else can go but what sampling value from a identify in the sample. design a Replication: Impose individuals -> · method underestimate treatments are imposed roughly equivalent groups before avoid Control: keep other variables the same for all groups. Control helps variation in responses, making it confounding and reduces or paper, select all the invitation. sampling lead to BIAS, to an which leads of the study. experimental and one Randomly select time & money. Random Assignment: Use · let Use Comparison: · · letters on identical hat, shuttle the papers study b/c of open BASIC PRINCIPLES OF EXPERIMENTAL DESIGN: s lips of paper. bowl random individuals choose both of these * individual selects every 4th & based on the population size desired sample size. Randomly but similar between saves * - to be a part of the - different within (HETEROGENEOUS than SRSs. · Randomize: Put in a systematic * c lusters are precise estimates of unknown population paper: or x ter · reach to ·voluntary sampling x in the chosen clusters included are more Write corresponding numbers individuals samples tend to give labels. selected Slips of to the similar within between, stratified and values - iRNEeD these clusters and CHOMOGENEOUS), but diFF Label: Label each individual 001 near each other. to SRSS these combine select: Choose the individuals who correspond of separate Convenience sampling ichooses individuals easiest identical slips of paper into non-overlapping groups individuals that are located response. Then choose a stratum & then different integers (ignore repeats, if necessary). · · population divide the - might affect their WITHOUT REPLACEMENT SRS. when doing to be chosen chance cluster sampling · Strata (similar in some way) N. 1 to From stratified random sampling SAMPLES remains SAMPLING to do * make sure the same given size a # + SAMPLE gives every possible sample of MPLING WELL mining TYPES OF SAMPLING POPULATION - & iFF. · estimates. Statistically significant: the · bserved results of a are too unusual to be by chance alone. study explained SIZE SAMPLING VARIABILIM & SAMPLE samples tend to larger random closer are produce estimates that the true population value than smaller random samples. In other to larger words, estimates from samples are precise. more SAANSTALLY SI6 NIFI CANT % 15%, yes, itis * IF statistically significant itmay have and happened by chance alone % 75%, no, it * If is mayhave happened by coincidence only. PROCESS OF 1.) 2.) make a simulation and Identifyhow many 3) the association between the explanatory and the chance or 3) study explains of whatthe which cause Random mean. to the difference and rule mightbe rull. and the of sample estimates margin I of conducting random * Randomize (oreC it'svery here, but All thesenotes * will essential to used so do you cannot 12 since chapter 4 is all your future and information RANDOMLY Inference aboutpopulation: Inference about Inference about RANDOM ALWAYS IS IMPORTANT 888 ASSIGNED TO GROUPS? NO YES cause &eFFect:YES Inference aboutpopulation: projects who First. & ata data. There are INDNIDUALS studies collecting your aboutcollecting lot of know these things since Chapter in 7 through a WERE be much effect. NO interval an ↑ individuals of SELECTED? creates plausible values. select the individuals were YES variability error: margin of - All individual data must be kept confidential. Only statistical summaries for groups of subjects YES RANDOMLY effect in the use the individuals of INDIVIDUALS shows the assignment: INFERENCE OF SUMMARY: WERE continued application time. The - All individuals who are subjects in a study must give their informed consent before data are collected. about groups allows inference cause individuals don't. - All planned studies must be reviewed in advance by an institutional review board charged with protecting the safety and well-being of the subjects. dotplot. Assignment some Apexam chosen. · associated with are possible. the population allow inference about From group is believableat or Random selection to one The individuals have consistency in the problem. of the · variable specific of the explanatory variable long link the association. Alleged cause precedes effectin the study is statistically state if not in the context significant THE SCOPE a response variable. This reduces other some 4) dots are 5% that stronger responses. explanatory variable, and equal to the mean difference. the one Larger values than are greater many dots to strong. is consistent. Many studies ordifferent kinds shows The association is the percentage of calculate 5)compare EXPERIMENT: explanatory variable 2) 4) or the between strong. The association and the response variable greater or equal to in mean from step 1. how is WHEN WE CAN'T DO AN * (P-VALUES PERCENAGE in ESTABLISHING CAUSATON In sampling IDENTFYING TE the difference Identify 1) FOR placemeans are statistically significant not and it I CRITRIA No cause &eFFect:YES Inference aboutpopulation: Inference about YES cause &CFFest: NO Inference aboutpopulation: No Inference about cause &eFFert:NO CHAPTER 5:PROBABILITY Minim Definitions; Formulas: · · P(A) A outcomes in event number of = + Addition Rule: P(A) P(B) P(AUB) · - + = · mutually exclusive · simulation P(AnB) · P(A) imitates a event can random happen at process in such time. the same simulated way that a consistent with real-world outcomes. simulation process: Describe ① ② Perform ③ Use Y* = Independent no - outcomes are P(given event occurs) P(B(A) - ("given that"): Conditional Probabilities approaches the true probability. = General proportion process, the trials ofany more and more we observe if - random P(AUB) P(A) P(B) · determined purely to 1. law of large numbers · for mutually exclusive events: Rule are 0 and 1. between outcome - -must add P(A) - = Addition probability · rule: P(AY) 1 that outcomes process-generates by chance. total number of outcomes in sample space Complement · random · · sample space · conditional - how the result · answer to possible reptition) trial (one the question. outcomes. - probability thatone eventhappens given probability Independent events - if knowing occurred other P(A(BY P(A(B) = does event known to have is event another Events: one trials (repetitions) many all list of simulate you will whether or not will change not one the that happened. has event probability that the happen. = or P(B) · P(B(AC) P(B(A) ·OR Multiplication Rule: General complement = = P(A 1B) P(A). P(B(A) · AND = · Multiplication Rule for Independent P(A1B) P(A) P(B) . = · At one least probability Rule: one) DOES SMALL TO HAVE TO FOR IS UNUSUAL? US know to ifthere is the From ③ Count the ④ the the total SAY THAT convincing evidence or proportion From simulation. number number dots of out of simulation. of IF: proportion of dots * From #3 (5%, it is statistically significant based on the question. * proportion of dots From #3 (1) intersection - As 5%, itis not statistically significant based A A B 1 2 - Ba 3 4 8 problem. ② Perform - - question? ① Identifythe percentage the (V) N 5% LESS TAN How union PROBABILIT THIS BE IT - =1-P (none) p (at least HOW Events: on the question. r a E C B A 213 I AR-pcanc ↓ p(A) X *sa ACPIBIAY: B P(C/A*c P(A1B) P(A'nB) - P(Are) - CHAPTER 6:Random Variables & ProbabilityDistributions DEFINITONS: FORMULAS: variables Random Discrete · missing P(X k), where i s Fk) 1 P(X P(X k) * = + + + + . variables discrete random = · Mx E(X) (X,)(P.) (x2)(P2) + = + = 54 (x, * = Mx) - (P.) (xz - + Mx)YP2) (Xi + . - Mx)2 (pi) mean process. * standard deviation of a discrete variable x_x*neige # -x of the * Uniform density curve t E the Normal density curve-used in deviations transformation Random same My mean: y: b.*: b same on a independent variable variables random = a bMx (S x + 1) Sum = * Ms= Mx + mean: = - when N OY ② Find · Normal MS the or OS the random Mr. Op. or ⑥ Y = the Find sum random or the difference of values binomial distribution Use normaleaf to Find the probability. possible successes of of X. of by10 p. specified - other trials. of same count - anything about us variable (expected value) of a binomial random mean * probability. variable a binomial random average value of after many trials. - - standard deviation -how - typicallyvary value a binomial random variable a of * * * * variable o n e trial of outcome tell trials. number of Fixed " probability? p of same z-score the the outcome Number? values the - E W binomial "Fail" or does not independent. · Find p)* P(X x) (Y)(p) (1 Independent?Knowing I iFX and Y are + ("success") occurs. particular outcome U bo + variables: variables Random Binomial Ob = s.d.: process thata random the same Binary?"success"or - mean:aMx b My trials of independent we do when - the other variable. of standard deviation. the square of - binomial setting · R.V. = combining ① Find (82) variance + = s.d.:Os combination of My I Ox+ O O Ox + OY variance: Linear * = Mp Mx My knowing the valueofo n e variable change the probability distribution = (D X -y) difference * I standard mean given. (* Find z-scores) I cannot help us predictt h e when - - 1b10x s.d.:Ox are does not variables Random Combining given, without value of y + · · are standard deviation. and mean * - * values used in - the = VARIABILIT to in an interval. take any value can (e.g. Foot length, salary, height, unfixed time, etc.) - or process. random - typically varies value a many, many trials after mean variable random Continuous · z xM Linear * SHAPE C ENTER the From probabilitiesin variables Random same same variable random -measures how much Height Y- variable random random Random Transforming T your family, etc.) the many, many trials of of value average - = Continuous · .. + in a discrete (expected value) of * - X -values them. gaps bet. with children (e.g. shoe size, number of (xi) (Pi) + N Ox * · - ... possible of Fixed set - values = * t probabilities. their values and k D .. P(X k+n) P(XIk) P(X k) P(X = * random process. a Probability distribution-random variables of possible · - = outcome an numerical values of - From probabilities = = random variable · t h e other From from the mean = = -> P(X(x) -> binomedr (trial: _iP:-ix-value: -> 1-binomedr(trial: _iP:-ix-value: -> (trial: _iP:-ix-value: 1- binomedr · 1) a binomial if setting;notindependent, we · Large counts - standard deviation:Ox * get 10% condition: n <0.10N, samplesize; N population when n= the outcome Large Counts Condition (LCC):npI 10; - o n e trial of anything about us other trials. of success. same probability? p of same probability. size = n(1-p) I 10 outcome tell t rials it took to get Trial?number of a Geometric · variable random - a * "Fail" the or does not ↳ until we trials number of the success. a - 4-P) the independent. trials, perform independent we Independent?Knowing = as t h e probability identifythat Binary?"success"or n I = * when we record 1) us individual each X is approximately Normal. distribution of Geometric setting L treat to helps - with 10% condition to proceed the use condition 11) = many calculations (expected value):Mx E(X) up mean * - can (trial: _iP:-ix-value:1) binomedr P(X(x) P(X)x) 10% condition -> = P(Xx) · binompdr (trial: _iP:-ix-value:1) P(X x) trials. after · Geometric distribution - trials number of it t akes to get success. probability success of starting of any o trial, with 1. x-values - · Random Geometric P(X x) (1 * = = P(X x) = P(X=x) P(X(x) - p) variables * INTERPRETATIONS -(p) _ix-value:1) geometpdf (p: · P(x)x) -> · standard deviation:Ox * selected, randomly (context) average the is (M+unit)." about E(X) t is small shape is right-skewed, shape is left-shewed, · = = Standard Deviation:"If many, selected, the randomly F unit were many = of the distribution:always when the sample size · "Itmany, many mint) were Mean: - mean (expected value):Mx · " -> * · (probateages chance/probability OF 1) *) geometcalf (p:_ix-value: #) 1- geometraf (p:_ix-value: 1) geometraf (p:_ix-value: 1 geometraf (p:_ix-value: P(X1x) -> * shape a is (context). -> -> CONTEXT: IN Probability:"There right-skewed (withoutapplying the p (0.5 10% From condition) · Describing = 0.5 the (context) about10+ unit) by mean Random variables Describe p>0.5 Normal, p shape is approximately typicallyvaries (M+ unit) of (Discrete, continuous, Binomial the · SHAPE, CENTER & or Geometric): VARIABILIM. CHAPTER 7:SAMPLING DISTRIBUTION Parameter (p, M,0) · · Statistic (I, X,sx) char. OFSAMPLE. some Distribution sampling · describes that number - Population Distribution · numberthatdecanon. - in - ALL individuals of values - a sample. values ofALL POSSIBLE the samples of same size population. increasing From the same the sample size decreases the variability ofthe * when sampling distribution. unbiased estimator · center (MP o r the is - the Central limit theorem · true to the equal MI) is value of parameter (Por M). ifthe population distribution - not is ① the Assume that the claim ② Make a simulation the of 3 0 From the simulation... the count total the dots of number out Given o 5%, thering evidence proportion of dots * #1 From based · 10% the of population?"Assuming all · that less than is "In SRSS OFsize (sample size, n), (subjectincontext) OF typically varies (SD+unit) (two statistics, difference) by about (proportion/mean) (proportion (mean) From = E POPULAMON DIST. Approx. Normal by Large P2/Fi-x2 the true difference (P1-P2/M, -M2+ unit). of PROPORTONS IN DIFFERENCE IN MEANS - average/means - whensamplingwithoutrepairmanas Fre a Mp # L0.10N I P(1 P) - Op Mx = n M = I or = I and I are usually Found OF before the words or less than values on questions greater than. => EM_I z · P ⑧ P - = - z = normal car(lower, the than greater than or Mx Ox . i s less that value a OR what is sampling without replacement? * It's it doesn't become INDEPENDENT. when when Meaning that t hat OF For we in take out the deck Ors ome a There's card and one deck of do not (no replacement), then event changes identifyifwe do 10% - standard a because cards, when back putit the probability it'sone card 30 n, 12 = 30 mentines"asn Mx, xz M, M2 - = I a In 2 When you not n eed at hand such less than, greater than as the value statistic. Therefore, its or within the when finding theirdifferences X , >xz I - x, - x2 >0 ,cz x,- z o + · e - or t ae M (P P2) - - z 2 = = ME, z , -P2 - P1) (x) xz) - - (X, x) (lower, upper, normal car probability the than greater in or than two values between (4, M) - + P2) find - = nz to Ex, Xz - ·z + use Mx,-x - = - (P. -P2) - P2(1 xu = a - e ag" = M:0, 0:1) is less that value OR (use two z-scores). less. to do the condition. see "Isubjects) then the need to the statement LIKE MESE," 10% condition be shown since onlygoing to doesn't we're generalize to the given sample, not the population. ⑧ i "wording"ofthe question that will help * us in example, occurs, the probability event an changes already. event j Approximately Normal by central Limit theorem (CLT) n2 P2 ] 10 n, P, = 10 - ·9.z =(P,-P2) two values (use two z-scores). in between - I POPULATIONNORMAL (LCC) ? -· Mir, 0:1) upper, probability find to - of P(1 p) - use x = *- ·z = ⑧ M z = condition Large counts Erin isane ↑ n P = ApproximatelyNormal by in a e~ = 30 n(1 P) = 10 proportions/ Fractions/percents BOTH IE ⑰ NORMAL (C(T) n the in incontext) typicallyvaries by (subject or 1 NORMAL IS T heorem Central Limit up I 10 ↳ context) in Given two statistics: DIFFERENCE IFskewed: conditiont counts the From (P/M+ unit)." OF (P, "the difference is met." the (proportion/mean) - IF 10% OF condition context). Then, 10% in sample true sample sample size, n) (population (one statistic) Standard Deviation MEAN APPROX. NORMAL the on INTERPRETATONS: statistic: one the on question. by CLT. PROPORTON of simulation. of number convincing evidence based question. about (SP+unit) FORMULAS: #1 From thereis * 5% dots proportion of * the sample Normal, but approx. Normal is true. sampling distribution. is large enough (n = 30), is sampling distribution size IF: CLAIMS: I DEFINITIONS: ↓ zP/m P/M I lower ber Tower upper z, ↓ lower Plu zz - upper 17 CHAPTER 8: ESTIMATING The point estimator:a chosen statistic (B, x, sx) will provide reasonable that a aboutthe estimate do parameter Confidence Internal:gives values STATE:parameter & an of based (C1) sample a on data. the of the · margin of (capture rate) method that produces other · * some 99%: z* 2.576 = Formulass,reminders: (A,B) interval Given the A is the lower where is the B value upper ISTAT value #A = sample size:(when I unknown, is * (it ; * margin ofe r ro r- has decimals use -> n up) C are We z (1 P) - * n A B I interval confidentthatthe proportion B A tr from in [parameter context]. OF -app- er ror of FOR critical value z*= sample size = STATE:parameter & C 1: confidence interval For P the 1 p-Pc=true difference - 2 context [context]. proportion inference method & PLAN: 8 ⑳ (1);ME (8) of conditions: inet. E Two-sample I interval For p, -P2 ⑳ ⑧ interval) ⑧ ME(b) in few cond. · · all 8 E sample nc n. (1 confidence internal contains convincing G*Only I, are From A C1 B E true parameter:P = * M true = captures proportion mean of the [parate,t] 3 the solve ok for SD sampling distribution of p, Pz is approx.Nor mal - Option 1 is * recommended) be careful I 0 with the Formula E r ror Margin of point estimate (Statistic) = (**) (SEP) I P,(1 B.) - - 5 z I * n, 2-Prop Int A Yz(1 52) - + 12 B Prop IInt -> Tests B :2 I - -> X,:n,P, X2:UzPz n,:n , nz:z A B I C-level:C ... we confidence level: (C1) confident are captures the that true p.-Pc= calculate C% interval For each, about will capture the [Parmte"]" the interval difference in [parameter OF confidence them OF we can CONCLUDE:Interpretation of "Ifwe take many, many samples it is replacement = ... and campsite so * needs marks generalize population. can so & p. On the calculator the interval we the R2 = * confident that 110 B.) I 10 Pz using ISTAT so to & shown = z* interpretations: Confidence be M, = interval:(A, B) - & RED BLACK to context). evidence (in consentences calculations:(two options. DO: the * - (n2+ context). of 10.10N Large counts:n , P, -its have context) sample of(n.+ independentrandom Moanainmincool making: isthe Random:random * ⑳ a E r ror Margin of (**) (SEP) I B true margin ME= (narrower · (Statistic) = sample size, (1); "We Formula -neeeep: E sample prop. ⑳ · the CONCLUDE:Interpretation P some ok for SD ... I wider - with point estimate 1 = SEP P(1-B) confidence level D ecision solve we can (sample size) ofe r ro r. . . * I 0 A, in when is replacement & recommended) -> n margin without sampling so distribution Using 1-PropIInt A:1-PropIInt Tests = * generalize population. can C-level:C 2 : of Standard error we the n: - where n * what? s o the sampling A is approx.Nor mal I On the calculator p 0.5) round ·a so to these, be careful * = * A B pointestimate Option 1 is needs marks = X:np * * For shown -= P z* Some be to ⑪ = = (n+context) sample of Only BLACK & RED 95%:2* 1.960 90%:z* 1.645 I title reminder Inotes - conditions: calculations:(two options. * = remains the same name with replacement. 2* 80%:2* 1.282 - - calculations s how do not bias. common & samples like problem says condition. Thati s sampling 10% the If (2*):invnorm (area:--iM:0;0:1) critical value & & based problem the on ext]. "... * like nonresponse, undercoverage, O response s h ow true proportion of p= for lii" variability, e r ro r of sources replace & & method Random:random sampling & inet. &One-sample z interval For p interal. error:Only accounts for not legend: C 1: confidence interval PLAN:Inference rate (C1):success level appe - so * population parameter unknown (p, M, o) confidence SID C??8 or interval ofplausible an (believable) · FOUR-STEP PROCESS this - FOR (P, M,0) · question says... &011.SEETIGE s! IAEeIIeE CIC% vocabulary: · WITH CONFIDENCE PROPORTIONS context]. t from (1-2 context) in the proportions convincing evidence? +, t 1st -> is proportion; greater -- Endproportion i- - + noincitethe diFFerence bIC internal contains 0. a CHAPTER 9:TESTING CLAIMS PROPORTIONS ABOUT The question says: Isthe 111 ade...? FOUR-STEP PROCESS do this SID C??8 vocabulary: significance test:procedure for using · observed or decide bet. two data to competing claims (hypotheses). · null hypothesis (Ho): evidence against Parameterall Ho: STATE: valuee We For Ho · that S cond. than the evidence 10% when Only convincing evidence not · Type rejectHo, I Error:we when For Ho Ha (in ISTAT without sampling for SD solve we can ok is replacement & with the formula Statistic standardized teststatistic (z).= - Po Po = Po(1 P-value= - paramete SD M lower: normalcdF ; upper: = 0 i0:1 PO) - P-value n Tests 5 :1-PropITest -> -> H On the calculator (two-sided) Po * Po: Po npoot mere - <Po&> Po (one-sided) n X: generalize population. can so these, * when we the distribution 1-PropITest Using context). so to s o the sampling A is approx.Nor mal be careful ② = z = True. is I title what? sampling * z true; gives convincing evidence For Ha n = Ho. There is is to - - so * recommended) Option 1 is needs shown be to x In P samples For n(1-p)? 10 & 10 marks & RED BLACK * 1 0 TO REJECT FAIL "... like condition. Thati s here!!! use NOT do convincing evidence For Ha (in context). P-value) &: * 10% Large counts:p1 * Ho. There is * (or in sentence) calculations:(two options. 0.05. REJECT remains the same name reminder Inotes = with replacement. DO: 0.10 use given is P-value <2: * problem says s how do not - no & IF (d) calculations "ALL" word - the If * P-value 0.01, 0.05, x using (n+context) condition:n<0.10N * significance level (2):where we compare the possible: - - based problem the FUTURE TENSE in be sample of Random:random * is true. is (context) inet. (One-sample z testF or p strong/stronger observed & & the F orget do not P-value (probability): probabilityofgetting as Po * null value evidence s h ow method & conditions: PLAN:Inference F He replace & & = * (Ha:P Po) · & on true proportion of p= should always where or I legend: Ho:P Po test to want Ha:P(C., F) (one-sided) Ha: parameter (or> null value > Po) CHa:P ( Ha: parameter appe hypotheses, a, parameter (Ha):evidence For alternative hypothesis · Itwo-sided) - FOR lower in per n:n · Type II Error: convincing not gives · a Power of Ha evidence For alternative specific parameter is some Because will find Test:probabilitythatthe test convincing CONCLUDE:Interpretation evidence For Ha when True. is Ha true; is He rejectHo, when rail to we when of FOR i P (Type I Error) standardized test -Statistic P,<P2 - parameter PLAN:Inference SD x z - M S = O Power=1- P(Type I Error) where: x Increasing the power: (8), M Ha (8) 0 to Po = null hypothesis So we find but 1: 10.10N nc 10.10N condition: x = Pc "The alternative hypothesis (context) For find convincing do not Ha is z = there is recommended needs nz = with be careful * ② = the we can P, - P2 z = P,(1 Pc) ISTAT -> 62 Tests->: - PropITest ok for SD solve so A the sampling distributor al Formula Statistic Pc(1 Pc) - - paramete SD normalcdF + evidence is replacement standardized teststatistic (z).= 4, z causation. M lower: ; upper: 0 i0:1 P-value n2 n, PropITest On the calculator true) (power probability x1:n , P, * P2 null Pz (context). "AssumingtoistheHostsare npoot mere - e lower " CONCLUDE: P-value: of g etting H (one-sided) 7 P2 R2:Hz of F inding rejectt h e to 2 :Me X (two-sided) <P2 & I fl e ss than () rejectNo. There * is the in Ener P, < P2 interpretation Because probability marks , &Pc here!!! is options generalize population/ can without sampling - individual we the * Hais to convincing evidence : use not so to show n2E, - - "ALL" word * =I & n=(1 )110, 110 Pc) I 10 P-value= 2 using. "IFHa is true (ata specific value in context) : - So & - = n1:H , a cor insentence). shown n, = true, (context). Power:P(Reject be to In YMz P1P " Error: Type I we but Only BLACK & RED FUTURE TENSE in be the F orget do not (n2+ context). calculations.(two options, 1 convincing evidence (context). Ha For (contexti s true, * = * sample random assignmento f * do "The should always Forsites o f(n.+ context). sample random assignment * d (C) test Forp, -P2 Large counts:n , P, I 10 I Error: the (context) proportion of using conditions: * 0p = the * P, IP2 (1-2 context) true difference p,-Pc= independentrandom 10% cond. z O interpretations: Type where in >Pz & n. (1 some ( Hain context). proportion of t ru e P: ISandom:random * B = p, method (Two-sample met. Power distance thatt h e - 0 P,-Pc(.,I) statistic & sample size (8), alpha convincing evidence P, P2 Ho:P,-P2= Ha: in F. - not = = 1 is hypotheses, a, parameter test to want We ⑧ ( Hain context). proportion of t ru e P: -app- STATE: HaTrue P (Type I error) = evidencethatt h e mining Fail to rejectto. There true. tytype is 2, we = greater than (3) TRUT HoTrue There a * It & reminders: Formulas rejectNo. a the value () (/) (P-value) p-value of the I fl e ss than * (P-value) p-value of (/) a Use * 2, we = evidencethatt h e mining p -- = Pz t ru e convincing evidence P. -P2 diFFCrnCe OF ... as context. (Ha in context). greater than (3) * It Fail to rejectto. There is not thatt h e t ru e = difference or ... (Ha incontext). CHAPTER 10: ESTIMATING WITHCONFIDENCE MEANS vocabulary: be to margin ofe r ro r (ME) since and D (W), but we do not h ave we using t* are t his think about s tudentwill next it goes down choices, until student, where thatone n studenthave formula. 1 - ......... (SEI): is this error Standard · but since 7 Sx on (M.-M2) treatments difrences the paired data (Ox), the STATE: the on quantitative variable individual OR For each similar individuals. of do this FOUR-STEP PROCESS gerete m app - FOR SID C888 or & & S (%) interval For M (or in sentence) - problem says the If * 10% s how do not "... For samples population Normal/Large sample n these of only one * to satisfied be show approx. Normal. be calculations:(two options. Only * be to ① x = n df n - = 1 = ** all From = calculator * Sx = t* inrI = int: areach I shown using Interval (8: A 5Inter is B I For ** C are with be careful * 2 0 captures true = point estimate X I - the df n - 1 I sampling is can also use interval condition: <0.10N nc in fail. ① the list the list. A +unit X, calculator (e.g., od to context]. BOTH population - Brunit marks * or be careful point X, -xz . . using * invI df = outliers with the calculator t t* - = where 1 n is the smaller sample size between the 2-SampTInt (0: 2-SampTInt) A E r ror Margin of (t*) (SET) I I df n = Formula estimate I (Statistic) = R2 = approx. Normal by CLI recommended) 2 0 Xz = 30 doesn'ts how strong skewness Option 1 is needs is dist. given sample data, both graph approx. Normal. shown M, = t (in context) n, 1 30; n2 1 *s'deatto Only BLACK & RED be of (or insentence) - calculations.(two options. B on not From [parameter OF n,<0.10N DO: to (n,+ context). (n2+ context). sample OR be * M2 (for bot) randomized experiment Normal/Large sample E r ror Margin of - sample of random Independentr andom * formula I 2 context conditions: · Table B. - [context]. Random:Independent · 10% approx.Nor mal I A 1 what? & effect for SD = reminder Inotes difference inet. E Two-sample Minterval Form, (t*) (SEx) - I title of inference method & PLAN: ok * = = you mean solve mean * always round of down from M the (Statistic) confidentthatthe the outliers inter recommended) itis For of in Table B, if CONCLUDE:Interpretation is distribution doesn't or strong skewness Option 1 is needs marks & RED BLACK we can so A ·given sample data, graph 3 & in can without replacement approx. Normal is dist. sampling we cause assignment so - by CLI = 30 experiment, Or random like these, condition. * random OR (n+context) sample of so the name - so * C 1: true confidence interval Form,-M2= owe cangenerate conditions: 101 condition:1 <0.10N should ext]. of mean · Random:random * & M true For = remains the same - & STATE:parameter & calculations s h ow . & & method inet. &One-sample We based (XaiFF). the same of appe confidence interval PLAN:Inference DO: the on parameter & C1.: (C1) cond. paired data For each question says...2011,SEATIGE s, IEEEIreE CIG% - b ased is is means pair FOR dot plot OF ↳ resultfrom recording two values Formula. The diFF (X, -X2). means of difference (Mdiff) Mean * mean, know with it (t) L standard deviation replace we CL. the 0 Two * do not 2* use on difference no It we population From the problem, 1- diFFY sampling distribution for a * t* the [ "standard deviation"of the know Ifyou do not * d ifferences of distribution * standard Normal (E) => for based varietyo fgummy bears. Thus, the t = n one choose From "Freedom"to more no only to remains the same higher CL. * 4 have assuming everything I sample size studentchoose, theyhave First let the 5choices. The so... doesn'taccountFor bias, only sampling variability. * ME have gummy bears, and I five students, I'll to ME, = larger we need Ihad if five ... is ME * degrees ofF reedom. be, is proportional nx4 sampling and o u r will vary more, distribution that, will vary more. the distribution degrees of freedom:since ME * given sample SD(SX), are amthe same MEX, nM, * Margin ofe r ro r * need the used, we population · and reminders: Some formulas t*(critical value For means).For the · two A B given. B I , pro-an For all * the are process same exceptfor the like a CONCLUDE:Interpretation diFF one tinterval for m sample we Following: - use * MdiFF Use UdiFE * Use instead parameter:Mdiff Monly of the true mean inference method is called:paired difference the graphing, graph M,-Mc that true = to interval sample t-interval For MaiFF the individual data differences, not the intewal difference context]. From A + unit (1-2 context) to the in Brunit means convincing evidence? [context]. of or one confident only SxOnly of * in the in [parameter OF NdiFEinstead ofn N SdIFFinstead * * (C%) X only XdiFIinstead of Use * * are captures each set. or +, I -> Yes, convincing ; - - - Yes, convincing i - eng - + a diFFerence bIC internal contains 0. CHAPTER 11: things to Afew TESTING MEANS independentevents (mostlyproportions samples not -sampleaname an withoutreplacement sampling - question says: E.letreeatIcezcoe read the problem * note:always i tw i l l apply! carefully if Apply10% condition when: - The Is 10% condition: the about remember CLAIMS ABOUT diet two prop.) or -random legend: & replace & s h ow not apply Do like samples - - when: condition hypotheses, a, parameter STATE: (experiments -random assignment (Ha interval rejectto. that (Ha C will A the make table using · · -> B cond. context). as two-SIDED a Sx interval tail probability. check the p-value in ofyour this Format: using two-sided:you need the the SdiFF= as you P-value s hould get be the P-value greater is (2.9. x2 distribution it end if when * WM 1.24...E E ... 7 1.24x10 0 () rejectto. There (/) (P-value) p-value of the Because - is Fail to rejectto. There nu is convincing evidence not ( Hain context of mean + unit) M:t ru e thatt h e ( Hain context or mean + unit) HaiM,-M2 use STATE in CONCLUDEu s e >O >M2 h, Ha: - - appe FOR means STATE:hypotheses, a, parameter context. in use STATE in CONCLUDEu s e Ha: M, <M2 - , M2 = Mz - context. PLAN:Inference method E HaiM,-MzF0 in use STATE in CONCLUDEu s e M.FM2 Ha: context. in cond. -n, n , conditions: asees means difference of DATA form or (Find STATSTCS form - one sample DATA form in *BOTHpopulation are whether given & mean thati s * calculations:(two options. in Only ① x, (I., Sx., X2, Sx2) data are (two data paired; given and are Find are subtract * Firstthen STATSNCS form be to paired ( YaiFr & SaiFF marks & RED BLACK * Sd oreach data) o r *z = E Mo- - Sx, = Sxz = n, = nz = using :2-SampTTest t (ca1c) df ((a1c) = = (diff, Stiff) statistic - P-value (calc) = parameter (null value) standard SEx two-sample TestFor M,-hz * M, -M2 - (x, x) - t - t paired * 0 = For (0 Sch n,nz not s amples needed if or distributions - 30 and are 1 nz) n2= shows BOTH sampling distributionS I with the approx. Normal. no strong - MdiFE outliers. (x, - x2) - - standard (m, - error hz) df n = = taf:sower-ins:P-value - 1 where i n here the smaller is sample size = Ifi ti s * two-sIPED (F), by2. multiply P-value - loweron Loweror or floor lower t upper:1000 () the (P-value) p-value of (/) a 2, we use = M,<Me M. as context. M2 evidencethatweeitherthedifferenceandstainatt resenttomereincing greater than (3) * It naiFE skewness or M, < M2 Because I fl e ss than IdiFF · by CLF. I 30 t test M SdiFF independent formula parameter Statistic 2 S tatistic: Test 0 t are assigned/experiment recommended) be careful * CONCLUDE:Interpretation * IF = S. of er ror = (n.) = Option 1 is needs shown (non-calculator) = n, * 1 Do: STATISTICS Form (X, Sx) two samples - is whether that given * - Normal/Large sample: is Random assignment/experiment sin context) samples condition: - = *wigharcmsion! i n context) 10% d (C) using Random Random:Independent * = 0 mic na & Mc true difference (1-2 context) (context) in the means of - = e,? in M, where We want to test HaiM, -M2 <0 in DATA form I upper:1000 * ice, differentsor in one-sample-one sample = lowerto or = evidencethatt h e M: t ru e mining * t soon loweron greater than (3) t F or M t Test by2. multiply P-value (F), 2, we a inet. E Two-sample t testF or M, -M2 * One-sample 1 * It t t two-sIPED Loweror Hai, -Mz(.,I) 0 ·A P-value - = upper:-idf:-I Ifi ti s - Ho:M, difference df n In -; * = · SdiFF 5<0.0000/ - - I fl e ss than * t mean OR = below thatis N(m, -) · error EdiFF-MdiFF t I distribution E. · - = CONCLUDE:Interpretation -e · formula parameter standard E- 6. 1.24 A outliers. write the still it's if thing whole FT = tcdf:(lower: p-value "E-A" has t ( ca1c) = = o, check than one-sided. But a P-value < < df 2:T-test (x2). by2 multiplied xc thing interval p-value t ((a(c) ↳ Ifyou see same the Statistic n = IdiFr= skewness or strong no 2 S tatistic: Test 0 = of, then with be careful * shown be to ① dist is Normal = recommended) Option 1 is needs marks Only BLACK & RED so approx. Normal. is by CLI. - 30 sampling distribution shows <P-value · (n) shownandth can Sameor it a ssigned/experiment needed = we so - Population distribution n * 1 * an -> (or in sentence) n ot * * calculations.(two options. = to experiments * 2 0.0 1 need Then, give - assignment(context) random to pop. gen. can MF Mo Do: = test for MdiFF t (n+context) condition:n<0.10N * P-value: in Finding paired me for Normal/Large sample: ST. 1-C1 = 10% * = x one-sided:you evidence I title what? conditions: we so -> name reminder Inotes 6 (d) using sample of Random:random E plausible a & inet. (One-sample test x 0.10 c1.:99%> c1. evidence = = - is 2 0.05 > c1.:90% - in as convincing There decision same (1:951 - not method PLAN:Inference plausible value, a as convincing is Ho (hull value) NOT contain does value, we * There context). in that Ift h e to (null value) to remains the same - - - - so * = calculations "ALL word the F orget do not * rejectto. rail we & G (d) (Context) of true CI& ST: interval contains the IF · = MdiFF= mean difference OR connection between · m true where & & Ha:M MdiFF(<,, I) Not unit (context) using of mean independent reminders: · Motunit Ho:M MdiFt= test to want We I volunteers these based problem the on 10% CHAPTER 12:INFERENCEFOR DISTRIBUTIONS & do notalways look * sample mightstate The question * "Do the the test for Ho: The Ha: The these -expected where = Pi Inference of but e it is * or probability 10% which Test for analysis:IF the up "Do we have evidence convincing a difference of distribution given a sampless,one variable "2 Brands of""color" e.g. Gummy Bears Gummy Bears -expected (RT)(CT) counts: total row = CONCLUDE: is population 1 for is Ha: There L2 -> expected inference P-value of (2 C, population 1 for Using & in P-value of 10% Test for Goodness distribution like Large S TAT Ei TEST - Test C: - P-value = have convincing perform Taco Tongue Evil ... PLAN: -expected a isnot Ha: There isa n & sample, two (0,- an * or 10% association same * on like (41)s,(rare) (41)s,(rare) for rule for CONCLUDE: (sample) - from calculator - 15). Edit R r ow = 2 column = table to tals) STAT -> TEST C: P-value Test [B] P-value, from = we = of that(Ha in fail to reject o context). Fit expected + calculator (n+context) experiment Because * so expected a counts table we generalize can population infer can without sampling are 75, of expected OK is the MATRIX your calculator. the p-value is of the of P-value jupper:idf:)=p-value sampling dist. approximately = that[Ha (& C, convincing evidence not is (2 C, convincing evidence p-value There is P-value so Chi-square using FUNCTIONo n do not edit xicdf(lower: (so replacement cond (automatic) or we to the causation -> not [A]- observed (do include last 1) Goodness applies, make Because * = - (& C, reject to. context). in previous ones. the (sample) . 1)(c P-value convincing evidence for values between between E.) ... (0:-Ei) x2 - that[Ha not Randomized this10% ** 1 - of p-value we = - counts:All Large ( TT) = First (2 C, convincing evidence Test There Ei P-value <0.10N coud.:n (RT)(CT) counts: association Method:Chi-Square -Tactingdebtre -> df (R the sampling dist. approximately & conditions method variables DO: = calculator. of p-value conditions:Random:Randomlyselected distribution given & significance level Hypotheses inference Inference "Seniors" Eyebrow Ho: There MATRIX is is so seniors Using association..." an of test for one 2.9. the the so Chi-square your There is Independence Chi-Square for - evidence from calculator 75, jupper:idf:)=P-value tcdf(lower: or mightstate ( are of expected using Because P-value = 2 column The question = do not edit counts table a There + OK is replacement FUNCTIONo n * R r ow * "Do we make Because * last = expected counts:All expected (automatic) [B] infer can without sampling so cond values CONCLUDE: population causation applies, -> table to tals) we (so generalize can we to the previous ones. the 15). Edit = -> x - not [A]- observed (do include from calculator - . 1) Fit experiment - this10% on population 2. and E.) ... (0:-Ei) x2 - of (n+context) Randomized rule same * var 1 1)(c context). <0.10N coud.:n population 2. DO: - in so ** first fail to reject o we = that(Ha & conditions method Method:Chi-Square = - reject to. context). in (& C, not * or categorical categories. we = that[Ha convincing evidence There is categorical vardistribution in and difference a number of = difference no GOF Test 1, convincing evidence p-value the Because * & significance level is There 1 is your K where observed conditions:Random:Randomlyselected STATE:Hypotheses - = TEST column total CT df k D: -> p-value the Because Inference Chi-square - -> from calculator There = approximately is list it. jupper:idf:)=P-value tcdf(lower: * Tt table total S TAT last or PLAN: ( TT) RT STATE: s o sampling dist. 5 are from calculator - - ( infer OK is replacement I Ei = can without sampling counts expected = P-value P-value population previous ones. . - I ..." It = so cond E.) ... (0:-Ei) x" first ... test for perform - (0,- = x significant, statistically testi s we (so generalize can causation applies, 10% the DO: of Homogeneity Chi-square df (R experiment expected count, then X and explain. mightstate The question * we to the I fit'spossible to list your largestc omponent (contribution) the of (0,- Fit of (n+context) - rule counts:All Large right-skewed. is find = Goodness Randomized this like (non-negative values) Follow - x "equally likely, etc." like * 0 Ho: be changed to the can <0.10N coud.:n on approximatelyChi-square be should (true/correct). & conditions method approximatelyNormal not not problem the Method:Chi-Square same distribution (context) = * - true correct). is so individual = is conditions:Random:Randomlyselected size sample n inference PLAN: Pi counts: & (context) claimed distribution wording Using of distribution of claimed * "MOMBag""color" e.g. M&M's use. distribution claimed distribution given a to & significance level Hypotheses STATE: of..." the Chi-square samples,one variable One identify Always question. (GOF) Goodness ofFit perform - at variable to know which ... give evidence aboutthe distribution data Chi-Square and RELATIONSHIPS = that(Ha we reject in context). we in - Ho. fail to reject o context). CHAPTER 12:INFERENCEFOR DISTRIBUTIONS & Sampling Distribution of (b) slope Population = y-intercept Bx + SD of sam"re + = Some residuals Formulas: SHAPE:approximatelyNormal VARIABILITY:Ob T x L (x-variable) n SEp* b* o X-rar X-rar scatterplot * LINEAR:the * have When curved pattern. leftover no relationship. linear each other. of independent is observation Each INDEPENDENT: * prae t to show a needs residual plot Also, the P constant = Ox Ob SECoef = I SEb axCoef SEaY Predictor B CENTER:Mb output: computer I I I I · B D slope 5D y a bx S a slope 2. I. N. E. Iconditions. Parameter symbols:Statistic some My x RELATIONSHIPS cond withoutreplacement, check 10% sampling two-sided ↓ test S STANDARD ERROR: SE SD RESIDUALS:S OF = S n n 1 - - Y; 2 Standard "The of I er ror DECREEOF: a, df e Error of slope of = residuals "cottenation (re) b LSRL EQUAL SD:The * for * b, as interpretations are PLAN:Inference method & Method:One sample > Tests 2 = SEp=Eb < STAT Test for *nolinear 0 * positive 0 * B 0 some * = for using a slope of randomized experiment. or - -in-1, of SEb df n = gie computer We context A interval confidentthatthe B the captures A, B the output. -in-, fi B t*is not slope true = DO:O calculator ① 3 ineserestre relationship the of A sets from LSRL for T the on B Sext]. and & Formulas or ② L1:x-values;L2:y-values STAT t t 5 from or = Use this a F:LinReg T Test > Tests > t output relationship and conclude df n = part. - 2 = b (SRL were B given t *thisti s also c the = SEb T from the output. lower: - ;upper: idf: P-value: ↳ Ifyou see [y-context] - = tcdf P-value: you Format if computer output relationship population [x-context] C are 2 = interpretation always i n - t* t * 2 given t* I b = = were you Format if computer output a t* t * b b relationship negative Ha:0 true this Use for slope hypotheses: = sample Slope (b) to test = Interval 3 !orase B ② CONCLUDE: STATE:parameter & variability roughlyequal random G:LinReg TInt / output - Independent: b has & Formulas or conditions: conditions:L i n e a r : > plot x-value. each from came L1:x-values;L2:y-values = Ho:B or notes. DO:O calculator df n want RANDOM:Data * show outliers. skew Slope (b) theseteeth. Fidenceinterfor populate we residual at chapter 3 of this on (C1) Significance cannot dutplot of residuals (x-var context) ① Inference NORMAL:A * (SED): the parameter & C1.: STATE: sq(adj) and overcontenttypically vanisethesee CONFIDENCE: one margin for sg - strong = Confidence Interval R = 2 bE STANDARY EST:t - ~SD of residual (ti = - R = C = p-value the o, check than is greater Ifi ti s * it end if two-sIPED (F), "E-#" whole thing E- 6. PLAN:Inference method & One Inference Method: sample conditions:L i n e a r : Independent: easi: Random: t test for 3 slope always i n - 1.9.-izixi os conditions: loweron I lower. Upper: 1000 CONCLUDE: context by2. multiply P-value write the still below it's if has soon or lower t upper:1000 interpretation Because () rejectNo. There ((X) (P-value) p-value of the I fl e ss than * is evidence. mining OF greater than (3) * It Fail to rejectto. There is not convincing evidence. OF a C = Ho we... between Ha (x-context] between and (x-context] [y-context]. and It-context]. Congratulations! You have finished the AP Statistics Course! —Mr. Jeremiah James dela Rosa Thank you to Stats Medic, Luke Wilcox, and Lindsey Gallas!

AP Statistics Notes: Data Analysis, Modeling, and More

Related documents

Products

Support

AP Statistics Notes: Data Analysis, Modeling, and More

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib