.TLI\ COURSES TSTICS FOR INTRODUGTORY J STATISTICS - A set of tools for collecting, oreanizing, presenting, and analyzing numerical facts or observations. I . Descriptive Statistics - procedures used to organize and present data in a convenient, useable. and communicable form. 2. Inferential Statistics - procedures employed to arrive at broader generalizations or inferences from sample data to populations. -l STATISTIC - A number describing a sample characteristic. Results from the manipulation of sample data according to certain specified procedures. DATA - Characteristics or numbers that a r e c o l l e c t e db y o b s e r v a t i o n . J POPULATION - A complete set of actual or potential observations. J J - A number describing a PARAMETER population characteristic; typically, inferred f r o m s a m p l es t a t i s t i c . - A subset of the population f SAMPLE selectedaccording to some scheme. SAMPLE - A subset selected J RANDOM in such a way that each member of the population has an equal opportunity to be selected. Ex.lottery numbers in afair lottery J VARIABLE - A phenomenon that may take on different values. f MEAN -The ooint in a distribution of measurements about which the summeddeviationsare equal to zero. Average value of a sample or population. POPULATION MEAN SAMPLE MEAN o:#2*, p: +!,*, Note: The mean ls very sensltlveto extrememeasurementsthat are not balancedon both sides. I WEIGHTED MEAN - Sum of a setof observations multiplied by their respectiveweights, divided by the sum of the weights: 9, *, *, -LWEIGHTED MEAN O SUM OF SOUARES fSSr- Der iationstiom andsummed: the mean.squared (I r,), - li. r x ) ' o r I x i ',- t PopulationSS:I(X N _ r, \,)2 SS:I(xi -x)2or Ixi2--Sample O VARIANCE - The averageof squaredifferandtheir mean. encesbetweenobservations SAMPLEVARIANCE POPULANONVARIANCE ,\r*' ; : n u m b e ro f w h e r ex r , : w e i g h t , ' x ,- o b s e r v a t i o nG o b s e r v a i i o ng r d u p s . ' C a l c u l a t ef dr o m a p o p u l a t i o n . sample.or gr6upings in a frequencydistribution. Ex. In the FrequencVDistribution below, the meun is 80.3: culculatbd by- using frequencies for the wis. When grouped, use clossmidpointsJbr xis. J MEDIAN - Observationor potenlialobservationin a set that divides the set so that the same number of observationslie on each side of it. For an odd number of values.it is the middle value; for an even number it is the averageof the middle two. Ex. In the Frequency Distribution table below, the median is 79.5. f MODE - Observationthat occurs with the greatest tiequency. Ex. In the Frequency Distributioln nble below. the mode is 88. VARIANCESFOH GBOUPEDDATA SAMPLE POPUIATION ^{G-'{G o2:*i lI ;_r t , ( r , - p) t s 2 = ; 1 i t i l m ' - x ; 2 t=1 D STANDARD DEVIATION - Squareroot of the variance: o- Ex. Pop. S.D. n Y I U fi GROUpITG OF DATA Shows the number of times each observation occurs when the values ofa variable are arranged in order according to their magnitudes. J il {il, x f 1 t - Interval Scale- a quantitative scale that permits the use of arithmetic operations. The zero point in the scale is arbitrary. ut BISTRI. tr CUMULATUEFREOUENCY t 11 74 11f x 65 o 11111 75 1111 66 1 11 x x f 0 85 0 86 1 76 11 67 o 77 111 68 1 11 87 1 7A I 69 111 0 88 0 89 1111111 79 11 94 111 80 93 I 92 0 91 96 95 o Histogram - a form of bar graph used rr ith interval or ratio-scaled variables. I a rrI.)'A .l b]|, K I 3artl LQ 100 1 83 99 98 gl D BAR GRAPH - A form of graph that uses bars to indicate the frequency of occurrence of observations. 11 81 1 11 1 82 I 70 1111 71 0 72 11 BUTION -A distributionwhich showsthetotal frequencythrough the upper real limit of eachclass. tr CUMUIATIVE PERCENTAGE DISTRI. BUTION-A distributionwhich showsthetotal percentagethrough the upper real limit of eachclass. 73 111 EilSTRIBUTION FREOUENCY II GROTJPED - A frequency distribution in which the values ofthe variable have been grouped into classes. I il {.ll CLASS f I 65-67 3 8 5 9 6 4 8 8 6 1 2 2 6&70 71-73 CLASS 98-100 f CLASS t lNl.l'tlz 7+76 Tt-79 80-82 83-85 86-88 89-91 92-g 95-97 9&100 !I! Cum f 3 11 16 25 31 35 43 51 57 58 60 62 - R.atio Scale- same as interval scale excepl that there is a true zero point. D FREOUENCY CURVE - A form of graph representing a frequency distribution in the form of a continuous line that traces a histogram. o Cumulative Frequency Curve - a continuous line that traces a histogram where bars in all the lower classes are stacked up in the adjacent higher class. It cannot have a negative slop€. o Normal curve - bell-shaped curve. o Skewed curve - departs from symmetry and tails-off at one end. llrfGl: " 4.84 15 17.74 25.81 10 40.32 50.00 -t \ 0 -att? 56.45 69.35 CURVE SKEWED 82.26 91.94 93.55 96.77 N O R M A LC U R V E ^/T\ ./ \ 15 / 10 100.00 -/ J- 0 -- \ \ LEFT \ \ z ) RANDOM VARIABLES Probability of occurrence^t at -Number of outcomafamring Ant=@ oif'ent'l EwntA A mapping 'onlv or function that assignsone and one-numerical value to each outcome in an exPeriment. D SAMPLE SPACE - All possibleoutcomesof an experiment. N TYPE OF EVENTS o Exhaustive - two or more events are said to be exhaustive if all possible outcomes are considered. Symbolically, P (A or B or...) l. -two or more events are said to be nonrNon-Exhausdve exhaustive if they do not exhaust all possible outcomes. Exclusive - Events that cannot occur rMutually simultaneously:p (A and B) = 0; and p (A or B) = p (A) + p (B). Ex. males, females Exclusive - Event-s that can occur oNon-Mutually simultaneously: p (A orB) = P(A) +p(B) - p(A and B)' &x. males, brown eyes. - Events whose probability is unaffected Slndependent by occurrence or nonoccurrence of each other: p(A lB) = p(A); ptB In)= p(e); and p(A and B) = p(A) p(B). Ex. gender and eye color - Events whose probability changes SDependent deoendlns upon the occurrence or non-occurrence ofeach other: p{.I I bl dilfers lrom AA): p(B lA) differs from p ( B ) ; a n dp ( A a n dB ) : p ( A ) p ( B l A ) : p ( B ) A A I B ) Ex. rsce and eye colon C JOINT PROBABILITIES - Probabilitythat2 ot more eventsoccur simultaneously. tr MARGINAL PROBABILITIES or Unconditional Probabilities= summationof probabilities' - Probability PROBABILITIES D CONDITIONAL of I given the existence of ,S, written, p (Al$. fl EXAMPLE- Given the numbers I to 9 as observations in a sample space: .Events mutually exclusive and exhaustive' n nurnbers) Example: p (all odd numb ers); p ( all eu-e .Evenls mutualty exclusive but not exhaustiveExample: p (an eien number); p (the numbers 7 and 5) .Events ni:ither mutually exclusive or exhaustiveExample: p (an even number or a 2) tl DISCRETE RANDOM VARIABLES - Involvesrulesor probabilitymodelsfor assigning or generatingonly distinctvalues(not fractionalmeasurements). C BINOMIAL DISTRIBUTION - A model for the sum of a seriesof n independenttrials wheretrial resultsin a 0 (failure) or I (success).Ex. Coin to p ( r ) = ( ! ) n ' l - t r l " - ' "t in n wherep(s) is the probabilityof s success trials with a constantn probability per trials, n! -a"n- dw" h ' -e' -r e(t s, 1/ \ =s,! ( n - s ) ! Binomial mean: !: nx Binomial variance: o': n, (l - tr) A s n i n c r e a s e s ,t h e B i n o m i a l a p p r o a c h e st h e Normal distribution. DISTRIBUTION D HYPERGEOMETRIC A model for the sum of a series of n trials where each trial results in a 0 or I and is drawn from a small population with N elements split between N1 successesand N2 failures. Then the probability of splitting the n trials between xl successes and x2 failures is: Nl! {_z! p(xlandtrr:W 't 4tlv-r;lr Hypergeometric mean: pt :E(xi - + andvariance:o2: ffit+][p] D POISSON DISTRIBUTION - A model for the number of occurrences of an event x : 0 , 1 , 2 , . . . ,w h e n t h e p r o b a b i l i t y o f o c c u r r e n c e is small, but the number of opportunities for t h e o c c u r r e n c ei s l a r g e , f o r x : 0 , 1 , 2 , 3 . . . .a n d )v > 0 . otherwise P(x) =. 0. e$t=ff tr LEVEL OF SIGNIFICANCE-Aprobabilin distribution. rarein thesampling valueconsidered whereoneis specifiedunderthenull hypothesis theoperationof chance willing to acknowledge factors. Common significance levels are 170, 5 0 , l 0 o . A l p h a ( a ) l e v e l : t h e l o w e s tl e v e for which the null hypothesis can be rejected. The significanceleveldeterminesthecritical region. [| NULL HYPOTHESIS (flr) - A statement that specifies hypothesized value(s) for one or more of the population parameter. lBx. Hs= a coin is unbiased.That isp : 0.5.] HYPOTHESIS (.r/1) - A tr ALTERNATM statement that specifies that the population parameter is some value other than the one specified underthe null trypothesis.[Ex. I1r: a coin is biased That isp * 0.5.1 HYPOTHESIS I. NONDIRECTIONAL an alternative hypothesis (H1) that states onll that the population parameter is different from the one ipicified under H 6. Ex. [1 f lt + !t0 Two-Tailed Probability Value is employed when the alternative hypothesis is non-directional. - an HYPOTHESIS 2. DIRECTIONAL alternative hypothesis that statesthe direction rn which the population parameter differs fiom the one specified under 11* Ex. Ilt: Ir > pn r-trHf lr ' t1 One-TailedProbability Value is employedu'hen the alternative hypothesis is directional. PROOF - Stnct D NOTION OF INDIRECT interpretation ofhypothesis testing reveals that thc' null hypothesis can never be proved. [Ex. Ifwe toi. a coin 200 times and tails comes up 100 times. it i s no guarantee that heads will come up exactly hali the time in the long run; small discrepancies migfrt exist. A bias can exist even at a small magnitude. We can make the assertion however that NO BASIS EXISTS FOR REJECTING THE THE COIN IS HYPOTHESIS THAT UNBIASED . (The null hypothesisis not reieued. When employing the 0.05 level of significa reject the null hypothesis when a given res occurs by chance 5% of the time or less.] ] TWO TYPES OF ERRORS - Type 1 Error (Typea Error) = the rejectionof 11,whenit is actuallytrue.The probabilityof a type 1 error is givenby a. -TypeII Error(TypeBError)=The acceptance offl, whenit is actuallyfalse.Theprobabilin of a type II error is given by B. P o i s s o nm e a n a n d r a r i a n c e : , t . Fo r c ontinuo u s t'a ri u b I es. .fi'eq u en t' i es u re e.tp re ssed in terms o.f areus under u t'ttt.re. fl SAMPLING DISTRIBUTION - A theoretical probability distribution of a statistic that would iesult from drawing all possible samples of a given size from some population. THE STAIUDARDEBROR OF THE MEAN A theoretical standard deviation of sample mean of a given sample si4e, drawn from some speciJied population. DWhen based on a very large, known population, the s t a n d a r de r r o r i s : 6 " r__ _ o ^ ln EWhen estimated from a sample drawn from very large population, the standard error is: O =^ = t- S 'fn lThe dispersion of sample means decreasesas sample size is increased. D CONTINUOUS RANDOM VARIABLES - Variable that may take on any value along an uninterrupted interval of a numberline. - bell cun'e; D NORMAL DISTRIBUTION a distribution whose values cluster symmetrically around the mean (also median and mode). f(x)=-1, o"t'2x (x-P)212o2 wheref (x): frequency.at.a givenrzalue : s t a n d a r dd e v i a t l o no f t h e o distribution lt p x : approximately I 111q 2.7183 approximately : the meanof the distribution : any scorein the distribution D STANDARD NORMAL DISTRIBUTION - A normalrandomvariableZ. thathasa mean deviationof l. of0. andstandard Q Z-VALUES - The numberof standarddevialiesfrom themean: tionsa specificobservation ': x- 11 (for sample mean X) rlf x 1, X2, X3,... xn , is a simple random sample of n elements from a large (infinite) population, with mean mu(p) and standard deviation o, then the distribution of T takes on the bell shaped distribution of a normal random variable as n increases andthe distribution ofthe 7-! ratio: 6l^J n approaches the standard normal distribution as n goes to'infinity. In practice.a normal approximation is acceptable for samples of 30 or larger. Percentage Cumulative Distribution for selected Z-value PercentifeScore Z values under a normal curye -3 -2 -l 0 +1 +2 +3 o-13 2.2a 15.87 50.00 a4.13 97.72 99.a7 Critical when region for rejection u : O-O7. two-tailed .2.b8 NBIASEDNESS- Propertyof a reliableesimator beins estimated. o Unbiased Estimate of a Parameter - an estimate that equals on the averagethe value ofthe parameter. Ex. the sample mesn is sn unbissed estimator of the population mesn. . Biased Estimate of a Parameter - an estimate that does not equal on the averagethe value ofthe parameter. Ex. the sample variance calculated with n is a bissed estimator of the population variance, however, x'hen calculated with n-I it is unbiused. J STANDARD ERROR - The standard deviation of the estimator is called the standard error. Er. The standard error forT's is. o: = "/F X This has to be distinguished from the STAND.A,RDDEVIATION OF THE SAMPLE: ' The standard error measuresthe variability in the Ts around their expectedvalue E(X) while the stanJard deviation of the sample reflects the variability rn the sample around the sample'smean (x). I -r L'sBDwHEN THE STANDARDDEvIAI rtoN IS UNKNOWN-Useof Student'sr. from f Wheno is notknown,itsvalueis estimated F f' s a m o l ed a t a . jm t-ratio- the ratio employedil thq.testingof significancebfa Ivpotheses or determiningthe Vrri'erence betweenmeafrsltwo--samplecase) inrolving a samplewith a i-distribuiion.The tbrmulaTs: \ F where p : population mean under H6 SX r = . sl r l "n6 oDistribution-symmetrical distribution with a mean of zero lnd standard deviation that annroachesone as degreesoffreedom increases ' i . i . a p p r o a c h e st h e Z d i s t r i b u t i o n ) . . A , s s u m p t i o na n d c o n d i t i o n r e q u i r e d i n r\suming r-distribution: Samplesare diawn from a norm-allv distributed population and o rpopulation standard deviatiori) is unknown. o Homogeneity of Variance- If.2 samples are b e r n c c o m o a r e d t h e a s s u m p t i o ni n u s i n g t - r a t i o r' th?t the variances of the populatioi's from * h e r e t h e s a m p l e sa r e d r a w n a r e e q u a l . o E s t i m a t e d6 X - , - X ,( t h a t i s s x , - F r ) i s b a s e do n thc unbiasedestimaie of the pofulaiion variance. o Degrees of Freedom (dJ\-^thenumber of values that are free to vary after placing certain restrictions on the data. Example. The sample (43,74,42,65) has n = 4. The sum is 224 and mean : 56. Using these 4 numbers and determining deviationsfrom the mean, we'll have J deviations namely (-13,18,-14,9) which sum up to :ero. Deviations from the mean is one restriction we have imposed and the natural consequence is that the sum ofthese deviations should equal zero. For this to happen, we can choose any number but our freedom to choose is limited to only 3 numbers because one is restricted by the requirement that the sum ofthe deviations should equal zero. We use the equality: (x, -x) + (x2-9 + ft t- x) + (xa--x): 0 q mean of 56,iJ'thefirst 3 observqtionsure given So I 43, 74,und 42, the last observationhus to be 65. This single restriction in this case helps us determine df, Theformula is n lessnumber of restrictions. In this t'ase,it is n-l= 4-l=3df . _/-Ratiois a robust test- This meansthat statistical inferencesarelikely valid despitefairly largedepartures from normality in the population distribution. If normality of populationdistributionis in doubt,it is wise to increasethe samplesize. tr USED WHEN THE STANDARD DEVIATION IS KNOWN: When o is known it is possible to describethe form of the distribution of the samplemeanasa Z statistic.The samplemust be drawn from a normal distribution or have a samplesize(n) of at least30. whereu : populationmean(either =r -! ',.6 = nro#rf or hypothesizedunder Ho) and or = o/f,. o Critical Region - the portion of the areaunder the curvewhich includesthosevaluesof a statistic that leadto the rejectionofthe null hypothesis. - The most often used significance levels are using z0.01,0.05,and0.L Fora one-tailedtest statistic,thesecorrespondto z-valuesof 2.33, For a two-tailedtest, 1.65,and 1.28respectively. the critical regionof 0.01 is split into two equal outerareasmarkedby z-valuesof 12.581. Example 1. Given a population with lt:250 and o: S0,whatis theprobabili6t of drawing a sample of n:100 values whosemean (x) is at least255?In this case,2=1.00.Looking atThble A, the given areafor 2:1.00 is 0.3413. Tb its right is 0.1587(=6.5-0.i413)or 15.85%. Conclusion: there are spproximately 16 chancesin 100 of obtaining a sample mean : 255from this papulation when n = 104. Example 2. Assume we do not know the population me&n. However, we suspect that it may have been selectedfrom a population with 1t= 250 and 6= 50,but we are not sure. The hypothesis to be tested is whether the sample mean was selectedfrom this populatian.Assume we obtainedfrom a sample (n) of 100, a sample ,neen of 263. Is it reasonable to &ssante that this sample was drawn from the suspectedpopulation? = 250 (that the actualmeanof the popu| . H o'.1t lationfrom which the sampleis drawn is equal to 250) Hi [t not equal to 250 (the alternative hypothesiSis that it is greaterthan or less than 250, thus a two-tailed test). 2. e-statisticwill be usedbecausethe population o is known. 3. Assumethe significancelevel (cr)to be 0.01. Looking at Table A, we find that the area beyond a z of 2.58 is approximately0.005. To reject H6atthe 0.01levelof significance,t}reabsolutevalue of the obtainedz must be equalto or greaterthanlz6.91l or 2.58.Herethevalueof z correspondingto samplemean: 263is 2.60. of Hs test +2.58 O Normal CurveAreas Table A area tron mean to z .00 .ol .o2 .(x! .o4 .o5 ,06 .o7 .6 .o9 o o o o . 0 0 4 0 . 0 0 8 0 . 0 1 2 0 0 1 6 0 0 1 9 9 . 0 2 3 9. 0 2 7 9 . 0 3 1 9. 0 3 5 9 0 3 9 8 0 4 3 8 . O 4 7 4. O 5 1 7 0 5 5 7 . 0 5 9 6 . 0 6 3 6 . 0 6 7 5 . 0 7 1 4 . 0 7 5 3 0 7 9 3 0 8 3 2 . 0 8 7 1 0 9 1 0 0948 .0987 .1026 .'tO64 .1'lO3 .1141 1 1 7 9 1 2 1 7 1 2 5 5. 1 2 9 3 1554 1591 1624.1664 o.o o-l o.2 o.3 o.4 .4452 .4554 .4641 .47't3 .4772 .4821 .4A61 .4893 .4918 .4938 .lgsg .4965 ,gYOC .4974 .4974 2.1 2.2 2.3 2.4 2.5 .4463 .4474 .4564 .4573 .4649 .46S .4719 .4726 .4774 .4743 .4826.4830 .4A64 .4A68 .4a96 .4898 .4920 .4922 .4940 .4941 .+955 .+ss6 ..4966 +YOO ..4967 {Vot .4975 .4976 .4975 .4976 .4444 .4542 .4664 .4732 .4744 .4834 .4871 .4901 .4925 .4943 assz .4WO .4968 .4977 .4977 .4495 .4591 .4671 .4734 .4793 .+aga .4875 .4904 .4927 .4945 a959 .9VqY .4969 .4977 .4977 .4505 .4599 .4674 .4744 .4798 tetz .4A78 .4906 .4929 .4946 .+goo .+VrV .4970 .4978 .4978 .4515 .460a .4646 .4750 .4803 aaa6 .4aal .4909 .4931 .4944 .+s6r .+rt .4971 | .4979 .4979 .4525 .4616 .4693 .4756 .4aAa .taso .4884 .4911 .4932 .4949 .+s6z ,1Vt 4 .4972 .4979 .4979 ,4535 .4625 .4699 .4761 .4812 ZSsa .4887 .4913 .4934 .4951 .csi6g .{VtJ .4973 .4980 .4980 .4545 .4633 .4706 .4767 .4417 ABs? .4A90 .4916 .4936 .4952 .4964.+Vr+ .4974 l 1 .4S81 l .4S81 .4981 .49A2 .4982 .49a3 .4984 .4984 .4985 .4985 .4986 .4986 l .4987 .4991 .4987 .4988 .4988 .4989 .4949 .4989 .4990 .4990 Example. Given x:l08, s:l5, and n-26 estimatea 95% confidence interval for the population mean. Since the population variance is unknown, the t-distribution is used.The resultinsinterval.usins a t-valve of 2.060 fromTable B (row 25 of the middle-column), is approximately 102 to 114. Consequently,any hypothesizedp between 102 to 114 is tenableon the basis of this sample.Any hypothesizedprbelow 102 or above 114 would be rejectedat 0.05 significance. O COMPARISON BETWEEN I AND z DISTRIBUTIONS Althoueh both distributions are svmmetrical about a meanbf zero, the f-distribution is more spread out than the normal di stributi on (z-distributioh). Thus a much larger value of t is required to mark off the bounds of the critical region <if rejection. As d/rincreases, differences between z- andt- distributions are reduced. Table A (z) may be used instead of Table B (r; when n>30. To Lse either table when n<30,the sample must be drawn from a normal population. tr CONCLUSION-Sincethisobtainedz fallswithin thecriticalregion,we mayrejectH oatthe0.01level of significance. Ta^bJs '!4 B*=l.evel .1 tr CONFIDENCE INTERVAL- Interval within which we may consider a hypothesis tenable. Common confidence intervals are 90oh,95oh, and 99oh. Confidence Limits: limits definins the confidence interval. (1- cr)100% confidence interval for rr: ii, *F t l-il. l,<i +z * ("1{n) w h e r eZ - , i s t h e v a l u e o f t h e standard normal variable Z-that puts crl2 percent in each tail of the distribution. The confidence interval is the complement of the critical regions. A t-statistic may be used in place of the z-statistic when o is unknown and s must be used as an estimate. (But note the caution in that section.) 1 . 3 4 -6 .Z d :e10 . !t ,13 .14 .1 _ _] q 17 18 t 20 ,aa 24 .25 ,?s 27 2. .29 - 30 ini, . - o.o25 o.o5 12.706 4.3Q3 3 . 18 2 2.776 2.571 2.447 0.o1 olo2 3r.tz1 6.965 4.541 1747 3.305 3.143 _ 0.oo5 o.ol 6i.657 9-925 5.441 4.604 4-O3Z 3.707 _ 2.994 3.499 2.306 _ ?.?42 2-2?E 2.201 2..11779 9 2 2.896 2.82L _ 2.76L _ 2.714 2..66AA1 1 2 3.355 3.250 3.169 3.106 3..00555 5 3 z. r oo 2.650 2.145 _ 2.924 2.'131 , 2.60? 2:12o _ ?,qe3 2,110 2.5Q7 2.1o1 2.552 2.Q93_2.5392.OqQ 2.524 2.514 2.OQO 2.5Q8 2Bf4 2.o6s 2.5Oq ?.e64 _2.4e2 2.060 _ 2.1qF 2.479 ?.o50 2.052 2.423 2.044 2.467 2.Q45 ?,482 2.042 2.457 . 2. _ _ - g.ot z ?.9f7 2.947 2.921 ?.e98 2.474 461 2.445 2.431 2.a1e 2-AA7 ?.1e7 2.7Q7 2.779 2.771 2.763 Z.lsE 2.750 GORRELATION OF THE N SAMPLING DISTRIBUTION DIFFERENCE BETWEEN MEANS- If a number of pairs of samples were taken from the same population or from two different populations, then: r The distribution of differences between pairs of samplemeanstendsto be normal (z-distrjbution). r The mean of these differences between means F1, 1"is equal to the difference between the population means. that is ltfl-tz. or and ozure known I Z-DISTRIBUTION: o The standard error ofthe difference between means o", - ={toi) | \ + @',)I n2 ", o Where (u, - u,) reDresents the hvpothesizeddifferencein rirdan!.'ihefollowins statisticcan be used for hypothesis tests: _ (4-tr)-(ut- uz) z= "r.r, o When n1 and n2 qre >30, substifuesy and s2 for ol and 02. respecnvely. (j;.#) (To-qbtain sum of squaygs(SS) see Measures of Central Tendencyon'page l) D POOLED '-TEST o Distribution is normal on<30 ( z tU I 0 r o1 and 02 are zal known but assumed equal - The hvoothesistest mav be 2 tailed (: vs. *) or I talled:.1i.is1t, andrhe.alternativeis 1rl > lt2 @r 1t, 2 p 2 and the alternatrves y f p2.) - degreesof freedom(dfl : (n -2. r-I)+(n y 1)=n fn 2 formula below for estimating 6riz giyen ;U.r11!9 to determrnest,-x-,. - Determine the critical region for reiection by aslevel-ofsisnificdnceand[ooksieningan acceptable in! at ihe r-tablb with df, nt + i2-2. o_Use the following formula for the estimated standard error: (n1- l)sf +(n2- l)s nI+n2 n1*n2-2 n tn2 ( D HETEROGENEITY OF VARIANCES mav be determinedby using the F-test: o s2lllarger variance'1 stgmaller variance) D NULL HYPOTHESIS- Variancesare equal and their ratio is one. HYPOTHESIS-Variances TIALTERNATM differ andtheir ratio is not one. f, Look at "Table C" below to determine if the variancesare significantly different from each other. Use degrees of freedom from the 2 nyI). samples:(n1-1, D STANDARD ERROR OF THE DIFFERENCE betweenMeansfor CorrelatedGroups. The seneralformula is: ^2 ^2 n ""r* " 7r- zrsrrsr, where r is Pearsoncorrelation o By matching samples on a variable correlated with the criterion variable, the magnitude of the standard error ofthe difference can be reduced. o The higher the correlation, the greater the reduction in the standard error ofthe difference. D PURPOSE- Indicatespossibility of overall meaneffectof the experimentaltreatmentsbefore investigatinga specific hypothesis. D ANOVA- Consistsof obtaining independent estimatesfrom populationsubgroups.It allowsfor the partition of the sum of squaresinto known components of variation. D TYPES OF VARIANCES ' Between4roupVariance(BGV| reflectsthemagnihrdeof thedifference(s)amongthegroupmeans. ' Within-Group Variance (WGV)- reflectsthe dispersionwithin eachtreatmentgroup. It is also referredto asthe error term. tI CALCULATING VARIANCES ' Following the F-ratio, when the BGV is large relativeto theWGV. the F-ratiowill alsobe larse. o2(8i'r,,')' 66y= k-1 : mean of ith treatment group and xr., of all n values across all k treatmeiii WGV: .ss,+ss, *...+^ss* "_O wherethe SS'sare the sumsof squares(seeMeasures of Central Tendencyon page 1) of each subgroup'svaluesaroundthe subgroupmean. D USING F.RATIO- F : BGV/WGV 1 Degreesof freedomare k-l for the numerator and n-k for the denominator. ' If BGV > WGV, the experimentaltreatments are responsiblefor the large differencesamong group means.Null hypothesis:the group means are estimatesof a commonpopulationmean. In random samplesof size n, the sampleproportionp fluctuatesaroundthe proportionmean: rc proportion with a proportionvarianceof I#9 standarderror of ,[Wdf; Top row=.05,Bottom row=.01 points for distribution of F of freedom for numerator As the sampling distribution of p increases,it concentratesmore aroundits targetmean. It also sets closerto the normal distribution.In which P-ft iooo. wqov' z: ^Tn(|-ttyn ISBN t 5?eeaEql-g Definirton - Carrelation refers to the relatianship baween two variables,The Correlstion CoefJicientis a measurethst exFrcssesthe extent ta which two vsriables we related Q *PEARSON r'METHOD (Product-Moment CorrelationCoefficient) - Cordlationcoefficient employedwith interval-or ratio-scaledvariables. Ex.: Givenobservations to twovariablesXandIl we cancomputetheir corresponding u values: :(x-R)/s" Z, andZ, :{y-D/tv. 'The formulas for the Pearsoncorrelation(r): : {* -;Xy -y) JSSI SSJ - Use the above formula for large samples. - Use this formula (also knovsn asthe Mean-Devistion Makod of camputngthe Pearsonr) for small samples. 2( z-2,,) r:__iL is quicker and can Q RAW SCORE METHOD be used in place of the first formula above when the sample values are available. (IxXI/) oMost widely-used non-parametric test. .The y2 mean : its degreesof freedom. oThe X2 variance : twice its degreesof fieedorl o Can be usedto test one or two independentsamples. o The square of a standardnormal variable is a c h i - s q d a r ev a r i a b l e . o Like the t-distibution" it has different distributions depending on the degrees of freedom. D DEGREES OF FREEDO|M (d.f.) . / v COMPUTATION o lf chi-pquare tests for the goodness-of'-fitto a hr p o t h e s i z ' eddi s t r i b u t i o n . d.f.: S - I - m, where . g:.number of groups,or classes.in the fiequener dlstrlbutlon. m - number of population - s a m n l eparametersthat lnu\r b e e s t i m a t e df r o m s t a t i s t i c st o t e s t t h c h y p o t h e ssi . o lf chi-squara testsfor homogeneityor contingener d.f : (rows-1) (columns-I) f, GOODNESS-OF-FIT TEST- To anolr tht-c h i - s q u a r e d i s t r i b u t i o n i n t h i s m a n h ' e r -t .h . ' c r i t i c d l c h i - s q u a r ev a l u e i s e x p r e s s e da s : (f-_i) ' ,nh".. , : observedfreqd6ncy ofthe variable /p /" - .lp..,tSd fiequency (basedon hypothesrzcd p o p u l a t l o no l s t n D u t r o n) . t r T E S T S O F C O N T I N G E N C Y - A p p l i c a t i o nt ' l Chi-squarete.ststo two separatepopulaiionsto te.r s t a t r s t l c arl n d e o e n d e n c o ef a t t r r b u t e s . D T E S T S O F H O M O G E N E I T Y - A p p l i c a t i o no t Chi-square-tests to tWpqq.mplqs to_test'iTtheycanrc f r o m f o p u l a t i o n s w i t h l i k e 'd i s t r i b u t i o n s . D R U N S T E S T - T e s t s w h e t h e r a s e q u e n c e( t t r c o m p r i s ea s a m p l e ) . i sr a n d o m .T h e ' f o l l o ui n g e q u a t r o n sa r e a p p l r e d : nt'n11 l2ntnr(2ntn, ,rnR, = r =, ,2r ", n ' r1- ' I r , t. '.tls " r r J 1 " , * " r 1 21 n r + n , l 1 Where tlltt ilnfift ,llilJllflill[[llllillll = mean number of runs n, : number of outcomes of one type n2 = number of outcomes of the other type S4 = standard deviation of the distribution of the numDer oI runs. 7 ( z quickstudy.com U .5.$4.95 / C A N .$7. 50 Febr uar y2003 lllllilllilffillll[l \Oll('EIOSTIDENT:ThisQ{/(/t chartcorersthebasrcsofIntroduer,,r' t r s t i c s .D u e t o i 1 : c o n d c n \ c d l o r n i a t . h c t u e rc r . u s c i t r s r S t a t i s t i c sg r r i r l r , l r r dn o l a \ a r ( p l a ( r nrrnt lor assigned course work.t l0(ll. tsar( hart\. lnc. Boca Raton. FI C u s t o m e rH o t l i n e# 1 . 8 0 0 . 2 3 0 . 9 5 2 2 \ l r n u l l i e t u r e cb l r I l n r i n a t r n gS e r rr c e s .I n c . l . a r g o .f l . i r n d s o l d u n d e r l i c e n s et f o r f I . r . 1 f l o l d r n g : . L L ( . . P r l o \ l t o . ( . \ l S . { u n d e r o n e o r n r o r e o l t h e t b l l o $ i n g P aL t e S nPt sr r: r r 5 . 0 6 - 1 . 6 1o- r 5 . l l . l . - 1 l l