o:#2*, - Cheat Sheet

advertisement
.TLI\
COURSES
TSTICS FOR INTRODUGTORY
J
STATISTICS - A set of tools for collecting,
oreanizing, presenting, and analyzing
numerical facts or observations.
I . Descriptive Statistics - procedures used to
organize and present data in a convenient,
useable. and communicable form.
2. Inferential Statistics - procedures employed
to arrive at broader generalizations or
inferences from sample data to populations.
-l STATISTIC - A number describing a sample
characteristic. Results from the manipulation
of sample data according to certain specified
procedures.
DATA - Characteristics or numbers that
a r e c o l l e c t e db y o b s e r v a t i o n .
J POPULATION - A complete set of actual
or potential observations.
J
J
- A number describing a
PARAMETER
population characteristic; typically, inferred
f r o m s a m p l es t a t i s t i c .
- A subset of the population
f SAMPLE
selectedaccording to some scheme.
SAMPLE - A subset selected
J RANDOM
in such a way that each member of the
population has an equal opportunity to be
selected. Ex.lottery numbers in afair lottery
J
VARIABLE - A phenomenon that may take
on different values.
f
MEAN -The ooint in a distribution of measurements
about which the summeddeviationsare equal to zero.
Average value of a sample or population.
POPULATION MEAN
SAMPLE MEAN
o:#2*,
p: +!,*,
Note: The mean ls very sensltlveto extrememeasurementsthat are not balancedon both sides.
I WEIGHTED MEAN - Sum of a setof observations
multiplied by their respectiveweights, divided by the
sum of the weights:
9, *, *,
-LWEIGHTED MEAN
O SUM OF SOUARES fSSr- Der iationstiom
andsummed:
the mean.squared
(I r,),
- li. r x ) ' o r I x i ',- t
PopulationSS:I(X
N
_ r, \,)2
SS:I(xi -x)2or Ixi2--Sample
O VARIANCE - The averageof squaredifferandtheir mean.
encesbetweenobservations
SAMPLEVARIANCE
POPULANONVARIANCE
,\r*'
; : n u m b e ro f
w h e r ex r , : w e i g h t , ' x ,- o b s e r v a t i o nG
o b s e r v a i i o ng r d u p s . ' C a l c u l a t ef dr o m a p o p u l a t i o n .
sample.or gr6upings in a frequencydistribution.
Ex. In the FrequencVDistribution below, the meun is
80.3: culculatbd by- using frequencies for the wis.
When grouped, use clossmidpointsJbr xis.
J MEDIAN - Observationor potenlialobservationin a
set that divides the set so that the same number of
observationslie on each side of it. For an odd number
of values.it is the middle value; for an even number it
is the averageof the middle two.
Ex. In the Frequency Distribution table below, the
median is 79.5.
f MODE - Observationthat occurs with the greatest
tiequency. Ex. In the Frequency Distributioln nble
below. the mode is 88.
VARIANCESFOH GBOUPEDDATA
SAMPLE
POPUIATION
^{G-'{G
o2:*i
lI
;_r
t , ( r , - p) t s 2 = ; 1 i t i l m ' - x ; 2
t=1
D STANDARD DEVIATION - Squareroot of
the variance:
o-
Ex. Pop. S.D.
n
Y
I
U
fi
GROUpITG
OF DATA
Shows the number of times each observation
occurs when the values ofa variable are arranged
in order according to their magnitudes.
J il {il,
x
f
1
t
- Interval Scale- a quantitative scale that
permits the use of arithmetic operations. The
zero point in the scale is arbitrary.
ut
BISTRI.
tr CUMULATUEFREOUENCY
t
11
74 11f
x
65
o
11111
75 1111
66
1
11
x
x
f
0 85
0 86
1
76
11
67
o
77
111
68
1
11 87
1
7A
I
69
111
0 88
0 89
1111111
79
11
94
111
80
93
I
92
0 91
96
95
o Histogram - a form of bar graph used rr ith
interval or ratio-scaled variables.
I a rrI.)'A .l b]|, K I 3artl LQ
100 1 83
99
98
gl
D BAR GRAPH - A form of graph that uses
bars to indicate the frequency of occurrence
of observations.
11
81
1
11
1
82
I
70 1111
71
0
72
11
BUTION -A distributionwhich showsthetotal frequencythrough the upper real limit of
eachclass.
tr CUMUIATIVE PERCENTAGE DISTRI.
BUTION-A distributionwhich showsthetotal percentagethrough the upper real limit of
eachclass.
73 111
EilSTRIBUTION
FREOUENCY
II GROTJPED
- A frequency distribution in which the values
ofthe variable have been grouped into classes.
I il {.ll
CLASS
f
I
65-67
3
8
5
9
6
4
8
8
6
1
2
2
6&70
71-73
CLASS
98-100
f
CLASS
t
lNl.l'tlz
7+76
Tt-79
80-82
83-85
86-88
89-91
92-g
95-97
9&100
!I!
Cum f
3
11
16
25
31
35
43
51
57
58
60
62
- R.atio Scale- same as interval scale excepl
that there is a true zero point.
D FREOUENCY CURVE - A form of graph
representing a frequency distribution in the form
of a continuous line that traces a histogram.
o Cumulative Frequency Curve - a continuous
line that traces a histogram where bars in all the
lower classes are stacked up in the adjacent
higher class. It cannot have a negative slop€.
o Normal curve - bell-shaped curve.
o Skewed curve - departs from symmetry and
tails-off at one end.
llrfGl:
"
4.84
15
17.74
25.81
10
40.32
50.00
-t
\
0 -att?
56.45
69.35
CURVE
SKEWED
82.26
91.94
93.55
96.77
N O R M A LC U R V E
^/T\
./
\
15
/
10
100.00
-/
J-
0
--
\
\
LEFT
\
\
z
)
RANDOM VARIABLES
Probability of occurrence^t at -Number of outcomafamring
[email protected]
oif'ent'l
EwntA
A mapping
'onlv or function that assignsone and
one-numerical value to each
outcome in an exPeriment.
D SAMPLE SPACE - All possibleoutcomesof an
experiment.
N TYPE OF EVENTS
o Exhaustive - two or more events are said to be exhaustive
if all possible outcomes are considered.
Symbolically, P (A or B or...) l.
-two or more events are said to be nonrNon-Exhausdve
exhaustive if they do not exhaust all possible outcomes.
Exclusive - Events that cannot occur
rMutually
simultaneously:p (A and B) = 0; and p (A or B) = p (A) + p (B).
Ex. males, females
Exclusive - Event-s that can occur
oNon-Mutually
simultaneously: p (A orB) = P(A) +p(B) - p(A and B)'
&x. males, brown eyes.
- Events whose probability is unaffected
Slndependent
by occurrence or nonoccurrence of each other: p(A lB) =
p(A); ptB In)= p(e); and p(A and B) = p(A) p(B).
Ex. gender and eye color
- Events whose probability changes
SDependent
deoendlns upon the occurrence or non-occurrence ofeach
other: p{.I I bl dilfers lrom AA): p(B lA) differs from
p ( B ) ; a n dp ( A a n dB ) : p ( A ) p ( B l A ) : p ( B ) A A I B )
Ex. rsce and eye colon
C JOINT PROBABILITIES - Probabilitythat2 ot
more eventsoccur simultaneously.
tr MARGINAL PROBABILITIES or Unconditional Probabilities= summationof probabilities'
- Probability
PROBABILITIES
D CONDITIONAL
of I given the existence of ,S, written, p (Al$.
fl EXAMPLE- Given the numbers I to 9 as
observations in a sample space:
.Events mutually exclusive and exhaustive'
n nurnbers)
Example: p (all odd numb ers); p ( all eu-e
.Evenls mutualty exclusive but not exhaustiveExample: p (an eien number); p (the numbers 7 and 5)
.Events ni:ither mutually exclusive or exhaustiveExample: p (an even number or a 2)
tl DISCRETE RANDOM VARIABLES - Involvesrulesor probabilitymodelsfor assigning or generatingonly distinctvalues(not fractionalmeasurements).
C BINOMIAL DISTRIBUTION - A model
for the sum of a seriesof n independenttrials
wheretrial resultsin a 0 (failure) or I (success).Ex. Coin to p ( r ) = ( ! ) n ' l - t r l " - '
"t
in n
wherep(s) is the probabilityof s success
trials with a constantn probability per trials,
n!
-a"n- dw" h
' -e' -r e(t s, 1/ \ =s,! ( n - s ) !
Binomial mean: !: nx
Binomial variance: o': n, (l - tr)
A s n i n c r e a s e s ,t h e B i n o m i a l a p p r o a c h e st h e
Normal distribution.
DISTRIBUTION
D HYPERGEOMETRIC
A model for the sum of a series of n trials where
each trial results in a 0 or I and is drawn from a
small population with N elements split between
N1 successesand N2 failures. Then the probability of splitting the n trials between xl successes
and x2 failures is:
Nl!
{_z!
p(xlandtrr:W
't
4tlv-r;lr
Hypergeometric mean: pt :E(xi
-
+
andvariance:o2:
ffit+][p]
D POISSON DISTRIBUTION - A model for
the number of occurrences of an event x :
0 , 1 , 2 , . . . ,w h e n t h e p r o b a b i l i t y o f o c c u r r e n c e
is small, but the number of opportunities for
t h e o c c u r r e n c ei s l a r g e , f o r x : 0 , 1 , 2 , 3 . . . .a n d
)v > 0 . otherwise P(x) =. 0.
e$t=ff
tr LEVEL OF SIGNIFICANCE-Aprobabilin
distribution.
rarein thesampling
valueconsidered
whereoneis
specifiedunderthenull hypothesis
theoperationof chance
willing to acknowledge
factors. Common significance levels are 170,
5 0 , l 0 o . A l p h a ( a ) l e v e l : t h e l o w e s tl e v e
for which the null hypothesis can be rejected.
The significanceleveldeterminesthecritical region.
[| NULL HYPOTHESIS (flr) - A statement
that specifies hypothesized value(s) for one or
more of the population parameter. lBx. Hs= a
coin is unbiased.That isp : 0.5.]
HYPOTHESIS (.r/1) - A
tr ALTERNATM
statement that specifies that the population
parameter is some value other than the one
specified underthe null trypothesis.[Ex. I1r: a coin
is biased That isp * 0.5.1
HYPOTHESIS I. NONDIRECTIONAL
an alternative hypothesis (H1) that states onll
that the population parameter is different from
the one ipicified under H 6. Ex. [1 f lt + !t0
Two-Tailed Probability Value is employed when
the alternative hypothesis is non-directional.
- an
HYPOTHESIS
2. DIRECTIONAL
alternative hypothesis that statesthe direction rn
which the population parameter differs fiom the
one specified under 11* Ex. Ilt: Ir > pn r-trHf lr ' t1
One-TailedProbability Value is employedu'hen
the alternative hypothesis is directional.
PROOF - Stnct
D NOTION OF INDIRECT
interpretation ofhypothesis testing reveals that thc'
null hypothesis can never be proved. [Ex. Ifwe toi.
a coin 200 times and tails comes up 100 times. it i s
no guarantee that heads will come up exactly hali
the time in the long run; small discrepancies migfrt
exist. A bias can exist even at a small magnitude.
We can make the assertion however that NO
BASIS EXISTS FOR REJECTING THE
THE
COIN IS
HYPOTHESIS THAT
UNBIASED . (The null hypothesisis not reieued.
When employing the 0.05 level of significa
reject the null hypothesis when a given res
occurs by chance 5% of the time or less.]
] TWO TYPES OF ERRORS
- Type 1 Error (Typea Error) = the rejectionof
11,whenit is actuallytrue.The probabilityof
a type 1 error is givenby a.
-TypeII Error(TypeBError)=The acceptance
offl, whenit is actuallyfalse.Theprobabilin
of a type II error is given by B.
P o i s s o nm e a n a n d r a r i a n c e : , t .
Fo r c ontinuo u s t'a ri u b I es. .fi'eq u en t' i es u re e.tp re ssed
in terms o.f areus under u t'ttt.re.
fl SAMPLING DISTRIBUTION - A theoretical
probability distribution of a statistic that would
iesult from drawing all possible samples of a
given size from some population.
THE STAIUDARDEBROR
OF THE MEAN
A theoretical standard deviation of sample mean of a
given sample si4e, drawn from some speciJied population.
DWhen based on a very large, known population, the
s t a n d a r de r r o r i s :
6
" r__ _ o
^ ln
EWhen estimated from a sample drawn from very large
population, the standard error is:
O =^ =
t-
S
'fn
lThe dispersion of sample means decreasesas sample
size is increased.
D CONTINUOUS RANDOM VARIABLES
- Variable that may take on any value along an
uninterrupted interval of a numberline.
- bell cun'e;
D NORMAL DISTRIBUTION
a distribution whose values cluster symmetrically around the mean (also median and mode).
f(x)=-1,
o"t'2x
(x-P)212o2
wheref (x): frequency.at.a givenrzalue
: s t a n d a r dd e v i a t l o no f t h e
o
distribution
lt
p
x
: approximately
I 111q
2.7183
approximately
: the meanof the distribution
: any scorein the distribution
D STANDARD NORMAL DISTRIBUTION
- A normalrandomvariableZ. thathasa mean
deviationof l.
of0. andstandard
Q Z-VALUES - The numberof standarddevialiesfrom themean:
tionsa specificobservation
': x- 11
(for sample mean X)
rlf x 1, X2, X3,... xn , is a simple random sample of n
elements from a large (infinite) population, with mean
mu(p) and standard deviation o, then the distribution of
T takes on the bell shaped distribution of a normal
random variable as n increases andthe distribution ofthe
7-!
ratio:
6l^J n
approaches the standard normal distribution as n goes
to'infinity. In practice.a normal approximation is
acceptable for samples of 30 or larger.
Percentage
Cumulative Distribution
for selected
Z-value
PercentifeScore
Z values
under a normal
curye
-3 -2
-l
0 +1 +2 +3
o-13 2.2a 15.87 50.00 a4.13 97.72 99.a7
Critical
when
region
for rejection
u : O-O7. two-tailed
.2.b8
NBIASEDNESS- Propertyof a reliableesimator beins estimated.
o Unbiased Estimate of a Parameter - an estimate
that equals on the averagethe value ofthe parameter.
Ex. the sample mesn is sn unbissed estimator of
the population mesn.
. Biased Estimate of a Parameter - an estimate
that does not equal on the averagethe value ofthe
parameter.
Ex. the sample variance calculated with n is a bissed estimator of the population variance, however,
x'hen calculated with n-I it is unbiused.
J STANDARD ERROR - The standard deviation
of the estimator is called the standard error.
Er. The standard error forT's is. o: = "/F
X
This has to be distinguished from the STAND.A,RDDEVIATION OF THE SAMPLE:
' The standard error measuresthe variability in the
Ts around their expectedvalue E(X) while the stanJard deviation of the sample reflects the variability
rn the sample around the sample'smean (x).
I -r L'sBDwHEN THE STANDARDDEvIAI rtoN IS UNKNOWN-Useof Student'sr.
from
f Wheno is notknown,itsvalueis estimated
F
f'
s a m o l ed a t a .
jm
t-ratio- the ratio employedil thq.testingof
significancebfa
Ivpotheses or determiningthe
Vrri'erence betweenmeafrsltwo--samplecase)
inrolving a samplewith a i-distribuiion.The
tbrmulaTs:
\
F
where p : population mean under H6
SX
r = . sl r l
"n6
oDistribution-symmetrical distribution with a
mean of zero lnd standard deviation that
annroachesone as degreesoffreedom increases
' i . i . a p p r o a c h e st h e Z d i s t r i b u t i o n ) .
. A , s s u m p t i o na n d c o n d i t i o n r e q u i r e d i n
r\suming r-distribution: Samplesare diawn from
a norm-allv distributed population and o
rpopulation standard deviatiori) is unknown.
o Homogeneity of Variance- If.2 samples are
b e r n c c o m o a r e d t h e a s s u m p t i o ni n u s i n g t - r a t i o
r' th?t the variances of the populatioi's from
* h e r e t h e s a m p l e sa r e d r a w n a r e e q u a l .
o E s t i m a t e d6 X - , - X ,( t h a t i s s x , - F r ) i s b a s e do n
thc unbiasedestimaie of the pofulaiion variance.
o Degrees of Freedom (dJ\-^thenumber of values
that are free to vary after placing certain
restrictions on the data.
Example. The sample (43,74,42,65) has n = 4. The
sum is 224 and mean : 56. Using these 4 numbers
and determining deviationsfrom the mean, we'll have
J deviations namely (-13,18,-14,9) which sum up to
:ero. Deviations from the mean is one restriction we
have imposed and the natural consequence is that the
sum ofthese deviations should equal zero. For this to
happen, we can choose any number but our freedom
to choose is limited to only 3 numbers because one is
restricted by the requirement that the sum ofthe deviations should equal zero. We use the equality:
(x, -x) + (x2-9 + ft t- x) + (xa--x): 0
q mean of 56,iJ'thefirst 3 observqtionsure
given
So
I
43, 74,und 42, the last observationhus to be 65. This
single restriction in this case helps us determine df,
Theformula is n lessnumber of restrictions. In this
t'ase,it is n-l= 4-l=3df
. _/-Ratiois a robust test- This meansthat statistical
inferencesarelikely valid despitefairly largedepartures
from normality in the population distribution. If normality of populationdistributionis in doubt,it is wise
to increasethe samplesize.
tr USED WHEN THE STANDARD DEVIATION IS KNOWN: When o is known it is possible to describethe form of the distribution of
the samplemeanasa Z statistic.The samplemust
be drawn from a normal distribution or have a
samplesize(n) of at least30.
whereu : populationmean(either
=r
-!
',.6 =
nro#rf or hypothesizedunder Ho) and or = o/f,.
o Critical Region - the portion of the areaunder
the curvewhich includesthosevaluesof a statistic
that leadto the rejectionofthe null hypothesis.
- The most often used significance levels are
using z0.01,0.05,and0.L Fora one-tailedtest
statistic,thesecorrespondto z-valuesof 2.33,
For a two-tailedtest,
1.65,and 1.28respectively.
the critical regionof 0.01 is split into two equal
outerareasmarkedby z-valuesof 12.581.
Example 1. Given a population with lt:250
and o: S0,whatis theprobabili6t of drawing a
sample of n:100 values whosemean (x) is at
least255?In this case,2=1.00.Looking atThble
A, the given areafor 2:1.00 is 0.3413. Tb its
right is 0.1587(=6.5-0.i413)or 15.85%.
Conclusion: there are spproximately 16
chancesin 100 of obtaining a sample mean :
255from this papulation when n = 104.
Example 2. Assume we do not know the
population me&n. However, we suspect that
it may have been selectedfrom a population
with 1t= 250 and 6= 50,but we are not sure.
The hypothesis to be tested is whether the
sample mean was selectedfrom this populatian.Assume we obtainedfrom a sample (n)
of 100, a sample ,neen of 263. Is it reasonable to &ssante that this sample was drawn
from the suspectedpopulation?
= 250 (that the actualmeanof the popu| . H o'.1t
lationfrom which the sampleis drawn is equal
to 250) Hi [t not equal to 250 (the alternative
hypothesiSis that it is greaterthan or less than
250, thus a two-tailed test).
2. e-statisticwill be usedbecausethe population o is known.
3. Assumethe significancelevel (cr)to be 0.01.
Looking at Table A, we find that the area beyond a z of 2.58 is approximately0.005.
To reject H6atthe 0.01levelof significance,t}reabsolutevalue of the obtainedz must be equalto or
greaterthanlz6.91l
or 2.58.Herethevalueof z correspondingto samplemean: 263is 2.60.
of Hs
test
+2.58
O
Normal
CurveAreas
Table
A
area tron
mean to z
.00
.ol
.o2
.(x!
.o4
.o5
,06
.o7
.6
.o9
o o o o . 0 0 4 0 . 0 0 8 0 . 0 1 2 0 0 1 6 0 0 1 9 9 . 0 2 3 9. 0 2 7 9 . 0 3 1 9. 0 3 5 9
0 3 9 8 0 4 3 8 . O 4 7 4. O 5 1 7 0 5 5 7 . 0 5 9 6 . 0 6 3 6 . 0 6 7 5 . 0 7 1 4 . 0 7 5 3
0 7 9 3 0 8 3 2 . 0 8 7 1 0 9 1 0 0948 .0987 .1026 .'tO64 .1'lO3 .1141
1 1 7 9 1 2 1 7 1 2 5 5. 1 2 9 3
1554 1591 1624.1664
o.o
o-l
o.2
o.3
o.4
.4452
.4554
.4641
.47't3
.4772
.4821
.4A61
.4893
.4918
.4938
.lgsg
.4965
,gYOC
.4974
.4974
2.1
2.2
2.3
2.4
2.5
.4463 .4474
.4564 .4573
.4649 .46S
.4719 .4726
.4774 .4743
.4826.4830
.4A64 .4A68
.4a96 .4898
.4920 .4922
.4940 .4941
.+955 .+ss6
..4966
+YOO
..4967
{Vot
.4975 .4976
.4975
.4976
.4444
.4542
.4664
.4732
.4744
.4834
.4871
.4901
.4925
.4943
assz
.4WO
.4968
.4977
.4977
.4495
.4591
.4671
.4734
.4793
.+aga
.4875
.4904
.4927
.4945
a959
.9VqY
.4969
.4977
.4977
.4505
.4599
.4674
.4744
.4798
tetz
.4A78
.4906
.4929
.4946
.+goo
.+VrV
.4970
.4978
.4978
.4515
.460a
.4646
.4750
.4803
aaa6
.4aal
.4909
.4931
.4944
.+s6r
.+rt
.4971
|
.4979
.4979
.4525
.4616
.4693
.4756
.4aAa
.taso
.4884
.4911
.4932
.4949
.+s6z
,1Vt 4
.4972
.4979
.4979
,4535
.4625
.4699
.4761
.4812
ZSsa
.4887
.4913
.4934
.4951
.csi6g
.{VtJ
.4973
.4980
.4980
.4545
.4633
.4706
.4767
.4417
ABs?
.4A90
.4916
.4936
.4952
.4964.+Vr+
.4974
l
1
.4S81 l
.4S81
.4981 .49A2 .4982 .49a3 .4984 .4984 .4985 .4985 .4986 .4986 l
.4987 .4991 .4987 .4988 .4988 .4989 .4949 .4989 .4990 .4990
Example. Given x:l08, s:l5, and n-26 estimatea
95% confidence interval for the population mean.
Since the population variance is unknown, the t-distribution is used.The resultinsinterval.usins a t-valve
of 2.060 fromTable B (row 25 of the middle-column),
is approximately 102 to 114. Consequently,any hypothesizedp between 102 to 114 is tenableon the
basis of this sample.Any hypothesizedprbelow 102
or above 114 would be rejectedat 0.05 significance.
O COMPARISON
BETWEEN I AND
z DISTRIBUTIONS
Althoueh both distributions are svmmetrical about
a meanbf zero, the f-distribution is more spread out
than the normal di stributi on (z-distributioh).
Thus a much larger value of t is required to mark off
the bounds of the critical region <if rejection.
As d/rincreases, differences between z- andt- distributions are reduced. Table A (z) may be used
instead of Table B (r; when n>30. To Lse either
table when n<30,the sample must be drawn from
a normal population.
tr CONCLUSION-Sincethisobtainedz fallswithin
thecriticalregion,we mayrejectH oatthe0.01level
of significance.
Ta^bJs
'!4
B*=l.evel
.1
tr CONFIDENCE
INTERVAL- Interval within
which we may consider a hypothesis tenable.
Common confidence intervals are 90oh,95oh,
and 99oh. Confidence Limits: limits definins
the confidence interval.
(1- cr)100% confidence interval for rr:
ii, *F t l-il. l,<i +z * ("1{n)
w h e r eZ - , i s t h e v a l u e o f t h e
standard normal variable Z-that puts crl2 percent in each tail of the distribution. The confidence interval is the complement of the critical
regions.
A t-statistic may be used in place of the z-statistic
when o is unknown and s must be used as an
estimate. (But note the caution in that section.)
1
.
3
4
-6
.Z
d
:e10
.
!t
,13
.14
.1
_
_]
q
17
18
t
20
,aa
24
.25
,?s
27
2.
.29
- 30
ini,
.
-
o.o25
o.o5
12.706
4.3Q3
3 . 18 2
2.776
2.571
2.447
0.o1
olo2
3r.tz1
6.965
4.541
1747
3.305
3.143 _
0.oo5
o.ol
6i.657
9-925
5.441
4.604
4-O3Z
3.707 _
2.994
3.499
2.306 _
?.?42
2-2?E
2.201
2..11779 9
2
2.896
2.82L _
2.76L _
2.714
2..66AA1 1
2
3.355
3.250
3.169
3.106
3..00555 5
3
z. r oo
2.650
2.145 _ 2.924
2.'131 ,
2.60?
2:12o _
?,qe3
2,110
2.5Q7
2.1o1
2.552
2.Q93_2.5392.OqQ 2.524
2.514
2.OQO
2.5Q8
2Bf4
2.o6s
2.5Oq
?.e64 _2.4e2
2.060 _ 2.1qF
2.479
?.o50
2.052
2.423
2.044
2.467
2.Q45
?,482
2.042
2.457
.
2.
_
_
-
g.ot z
?.9f7
2.947
2.921
?.e98
2.474
461
2.445
2.431
2.a1e
2-AA7
?.1e7
2.7Q7
2.779
2.771
2.763
Z.lsE
2.750
GORRELATION
OF THE
N SAMPLING DISTRIBUTION
DIFFERENCE BETWEEN MEANS- If a number of pairs of samples were taken from the same
population or from two different populations, then:
r The distribution of differences between pairs of
samplemeanstendsto be normal (z-distrjbution).
r The mean of these differences between means
F1, 1"is equal to the difference between the
population means. that is ltfl-tz.
or and ozure known
I Z-DISTRIBUTION:
o The standard error ofthe difference between means
o", - ={toi) | \ + @',)I n2
",
o Where (u, - u,) reDresents
the hvpothesizeddifferencein rirdan!.'ihefollowins statisticcan be used
for hypothesis tests:
_ (4-tr)-(ut- uz)
z=
"r.r,
o When n1 and n2 qre >30, substifuesy and s2 for
ol and 02. respecnvely.
(j;.#)
(To-qbtain sum of squaygs(SS) see Measures of Central Tendencyon'page l)
D POOLED '-TEST
o Distribution is normal
on<30
(
z
tU
I
0
r o1 and 02 are zal known but assumed equal
- The hvoothesistest mav be 2 tailed (: vs. *) or I
talled:.1i.is1t, andrhe.alternativeis 1rl > lt2 @r 1t, 2
p 2 and the alternatrves y f p2.)
- degreesof freedom(dfl : (n
-2.
r-I)+(n y 1)=n fn 2
formula below for estimating 6riz
giyen
;U.r11!9
to determrnest,-x-,.
- Determine the critical region for reiection by aslevel-ofsisnificdnceand[ooksieningan acceptable
in! at ihe r-tablb with df, nt + i2-2.
o_Use the following formula for the estimated standard error:
(n1- l)sf +(n2- l)s
nI+n2
n1*n2-2
n tn2
(
D HETEROGENEITY OF VARIANCES mav
be determinedby using the F-test:
o
s2lllarger variance'1
stgmaller
variance)
D NULL HYPOTHESIS- Variancesare equal
and their ratio is one.
HYPOTHESIS-Variances
TIALTERNATM
differ andtheir ratio is not one.
f, Look at "Table C" below to determine if the
variancesare significantly different from each
other. Use degrees of freedom from the 2
nyI).
samples:(n1-1,
D STANDARD ERROR OF THE DIFFERENCE betweenMeansfor CorrelatedGroups.
The seneralformula is:
^2
^2
n
""r* " 7r- zrsrrsr,
where r is Pearsoncorrelation
o By matching samples on a variable correlated
with the criterion variable, the magnitude of the
standard error ofthe difference can be reduced.
o The higher the correlation, the greater the
reduction in the standard error ofthe difference.
D PURPOSE- Indicatespossibility of overall
meaneffectof the experimentaltreatmentsbefore
investigatinga specific hypothesis.
D ANOVA- Consistsof obtaining independent
estimatesfrom populationsubgroups.It allowsfor
the partition of the sum of squaresinto known
components
of variation.
D TYPES OF VARIANCES
' Between4roupVariance(BGV| reflectsthemagnihrdeof thedifference(s)amongthegroupmeans.
' Within-Group Variance (WGV)- reflectsthe
dispersionwithin eachtreatmentgroup. It is also
referredto asthe error term.
tI CALCULATING VARIANCES
' Following the F-ratio, when the BGV is large
relativeto theWGV. the F-ratiowill alsobe larse.
o2(8i'r,,')'
66y=
k-1
: mean of ith treatment group and xr.,
of all n values across all k treatmeiii
WGV:
.ss,+ss, *...+^ss*
"_O
wherethe SS'sare the sumsof squares(seeMeasures of Central Tendencyon page 1) of each
subgroup'svaluesaroundthe subgroupmean.
D USING F.RATIO- F : BGV/WGV
1 Degreesof freedomare k-l for the numerator
and n-k for the denominator.
' If BGV > WGV, the experimentaltreatments
are responsiblefor the large differencesamong
group means.Null hypothesis:the group means
are estimatesof a commonpopulationmean.
In random samplesof size n, the sampleproportionp fluctuatesaroundthe proportionmean: rc
proportion
with a proportionvarianceof I#9
standarderror of ,[Wdf;
Top row=.05,Bottom row=.01
points for distribution of F
of freedom for numerator
As the sampling distribution of p increases,it
concentratesmore aroundits targetmean. It also
sets closerto the normal distribution.In which
P-ft
iooo.
wqov'
z:
^Tn(|-ttyn
ISBN t 5?eeaEql-g
Definirton - Carrelation refers to the relatianship baween
two variables,The Correlstion CoefJicientis a measurethst
exFrcssesthe extent ta which two vsriables we related
Q *PEARSON r'METHOD (Product-Moment
CorrelationCoefficient) - Cordlationcoefficient
employedwith interval-or ratio-scaledvariables.
Ex.: Givenobservations
to twovariablesXandIl
we cancomputetheir corresponding
u values:
:(x-R)/s"
Z,
andZ, :{y-D/tv.
'The formulas for the Pearsoncorrelation(r):
: {* -;Xy -y)
JSSI SSJ
- Use the above formula for large samples.
- Use this formula (also knovsn asthe Mean-Devistion
Makod of camputngthe Pearsonr) for small samples.
2( z-2,,)
r:__iL
is quicker and can
Q RAW SCORE METHOD
be used in place of the first formula above when
the sample values are available.
(IxXI/)
oMost widely-used non-parametric test.
.The y2 mean : its degreesof freedom.
oThe X2 variance : twice its degreesof fieedorl
o Can be usedto test one or two independentsamples.
o The square of a standardnormal variable is a
c h i - s q d a r ev a r i a b l e .
o Like the t-distibution" it has different distributions depending on the degrees of freedom.
D DEGREES OF FREEDO|M (d.f.) .
/
v
COMPUTATION
o lf chi-pquare tests for the goodness-of'-fitto a hr p o t h e s i z ' eddi s t r i b u t i o n .
d.f.: S - I - m, where
. g:.number of groups,or classes.in the fiequener
dlstrlbutlon.
m - number of population
- s a m n l eparametersthat lnu\r
b e e s t i m a t e df r o m
s t a t i s t i c st o t e s t t h c
h y p o t h e ssi .
o lf chi-squara testsfor homogeneityor contingener
d.f : (rows-1) (columns-I)
f, GOODNESS-OF-FIT
TEST- To anolr tht-c h i - s q u a r e d i s t r i b u t i o n i n t h i s m a n h ' e r -t .h . '
c r i t i c d l c h i - s q u a r ev a l u e i s e x p r e s s e da s :
(f-_i) '
,nh"..
,
: observedfreqd6ncy ofthe variable
/p
/" - .lp..,tSd fiequency (basedon hypothesrzcd
p o p u l a t l o no l s t n D u t r o n) .
t r T E S T S O F C O N T I N G E N C Y - A p p l i c a t i o nt ' l
Chi-squarete.ststo two separatepopulaiionsto te.r
s t a t r s t l c arl n d e o e n d e n c o
ef a t t r r b u t e s .
D T E S T S O F H O M O G E N E I T Y - A p p l i c a t i o no t
Chi-square-tests
to tWpqq.mplqs
to_test'iTtheycanrc
f r o m f o p u l a t i o n s w i t h l i k e 'd i s t r i b u t i o n s .
D R U N S T E S T - T e s t s w h e t h e r a s e q u e n c e( t t r
c o m p r i s ea s a m p l e ) . i sr a n d o m .T h e ' f o l l o ui n g
e q u a t r o n sa r e a p p l r e d :
nt'n11
l2ntnr(2ntn,
,rnR, =
r =, ,2r ", n
' r1- ' I r , t. '.tls
" r r J 1 " , * " r 1 21 n r + n , l 1
Where
tlltt
ilnfift
,llilJllflill[[llllillll
= mean number of runs
n, : number of outcomes of one type
n2 = number of outcomes of the other type
S4 = standard deviation of the distribution of the
numDer oI runs.
7
(
z
quickstudy.com
U .5.$4.95 / C A N .$7. 50
Febr uar y2003
lllllilllilffillll[l
\Oll('EIOSTIDENT:ThisQ{/(/t
chartcorersthebasrcsofIntroduer,,r'
t r s t i c s .D u e t o i 1 : c o n d c n \ c d l o r n i a t . h c t u e rc r . u s c i t r s r S t a t i s t i c sg r r i r l r , l r r dn o l a \ a r ( p l a ( r
nrrnt lor assigned course work.t l0(ll. tsar( hart\. lnc. Boca Raton. FI
C u s t o m e rH o t l i n e# 1 . 8 0 0 . 2 3 0 . 9 5 2 2
\ l r n u l l i e t u r e cb
l r I l n r i n a t r n gS e r rr c e s .I n c . l . a r g o .f l . i r n d s o l d u n d e r l i c e n s et f o r f I . r . 1
f l o l d r n g : . L L ( . . P r l o \ l t o . ( . \ l S . { u n d e r o n e o r n r o r e o l t h e t b l l o $ i n g P aL t e
S nPt sr r: r r
5 . 0 6 - 1 . 6 1o- r 5 . l l . l . - 1 l l
Download