Class Notes in pdf

advertisement
PSYCH 230 – STATISTICS
1) If you are already registered sit down.
2) If you are on the waiting list or just showed up,
stay standing and we will see how many seats are
available.
3) We will start adding using the waiting list.
PSYCHOLOGY 230 - STATS
Elizabeth Krupinski, PhD
Depts. Radiology & Psychology
112 Radiology Research Building
626-4498
krupinski@radiology.arizona.edu
http://radiology.arizona.edu/krupinski/psychology-230measurement-statistics
•
•
•
•
•
•
•
•
North on Cherry
Left on Drachman
First right = Ring
Road but no signs
Around bend
Lot #1 (blue) on
right
Driveway into
fence on right
Radiology
Research Bldg
Room 112
Rad
Res
112
Ring Road
Drachman
Speedway
CAMPUS
PREREQUISITES
1) Psych 101 or IND 101
2) Math 110 – college algebra

+, x, -, ÷, √,  , , | |

positive vs negative numbers

order of operations

rounding: < 5 down, > 5 up

decimals: 2 places on quizzes
QUIZZES
4 quizzes
- each 25% of your grade
- 100 points each
- all of them count (none dropped)
~ 1/3 fill-in-the-blank
- comprehension of concepts
- ability to apply principles, terms, etc.
~ 2/3 problems
- ability identify appropriate equations
- ability carry out required math
- ability use statistical tables
- ability reach proper conclusions
formulas & tables provided on quizzes
EXTRA CREDIT
Assignments from Aplia
15 POINTS MAXIMUM!!!!!!
Final grade = (4 quiz grades + extra credit)/4
TEXTS
Class notes: buy in the bookstore (required)
http://radiology.arizona.edu/sites/radiology.ari
zona.edu/files/u3/notes2013.pdf
Book: Fundamental Statistics for the
Behavioral Sciences 8th Ed 2014
David C. Howell
Wadsworth Cenage Learning
CALCULATORS
DO NOT FORGET TO BRING YOUR
CALCULATOR TO THE QUIZZES!!!!!!
Required:
+, -, x, , 
Helpful:
X (sometimes )- mean
S (SD) - standard deviation (sometimes is )
X - sum X
X2 - sum X squared
N or n – number
BASIC MATH REVIEW
2+2=4
2 + (-2) = 0
(-2) + (-2) = (-4)
2x2=4
2 x (-2) = (-4)
(-2) x (-2) = 4
2–2=0
2 – (-2) = 4
(-2) – (-2) = 0
2/2 = 1
2/(-2) = (-1)
(-2)/(-2) = 1
22 = 4
(-2)2 = 4
√4 = 2
√(-4) = error
ALSO REFER TO APPENDIX A
IN THE BOOK
- +
+ +
_ _
+ -
GRAPHING QUADRANTS
true limits = + / - ½ the unit of measurement
i = (hi - lo + 1) / # groups
midpoint = (hi true + lo true) / 2
PR = cumfll + ((X - Xll) / i)(fi)
N
x 100
cumf = (PR x N) / 100
X = Xll + [[i (cumf - cumfll)] / fi]
cumfll = cum freq at lower true limit of X
X = score
Xll = score at lower true limit of X
i = width
fi = # cases in X's group
N = total # scores
- Sam wants to find out if the number of hours people study has any
effect on their grade.
- Mary wants to find out if gender has any influence on math and
verbal SAT scores.
- Dr. Jones wants to find out if her current class performs any
differently on the final compared to all past students.
- A large pharmaceutical company wants to know if their new drug
for controlling OCD is effective.
Chapters 1 & 2: Intro & Basics
- statistics: the process of collecting data & making decisions
based on the analysis of these data
descriptive  inferential (generalize)
Common Terms
- constant: # representing a construct that does not change
(e.g., ); we will see these in some formulas
- variable: measurable characteristic that changes with
person, environment, experiment e.g., height, IQ, learning
(X or Y)
- independent variable (IV): variable examined to determine
its effect on outcome of interest (DV); under control of
experimenter - manipulated variable; e.g., dose of a drug
- dependent variable (DV): outcome of interest measured to
assess effects of IV; not under experimenter control; e.g.,
how a person reacts to the drug
- subject or organismic variable: naturally occurring IV;
characteristic of people but not controlled e.g., eye color,
gender
- data: numbers, measurements collected
- population: complete set of people/objects having some
common characteristic
- parameter: value summarizing characteristic of
population; are constants; use Greek letters to represent
- sample: subset of population, share same characteristics
- statistic: value summarizing characteristic of a sample; are
variable; use Roman letters to represent
- simple random sample: subset of population selected so that
each population member has = & independent chance of
being chosen
- random assignment: assign subjects to treatments in = &
independent manner to avoid bias
- confounding: where DV is affected by variable related to IV
so can't assume that IV causes DV effects
Group 1
Lecture 3x/week
vs
Taught by Dr. Smith
Group 2
lecture 2x/week Lab 1x/week
Taught by Dr. Jones
Results: group #2 performs better on final exam
Conclude: lecture + lab > lecture alone
WRONG!!!! Confounded by different teachers as well as
format differences
CHAPTERS 1 & 2 – HOMEWORK PART 1
NOT IN BOOK
1) Indicate whether each is a statistic, data, or inference.
a. Sample of 250 workers earn an average of $13,887
b. Based on sample of 500 workers in Tucson it is
believed that average income of all workers is
$21,564
c. A series a pitches go 98, 93 and 100 mph
d. Ann’s tuition was $13,788 and Bud’s was $14,986
e. Based on a survey it is believed that 33,566,876
people watched last year’s Super Bowl
2) Indicate whether each is a variable or constant
a. Number days in July
b. Number shares traded on NYSE on different days
c. Age freshman entering college
d. Time to complete an assignment
e. Age someone eligible to vote in national election
f. Scores on a 100 point quiz
g. Amount money spent on textbooks by students
3) What is the difference between sample and population?
a. Can a population have only 20 subjects?
4) A researcher studies risk taking behavior by taking a
random sample of male undergrads at a large
university. She gives them a standardized test to assess
this behavior. Based on the study would like to make
inferences about other male undergrads at the
university.
a. Are the students
i. Population, statistic, parameter or sample
b. Is measured risk behavior
i. Statistic, variable, parameter or sample
c. The individual scores obtained are
i. Data, sample, statistics or population
d. The average score of the sample are a
i. Parameter, statistic, variable, or data
e. When we generalize from the sample to make
inferences about
i. Parameter, variable, data or population
f. The average for all undergrads would be a
i. Parameter, variable, data or population
5) Classify each as manipulated, subject or not a variable
a. Amount drug used in a study
b. Value pi
c. Number days in a week
d. Diagnostic categories patients in a study
e. Gender
f. Amount reinforcement
g. Method instruction
h. Hours food deprivation
i. Scores on a test
j. Mood of subjects
NOT IN BOOK ANSWERS
1. a. statistic
b. inference
c. data
d. data
e. inference
2. a. constant
b. variable
c. variable
d. variable
e. constant
f. variable
g. variable
3. all vs subset & yes
4. a. sample
b. variable
c. data
d. statistic
e. populations
f. parameter
5. a. manipulated
b. not variable
c. not variable
d. subject variable
e. subject variable f. manipulated
g. manipulated
h. manipulated
i. subject variable j. subject variable
- Fred wants to find out what types of pets college students have.
- Alice wants to find out if birth order has any effect on GPA.
- Mike wants to look at temperature effects on ice cream
consumption.
- Sally wants to see how fast rats run through a maze as a function
of reward type at the end.
- Rick wants to examine how many kids people have today
compared to 50 years ago.
- Mary wants to examine how tall people are compared to 50 years
ago.
Basic Concepts
- X or Y: symbol for a variable
- Xi or Yi: represents individual observation
- N or n: # data points in a set, number
- : indicates summation
EXAMPLES (X = group 1 kids, y = group 2 kids)
X1 = 4 X2 = 6 X3 = 1 X4 = 5 X5 = 2 X6 = 3
Y1 = 3 Y2 = 4 Y 3 = 6 Y4 = 1
6
a)  Xi = 1 + 5 + 2 + 3 = 11
i=3
Where you stop
Where you start
3
b)  Yi = 3 + 4 + 6 = 13
i=1
6
* c)  Xi2 = 52 + 22 + 32 = 25 + 4 + 9 = 38
i=4
NOT THE
SAME !!!!
6
* d) ( Xi)2 = (5 + 2 + 3)2 = 102 = 100
i=4
N
e)  Xi = 6 + 1 + 5 + 2 + 3 = 17
i=2
N = go to the end; use all #s from start point
types of measurement scales (like inches vs cm)
a) nominal: qualitative (name); mutually exclusive without
logical order (cat, dog, fish)
b) ordinal: mutually exclusive with logical rank ordering
(<,>) (1st grade, 2nd grade; captain, major, colonel)
c) interval: quantitative with = units of measurement and
arbitrary (imaginary) zero point (thermometer, calendar); =
intervals between objects represent = differences (differences
are meaningful – diff between 10 & 20 deg same as 80 & 90)
d) ratio: quantitative with = units of measurement and
absolute (real) zero point (height, weight, length) (ratios are
meaningful)
some more terms
- reliability: degree to which repeated measurements in same
conditions give same results
- measurement error: uncontrolled recording error
- validity: accuracy test/measure actually measures thing of
interest
- discontinuous (discrete) variables: only whole #s allowed
e.g., # kids
- continuous variables: any values allowed
a) true limits: #s that limit where true value lies
+ / - ½ the unit of measurement
- to get unit of measurement
1) no decimals: # by which set increases
e.g., 3,4,5,6 => unit = 1
½ = 0.5 (limit value)
3 + 0.5 = 3.5 (upper limit) 3 - 0.5 = 2.5 (lower limit)
5,10,15,20 => unit = 5 5/2 = 2.5 (limit value)
10 + 2.5 = 12.5 (upper limit) 10 - 2.5 = 7.5 (lower limit)
2) decimals: a) anything to left = 0
b) last # on right = 1; all others = 0
e.g., 13.63 => 0.01 (unit of measurement)
0.01 / 2 = 0.005 (limit values)
13.63 + 0.005 = 13.635 (upper limit)
13.63 - 0.005 = 13.625 (lower limit)
some basic descriptive statistics
1) frequency: count
class = 20
13 women; 7 men
2) ratio: 13:7 women to men; DO NOT REDUCE
20: 5 do not reduce to 4:1
3) proportion: fraction 13/20 = 0.65 women
DO OUT THE DIVISION
4) percentage: proportion x 100 7/20 x 100 = 35% men
CHAPTERS 1 & 2 – HOMEWORK PART 2
NOT IN BOOK
1) What scale are these based on?
a. Your height
b. Your weight
c. Your occupation
d. How one course compares to another (better, worse)
2) Are these variables continuous or discrete?
a. Distance traveled
b. Time to complete a task
c. Votes cast for 3 candidates
d. Number of votes cast
3) Find true limits for
a. 5
b. 5.0
c. 5.00
d. 0.1
e. -10
f. 0.8
4) For the following data:
Area
Male
Female
400
300
Business Admin
50
150
Education
150
200
Humanities
250
300
Science
200
200
Social Science
a) Of the total # students what % is female?
b)For only the males, what % is found in each area?
c) Of those in business what % is female?
d)What % is male in science?
IN BOOK CHAPTER 2
2.7, 2.8, 2.9, 2.15 a-c, 2.16 a-b, 2.17 a-b, 2.18 a-c, 2.19 a-e
NOT IN BOOK ANSWERS
1. a) ratio b) ratio c) nominal d) ordinal
2. a) continuous b) continuous c) discrete d) discrete
3. a) 5
1/2 = 0.5
4.5 - 5.5
b) 5.0
0.1/2 = 0.05
4.95 - 5.05
c) 5.00 0.01/2 = 0.005 4.995 - 5.005
d) 0.1
0.1/2 = 0.05
0.05 - 0.15
e) (-10) ½ = 0.5
(-10.5) - (-9.5)
f) 0.8
0.1/2 = 0.05
0.75 - 0.85
4.
BA
E
H
S
SS
men
400
50
150
250
200
women
300
150
200
300
200
a) 1150/1150 + 1050 = 52.27%
b) BA: 400/1050 x 100 = 38.10%
E: 50/1050 x 100 = 4.76%
H: 150/1050 x 100 = 14.29%
S: 250/1050 x 100 = 23.81%
SS: 200/1050 x 100 = 19.05%
c) 300/700 x 100 = 42.86%
d) 250/550 x 100 = 45.45%
IN BOOK ANSWERS
2.7) gender of person present, gender of subject
2.8) amount of food eaten
2.9) Amount of food eaten depends on their gender as
well as gender of someone else present while eating
2.15)
a) 2.03, 1.05, 1.86
b) 14.82
c)
2.16)
2.17)
2.18)
2.19
10
Xi
i=1
a) 1.73, 1.56
b) 14.63
a) 219.63 & 23.22
b) 14.82/10 = 1.48
a) 214.04 & 22.45
b) [22.45 – [214.04/10]]/10-1 = 0.12
c) 0.35
a) 2.85, 1.06, 4.12, 1.75, 1.00, 1.15, 2.36, 3.22,
2.54, 2.70
b) 22.75
c) 14.82 * 14.63 = 216.82
d) yes & yes
e) [22.75 – [216.82/10]]/9 = 0.12
- I have 23,184 data points from my experiment - what do I
do with all that information?
- How do I present that information to someone else?
- Mitch got a 43 on the quiz – how did he do compared to
everyone else?
- Ann was told she scored at the 75th percentile on the GRE
exam – what does that mean?
1325.000
FN
one
1445.000
FP
one
2316.000
FP
one
1152.000
FN
one
1298.000
FN
one
1876.000
FP
one
945.000
FN
one
905.000
FN
one
675.000
FN
one
1273.000
TP
one
396.000
FN
one
1007.000
FN
one
1378.000
TP
one
1267.000
TP
one
1267.000
FN
one
945.000
TP
one
1432.000
TP
one
540.000
FN
one
1106.000
TP
one
1765.000
TP
one
1765.000
TP
one
1258.000
TP
one
1546.000
TP
one
1549.000
TP
one
734.000
TP
one
1653.000
TP
one
1289.000
TP
one
1569.000
TP
one
907.000
TP
one
2006.000
TP
one
1328.000
TP
one
1167.000
TP
one
2176.000
TP
one
1741.000
TP
one
1659.000
TP
one
1894.000
TP
one
1143.000
TP
one
1734.000
TP
one
1856.000
TP
one
2003.000
TP
one
1178.000
TP
one
1287.000
TP
one
1475.000
TP
one
1342.000
TP
one
1089.000
TP
one
967.000
FP
two
1976.000
TP
one
2108.000
TP
one
1263.000
FN
two
1386.000
FP
two
765.000
TP
one
1367.000
TP
two
890.000
FN
two
1492.000
FP
two
945.000
TP
two
1239.000
FN
two
1167.000
FP
two
824.000
TP
two
1643.000
TP
two
2076.000
FP
two
1428.000
TP
two
1128.000
TP
two
1750.000
FN
two
1184.000
TP
two
1378.000
TP
two
230.000
FN
two
1205.000
TP
two
1785.000
TP
two
1437.000
TP
two
1428.000
TP
two
1675.000
TP
two
2178.000
TP
two
947.000
TP
two
1429.000
TP
two
1856.000
TP
two
723.000
TP
two
1167.000
TP
two
298.000
TP
two
1132.000
TP
two
1745.000
TP
two
1429.000
TP
two
1639.000
TP
two
1067.000
TP
two
1763.000
TP
two
1174.000
TP
two
945.000
TP
two
1967.000
TP
two
1002.000
TP
two
1858.000
TP
two
3012.000
TP
two
1421.000
TP
two
1428.000
TP
two
1865.000
TP
two
1167.000
FP
three 1745.000
TP
two
670.000
TP
two
905.000
FN
three 2067.000
FP
three 1654.000
TP
two
1427.000
TP
three 1004.000
FN
three 1865.000
TP
two
1538.000
TP
three 1538.000
TP
three 1896.000
TP
two
1142.000
TP
three 1843.000
TP
three 1267.000
FP
three
1632.000
TP
three 1178.000
TP
three 2006.000
FP
three
1189.000
TP
three 1906.000
TP
three 1290.000
FN
three
564.000
TP
three 507.000
TP
three 543.000
FN
three
1195.000
TP
three 1427.000
TP
three 1100.000
FN
three
1427.000
TP
three 1778.000
TP
three 956.000
FN
three
1894.000
TP
three 1638.000
TP
three 1785.000
TP
three
792.000
TP
three 1324.000
TP
three 1098.000
TP
three
1063.000
TP
three 1756.000
TP
three 1278.000
TP
three
1217.000
TP
three 1542.000
TP
three 1850.000
TP
three
1853.000
TP
three 1008.000
TP
three 1645.000
TP
three
904.000
TP
three 1105.000
TP
three 1238.000
TP
three
1648.000
FP
four
788.000
TP
three 786.000
TP
three
1284.000
FP
four
1267.000
FP
four
1278.000
TP
three
1202.000
FN
four
1867.000
FN
four
1956.000
TP
three
2548.000
FN
four
238.000
FN
four
1673.000
TP
three
1732.000
TP
four
1427.000
TP
four
1978.000
TP
three
894.000
TP
four
1867.000
TP
four
2156.000
FP
four
1263.000
TP
four
2067.000
TP
four
967.000
FP
four
1048.000
TP
four
1967.000
TP
four
1785.000
FN
four
1723.000
TP
four
1754.000
TP
four
1267.000
FN
four
604.000
TP
four
1329.000
TP
four
906.000
FN
four
2004.000
TP
four
1867.000
TP
four
397.000
FN
four
793.000
TP
four
1540.000
TP
four
1056.000
FN
four
1174.000
TP
four
1756.000
TP
four
529.000
FN
four
1631.000
TP
four
1230.000
TP
four
567.000
TP
four
1060.000
TP
four
905.000
TP
four
1275.000
TP
four
1428.000
TP
four
1976.000
TP
four
1845.000
TP
four
956.000
TP
four
1056.000
TP
four
1834.000
TP
four
1639.000
FP
five
905.000
FP
five
1839.000
TP
four
1067.000
FN
five
1276.000
FN
five
2004.000
TP
four
1284.000
FN
five
670.000
FN
five
568.000
TP
four
954.000
TP
five
1078.000
FN
five
1745.000
TP
four
1743.000
TP
five
1649.000
TP
five
1954.000
TP
four
1184.000
TP
five
1978.000
TP
five
1789.000
FP
five
1630.000
TP
five
2005.000
TP
five
452.000
FN
five
1007.000
TP
five
1967.000
TP
five
1169.000
FN
five
584.000
TP
five
1286.000
TP
five
2006.000
FN
five
1639.000
TP
five
1095.000
TP
five
1759.000
FN
five
1075.000
TP
five
1745.000
TP
five
1278.000
TP
five
945.000
TP
five
2006.000
TP
five
1948.000
TP
five
1006.000
TP
five
670.000
TP
five
1739.000
TP
five
569.000
TP
five
1750.000
TP
five
1237.000
TP
five
1197.000
TP
five
2967.000
TP
five
187.000
TP
five
1143.000
TP
five
1756.000
TP
five
1854.000
TP
five
904.000
FP
six
1267.000
FP
six
2068.000
TP
five
1211.000
FN
six
905.000
FP
six
2178.000
TP
five
1406.000
FN
six
2078.000
FN
six
1762.000
TP
five
1134.000
TP
six
1956.000
FN
six
906.000
TP
five
783.000
TP
six
1328.000
TP
six
2170.000
TP
five
1290.000
TP
six
567.000
TP
six
3001.000
FP
six
1329.000
TP
six
1967.000
TP
six
1275.000
FP
six
605.000
TP
six
2865.000
TP
six
1967.000
FN
six
1468.000
TP
six
1856.000
TP
six
238.000
FN
six
1126.000
TP
six
459.000
TP
six
911.000
FN
six
1390.000
TP
six
1853.000
TP
six
1765.000
TP
six
685.000
TP
six
1953.000
TP
six
507.000
TP
six
1056.000
TP
six
1956.000
TP
six
1176.000
TP
six
1265.000
TP
six
2006.000
TP
six
1967.000
TP
six
2006.000
TP
six
1654.000
TP
six
1659.000
TP
six
1421.000
TP
six
609.000
TP
six
2002.000
TP
six
Chapter 3 - Frequency Distributions & Percentiles
- exploratory data analysis: ways to arrange & display #s to
quickly organize & summarize data
- grouping data
1) frequency distribution: high - low
pet type frequency
proportion
%
dog
20
0.43 (20/46)
43.00 (0.43 x 100)
cat
15
0.33
33.00
turtle
11
0.24
24.00
46
1.00
100.00
2) grouping in classes
a) aim for 12 - 15 groups
b) mutually exclusive
c) same width
d) don't omit intervals
e) make widths convenient
width = (hi - lo + 1) / # groups = i
example:
84
96
99
100
100
111
116
85
97
100
100
104
111
117
87
97
100
101
104
111
118
80
97
100
102
105
111
124
81
97
100
103
104
111
124
88
98
101
102
106
111
125
89
98
101
100
105
111
125
90
98
101
101
104
111
126
92
98
101
102
105
112
127
92
99
102
100
105
112
129
93
99
102
100
110
113
134
95
99
103
100
110
113
96
99
103
100
111
114
96
99
100
100
111
115
i = (134 - 80 + 1)/15 = 3.67 ~ 4
START AT BOTTOM WITH LOW #
Interval
132 - 135
128 - 131
124 - 127
120 - 123
116 - 119
112 - 115
108 - 111
104 - 107
100 - 103
96 - 99
92 - 95
88 - 91
84 - 87
80 - 83
True Limits
131.50 - 135.50
127.50 - 131.50
123.50 - 127.50
119.50 - 123.50
115.50 - 119.50
111.50 - 115.50
107.50 - 111.50
103.50 - 107.50
99.50 - 103.50
95.50 - 99.50
91.50 - 95.50
87.50 - 91.50
83.50 - 87.50
79.50 - 83.50
f
1
1
6
0
3
6
12
9
28
17
4
3
3
2
midpoint = (hi true + lo true) / 2
Midpoint
133.50
129.50
125.50
121.50
117.50
113.50
109.50
105.50
101.50
97.50
93.50
89.50
85.50
81.50
- cumulative data
class grades
f
91 - 100
6
81 - 90
4
71 - 80
9
61-70
11
51 - 60
2
32
cum f
32
26
22
13
2
cum prop
1.00
0.8125
0.6875
0.4062
0.0625
cum %
100.00
81.25
68.75
40.62
6.25
Percentiles & Percentile Ranks
- score alone means nothing, must compare to standard or
base score; can do with percentiles
- percentiles: #s that divide distribution into 100 = parts
- percentile rank: # that represents the % of cases in a
comparison group that achieved scores < the one cited
e.g., PR of 95 on SAT means 95% of those taking SAT at the
same time did worse than you & 5% did better
some symbols
cumfll = cum freq at lower true limit of X
X = score
Xll = score at lower true limit of X
i = width
fi = # cases in X's group
N = total # scores
1) Getting PR from score (X)
PR = cumfll + ((X - Xll)/i) (fi)
N
x
100
Class (X)
limits
f
cum f
cum %
93 - 95
90 - 92
87 - 89
84 - 86
81 - 83
78 - 80
92.50 - 95.50
89.50 - 92.50
86.50 - 89.50
83.50 - 86.50
80.50 - 83.50
77.50 - 80.50
4
3
2
7
6
3
25
21
18
16
9
3
100.00
84.00
72.00
64.00
36.00
12.00
What is PR of 88?
X = 88
cumfll = 16
Xll = 86.5
i=3
fi = 2
N = 25
PR = 16 + ((88 - 86.50) / 3) (2) x 100
25
PR = 68
NB: PR goes from 0 – 100
2) Getting score (X) from PR
cumf = (PR x N)/100
X = Xll + [ i (cumf - cumfll) / fi ]
Class (X)
limits
f
cum f
cum %
93 - 95
90 - 92
87 - 89
84 - 86
81 - 83
78 - 80
92.500 - 95.50
89.50 - 92.50
86.50 - 89.50
83.50 - 86.50
80.50 - 83.50
77.50 - 80.50
4
3
2
7
6
3
25
21
18
16
9
3
100.00
84.00
72.00
64.00
36.00
12.00
What is score for PR of 75?
cumf = 75 x 25 / 100 = 18.75
Xll = 89.5
i=3
cumf = 18.75
cumfll = 18
fi = 3
X = 89.5 + [ 3 (18.75 - 18) / 3 ] = 90.25
CHAPTER 3 HOMEWORK PART 1
NOT IN BOOK
1) Given the following set of data:
67
45
45
35
25
56
37
28
59
45
63
45
34
37
36
17
42
75
61
41
64
46
34
61
26
26
32
32
40
38
57
47
15
24
5
5
29
31
41
14
56
37
23
14
44
14
90
52
43
57
55
23
43
43
13
23
44
49
49
25
53
34
16
37
33
45
46
65
38
20
53
44
44
27
33
59
45
54
31
15
54
27
36
36
17
19
66
15
19
16
a. What is the class width if you want 18 groups?
b. Construct a frequency distribution
c. What is PR if X = 36?
d. What is X if PR = 98?
NOT IN BOOK ANSWERS
1 a) (90 - 5 + 1) / 18 = 4.7 ~ 5
b) group
limits
90 - 94
85 - 89
80 - 84
75 - 79
70 - 74
65 - 69
60 - 64
55 - 59
50 - 54
45 - 49
40 - 44
35 - 39
30 - 34
25 - 29
20 - 24
15 - 19
10 - 14
5-9
89.50 - 94.50
84.50 - 89.50
79.50 - 84.50
74.50 - 79.50
69.50 - 74.50
64.50 - 69.50
59.50 - 64.50
54.50 - 59.50
49.50 - 54.50
44.50 - 49.50
39.50 - 44.50
34.50 - 39.50
29.50 - 34.50
24.50 - 29.50
19.50 - 24.50
14.50 - 19.50
9.50 - 14.50
4.50 - 9.50
mdpt
92
87
82
77
72
67
62
57
52
47
42
37
32
27
22
17
12
7
f
cumf
cum%
1
0
0
1
0
3
4
7
5
11
11
10
9
8
5
9
4
2
90
89
89
89
88
88
85
81
74
69
58
47
37
28
20
15
6
2
100.00
98.89
98.89
98.89
97.78
97.78
94.44
90.00
82.22
76.67
64.44
52.22
41.11
31.11
22.22
16.67
6.67
2.22
c) what is PR if X = 36?
PR = 37 + ((36 - 34.50) / 5) (10) x 100
90
=
d) what is X if PR = 98?
cumf = 98 x 90 / 100 = 88.20
X = 74.50 + [ 5 (88.2 - 88) / 1 ] = 75.50
44.44
- What types of graphs are used most often in psychology?
- Are there rules for which one to use?
- Are there rules about how to make them?
- Does the shape of the graph mean anything useful?
Chapter 3 - Graphing
- visual methods to display data
a) figure: pictorial; photo, drawing
b) table: organized numerical info
c) graph: pictorial; axes, #s etc.
- basics of graphing
a) X-axis (abscissa): horizontal; IV
b) Y-axis (ordinate): vertical; DV
c) always label axes – note the units
d) Y starts at 0; continuous, no breaks
X can change start; break; can be discrete
e) Y about 0.75 length of X
Frequency
1) Bar Graph: nominal, sometimes ordinal
a) bar = category
b) height = frequency
c) bars DO NOT touch
d) if ordinal must preserve order
e) can be vertical or horizontal
20
18
16
14
12
10
8
6
4
2
0
Women
Men
DOG
CAT
FISH
TYPE OF PET
BIRD
Pet
Dog
Cat
Fish
Bird
w
20
15
8
5
m
10
15
5
14
2) Histogram: interval, ratio data, sometimes ordinal
a) same rules as bar only bars DO touch
b) usually for discrete data
25
Grade
F
D
C
B
A
Frequency
20
15
10
5
Freq
2
4
20
15
10
0
F
D
C
B
A
Grade
3) Frequency or Line graph: interval, ratio, sometimes
ordinal
a) usually for continuous data
Weight
56
57
58
59
60
7
Frequency
6
5
4
3
2
1
0
56
57
58
Weight
59
60
freq
2
2
4
6
5
3.5
3
2.5
2
1.5
1
0.5
0
Cum % cured
# cured
4) cumulative frequency: can be bar, histogram or line, but
uses cumulative freq, proportion or %
a) the line graph version is typically s-shaped or ogive
b) always increases
e.g., 12 people on a drug to cure disease X. Left = #
cured each time period. Right = cum % cured over time.
1
3
6
9
12
80
70
60
50
40
30
20
10
0
1
3
months on drug
6
months on drug
Forms of Frequency Curves
1) Normal (bell-shaped) curve: symmetric
a) mesokurtic: ideal
(middle)
b) leptokurtic: peaked
(leaping)
c) platykurtic: flat
(prairie)
2) skew: not symmetric
a) positive skew: fewer scores at high end;
shifted to left
b) negative skew: fewer scores at low end;
shifted to right
9
12
CHAPTER 3 HOMEWORK PART 2
IN THE BOOK
3.1 – plot the data provided assuming scores could have
decimals (even though not shown); also plot the top row as
“passage” and bottom as “non-passage” groups where the xaxis is called subject and there are 14 subjects in each group
3.22 – plot the data provided using total households only
3.23 – plot the data using total # births only
NOT IN THE BOOK
1) Draw a graph showing
a. positive skew
b. Negative skew
c. Normal distribution
d. Platykurtic distribution
e. Leptokurtic distribution
IN THE BOOK ANSWERS
3.1
60
score
50
40
Pasage
30
non-passage
20
10
0
1
3
5
7
9
11
13
subject
3.20
12000
10000
8000
# 6000
4000
2000
0
white
black
na
hispanic
1982
1991
2005
asian
foreign
Year
3.22
1960
100000
1970
80000
1975
60000
1980
40000
1985
#
1987
20000
1988
0
1989
Year
1990
NOT IN THE BOOK ANSWERS
a)
b)
c)
d)
e)
z = (X - X) / s = (X - ) / 

SIR = (Q3 - Q1) / 2
X = X / n

s3 = [3(X - median)] / s
Range = hi - lo
Xw = fX / ntot

s4 = 3 + [ (Q3 - Q1) / 2 (P90 - P10)]
md = Xll + i [ ((N/2) - cumfll) / fi]
s2 = (X - X)2 / n
SS = X2 - (X)2/n
s = s2
s2 = SS/n
s =  s2
- Sid wants to know what is the average age of people in the
mall before the stores open?
- Dr. Smith has 4 classes each with a different number of
pupils. He has the average grade on the last quiz for each of
the 4 classes but wants to know the overall average.
- If we include all the billionaires in the calculation of the
average US income will it be inflated because of the few very
high values? Is there a better measure than the mean?
Chapter 4 - Central Tendency
A) Arithmetic Mean (average): X = X/n
4 + 2 + 6 + 4 + 5 = 21 21/5 = 4.20 = X
1) from ungrouped frequency distribution:
X
10
9
f
4
2
fX
40
18
8
7
6
5
6
2
5
1
20
48
14
30
5
155
X = fX/n
X = 155/20 = 7.75
2) Weighted Mean: mean of a group of means
e.g., 4 classes with mean exam scores of 75, 78, 72, 80. What
is the overall or grand mean?
a) if each class has same # of people:
(75 + 78 + 72 + 80)/4 = 76.25
b) if each class has different # people must account for it
class X
75
78
72
80
F
30
40
25
50
145
fX
2250
3120
1800
4000
11170
Xw = fX/Ntot
Xw = 11170/145 = 77.03
B) Median: midpoint of a distribution of scores so ½ fall
above & ½ fall below = 50th percentile
1) for continuous scores
md = Xll + i [ ((N/2) - cumfll) / fi]
true limits
68.50 - 71.50
65.50 - 68.50
62.50 - 65.50
59.50 - 62.50
56.50 - 59.50
53.50 - 56.50
f
13
15
20
28
19
6
cumf
101
88
73
53
25
6
1) to find box = N/2
101/2 = 50.50
find 50.5 in cumf column
md = 59.50 + 3 [((101/2) - 25) / 28 ] = 62.23
Good for skewed, truncated & open-ended distributions
- truncated: use only part of the distribution
- open-ended: top or bottom category has only 1 limit
e.g., 68.50 + for top category
< 53.50 for bottom category
2) median for arrays of scores
a) if N is odd => put in ascending order, find middle #
56, 6, 13, 31, 28 => 6, 13, 28, 31, 56
b) if N is even => ascending order, take X of 2 middle #s
6, 13, 28, 31, 56, 72 => (28 + 31) / 2 = 29.50
c) N is even but middle 2 #s are the same => use formula
1, 2, 4, 6, 6, 6, 7, 121
x
121
7
6
4
2
1
f
1
1
3
1
1
1
cumf
8
7
6
3
2
1
8/2 = 4 => box
md = 5.5 + 1 [ ((8/2) - 3) / 3] = 5.83
C) Mode: most common score; crude measure
1) 1, 3, 4, 6, 7, 7, 7, 9, 9 mode = 7
2, 2, 4, 9, 9
mode = 2, 9
2)
class
68.5 - 71.5
65.5 - 68.5
62.5 - 65.5
59.5 - 62.5
f
10
15
9
10
1) find highest f value
2) report midpoint as mode
mode = (68.5 + 65.5) / 2 = 67
- Which to use?
1) mode: quick & easy but crude; not unique - can have 2+
2) median: skewed, truncated, open-ended
3) mean: most common, normal distributions
some properties of the mean
a) summed deviations = 0
 (X - X) = 0
X
X-X
4
4 - 5.5 = -1.5
3
3 - 5.5 = -2.5
9
9 - 5.5 = 3.5
6
6 - 5.5 = 0.5
0
b) sensitive to extreme values (skew)
2, 3, 5, 7, 8 X = 5 md = 5
2, 3, 5, 7, 33 X = 10 md = 5
c) can't use with open-ended distribution
Mean, Median & Skew relationship
a) mean > median => positive skew
b) mean < median => negative skew
c) mean = median => no skew
CHAPTER 4 HOMEWORK
IN THE BOOK
4.1
NOT IN THE BOOK
1) Find mean, median & mode for
a. 10, 8, 6, 0, 8, 3, 2, 5, 8, 0
b. 119, 5, 4, 4, 4, 3, 1, 0
2) Find the weighted mean for:
Person X items sold
Amy
1.75
Bob
2.0
Carrie
2.4
Diana
2.5
Elyssa
2.0
Fred
1.67
# days
4
5
5
4
3
3
CHAPTER 4 – HOMEWORK ANSWERS
IN THE BOOK
4.1) mean: 1193/17 = 70.18
Median:
55, 56, 56, 59, 66, 66, 71, 71, 72, 72, 72, 72, 73, 73, 75, 91, 93
Mode: 72
NOT IN THE BOOK
1a) 0, 0, 2, 3, 5, 6, 8, 8, 8, 10
X = 50/10 = 5; mode = 8; md = (5 + 6) / 2 = 5.50
b) 119, 5, 4, 4, 4, 3, 1, 0 X = 140/8 = 17.50; mode = 4
X
f
cumf
119
1
8
8/2 = 4
5
1
7
4
3
6 md = 3.5 + 1 [ ((8/2) - 3) /3] = 3.83
3
1
3
1
1
2
0
1
1
18) X
1.75
2.0
2.4
2.5
2.0
1.67
f
4
5
5
4
3
3
24
fX
7
10
12
10
6
5.01
50.01
Xw = 50.01/24 = 2.08
- Al calculated the average height of people in a random sample to
figure out how high he should make the pull-down security bars on
a new roller coaster. He says the average height is 5’10” but his boss
says not everyone is 5’10”. He wants to know about what height to
expect – what is the dispersion or spread of heights?
Betty graphs data she collected on frequency of failing grades for
grammar school students as a function of tv shows watched and
finds a very peaked graph shifted to the left. She knows it’s
leptokurtic and skewed but can she attach values to say how
leptokutic and how skewed?
Chapter 5 – Dispersion/Variability
- dispersion: spread or variability of scores around central
tendency measure
1) range: hi score - lo score 11, 17, 9, 3, 20, 36 36 - 3 = 33
2) semi-interquartile range (SIR) or Q2: use with median;
median + SIR cuts off middle 50% of scores
SIR = Q2 = (Q3 - Q1) / 2
PR
90
75
50
35
25
10
X
80
70
40
30
10
5
Q3 = score at 75th PR
Q1 = score at 25th PR
SIR = Q2 = (70 - 10) / 2 = 30
3) variance or mean square (s2 or 2) & standard deviation
or root mean square (s or )
a) use with mean
b) can use to compare distributions
c) quite precise
d) used in statistical tests later on
e) large values = high error, low precision
small values = low error, high precision
1) Mean Deviation Method: long, but shows how scores vary
from the mean
s2 = (X - X)2 / n = SS/n
X
65
90
84
76
81
98
82
59
X-X
-14.375
10.625
4.625
-3.375
1.625
18.625
2.625
-20.375
0
s = s2
(X - X)2
206.64
n=8
X = 79.375
112.89
21.39
s2 = 1123.87/8 = 140.48
11.39
2.64
s =  140.48 = 11.85
346.89
6.89
415.14
1123.87 = SS
2) Raw Score Method: easier; less intuitive about mean
SS = X2 - (X)2/n
s2 = SS/n
s =  s2
X
X2
65
4225
90
8100
SS = 51527 - (635)2/8 = 1123.875
84
7056
76
5776
s2 = 1123.875/8 = 140.48
81
6561
98
9604
s =  140.48 = 11.85
82
6724
59
3481
635
51527
- homogeneous sample: data values similar => low s2 & s
- heterogeneous sample: data values dissimilar => high s2 & s
- Pearson's Coefficient of Skew: + or - and how much

s3 = [3(X - median)] / s

X = 20 s = 5 md = 24 s3 = [ 3(20 - 24)] / 5 = -2.40
Generally + 0.5 is ~ symmetrical/normal
- Kurtosis: peaked or flat

s4 = 3 + [ (Q3 - Q1) / 2 (P90 - P10)]
P90 = score at 90th PR
X
100
90
70
40
20
5
PR
90
75
60
50
25
10
P10 = score at 10th PR

s4 = 3 + [ (90 - 20) / 2 (100 - 5)] = 3.37
3 = mesokurtic
< 3 = platykurtic
> 3 = leptokurtic
CHAPTER 5 HOMEWORK
IN THE BOOK
5.1
a) use top row of numbers only & the mean deviation method
b) use middle row of numbers & the raw score method
NOT IN BOOK
PR
100
90
75
60
50
35
25
10
5
X
90
85
70
50
40
20
10
5
2
X = 30 s = 5 md = 25
1) Find SIR
2) Find SKEW
3) Find KURTOSIS
CHAPTER 5 HOMEWORK ANSWERS
IN BOOK
1a) X
54
52
51
50
36
55
44
46
57
44
43
52
X-X
(X - X)2
5.33
28.41
3.33
11.09
X = 48.67 n = 12
2.33
5.43
1.33
1.77
s2 = 410.68/12 = 34.22
-12.67 160.53
6.33
40.07
s =  34.22 = 5.85
-4.67
21.81
-2.67
7.13
range = 57 – 36 = 21
8.33
69.39
-4.67
21.81
-5.67
32.15
3.33
11.09
-0.04* 410.68
* ~ 0 – not exact because of rounding
b)
X
38
46
55
34
44
39
43
36
55
57
36
46
529
X2
1444
2116
3025
1156
1936
1521
1849
1296
3025
3249
1296
2116
24029
SS = 24029 - (529)2/12 =708.92
s2 = 708.92/12 = 59.08
s =  59.08 = 7.69
range = 57 – 34 = 23
NOT IN BOOK
PR
100
90
75
60
50
35
25
10
5
X
90
85
70
50
40
20
10
5
2
X = 30 s = 5 md = 25
SIR = (70 - 10)/2 = 30

s3 = [3(30 - 25)]/5 = 3

s4 = 3 + [(70 - 10)/2(85 - 5)] = 3.38
- Is there a simpler method to examine percentile ranks and
compare values other than the PR formula?
- Mitch has the mean and standard deviation values for a
quiz that a class just took. He also has his grade on the quiz.
How can he determine how many people did worse than him
and how many did better?
- If you know a country club takes people whose income is in
the top 5% of the city and you know the average income of
the city and standard deviation, can you use your income to
figure out if you can get in the club?
Chapter 6 - z-scores or standard scores
- z-score: represents distance between score & mean relative
to s
1) can use to compare 2 different variables because z-scores
are abstract #s without units
2) if scores are normally distributed can relate directly to PR
via the "Standard Normal Distribution" = a theoretically
ideal normal distribution where:
 = 0  = 1 total area under curve = 1.0 or 100%
50% =>

<= 50%
-
+
below the mean
above the mean
68.26%
95.44%
99.74%
-4
-3
-2
-1
0
1
2
3
4

3) when you transform data to z-scores
a) mean = 0
b) sum of squared z-scores = n
c) s = 1
z = (X - X)/s
z = (X - )/
sample
population
e.g., for IQ  = 100  = 15; someone got an IQ of 130
z = (130 - 100)/15 = +2.00
so are 2 standard deviations above the mean
e.g., when 2 scores come from different distributions is hard
to compare; z-scores let you do it
psych  = 50  = 10
bio
 = 48  = 4
Bob got a 60 on psych & 56 on bio; for which course should
he expect a better grade?
Psych z = (60 - 50)/10 = +1.00
Bio z = (56 - 48)/4 = +2.00 would expect better grade bio!!!
e.g., of properties
ht
6'
5'
5'
6'
7'
5.80
0.75
5
ht z-score
0.27
-1.1
-1.1
0.27
1.6
0
1
5
ht z2
0.0729
1.21
1.21
0.729
2.56
wt z2 wt
0.0961
0.6084
2.0736
0.2704
1.9321
z-score
0.31
-0.78
-1.44
0.52
1.39
0
1
5
wt
200 lb
150 lb
120 lb
210 lb
250 lb
186.00
45.87
5
X
S
N
5
5

5
5
=======================================================
1) assume X = 650  = 600  = 100. What % did worse than
X?
z = (650 - 600) / 100 = 0.50
Table A page 548 - 549
Column a = z-score
Column b = area between  & z
Column c = area beyond z
0
0.5
Area between = 0.1915 so 0.1915 + 0.5 = 0.6915 = 69.15%
did worse or PR = 69.15
2) X = 400  = 600 = 100. What % did worse?
z = (400 - 600) / 100 = -2
-2
Area beyond = 0.0228 = 2.28% did worse or PR = 2.28
0
3) What % of cases fall between X = 650 and X = 400 if
 = 600  = 100?
z = (650 - 600) / 100 = 0.5
z = (400 - 600) / 100 = -2
0.1915 + 0.4772 = 0.6687 = 66.87%
-2
0.5
0
0.5
4) What % fall between X = 700 and X = 800 if  = 600
 = 100?
z = (700 - 600) / 100 = 1 z = (800 - 600) /100 = 2
0.4772 - 0.3413 = 0.1359 = 13.59%
0
1
2
RULE: ++ or -- => subtract column b
+=> add column b
5) Suppose a golf club takes only top 3% of population in
income where  = 500k  = 25k. You make 520k. Can you
get in?
0.03 or 3%
column c gives beyond so
find 0.03 in c & get z that goes
with it z = 1.88
0
0.5
?
so.... 1.88 = (X - 500) / 25
(1.88) (25) = X - 500
(1.88) (25) + 500 = X
X = 547K
so you cannot get in!!!
6) Suppose  = 600  = 100, what is the score at the 60th
percentile? 0.40
Column c => 0.4013 => z = 0.25
above
So … 0.25 = X – 600/100
0.25 (100)
= X – 600
7)
0.25 (100) + 600 = X
0
?
X = 625
7) Suppose  = 600  = 100, between what scores do the
middle 30% lie?
0.15
0.15
?
0
Column b => 0.15 =>
+/- 0.39
0.39 = X - 600/100 = 639
-0.39 = X – 600/100 = 561
?
8) Suppose  = 600  = 100, beyond what scores do the
most extreme 20% lie?
Column c => 0.10 =>
+/- 1.28
0.10
0.10
1.28 = X – 600/100 = 728
?
0
?
-1.28 = X – 600/100 = 472
CHAPTER 6 - HOMEWORK
NOT IN BOOK
1) You have a normal distribution based on 1000 scores
with a mean of 50 and sd of 10.
a. find proportion of area & # cases between the mean
and 60
b. find percent of area & # cases between the mean and
25
c. find proportion of area & # cases above 70
d. find percent of area & # cases above 45
e. find proportion of area & # cases between 60 and 70
f. find percent of area & # cases between 45 and 70
2) You have a normal distribution with a mean of 72 and
sd of 12.
a. What is the score at the 25th percentile?
b. Between what scores do the middle 50% of cases lie?
c. Beyond what scores do the most extreme 10% of
cases lie?
CHAPTER 6 – HOMEWORK ANSWERS
1. a) (60 - 50) / 10 = 1
0.3413 x1000 = 341.3 cases
0
1
b) (25 - 50) / 10 = -2.5
0.4938 x 100 = 49.38%
0.4938 x 1000 = 493.8 cases
-2.5
c) (70 - 50) / 10 = 2
0.0228 x 1000 = 22.8 cases
0
d) (45 - 50) / 10 = -0.5
0.6915 x 100 = 69.15%
0.6915 x 1000 = 691.5 cases
-0.5
0
2
0
e) (60 - 50) / 10 = 1
(70 - 50) / 10 = 2
0.4772 - 0.3413 = 0.1359 x 1000 = 135.9 cases
f) (45 - 50) / 10 = -0.5
(70 - 50) / 10 = 2
0.4772 + 0.1915 = 0.6687 x 100 = 66.87%
0.6687 x 1000 = 668.7 cases
0
-0.5
0
1
2
2
2. a)
0.25
-0.67 = (X - 72) / 12
X = 63.96
? = -0.67
b)
0.25
? = -0.68
0.68 = (X - 72) / 12
X = 80.16
0.25
0
-0.68 = (X - 72) / 12
X = 63.84
? = 0.68
0.05
0.05
c)
1.64 = (X - 72) / 12
X = 91.68
? = -1.64
? = 1.64
-1.64 = (X - 72) / 12
X = 52.32
sesty = sy  [N ( 1 - r2)] / (N - 2)
by = (r) (sy/sx)
a = Y - byX
r = (zxzy) / N
Y = a + byX
rs = 1 - [ (6D2) / [N (N2 - 1)]]
zy' = (r)(zx)
Y' = Y + (zy')(sy)
Y' = Y + [ (r)(sy/sx)(X - X)]
r=
XY - [(X)(Y) / N]
 [X2 - [(X)2 /N]] [Y2 - [(Y)2 / N]]
1 = r2 + k 2
- Sue wants to know if there is a relationship between how
well students do on a quiz and how much test anxiety they
report prior to taking it.
- Bill has teachers rank their students by how popular they
think they are and then wants to know if there is a
relationship between the popularity ranks and the students’
GPA.
- Sandy wants to know if there is a relationship between
number of depressed people and SES.
Chapter 9 - Correlation
- correlation: relationship between 2 variables
- correlation coefficient: measure used to express extent or
strength of relationship
1) positive correlation: 0 < r < 1; score high on 1 variable &
score high on the other; score low on 1 variable score &
score low on the other; positive slope; 1.0 = perfect
correlation
2) negative correlation: -1 < r < 0; score high on 1 variable &
score low on the other; negative slope; -1.0 = perfect
correlation
positive
negative
3) 0 = no correlation, no linear relationship
4) looking for a linear relationship - others exist (e.g., ushaped), but correlation only measures linear
5) correlation = causation
6) |r| < 0.29
small correlation, weak relationship
|r| 0.3 - 0.49 medium correlation / relationship
|r| 0.5 - 1.0 large correlation, strong relationship
- scatter diagram: graphic means to show data points &
correlation & (later) regression
- centroid: X, Y point (
)
Ht
2
4
5
9
5
12
10
wt
8
6
4
2
Wt
3
7
10
11
7.75 mean
0
0
2
4
6
8
10
ht
1) Pearson r: for interval & ratio data
a) z-score method
r = (zxzy) / N
N = # pairs
X
Zx
Y
Zy
ZxZy
1
3
5
7
9
11
13
-1.5
-1
-0.5
0
0.5
1
1.5
4
7
10
13
16
19
22
-1.5
-1.0
-0.5
0
0.5
1.0
1.5
2.25
1
0.25
0
0.25
1
2.25
=7
r = 7/7 = 1.00
Good if already have z-scores, otherwise is a pain!
If already have info: (zxzy) = 4.90 N = 7, then 4.9/7 = 0.70
then it's easy.
2) Raw Score Method
r=
XY - [(X)(Y) / N]
 [X2 - [(X)2 /N]] [Y2 - [(Y)2 / N]]
numerator = covariance: degree to which 2 variables share
common variance; high covariance = more linear, closer to
+1 low covariance = less linear, closer to 0
X2
1
9
25
49
81
121
169
455
X
1
3
5
7
9
11
13
49
r=
Y
7
4
13
16
10
22
19
91
Y2
49
16
169
256
100
484
361
1435
XY
7
12
65
112
90
242
247
775
X = 49
X2 = 455
Y = 91
Y2 = 1435
XY = 775
N=7
(X)2 = 2401
(Y)2 = 8281
775 - [(49)(91) / 7]
 [455 - [2401/7]] [1435 - [8281 / 7]]
r = + 0.82
N.B. can get negative on top but not on bottom
- If r = + 1 all data fall in a line; if |r| < 1 data are scattered.
There are 3 types of variation:
total
=
explained (r2)
+
unexplained (k2)
if r = + 1 all is explained; if r = 0 all is unexplained
a) r2 = coefficient of determination: proportion of 1 variable
explained by the other
b) k2 = coefficient of non-determination: proportion of 1
variable not explained by the other
total = 1 or 100% so....
1 = r2 + k2
=> k2 = 1 - r2
e.g., r = 0.84 r2 = 0.71 k2 = 1 - 0.71 = 0.29
- cautions with Pearson r
1) measures linearity so low r means not linear; could still
have a non-linear relationship
2) distribution need not be normal but must be unimodal
3) of truncated will get spuriously low r
2) Spearman r: with ordinal data; rs
a) both variables must be rank ordered
b) non-parametric test: looks at ranks only (parametric
uses actual #s)
rs = 1 - [ (6D2) / [N (N2 - 1)]]
D = 0 N = # pairs
D = rank X - rank Y
X
140
120
136
100
129
125
rank X
1
5
2
6
3
4
Y
63
70
72
69
65
71
rank Y
6
3
1
4
5
2
D
-5
2
1
2
-2
2
0
D2
25
4
1
4
4
4
42
rs = 1 - [ (6  42) / [6 ( 36 - 1)]] = - 0.20
- Tied Scores: if tied must take this into account to be fair
X
140
120
136
100
120
125
rank X
1
4
2
6
5
3
adjusted rank X
1
4.5
(4 + 5) / 2 = 4.50
2
6
take mean of tied ranks
4.5
assign mean rank
3
- Correlation matrix: table to visualize many correlations
kindergarten
kinder
-----grammar
-----high
-----college
------
grammar
0.93
-------------
high
0.74
-0.63
-----------
college
0.61
-0.54
0.36
------
e.g., what 2 groups correlate the most? Grammar & kindergarten
e.g., which 2 groups correlate the least? High school & college
e.g., what is the correlation between grammar & high? -0.63
CHAPTER 9 HOMEWORK
IN THE BOOK
9.1 a) using Benin Rep to Ghana only for InfMort (x) and Income (y) only
9.2 using same as above but by hand (not SPSS); also find r2 & k2
NOT IN BOOK
1) Use Spearman Rank to find the correlation coefficient
% recall
% recognition
Sleepy
86
91
Dopey
81
95
Grumpy
75
86
Sneezy
78
93
Doc
58
80
Happy
62
70
Bashful
38
84
2) RANK ORDER THESE
a) X
7
4
6
7
9
4
2
b)
X
76
79
81
76
63
28
c)
X
-41
-38
-42
-41
-26
-26
-41
CHAPTER 9 - HOMEWORK ANSWERS
income
9.1a)
7000
6000
5000
4000
3000
2000
1000
0
40
60
80
100
120
infmort
9.2) X
104
109
80
102
110
91
76
113
61
61
907
X2
10816
11881
6400
10404
12100
8281
5776
12769
3721
3721
85869
r=
1311052 - [(907  16554) / 10]
Y
933
965
1573
1166
850
1654
880
628
6024
1881
16554
Y2
870489
931225
2474329
1359556
722500
2735716
774400
394384
36288576
3538161
50089336
XY
97032
X = 907
105185
X2 = 85869
125840
Y = 16554
118932
Y2 = 50089336
93500
XY = 1311052
150514
(X)2 = 822649
66880 (Y)2 = 274034916
70964
N = 10
367464
114741
1311052
 [ 85869 - (822649 /10)] [ 50089336 - (274034916 / 10)]
1311052 – 1501447.8 =
√(3604.1)(22685845)
r = - 0.67
r2 = -0.672 = 0.45 k2 = 1 - 0.45 = 0.55
-190395.8
285940.65
NOT IN BOOK
1) % recall
86
81
75
78
58
62
38
rank recall
1
2
4
3
6
5
7
% recog.
91
95
86
93
80
70
84
rank recog.
3
1
4
2
6
7
5
D2
4
1
0
1
0
4
4
14
rank
3.5
2
1
3.5
5
6
X
-41
-38
-42
-41
-26
-26
-41
rs = 1 - [ ( 6  14) / [7 (49 - 1)]] = 0.75
2)
X
7
4
6
7
9
4
2
rank
2.5
5.5
4
2.5
1
5.5
7
2)
X
76
79
81
76
63
28
3)
rank
5
3
7
5
1.5
1.5
5
- Joe has a set of data correlating number of books read per month with age.
He wants to plot these data on a graph and draw a line to show the general
linear trend of the data.
- Carol has a set of data on height as a function of how many grams of
protein children had on average per day. She then wants to predict the
height of an individual assuming they had 10 grams of protein on average
per day.
Chapter 10 - Regression
- regression: allows you to predict relationships
- remember Y = mX + b as the equation for a line? We rewrite it in regression analysis as Y = a + byX
X, Y = variables
by = slope (m) (tilt)
a = y-intercept (b) (where it hits y-axis)
a) if r = + 1 it's easy to predict & draw the line
if r < + 1 you must draw a "best fit" line
b) some properties of the regression line
1) squared deviations around line are minimal
2) sum deviations = 0
3) new symbols X' & Y' for predictions
- To find regression line equation:
by = (r) (sy/sx)
X
1
2
3
4
5
3
1.41
Y
5 r = -1.0
4
3
2
1
3
mean
1.41 s
a = Y - byX
Y = a + byX
by = (-1)(1.41/1.41) = -1
a = 3 - (-1)(3) = 6
Y = 6 + (-1) X
leave X & Y as letters
- To Draw the regression line for Y = 6 + (-1) X
1) pick 2 reasonable values for X
2) put in equation & solve for Y
3) plot the 2 pairs of X,Y points
4) connect the dots with a line
centroid
6
If X = 5 Y = 6 + (-1)(5) = 1
If X = 1 Y = 6 + (-1)(1) = 5
5
Y
4
3
2
1
0
0
2
4
X
- In regression analysis you can also find X = a + bxY and
get 2 regression lines that have certain relationship
r=1
r = 0.75
r = + 1 => superimposed
r = 0.25
r = 0 => perpendicular
intersection point = X,Y the centroid
r=0
6
- To predict Y if know X
Y' = Y + [ (r)(sy/sx)(X - X)]
Given: X = 70 sx = 4 Y = 75 sy = 8 r = 0.6 If Sue got a 62 on X
what did she get on Y?
Y' = 75 + [ (0.6) (8/4) (62 - 70) ] = 65.40
- If you have z-scores
zy' = (r)(zx)
Y' = Y + (zy')(sy)
Given: X = 62 X = 70 sx = 4 zx = -2 Y = 75 sy = 8 r = 0.6
a) zy' = (0.6) (-2) = -1.20
b) Y' = 75 + (-1.2)(8) = 65.40
- Standard Error of the Estimate (sesty): estimate of the
standard deviation of data around the regression line; k2 was
a version of this but not really in terms of standard deviation
sesty = sy  [N ( 1 - r2)] / (N - 2)
r = + 1 => sesty = 0 no errors / deviation
r = 0 => sesty is maximal
Given: X = 70 sx = 4 Y = 75 sy = 8 N = 20 r = 0.60
sesty = 8  [ 20 (1 - 0.62)] / (20-2) = 6.75
Larger sesty => less accurate predictions
- recall: Y' was a prediction not a fact. Using sesty we can
find an interval where are 68% sure that the true Y will be
Ytrue = Y' + sesty  1 + (1/N) + [(X - X)2 / SSx]
Sesty & Ytrue are influenced by magnitude of X & Y
variance: low variance => better / lower sesty => better Ytrue
- Homoscedasticity: where variance of 1 variable is constant
at all levels of the other variable
- Heteroscedasticity: where variance of 1 variable is not
constant at all levels of the other variable
Homoscedasticity
Heteroscedasticity
- Post-Hoc Fallacy: assuming a cause & effect relationship
from correlation data
CHAPTER 10 HOMEWORK
IN THE BOOK
10.1 using Y & X1 where mean Y = 6.7, s = 0.70; mean X =
46, s = 6.29, r = 0.62 (also plot the regression line), 10.2, 10.3
NOT IN BOOK
X = 20 Sx = 5 X = 24 Zx = 0.8 Y = 50 Sy = 7 r = 0.7
a) Zy’ = ?
b) Y’ = ?
CHAPTER 10 – HOMEWORK ANSWERS
10.1) by = (0.62)(0.70 / 6.29) = 0.07
a = 6.7 - (0.07)(46) = 3.48
Y = 3.48 + 0.07 X
Y = 0.70 + (0.07)(1) = 3.55 Y = 3.48 + (0.07) (3) = 3.69
4
y
3
2
1
0
1
3
x
10.2) sesty = 0.7  [10(1 - 0.622)] / (10 - 2) = 0.61
10.3) Y' = 6.7 + (0.62)(0.70 / 6.29)(70 - 46) = 8.36
NOT IN BOOK
a) zy' = (0.7)(0.8) = 0.56
b) Y' = 50 + (0.56)(7) = 53.92
2 =  [(Oi - Ei)2 / Ei]
df = (r - 1)( c - 1)
est 2 = (t2 - 1) / (t2 + N1 + N2 - 1)


2
est  = [SSbet - (k - 1)(s w)] / (SStot + s2w)
OR
2
est  = [dfbet(F - 1)] / [dfbet (F - 1) + Ntot]
2
x = / N
sx = s /  N - 1
z = (X - ) / x

HSD = q s2w / n
upper limit = X + (t 0.05)(sx)
lower limit = X - (t 0.05)(sx)
t = (X - ) / sx df = N - 1
SS1 = X12 - [(X1)2 / N1]
SS2 = X22 - [(X2)2 / N2]
Sx1x2 =  [(SS1 + SS2) / (N1 + N2 - 2)][(1/N1) + (1/N2)]
t = [(X1 - X2) - (1 - 2)] / sx1x2
df = N1 + N2 - 2
SStot = Xtot2 - [(Xtot)2 / Ntot]
dfw = Ntot - k
SSbet = [(Xi)2 /Ni] - [(Xtot)2 / Ntot]

s2bet = SSbet / dfbet

s2w = SSw / dfw
SSw = SStot - SSbet

dfbet = k - 1
F=s

2
bet /
s2w
- Are there any underlying concepts that guide our choice of
statistical tests?
- Are there standards that we can compare our results to in
order to see if there are statistically significant differences?
- Are we always right or are there errors we should be aware
of?
Chapter 8 - Inferential Statistics & Errors
- goal: estimate parameters of pop. from descriptive stats;
compare 2+ groups of data
1) hypothesis testing: compare samples for differences
- Step #1 = formulate all hypotheses
1) typically have experimental & control groups:
manipulated vs comparison groups respectively
2) hypotheses
a) null hypothesis (H0): expect no difference
b) alternative hypothesis (H1): expect a difference
1) 1-tailed / directional: states how they differ (<, >)
2) 2-tailed / non-directional: just states they differ
- Step #2 = conduct the study, collect the data, generate
summary statistics (e.g., mean, SD, etc.)
- Step #3 = choose appropriate statistical test (i.e., formulas)
that will assess the evidence (data) against the null
hypothesis by generating a test statistic = a single number
that assesses the compatibility of the data with H0
- Step #4 = generate the p-value = the likelihood/probability
that the result observed is due to random occurrence if H0 is
correct or if H0 is true what is the probability of observing a
test statistic as extreme as the one obtained in #3? p-values
typically generated by statistical software packages
- Step #5a (using software) = compare p-value to a fixed
significance level () that the scientific community agrees
that there is statistical significance (most common = 0.05 &
0.01): Rule: p <  => reject H0 p >  => accept H0
 = 0.05 p = 0.03 reject H0, are different
 = 0.01 p = 0.06 accept H0, no different
- Step #5b (by hand) =
a) each statistical test is associated with a theoretical
distribution of values (sampling distribution) of what would
happen (theoretically) if every sample of a particular size
were studied (i.e., what test statistic would you expect for a
given sample size)
b) when you generate a test statistic (using a formula)
you can then go to a table with the sampling distribution and
for a given -level & sample size find what test statistic value
would expect if H0 is true – if your test statistic > table value
reject H0 = there is a statistically significant difference
- Central Limit Theorum (CLT): method to construct a
sampling distribution of the population mean, providing a
way to test H0; assumes that if random samples of fixed N
from any pop. are drawn & X calculated then:
1) distribution of means becomes normal
2) grand mean approaches mean of pop.
3) standard deviation decreases
- standard error of the means: the overall standard deviation
of the sample means
Since all of this is based on probabilities there is always the
risk that you can make an error in your decisions.
- decision errors
a) Type I (): reject H0 when it's true
b) Type II (): accept H0 when it's false
your
decision
accept
Ho
reject
Ho
true status of null
Ho true
H0 false
correct
II / 
(1 - )
I/
correct
(1 - )
-  = 0.05 2-tail p = 0.03 1-tail H0: false
0.03 x 2 = 0.06 p >  => accept H0 => Type II
-  = 0.05 1-tail p = 0.06 2-tail H0: true
0.06 / 2 = 0.03 p <  => reject H0 => Type I
-  = 0.05 1-tail p = 0.03 1-tail H0: false
p <  => reject H0 => correct
Rule: always fix the p-value
CHAPTER 8 - HOMEWORK
NOT IN BOOK
1) For the following decide accept/reject then state if there is an
error and if it is Type I or II

p
H0
0.01 1-tail
0.008 1-tail T
0.05 2-tail
0.08 2-tail T
0.05 1-tail
0.06 1-tail F
0.02 1-tail
0.03 2-tail F
0.01 2-tail
0.006 1-tail T
CHAPTER 8 – HOMEWORK ANSWERS
1a) p <  => reject => Type I
b) p >  => accept => correct
c) p >  => accept => Type II
d) p <  => reject => correct
e) p >  => accept => correct
- John has access to all the records for inductees into the US Army
since it began and knows the average IQ and standard deviation for
this population. He has a group of new inductees and wants to know
if their average IQ differs significantly from past years.
- Kelly knows that sampling errors always exist so the sample mean
will not exactly match the true population mean. Can she determine
a range of values that will cover the true mean with some degree of
confidence?
Chapter 12 - Single Sample Tests
1) z-test: know  & X
x = / N
z = (X - ) / x
x = standard error of the mean
e.g.,  = 250  = 50 X = 263 N = 100 do the means differ?
Use  = 0.01 2-tailed
x = 50 /  100 = 5 z = (263 - 250) / 5 = 2.60
from z-table:  at 0.05 reject if |z| > 1.96
 at 0.01 reject if |z| > 2.58
so....2.60 > 2.58 => reject null - they differ
Rule: test statistic > table value => reject null
Note: you are now getting the actual test statistic not pvalue! Alpha guides you to a place in the table to decide if
test statistic is < or > that criterion. Computers provide pvalue along with answers.
2) Student's t-test: , X & s known
sx = s /  N - 1
t = (X - ) / sx
df = N - 1
e.g., X = 85.1 s = 9.61 N = 10  = 72 do the means differ?
Use  = 0.01 1-tailed
sx = 9.61 /  10 - 1 = 3.2 t = (85.1 - 72) / 3.2 = 4.09
df = 10 - 1 = 9
go to t-table page 551
1) choose 1-tail or 2-tail row
2) get  for that row
3) find df = degrees of freedom = # of values free to vary
after certain restrictions placed on data (reflection of sample
size)
so...... 4.09 > 2.821 => reject null, they differ
df: # independent scores; e.g., if X = 4.5 & n = 4 and you
know 3 of scores are 3, 4 & 5. Total scores must = 18 since
18/4 = 4.5. so last number must be 6.
a) confidence limits for X: range of values representing
probability that more samples drawn from pop. will fall
within it
95% limits
99% limits
upper limit = X + (t 0.05)(sx)
upper limit = X + (t 0.01)(sx)
lower limit = X - (t 0.05)(sx)
lower limit = X - (t 0.01)(sx)
e.g., X = 108 s = 15 N = 26 sx = 3 df = 25
upper = 108 + (2.06)(3) = 114.18
lower = 108 - (2.06)(3) = 101.82
95% limits
t-table  at 0.05 ALWAYS 2-TAILED
upper = 108 + (2.787)(3) = 116.36
lower = 108 - (2.787)(3) = 99.64
99% limits
t-table  at 0.01 ALWAYS 2-TAILED
NB: 95% limits are "tighter" than 99%
99
95
99
101
108
114
116
CHAPTER 12 – HOMEWORK
IN THE BOOK
12.11) where X = 3.01, s = 7.18, n = 29; use  = 5.06;  = 0.05
2-tailed
12.12) compute both 95% & 99% confidence limits
NOT IN THE BOOK
1) Using the same data from 12.11 but this time s = unknown
and  = 5.06,  = 7.18;  = 0.01 2-tailed
CHAPTER 12 - HOMEWORK
12.11)  = 0 s = 7.18 n = 29 X = 3.01  = 0.05 2-tailed
sx = 7.18 /  29-1 = 1.36 t = (3.01 – 0) / 1.36 = 2.22
df = 28 2.22 > 2.048 => reject H0
12.12)
upper = 3.01 + (2.048)(1.36) = 5.79
lower = 3.01 - (2.048)(1.36) = 0.22
upper = 3.01 + (2.763) (1.36) = 6.77
lower = 3.01 - (2.763)(1.36) = -0.75
NOT IN BOOK
1)  = 5.06 n = 29 X = 3.01  = 7.18  = 0.01 2-tailed
x = 7.18 /  29 = 1.33 z = (3.01 – 5.06) / 1.33 = -1.54 < 2.58 =>
accept H0
- Andy has two groups of rats and wants to see if what he feeds
them affects how fast they run through a maze. One group gets
mashed protein bars to eat and the other gets mashed bananas. He
runs them through the maze and times them. The protein group
runs it in 6.5 seconds on average and the banana group runs it in
10.3 seconds. Is there a significant difference?
- Is there a way to estimate the degree to which the IV really
contributes to the effect seen on the DV?
Chapter 14 - 2-Sample Tests
- Student's t-test for unknown population
SS1 = X12 - [(X1)2 / N1]
SS2 = X22 - [(X2)2 / N2]
Sx1x2 =  [(SS1 + SS2) / (N1 + N2 - 2)][(1/N1) + (1/N2)]
t = [(X1 - X2) - (1 - 2)] / sx1x2
** 1 - 2 = 0 **
df = N1 + N2 - 2
e.g.,
X1 = 477
X12 = 29845
X1 = 59.63
N1 = 8
X2 = 11
X22 = 101
X2 = 5.5
N2 = 2
 = 0.05
1-tail
SS1 = 29845 - [(4772)/8] = 1403.88
SS2 = 101 - [(112)/2] = 40.50
Sx1x2 =  [(1403.88 + 40.50) / (8 + 2 - 2)] [ (1/8) + (1/2)] = 10.62
t = (59.63 - 5.50) / 10.62 = 5.10 > 1.86 => reject H0
df = 8 + 2 - 2 = 8
- est 2 (omega-squared): many things contribute to p-level
and whether you accept of reject the null; one is 2 or
degree to which IV accounts for variance in DV - how much
are the 2 variables related?
est 2 = (t2 - 1) / (t2 + N1 + N2 - 1)
- interpret like r2 - higher 2 means have significant findings
e.g., t = 5.097 in previous problem
est 2 = (5.0972 - 1) / (5.0972 + 8 + 2 - 1) = 0.714
IV accounts for 71.4% of variance in DV - fairly significant
Can follow this with the confidence limits
CHAPTER 14 HOMEWORK
14.11 use  = 0.05 2-tailed; also find est2
CHAPTER 14 – HOMEWORK ANSWERS
14.11)
X1 = 169
X12 = 3297
X1 = 18.78
N1 = 9
X2 = 141
X22 = 2607
X2 = 17.63
N2 = 8
 = 0.05
2-tailed
SS1 = 3297 - (1692) / 9 = 123.56
SS2 = 2607 - (1412) / 8 = 121.88
Sx1x2 =  [(123.56 + 121.88) / (9 + 8 - 2)] [(1/9) + 1/8)] = 1.96
t = (18.78 – 17.63) / 1.96 = 0.59 < 2.131 => accept H0
df = 15
2 = (0.592 - 1) / (0.592 + 9 + 8 - 1) = -0.04
- June has a new drug to control the number of manic episodes
patients experience each month, but she is not sure of the most
effective dose. She gets 30 manic patients and divides them
randomly into 3 groups. She gives one group a low dose, one group
a medium dose and one group a high dose of the drug. She then
monitors them for one month, recording the number of manic
episodes they experience. Group 1 has an average of 6 episodes,
group 2 has 3, and group 3 has 5. Do they differ significantly in
their effect on the number of manic episodes?
- Exactly which doses differ from each other?
Chapter 16 - Analysis of Variance (ANOVA)
- omnibus test: permits analysis of several variables or
variable levels at the same time
- one-way ANOVA: analysis of various levels or categories of
single treatment variables
- why not do lots of t-tests? Will give experimentwise errors =
drive up probability of making Type I errors
ANOVA: divide total variance into between & within
subjects variance
s2
30.1
42.6
40.9
Rat
1
2
3
test 1
6.3
8.2
7.1
test 2
1.3
2.4
1.9
test 3
14.6
18.2
17.3
X
7.4
9.6
8.8
X
S2
7.2
0.61
1.9
0.20
16.7
2.34
within subject
variances
Between subject variances
- ANOVA is based on the General Linear Model: a conceptual
mathematical model
Xij = + i + ij <= random error or error variance
e.g., blood pressure study: do the 3 means differ?  = 0.05
active (X1)
passive (X2) relaxed (X3)
totals
X
1407
1303
1308
4018
X2
99723
85479
86254
271456
X
70.35
65.15
65.40
-------N
20
20
20
60
Step 1: add across all rows to get totals; then do equations
1)
SStot = Xtot2 - [(Xtot)2 / Ntot]
271456 - [(40182) / 60] = 2383.94
2) SSbet = [(Xi)2 /Ni] - [(Xtot)2 / Ntot]
14072 + 13032 + 13082 - 40182
20
20
20
60
i = individual
= 344.04
3) SSw = SStot - SSbet
2383.94 - 344.04 = 2039.90
4)
dfbet = k - 1
3-1=2
5) dfw = Ntot - k
60 - 3 = 57
k = # conditions

6) s2bet = SSbet / dfbet

s2bet = MSbet
344.04 / 2 = 172.02

7) s2w = SSw / dfw

s2w = MS w
2039.9 / 57 = 35.79
 
8) F = s2bet / s2w
172.02 / 35.79 = 4.81
9) F-table on page 558 - 560
- across top = dfbet
- down left = dfw
- light # =  at 0.05
- bold # =  at 0.01
df = 2,57
2,60 at 0.05 = 3.15
so...... 4.81 > 3.15 => reject H0 the 3 means do differ
- F was an omnibus test - it just says the 3 means differ but not
which ones; need follow-up tests to determine this
a) a priori: decide prior to study what tests or comparisons will
do; planned
b) a posteriori or post hoc: do all possible pair-wise
comparisons; not planned
- Tukey HSD (Honestly Significant Difference) Test (post hoc)

HSD = q  s2w / n
1) prepare a means table
70.35
65.15
65.40
70.35
----------------
65.15
5.20*
---------
65.40
4.95*
-0.25
------
2) do HSD test
HSD = 3.40  35.79 / 20 = 4.54
q comes from table L on page 562 using dfw & k
Any of the difference values (| |) in the table > to HSD value get an *
meaning they differ significantly.
- est 2: degree of association IV & DV


est 2 = [SSbet - (k - 1)(s2w)] / (SStot + s2w)
est 2 = [344.04 - (3 - 1)(35.79)] / (2383.94 + 35.79) = 0.11
OR
est 2 = [dfbet(F - 1)] / [dfbet (F - 1) + Ntot]
est 2 = [2(4.81 - 1)] / [2 (4.81 - 1) + 60] = 0.11
CHAPTER 16 HOMEWORK
16.21 use  = 0.05; also find est2; also create a means table
& find which means differ using  = 0.05
CHAPTER 16 – HOMEWORK ANSWERS
16.21
X
X2
X
n
X1
433
15519
28.87
15
X2
599
29595
39.93
15
X3
713
36897
47.53
15
totals
1745
82011
---45
SStot = 82011 - (17452) / 45 = 14343.78
SSbet = 4332 + 5992 + 7132 - 17452
15
15
15
45 = 2643.38
SSw = 14343.78 – 2643.38 = 11700.40
dfbet = 3 - 1 = 2
dfw = 45 - 3 = 42

s2bet = 2643.38/2 = 1321.69

s2w = 11700.40/42 = 278.58
F = 1321.69/278.58 = 4.74
4.74 > 3.22 => reject H0
est 2 = [2643.38 - (3 - 1)(278.58)] / (14343.78 + 278.58) = 0.14
28.87
28.87 --39.93 --47.53 ---
39.93
-11.06
-----
47.53 HSD = 3.44278.58/15 = 14.82
-18.66*
-7.6
---
- Ed polls a random sample of people by phone to see how much
they agree with the statement that the president is doing a good job:
very good, good, neutral, poor, very poor. Is there a difference in
the frequency with which people give responses for the different
categories?
- Kathy wants to know if people will help someone more or less as a
function of gender of the person needing help. She has Bob & Ann
pretend to drop a bag of groceries on a busy street and records how
many times people stop to help either one of them. Was there a
significant difference in helping versus non-helping for Bob vs
Ann?
Chapter 19 - Chi-Squared Test (2)
- nonparametric: does not require normality
- 2: typically with frequencies or proportions from nominal data
1) one-variable X2 or "goodness of fit"
2 =  [(Oi - Ei)2 / Ei]
strong
agree
7
agree
12
undecided
13
O = observed data
E = expected data
i = individual
strong
disagree disagree
13
10
expected = total answers / # categories = 55/5 = 11
X2 = (7 - 11)2 + (12 - 11)2 + (13 - 11)2 + (13 - 11)2 + (10 - 11)2
11
11
11
11
11
= 2.3
df = N - 1
(n = # categories) df = 5 - 1 = 4
X2 table on page 572
 at 0.05 => 9.488
2.3 < 9.488 => accept H0 no difference
2) multi-variable X2: same formula but different way to get
expected
drug
placebo
get better
a 1
get worse
b 17
18
c
d 12
21
29
39
9
10
1) label boxes a - d
fe = fcfr/n
2) find expected values
a) (18/39) (10) = 4.6
b) (18/39) (29) = 13.4
c) (21/39) (10) = 5.4
d) (21/39) (29) = 15.6
3) use X2 formula
a
b

x
c
d
(1 - 4.6)2 + (17 - 13.4)2 + (9 - 5.4)2 + (12 - 15.6)2
4.6
13.4
5.4
15.6
df = (r - 1)(c - 1)
= 7.09
r = # rows c = # columns
df = (2 - 1)(2 - 1) = 1
7.09 > 6.635 => reject H0 they differ
CHAPTER 19 – HOMEWORK
19.1, 19.8
CHAPTER 19 – HOMEWORK ANSWERS
19.1)
25
32
10
67/3 = 22
use  = 0.05
(25 - 22)2 + (32 - 22)2 + (10 - 22)2
22
22
22
= 11.51
df = 3 - 1 = 2 11.51 > 5.991 => reject H0
19.8)
use  = 0.01
Smoke
Non-Smoke
Total
1
29
198
227
a) (100/586)(227) = 38.74
b) (100/586)(123) = 20.99
c) (100/586)(236) = 40.27
d) (486/586)(227) = 188.26
e) (486/586)(123) = 102.01
f) (486/586)(236) = 195.73
2
16
107
123
3
55
181
236
Total
100
486
586
(29 – 38.74)2 + (16 – 20.99)2 + (55 – 40.27)2 + (198 – 188.26)2
38.74
20.99
40.27
188.26
+
2
(107 - 102.01)
(181 – 195.73)2
102.01
+
195.73
= 2.45 + 1.19 + 5.39 + 0.50 + 0.24 + 1.11 = 10.88
df = (2 - 1)(3 - 1) = 2
10.88 > 5.991 => reject H0 they differ
Download