PSYCH 230 – STATISTICS 1) If you are already registered sit down. 2) If you are on the waiting list or just showed up, stay standing and we will see how many seats are available. 3) We will start adding using the waiting list. PSYCHOLOGY 230 - STATS Elizabeth Krupinski, PhD Depts. Radiology & Psychology 112 Radiology Research Building 626-4498 krupinski@radiology.arizona.edu http://radiology.arizona.edu/krupinski/psychology-230measurement-statistics • • • • • • • • North on Cherry Left on Drachman First right = Ring Road but no signs Around bend Lot #1 (blue) on right Driveway into fence on right Radiology Research Bldg Room 112 Rad Res 112 Ring Road Drachman Speedway CAMPUS PREREQUISITES 1) Psych 101 or IND 101 2) Math 110 – college algebra +, x, -, ÷, √, , , | | positive vs negative numbers order of operations rounding: < 5 down, > 5 up decimals: 2 places on quizzes QUIZZES 4 quizzes - each 25% of your grade - 100 points each - all of them count (none dropped) ~ 1/3 fill-in-the-blank - comprehension of concepts - ability to apply principles, terms, etc. ~ 2/3 problems - ability identify appropriate equations - ability carry out required math - ability use statistical tables - ability reach proper conclusions formulas & tables provided on quizzes EXTRA CREDIT Assignments from Aplia 15 POINTS MAXIMUM!!!!!! Final grade = (4 quiz grades + extra credit)/4 TEXTS Class notes: buy in the bookstore (required) http://radiology.arizona.edu/sites/radiology.ari zona.edu/files/u3/notes2013.pdf Book: Fundamental Statistics for the Behavioral Sciences 8th Ed 2014 David C. Howell Wadsworth Cenage Learning CALCULATORS DO NOT FORGET TO BRING YOUR CALCULATOR TO THE QUIZZES!!!!!! Required: +, -, x, , Helpful: X (sometimes )- mean S (SD) - standard deviation (sometimes is ) X - sum X X2 - sum X squared N or n – number BASIC MATH REVIEW 2+2=4 2 + (-2) = 0 (-2) + (-2) = (-4) 2x2=4 2 x (-2) = (-4) (-2) x (-2) = 4 2–2=0 2 – (-2) = 4 (-2) – (-2) = 0 2/2 = 1 2/(-2) = (-1) (-2)/(-2) = 1 22 = 4 (-2)2 = 4 √4 = 2 √(-4) = error ALSO REFER TO APPENDIX A IN THE BOOK - + + + _ _ + - GRAPHING QUADRANTS true limits = + / - ½ the unit of measurement i = (hi - lo + 1) / # groups midpoint = (hi true + lo true) / 2 PR = cumfll + ((X - Xll) / i)(fi) N x 100 cumf = (PR x N) / 100 X = Xll + [[i (cumf - cumfll)] / fi] cumfll = cum freq at lower true limit of X X = score Xll = score at lower true limit of X i = width fi = # cases in X's group N = total # scores - Sam wants to find out if the number of hours people study has any effect on their grade. - Mary wants to find out if gender has any influence on math and verbal SAT scores. - Dr. Jones wants to find out if her current class performs any differently on the final compared to all past students. - A large pharmaceutical company wants to know if their new drug for controlling OCD is effective. Chapters 1 & 2: Intro & Basics - statistics: the process of collecting data & making decisions based on the analysis of these data descriptive inferential (generalize) Common Terms - constant: # representing a construct that does not change (e.g., ); we will see these in some formulas - variable: measurable characteristic that changes with person, environment, experiment e.g., height, IQ, learning (X or Y) - independent variable (IV): variable examined to determine its effect on outcome of interest (DV); under control of experimenter - manipulated variable; e.g., dose of a drug - dependent variable (DV): outcome of interest measured to assess effects of IV; not under experimenter control; e.g., how a person reacts to the drug - subject or organismic variable: naturally occurring IV; characteristic of people but not controlled e.g., eye color, gender - data: numbers, measurements collected - population: complete set of people/objects having some common characteristic - parameter: value summarizing characteristic of population; are constants; use Greek letters to represent - sample: subset of population, share same characteristics - statistic: value summarizing characteristic of a sample; are variable; use Roman letters to represent - simple random sample: subset of population selected so that each population member has = & independent chance of being chosen - random assignment: assign subjects to treatments in = & independent manner to avoid bias - confounding: where DV is affected by variable related to IV so can't assume that IV causes DV effects Group 1 Lecture 3x/week vs Taught by Dr. Smith Group 2 lecture 2x/week Lab 1x/week Taught by Dr. Jones Results: group #2 performs better on final exam Conclude: lecture + lab > lecture alone WRONG!!!! Confounded by different teachers as well as format differences CHAPTERS 1 & 2 – HOMEWORK PART 1 NOT IN BOOK 1) Indicate whether each is a statistic, data, or inference. a. Sample of 250 workers earn an average of $13,887 b. Based on sample of 500 workers in Tucson it is believed that average income of all workers is $21,564 c. A series a pitches go 98, 93 and 100 mph d. Ann’s tuition was $13,788 and Bud’s was $14,986 e. Based on a survey it is believed that 33,566,876 people watched last year’s Super Bowl 2) Indicate whether each is a variable or constant a. Number days in July b. Number shares traded on NYSE on different days c. Age freshman entering college d. Time to complete an assignment e. Age someone eligible to vote in national election f. Scores on a 100 point quiz g. Amount money spent on textbooks by students 3) What is the difference between sample and population? a. Can a population have only 20 subjects? 4) A researcher studies risk taking behavior by taking a random sample of male undergrads at a large university. She gives them a standardized test to assess this behavior. Based on the study would like to make inferences about other male undergrads at the university. a. Are the students i. Population, statistic, parameter or sample b. Is measured risk behavior i. Statistic, variable, parameter or sample c. The individual scores obtained are i. Data, sample, statistics or population d. The average score of the sample are a i. Parameter, statistic, variable, or data e. When we generalize from the sample to make inferences about i. Parameter, variable, data or population f. The average for all undergrads would be a i. Parameter, variable, data or population 5) Classify each as manipulated, subject or not a variable a. Amount drug used in a study b. Value pi c. Number days in a week d. Diagnostic categories patients in a study e. Gender f. Amount reinforcement g. Method instruction h. Hours food deprivation i. Scores on a test j. Mood of subjects NOT IN BOOK ANSWERS 1. a. statistic b. inference c. data d. data e. inference 2. a. constant b. variable c. variable d. variable e. constant f. variable g. variable 3. all vs subset & yes 4. a. sample b. variable c. data d. statistic e. populations f. parameter 5. a. manipulated b. not variable c. not variable d. subject variable e. subject variable f. manipulated g. manipulated h. manipulated i. subject variable j. subject variable - Fred wants to find out what types of pets college students have. - Alice wants to find out if birth order has any effect on GPA. - Mike wants to look at temperature effects on ice cream consumption. - Sally wants to see how fast rats run through a maze as a function of reward type at the end. - Rick wants to examine how many kids people have today compared to 50 years ago. - Mary wants to examine how tall people are compared to 50 years ago. Basic Concepts - X or Y: symbol for a variable - Xi or Yi: represents individual observation - N or n: # data points in a set, number - : indicates summation EXAMPLES (X = group 1 kids, y = group 2 kids) X1 = 4 X2 = 6 X3 = 1 X4 = 5 X5 = 2 X6 = 3 Y1 = 3 Y2 = 4 Y 3 = 6 Y4 = 1 6 a) Xi = 1 + 5 + 2 + 3 = 11 i=3 Where you stop Where you start 3 b) Yi = 3 + 4 + 6 = 13 i=1 6 * c) Xi2 = 52 + 22 + 32 = 25 + 4 + 9 = 38 i=4 NOT THE SAME !!!! 6 * d) ( Xi)2 = (5 + 2 + 3)2 = 102 = 100 i=4 N e) Xi = 6 + 1 + 5 + 2 + 3 = 17 i=2 N = go to the end; use all #s from start point types of measurement scales (like inches vs cm) a) nominal: qualitative (name); mutually exclusive without logical order (cat, dog, fish) b) ordinal: mutually exclusive with logical rank ordering (<,>) (1st grade, 2nd grade; captain, major, colonel) c) interval: quantitative with = units of measurement and arbitrary (imaginary) zero point (thermometer, calendar); = intervals between objects represent = differences (differences are meaningful – diff between 10 & 20 deg same as 80 & 90) d) ratio: quantitative with = units of measurement and absolute (real) zero point (height, weight, length) (ratios are meaningful) some more terms - reliability: degree to which repeated measurements in same conditions give same results - measurement error: uncontrolled recording error - validity: accuracy test/measure actually measures thing of interest - discontinuous (discrete) variables: only whole #s allowed e.g., # kids - continuous variables: any values allowed a) true limits: #s that limit where true value lies + / - ½ the unit of measurement - to get unit of measurement 1) no decimals: # by which set increases e.g., 3,4,5,6 => unit = 1 ½ = 0.5 (limit value) 3 + 0.5 = 3.5 (upper limit) 3 - 0.5 = 2.5 (lower limit) 5,10,15,20 => unit = 5 5/2 = 2.5 (limit value) 10 + 2.5 = 12.5 (upper limit) 10 - 2.5 = 7.5 (lower limit) 2) decimals: a) anything to left = 0 b) last # on right = 1; all others = 0 e.g., 13.63 => 0.01 (unit of measurement) 0.01 / 2 = 0.005 (limit values) 13.63 + 0.005 = 13.635 (upper limit) 13.63 - 0.005 = 13.625 (lower limit) some basic descriptive statistics 1) frequency: count class = 20 13 women; 7 men 2) ratio: 13:7 women to men; DO NOT REDUCE 20: 5 do not reduce to 4:1 3) proportion: fraction 13/20 = 0.65 women DO OUT THE DIVISION 4) percentage: proportion x 100 7/20 x 100 = 35% men CHAPTERS 1 & 2 – HOMEWORK PART 2 NOT IN BOOK 1) What scale are these based on? a. Your height b. Your weight c. Your occupation d. How one course compares to another (better, worse) 2) Are these variables continuous or discrete? a. Distance traveled b. Time to complete a task c. Votes cast for 3 candidates d. Number of votes cast 3) Find true limits for a. 5 b. 5.0 c. 5.00 d. 0.1 e. -10 f. 0.8 4) For the following data: Area Male Female 400 300 Business Admin 50 150 Education 150 200 Humanities 250 300 Science 200 200 Social Science a) Of the total # students what % is female? b)For only the males, what % is found in each area? c) Of those in business what % is female? d)What % is male in science? IN BOOK CHAPTER 2 2.7, 2.8, 2.9, 2.15 a-c, 2.16 a-b, 2.17 a-b, 2.18 a-c, 2.19 a-e NOT IN BOOK ANSWERS 1. a) ratio b) ratio c) nominal d) ordinal 2. a) continuous b) continuous c) discrete d) discrete 3. a) 5 1/2 = 0.5 4.5 - 5.5 b) 5.0 0.1/2 = 0.05 4.95 - 5.05 c) 5.00 0.01/2 = 0.005 4.995 - 5.005 d) 0.1 0.1/2 = 0.05 0.05 - 0.15 e) (-10) ½ = 0.5 (-10.5) - (-9.5) f) 0.8 0.1/2 = 0.05 0.75 - 0.85 4. BA E H S SS men 400 50 150 250 200 women 300 150 200 300 200 a) 1150/1150 + 1050 = 52.27% b) BA: 400/1050 x 100 = 38.10% E: 50/1050 x 100 = 4.76% H: 150/1050 x 100 = 14.29% S: 250/1050 x 100 = 23.81% SS: 200/1050 x 100 = 19.05% c) 300/700 x 100 = 42.86% d) 250/550 x 100 = 45.45% IN BOOK ANSWERS 2.7) gender of person present, gender of subject 2.8) amount of food eaten 2.9) Amount of food eaten depends on their gender as well as gender of someone else present while eating 2.15) a) 2.03, 1.05, 1.86 b) 14.82 c) 2.16) 2.17) 2.18) 2.19 10 Xi i=1 a) 1.73, 1.56 b) 14.63 a) 219.63 & 23.22 b) 14.82/10 = 1.48 a) 214.04 & 22.45 b) [22.45 – [214.04/10]]/10-1 = 0.12 c) 0.35 a) 2.85, 1.06, 4.12, 1.75, 1.00, 1.15, 2.36, 3.22, 2.54, 2.70 b) 22.75 c) 14.82 * 14.63 = 216.82 d) yes & yes e) [22.75 – [216.82/10]]/9 = 0.12 - I have 23,184 data points from my experiment - what do I do with all that information? - How do I present that information to someone else? - Mitch got a 43 on the quiz – how did he do compared to everyone else? - Ann was told she scored at the 75th percentile on the GRE exam – what does that mean? 1325.000 FN one 1445.000 FP one 2316.000 FP one 1152.000 FN one 1298.000 FN one 1876.000 FP one 945.000 FN one 905.000 FN one 675.000 FN one 1273.000 TP one 396.000 FN one 1007.000 FN one 1378.000 TP one 1267.000 TP one 1267.000 FN one 945.000 TP one 1432.000 TP one 540.000 FN one 1106.000 TP one 1765.000 TP one 1765.000 TP one 1258.000 TP one 1546.000 TP one 1549.000 TP one 734.000 TP one 1653.000 TP one 1289.000 TP one 1569.000 TP one 907.000 TP one 2006.000 TP one 1328.000 TP one 1167.000 TP one 2176.000 TP one 1741.000 TP one 1659.000 TP one 1894.000 TP one 1143.000 TP one 1734.000 TP one 1856.000 TP one 2003.000 TP one 1178.000 TP one 1287.000 TP one 1475.000 TP one 1342.000 TP one 1089.000 TP one 967.000 FP two 1976.000 TP one 2108.000 TP one 1263.000 FN two 1386.000 FP two 765.000 TP one 1367.000 TP two 890.000 FN two 1492.000 FP two 945.000 TP two 1239.000 FN two 1167.000 FP two 824.000 TP two 1643.000 TP two 2076.000 FP two 1428.000 TP two 1128.000 TP two 1750.000 FN two 1184.000 TP two 1378.000 TP two 230.000 FN two 1205.000 TP two 1785.000 TP two 1437.000 TP two 1428.000 TP two 1675.000 TP two 2178.000 TP two 947.000 TP two 1429.000 TP two 1856.000 TP two 723.000 TP two 1167.000 TP two 298.000 TP two 1132.000 TP two 1745.000 TP two 1429.000 TP two 1639.000 TP two 1067.000 TP two 1763.000 TP two 1174.000 TP two 945.000 TP two 1967.000 TP two 1002.000 TP two 1858.000 TP two 3012.000 TP two 1421.000 TP two 1428.000 TP two 1865.000 TP two 1167.000 FP three 1745.000 TP two 670.000 TP two 905.000 FN three 2067.000 FP three 1654.000 TP two 1427.000 TP three 1004.000 FN three 1865.000 TP two 1538.000 TP three 1538.000 TP three 1896.000 TP two 1142.000 TP three 1843.000 TP three 1267.000 FP three 1632.000 TP three 1178.000 TP three 2006.000 FP three 1189.000 TP three 1906.000 TP three 1290.000 FN three 564.000 TP three 507.000 TP three 543.000 FN three 1195.000 TP three 1427.000 TP three 1100.000 FN three 1427.000 TP three 1778.000 TP three 956.000 FN three 1894.000 TP three 1638.000 TP three 1785.000 TP three 792.000 TP three 1324.000 TP three 1098.000 TP three 1063.000 TP three 1756.000 TP three 1278.000 TP three 1217.000 TP three 1542.000 TP three 1850.000 TP three 1853.000 TP three 1008.000 TP three 1645.000 TP three 904.000 TP three 1105.000 TP three 1238.000 TP three 1648.000 FP four 788.000 TP three 786.000 TP three 1284.000 FP four 1267.000 FP four 1278.000 TP three 1202.000 FN four 1867.000 FN four 1956.000 TP three 2548.000 FN four 238.000 FN four 1673.000 TP three 1732.000 TP four 1427.000 TP four 1978.000 TP three 894.000 TP four 1867.000 TP four 2156.000 FP four 1263.000 TP four 2067.000 TP four 967.000 FP four 1048.000 TP four 1967.000 TP four 1785.000 FN four 1723.000 TP four 1754.000 TP four 1267.000 FN four 604.000 TP four 1329.000 TP four 906.000 FN four 2004.000 TP four 1867.000 TP four 397.000 FN four 793.000 TP four 1540.000 TP four 1056.000 FN four 1174.000 TP four 1756.000 TP four 529.000 FN four 1631.000 TP four 1230.000 TP four 567.000 TP four 1060.000 TP four 905.000 TP four 1275.000 TP four 1428.000 TP four 1976.000 TP four 1845.000 TP four 956.000 TP four 1056.000 TP four 1834.000 TP four 1639.000 FP five 905.000 FP five 1839.000 TP four 1067.000 FN five 1276.000 FN five 2004.000 TP four 1284.000 FN five 670.000 FN five 568.000 TP four 954.000 TP five 1078.000 FN five 1745.000 TP four 1743.000 TP five 1649.000 TP five 1954.000 TP four 1184.000 TP five 1978.000 TP five 1789.000 FP five 1630.000 TP five 2005.000 TP five 452.000 FN five 1007.000 TP five 1967.000 TP five 1169.000 FN five 584.000 TP five 1286.000 TP five 2006.000 FN five 1639.000 TP five 1095.000 TP five 1759.000 FN five 1075.000 TP five 1745.000 TP five 1278.000 TP five 945.000 TP five 2006.000 TP five 1948.000 TP five 1006.000 TP five 670.000 TP five 1739.000 TP five 569.000 TP five 1750.000 TP five 1237.000 TP five 1197.000 TP five 2967.000 TP five 187.000 TP five 1143.000 TP five 1756.000 TP five 1854.000 TP five 904.000 FP six 1267.000 FP six 2068.000 TP five 1211.000 FN six 905.000 FP six 2178.000 TP five 1406.000 FN six 2078.000 FN six 1762.000 TP five 1134.000 TP six 1956.000 FN six 906.000 TP five 783.000 TP six 1328.000 TP six 2170.000 TP five 1290.000 TP six 567.000 TP six 3001.000 FP six 1329.000 TP six 1967.000 TP six 1275.000 FP six 605.000 TP six 2865.000 TP six 1967.000 FN six 1468.000 TP six 1856.000 TP six 238.000 FN six 1126.000 TP six 459.000 TP six 911.000 FN six 1390.000 TP six 1853.000 TP six 1765.000 TP six 685.000 TP six 1953.000 TP six 507.000 TP six 1056.000 TP six 1956.000 TP six 1176.000 TP six 1265.000 TP six 2006.000 TP six 1967.000 TP six 2006.000 TP six 1654.000 TP six 1659.000 TP six 1421.000 TP six 609.000 TP six 2002.000 TP six Chapter 3 - Frequency Distributions & Percentiles - exploratory data analysis: ways to arrange & display #s to quickly organize & summarize data - grouping data 1) frequency distribution: high - low pet type frequency proportion % dog 20 0.43 (20/46) 43.00 (0.43 x 100) cat 15 0.33 33.00 turtle 11 0.24 24.00 46 1.00 100.00 2) grouping in classes a) aim for 12 - 15 groups b) mutually exclusive c) same width d) don't omit intervals e) make widths convenient width = (hi - lo + 1) / # groups = i example: 84 96 99 100 100 111 116 85 97 100 100 104 111 117 87 97 100 101 104 111 118 80 97 100 102 105 111 124 81 97 100 103 104 111 124 88 98 101 102 106 111 125 89 98 101 100 105 111 125 90 98 101 101 104 111 126 92 98 101 102 105 112 127 92 99 102 100 105 112 129 93 99 102 100 110 113 134 95 99 103 100 110 113 96 99 103 100 111 114 96 99 100 100 111 115 i = (134 - 80 + 1)/15 = 3.67 ~ 4 START AT BOTTOM WITH LOW # Interval 132 - 135 128 - 131 124 - 127 120 - 123 116 - 119 112 - 115 108 - 111 104 - 107 100 - 103 96 - 99 92 - 95 88 - 91 84 - 87 80 - 83 True Limits 131.50 - 135.50 127.50 - 131.50 123.50 - 127.50 119.50 - 123.50 115.50 - 119.50 111.50 - 115.50 107.50 - 111.50 103.50 - 107.50 99.50 - 103.50 95.50 - 99.50 91.50 - 95.50 87.50 - 91.50 83.50 - 87.50 79.50 - 83.50 f 1 1 6 0 3 6 12 9 28 17 4 3 3 2 midpoint = (hi true + lo true) / 2 Midpoint 133.50 129.50 125.50 121.50 117.50 113.50 109.50 105.50 101.50 97.50 93.50 89.50 85.50 81.50 - cumulative data class grades f 91 - 100 6 81 - 90 4 71 - 80 9 61-70 11 51 - 60 2 32 cum f 32 26 22 13 2 cum prop 1.00 0.8125 0.6875 0.4062 0.0625 cum % 100.00 81.25 68.75 40.62 6.25 Percentiles & Percentile Ranks - score alone means nothing, must compare to standard or base score; can do with percentiles - percentiles: #s that divide distribution into 100 = parts - percentile rank: # that represents the % of cases in a comparison group that achieved scores < the one cited e.g., PR of 95 on SAT means 95% of those taking SAT at the same time did worse than you & 5% did better some symbols cumfll = cum freq at lower true limit of X X = score Xll = score at lower true limit of X i = width fi = # cases in X's group N = total # scores 1) Getting PR from score (X) PR = cumfll + ((X - Xll)/i) (fi) N x 100 Class (X) limits f cum f cum % 93 - 95 90 - 92 87 - 89 84 - 86 81 - 83 78 - 80 92.50 - 95.50 89.50 - 92.50 86.50 - 89.50 83.50 - 86.50 80.50 - 83.50 77.50 - 80.50 4 3 2 7 6 3 25 21 18 16 9 3 100.00 84.00 72.00 64.00 36.00 12.00 What is PR of 88? X = 88 cumfll = 16 Xll = 86.5 i=3 fi = 2 N = 25 PR = 16 + ((88 - 86.50) / 3) (2) x 100 25 PR = 68 NB: PR goes from 0 – 100 2) Getting score (X) from PR cumf = (PR x N)/100 X = Xll + [ i (cumf - cumfll) / fi ] Class (X) limits f cum f cum % 93 - 95 90 - 92 87 - 89 84 - 86 81 - 83 78 - 80 92.500 - 95.50 89.50 - 92.50 86.50 - 89.50 83.50 - 86.50 80.50 - 83.50 77.50 - 80.50 4 3 2 7 6 3 25 21 18 16 9 3 100.00 84.00 72.00 64.00 36.00 12.00 What is score for PR of 75? cumf = 75 x 25 / 100 = 18.75 Xll = 89.5 i=3 cumf = 18.75 cumfll = 18 fi = 3 X = 89.5 + [ 3 (18.75 - 18) / 3 ] = 90.25 CHAPTER 3 HOMEWORK PART 1 NOT IN BOOK 1) Given the following set of data: 67 45 45 35 25 56 37 28 59 45 63 45 34 37 36 17 42 75 61 41 64 46 34 61 26 26 32 32 40 38 57 47 15 24 5 5 29 31 41 14 56 37 23 14 44 14 90 52 43 57 55 23 43 43 13 23 44 49 49 25 53 34 16 37 33 45 46 65 38 20 53 44 44 27 33 59 45 54 31 15 54 27 36 36 17 19 66 15 19 16 a. What is the class width if you want 18 groups? b. Construct a frequency distribution c. What is PR if X = 36? d. What is X if PR = 98? NOT IN BOOK ANSWERS 1 a) (90 - 5 + 1) / 18 = 4.7 ~ 5 b) group limits 90 - 94 85 - 89 80 - 84 75 - 79 70 - 74 65 - 69 60 - 64 55 - 59 50 - 54 45 - 49 40 - 44 35 - 39 30 - 34 25 - 29 20 - 24 15 - 19 10 - 14 5-9 89.50 - 94.50 84.50 - 89.50 79.50 - 84.50 74.50 - 79.50 69.50 - 74.50 64.50 - 69.50 59.50 - 64.50 54.50 - 59.50 49.50 - 54.50 44.50 - 49.50 39.50 - 44.50 34.50 - 39.50 29.50 - 34.50 24.50 - 29.50 19.50 - 24.50 14.50 - 19.50 9.50 - 14.50 4.50 - 9.50 mdpt 92 87 82 77 72 67 62 57 52 47 42 37 32 27 22 17 12 7 f cumf cum% 1 0 0 1 0 3 4 7 5 11 11 10 9 8 5 9 4 2 90 89 89 89 88 88 85 81 74 69 58 47 37 28 20 15 6 2 100.00 98.89 98.89 98.89 97.78 97.78 94.44 90.00 82.22 76.67 64.44 52.22 41.11 31.11 22.22 16.67 6.67 2.22 c) what is PR if X = 36? PR = 37 + ((36 - 34.50) / 5) (10) x 100 90 = d) what is X if PR = 98? cumf = 98 x 90 / 100 = 88.20 X = 74.50 + [ 5 (88.2 - 88) / 1 ] = 75.50 44.44 - What types of graphs are used most often in psychology? - Are there rules for which one to use? - Are there rules about how to make them? - Does the shape of the graph mean anything useful? Chapter 3 - Graphing - visual methods to display data a) figure: pictorial; photo, drawing b) table: organized numerical info c) graph: pictorial; axes, #s etc. - basics of graphing a) X-axis (abscissa): horizontal; IV b) Y-axis (ordinate): vertical; DV c) always label axes – note the units d) Y starts at 0; continuous, no breaks X can change start; break; can be discrete e) Y about 0.75 length of X Frequency 1) Bar Graph: nominal, sometimes ordinal a) bar = category b) height = frequency c) bars DO NOT touch d) if ordinal must preserve order e) can be vertical or horizontal 20 18 16 14 12 10 8 6 4 2 0 Women Men DOG CAT FISH TYPE OF PET BIRD Pet Dog Cat Fish Bird w 20 15 8 5 m 10 15 5 14 2) Histogram: interval, ratio data, sometimes ordinal a) same rules as bar only bars DO touch b) usually for discrete data 25 Grade F D C B A Frequency 20 15 10 5 Freq 2 4 20 15 10 0 F D C B A Grade 3) Frequency or Line graph: interval, ratio, sometimes ordinal a) usually for continuous data Weight 56 57 58 59 60 7 Frequency 6 5 4 3 2 1 0 56 57 58 Weight 59 60 freq 2 2 4 6 5 3.5 3 2.5 2 1.5 1 0.5 0 Cum % cured # cured 4) cumulative frequency: can be bar, histogram or line, but uses cumulative freq, proportion or % a) the line graph version is typically s-shaped or ogive b) always increases e.g., 12 people on a drug to cure disease X. Left = # cured each time period. Right = cum % cured over time. 1 3 6 9 12 80 70 60 50 40 30 20 10 0 1 3 months on drug 6 months on drug Forms of Frequency Curves 1) Normal (bell-shaped) curve: symmetric a) mesokurtic: ideal (middle) b) leptokurtic: peaked (leaping) c) platykurtic: flat (prairie) 2) skew: not symmetric a) positive skew: fewer scores at high end; shifted to left b) negative skew: fewer scores at low end; shifted to right 9 12 CHAPTER 3 HOMEWORK PART 2 IN THE BOOK 3.1 – plot the data provided assuming scores could have decimals (even though not shown); also plot the top row as “passage” and bottom as “non-passage” groups where the xaxis is called subject and there are 14 subjects in each group 3.22 – plot the data provided using total households only 3.23 – plot the data using total # births only NOT IN THE BOOK 1) Draw a graph showing a. positive skew b. Negative skew c. Normal distribution d. Platykurtic distribution e. Leptokurtic distribution IN THE BOOK ANSWERS 3.1 60 score 50 40 Pasage 30 non-passage 20 10 0 1 3 5 7 9 11 13 subject 3.20 12000 10000 8000 # 6000 4000 2000 0 white black na hispanic 1982 1991 2005 asian foreign Year 3.22 1960 100000 1970 80000 1975 60000 1980 40000 1985 # 1987 20000 1988 0 1989 Year 1990 NOT IN THE BOOK ANSWERS a) b) c) d) e) z = (X - X) / s = (X - ) / SIR = (Q3 - Q1) / 2 X = X / n s3 = [3(X - median)] / s Range = hi - lo Xw = fX / ntot s4 = 3 + [ (Q3 - Q1) / 2 (P90 - P10)] md = Xll + i [ ((N/2) - cumfll) / fi] s2 = (X - X)2 / n SS = X2 - (X)2/n s = s2 s2 = SS/n s = s2 - Sid wants to know what is the average age of people in the mall before the stores open? - Dr. Smith has 4 classes each with a different number of pupils. He has the average grade on the last quiz for each of the 4 classes but wants to know the overall average. - If we include all the billionaires in the calculation of the average US income will it be inflated because of the few very high values? Is there a better measure than the mean? Chapter 4 - Central Tendency A) Arithmetic Mean (average): X = X/n 4 + 2 + 6 + 4 + 5 = 21 21/5 = 4.20 = X 1) from ungrouped frequency distribution: X 10 9 f 4 2 fX 40 18 8 7 6 5 6 2 5 1 20 48 14 30 5 155 X = fX/n X = 155/20 = 7.75 2) Weighted Mean: mean of a group of means e.g., 4 classes with mean exam scores of 75, 78, 72, 80. What is the overall or grand mean? a) if each class has same # of people: (75 + 78 + 72 + 80)/4 = 76.25 b) if each class has different # people must account for it class X 75 78 72 80 F 30 40 25 50 145 fX 2250 3120 1800 4000 11170 Xw = fX/Ntot Xw = 11170/145 = 77.03 B) Median: midpoint of a distribution of scores so ½ fall above & ½ fall below = 50th percentile 1) for continuous scores md = Xll + i [ ((N/2) - cumfll) / fi] true limits 68.50 - 71.50 65.50 - 68.50 62.50 - 65.50 59.50 - 62.50 56.50 - 59.50 53.50 - 56.50 f 13 15 20 28 19 6 cumf 101 88 73 53 25 6 1) to find box = N/2 101/2 = 50.50 find 50.5 in cumf column md = 59.50 + 3 [((101/2) - 25) / 28 ] = 62.23 Good for skewed, truncated & open-ended distributions - truncated: use only part of the distribution - open-ended: top or bottom category has only 1 limit e.g., 68.50 + for top category < 53.50 for bottom category 2) median for arrays of scores a) if N is odd => put in ascending order, find middle # 56, 6, 13, 31, 28 => 6, 13, 28, 31, 56 b) if N is even => ascending order, take X of 2 middle #s 6, 13, 28, 31, 56, 72 => (28 + 31) / 2 = 29.50 c) N is even but middle 2 #s are the same => use formula 1, 2, 4, 6, 6, 6, 7, 121 x 121 7 6 4 2 1 f 1 1 3 1 1 1 cumf 8 7 6 3 2 1 8/2 = 4 => box md = 5.5 + 1 [ ((8/2) - 3) / 3] = 5.83 C) Mode: most common score; crude measure 1) 1, 3, 4, 6, 7, 7, 7, 9, 9 mode = 7 2, 2, 4, 9, 9 mode = 2, 9 2) class 68.5 - 71.5 65.5 - 68.5 62.5 - 65.5 59.5 - 62.5 f 10 15 9 10 1) find highest f value 2) report midpoint as mode mode = (68.5 + 65.5) / 2 = 67 - Which to use? 1) mode: quick & easy but crude; not unique - can have 2+ 2) median: skewed, truncated, open-ended 3) mean: most common, normal distributions some properties of the mean a) summed deviations = 0 (X - X) = 0 X X-X 4 4 - 5.5 = -1.5 3 3 - 5.5 = -2.5 9 9 - 5.5 = 3.5 6 6 - 5.5 = 0.5 0 b) sensitive to extreme values (skew) 2, 3, 5, 7, 8 X = 5 md = 5 2, 3, 5, 7, 33 X = 10 md = 5 c) can't use with open-ended distribution Mean, Median & Skew relationship a) mean > median => positive skew b) mean < median => negative skew c) mean = median => no skew CHAPTER 4 HOMEWORK IN THE BOOK 4.1 NOT IN THE BOOK 1) Find mean, median & mode for a. 10, 8, 6, 0, 8, 3, 2, 5, 8, 0 b. 119, 5, 4, 4, 4, 3, 1, 0 2) Find the weighted mean for: Person X items sold Amy 1.75 Bob 2.0 Carrie 2.4 Diana 2.5 Elyssa 2.0 Fred 1.67 # days 4 5 5 4 3 3 CHAPTER 4 – HOMEWORK ANSWERS IN THE BOOK 4.1) mean: 1193/17 = 70.18 Median: 55, 56, 56, 59, 66, 66, 71, 71, 72, 72, 72, 72, 73, 73, 75, 91, 93 Mode: 72 NOT IN THE BOOK 1a) 0, 0, 2, 3, 5, 6, 8, 8, 8, 10 X = 50/10 = 5; mode = 8; md = (5 + 6) / 2 = 5.50 b) 119, 5, 4, 4, 4, 3, 1, 0 X = 140/8 = 17.50; mode = 4 X f cumf 119 1 8 8/2 = 4 5 1 7 4 3 6 md = 3.5 + 1 [ ((8/2) - 3) /3] = 3.83 3 1 3 1 1 2 0 1 1 18) X 1.75 2.0 2.4 2.5 2.0 1.67 f 4 5 5 4 3 3 24 fX 7 10 12 10 6 5.01 50.01 Xw = 50.01/24 = 2.08 - Al calculated the average height of people in a random sample to figure out how high he should make the pull-down security bars on a new roller coaster. He says the average height is 5’10” but his boss says not everyone is 5’10”. He wants to know about what height to expect – what is the dispersion or spread of heights? Betty graphs data she collected on frequency of failing grades for grammar school students as a function of tv shows watched and finds a very peaked graph shifted to the left. She knows it’s leptokurtic and skewed but can she attach values to say how leptokutic and how skewed? Chapter 5 – Dispersion/Variability - dispersion: spread or variability of scores around central tendency measure 1) range: hi score - lo score 11, 17, 9, 3, 20, 36 36 - 3 = 33 2) semi-interquartile range (SIR) or Q2: use with median; median + SIR cuts off middle 50% of scores SIR = Q2 = (Q3 - Q1) / 2 PR 90 75 50 35 25 10 X 80 70 40 30 10 5 Q3 = score at 75th PR Q1 = score at 25th PR SIR = Q2 = (70 - 10) / 2 = 30 3) variance or mean square (s2 or 2) & standard deviation or root mean square (s or ) a) use with mean b) can use to compare distributions c) quite precise d) used in statistical tests later on e) large values = high error, low precision small values = low error, high precision 1) Mean Deviation Method: long, but shows how scores vary from the mean s2 = (X - X)2 / n = SS/n X 65 90 84 76 81 98 82 59 X-X -14.375 10.625 4.625 -3.375 1.625 18.625 2.625 -20.375 0 s = s2 (X - X)2 206.64 n=8 X = 79.375 112.89 21.39 s2 = 1123.87/8 = 140.48 11.39 2.64 s = 140.48 = 11.85 346.89 6.89 415.14 1123.87 = SS 2) Raw Score Method: easier; less intuitive about mean SS = X2 - (X)2/n s2 = SS/n s = s2 X X2 65 4225 90 8100 SS = 51527 - (635)2/8 = 1123.875 84 7056 76 5776 s2 = 1123.875/8 = 140.48 81 6561 98 9604 s = 140.48 = 11.85 82 6724 59 3481 635 51527 - homogeneous sample: data values similar => low s2 & s - heterogeneous sample: data values dissimilar => high s2 & s - Pearson's Coefficient of Skew: + or - and how much s3 = [3(X - median)] / s X = 20 s = 5 md = 24 s3 = [ 3(20 - 24)] / 5 = -2.40 Generally + 0.5 is ~ symmetrical/normal - Kurtosis: peaked or flat s4 = 3 + [ (Q3 - Q1) / 2 (P90 - P10)] P90 = score at 90th PR X 100 90 70 40 20 5 PR 90 75 60 50 25 10 P10 = score at 10th PR s4 = 3 + [ (90 - 20) / 2 (100 - 5)] = 3.37 3 = mesokurtic < 3 = platykurtic > 3 = leptokurtic CHAPTER 5 HOMEWORK IN THE BOOK 5.1 a) use top row of numbers only & the mean deviation method b) use middle row of numbers & the raw score method NOT IN BOOK PR 100 90 75 60 50 35 25 10 5 X 90 85 70 50 40 20 10 5 2 X = 30 s = 5 md = 25 1) Find SIR 2) Find SKEW 3) Find KURTOSIS CHAPTER 5 HOMEWORK ANSWERS IN BOOK 1a) X 54 52 51 50 36 55 44 46 57 44 43 52 X-X (X - X)2 5.33 28.41 3.33 11.09 X = 48.67 n = 12 2.33 5.43 1.33 1.77 s2 = 410.68/12 = 34.22 -12.67 160.53 6.33 40.07 s = 34.22 = 5.85 -4.67 21.81 -2.67 7.13 range = 57 – 36 = 21 8.33 69.39 -4.67 21.81 -5.67 32.15 3.33 11.09 -0.04* 410.68 * ~ 0 – not exact because of rounding b) X 38 46 55 34 44 39 43 36 55 57 36 46 529 X2 1444 2116 3025 1156 1936 1521 1849 1296 3025 3249 1296 2116 24029 SS = 24029 - (529)2/12 =708.92 s2 = 708.92/12 = 59.08 s = 59.08 = 7.69 range = 57 – 34 = 23 NOT IN BOOK PR 100 90 75 60 50 35 25 10 5 X 90 85 70 50 40 20 10 5 2 X = 30 s = 5 md = 25 SIR = (70 - 10)/2 = 30 s3 = [3(30 - 25)]/5 = 3 s4 = 3 + [(70 - 10)/2(85 - 5)] = 3.38 - Is there a simpler method to examine percentile ranks and compare values other than the PR formula? - Mitch has the mean and standard deviation values for a quiz that a class just took. He also has his grade on the quiz. How can he determine how many people did worse than him and how many did better? - If you know a country club takes people whose income is in the top 5% of the city and you know the average income of the city and standard deviation, can you use your income to figure out if you can get in the club? Chapter 6 - z-scores or standard scores - z-score: represents distance between score & mean relative to s 1) can use to compare 2 different variables because z-scores are abstract #s without units 2) if scores are normally distributed can relate directly to PR via the "Standard Normal Distribution" = a theoretically ideal normal distribution where: = 0 = 1 total area under curve = 1.0 or 100% 50% => <= 50% - + below the mean above the mean 68.26% 95.44% 99.74% -4 -3 -2 -1 0 1 2 3 4 3) when you transform data to z-scores a) mean = 0 b) sum of squared z-scores = n c) s = 1 z = (X - X)/s z = (X - )/ sample population e.g., for IQ = 100 = 15; someone got an IQ of 130 z = (130 - 100)/15 = +2.00 so are 2 standard deviations above the mean e.g., when 2 scores come from different distributions is hard to compare; z-scores let you do it psych = 50 = 10 bio = 48 = 4 Bob got a 60 on psych & 56 on bio; for which course should he expect a better grade? Psych z = (60 - 50)/10 = +1.00 Bio z = (56 - 48)/4 = +2.00 would expect better grade bio!!! e.g., of properties ht 6' 5' 5' 6' 7' 5.80 0.75 5 ht z-score 0.27 -1.1 -1.1 0.27 1.6 0 1 5 ht z2 0.0729 1.21 1.21 0.729 2.56 wt z2 wt 0.0961 0.6084 2.0736 0.2704 1.9321 z-score 0.31 -0.78 -1.44 0.52 1.39 0 1 5 wt 200 lb 150 lb 120 lb 210 lb 250 lb 186.00 45.87 5 X S N 5 5 5 5 ======================================================= 1) assume X = 650 = 600 = 100. What % did worse than X? z = (650 - 600) / 100 = 0.50 Table A page 548 - 549 Column a = z-score Column b = area between & z Column c = area beyond z 0 0.5 Area between = 0.1915 so 0.1915 + 0.5 = 0.6915 = 69.15% did worse or PR = 69.15 2) X = 400 = 600 = 100. What % did worse? z = (400 - 600) / 100 = -2 -2 Area beyond = 0.0228 = 2.28% did worse or PR = 2.28 0 3) What % of cases fall between X = 650 and X = 400 if = 600 = 100? z = (650 - 600) / 100 = 0.5 z = (400 - 600) / 100 = -2 0.1915 + 0.4772 = 0.6687 = 66.87% -2 0.5 0 0.5 4) What % fall between X = 700 and X = 800 if = 600 = 100? z = (700 - 600) / 100 = 1 z = (800 - 600) /100 = 2 0.4772 - 0.3413 = 0.1359 = 13.59% 0 1 2 RULE: ++ or -- => subtract column b +=> add column b 5) Suppose a golf club takes only top 3% of population in income where = 500k = 25k. You make 520k. Can you get in? 0.03 or 3% column c gives beyond so find 0.03 in c & get z that goes with it z = 1.88 0 0.5 ? so.... 1.88 = (X - 500) / 25 (1.88) (25) = X - 500 (1.88) (25) + 500 = X X = 547K so you cannot get in!!! 6) Suppose = 600 = 100, what is the score at the 60th percentile? 0.40 Column c => 0.4013 => z = 0.25 above So … 0.25 = X – 600/100 0.25 (100) = X – 600 7) 0.25 (100) + 600 = X 0 ? X = 625 7) Suppose = 600 = 100, between what scores do the middle 30% lie? 0.15 0.15 ? 0 Column b => 0.15 => +/- 0.39 0.39 = X - 600/100 = 639 -0.39 = X – 600/100 = 561 ? 8) Suppose = 600 = 100, beyond what scores do the most extreme 20% lie? Column c => 0.10 => +/- 1.28 0.10 0.10 1.28 = X – 600/100 = 728 ? 0 ? -1.28 = X – 600/100 = 472 CHAPTER 6 - HOMEWORK NOT IN BOOK 1) You have a normal distribution based on 1000 scores with a mean of 50 and sd of 10. a. find proportion of area & # cases between the mean and 60 b. find percent of area & # cases between the mean and 25 c. find proportion of area & # cases above 70 d. find percent of area & # cases above 45 e. find proportion of area & # cases between 60 and 70 f. find percent of area & # cases between 45 and 70 2) You have a normal distribution with a mean of 72 and sd of 12. a. What is the score at the 25th percentile? b. Between what scores do the middle 50% of cases lie? c. Beyond what scores do the most extreme 10% of cases lie? CHAPTER 6 – HOMEWORK ANSWERS 1. a) (60 - 50) / 10 = 1 0.3413 x1000 = 341.3 cases 0 1 b) (25 - 50) / 10 = -2.5 0.4938 x 100 = 49.38% 0.4938 x 1000 = 493.8 cases -2.5 c) (70 - 50) / 10 = 2 0.0228 x 1000 = 22.8 cases 0 d) (45 - 50) / 10 = -0.5 0.6915 x 100 = 69.15% 0.6915 x 1000 = 691.5 cases -0.5 0 2 0 e) (60 - 50) / 10 = 1 (70 - 50) / 10 = 2 0.4772 - 0.3413 = 0.1359 x 1000 = 135.9 cases f) (45 - 50) / 10 = -0.5 (70 - 50) / 10 = 2 0.4772 + 0.1915 = 0.6687 x 100 = 66.87% 0.6687 x 1000 = 668.7 cases 0 -0.5 0 1 2 2 2. a) 0.25 -0.67 = (X - 72) / 12 X = 63.96 ? = -0.67 b) 0.25 ? = -0.68 0.68 = (X - 72) / 12 X = 80.16 0.25 0 -0.68 = (X - 72) / 12 X = 63.84 ? = 0.68 0.05 0.05 c) 1.64 = (X - 72) / 12 X = 91.68 ? = -1.64 ? = 1.64 -1.64 = (X - 72) / 12 X = 52.32 sesty = sy [N ( 1 - r2)] / (N - 2) by = (r) (sy/sx) a = Y - byX r = (zxzy) / N Y = a + byX rs = 1 - [ (6D2) / [N (N2 - 1)]] zy' = (r)(zx) Y' = Y + (zy')(sy) Y' = Y + [ (r)(sy/sx)(X - X)] r= XY - [(X)(Y) / N] [X2 - [(X)2 /N]] [Y2 - [(Y)2 / N]] 1 = r2 + k 2 - Sue wants to know if there is a relationship between how well students do on a quiz and how much test anxiety they report prior to taking it. - Bill has teachers rank their students by how popular they think they are and then wants to know if there is a relationship between the popularity ranks and the students’ GPA. - Sandy wants to know if there is a relationship between number of depressed people and SES. Chapter 9 - Correlation - correlation: relationship between 2 variables - correlation coefficient: measure used to express extent or strength of relationship 1) positive correlation: 0 < r < 1; score high on 1 variable & score high on the other; score low on 1 variable score & score low on the other; positive slope; 1.0 = perfect correlation 2) negative correlation: -1 < r < 0; score high on 1 variable & score low on the other; negative slope; -1.0 = perfect correlation positive negative 3) 0 = no correlation, no linear relationship 4) looking for a linear relationship - others exist (e.g., ushaped), but correlation only measures linear 5) correlation = causation 6) |r| < 0.29 small correlation, weak relationship |r| 0.3 - 0.49 medium correlation / relationship |r| 0.5 - 1.0 large correlation, strong relationship - scatter diagram: graphic means to show data points & correlation & (later) regression - centroid: X, Y point ( ) Ht 2 4 5 9 5 12 10 wt 8 6 4 2 Wt 3 7 10 11 7.75 mean 0 0 2 4 6 8 10 ht 1) Pearson r: for interval & ratio data a) z-score method r = (zxzy) / N N = # pairs X Zx Y Zy ZxZy 1 3 5 7 9 11 13 -1.5 -1 -0.5 0 0.5 1 1.5 4 7 10 13 16 19 22 -1.5 -1.0 -0.5 0 0.5 1.0 1.5 2.25 1 0.25 0 0.25 1 2.25 =7 r = 7/7 = 1.00 Good if already have z-scores, otherwise is a pain! If already have info: (zxzy) = 4.90 N = 7, then 4.9/7 = 0.70 then it's easy. 2) Raw Score Method r= XY - [(X)(Y) / N] [X2 - [(X)2 /N]] [Y2 - [(Y)2 / N]] numerator = covariance: degree to which 2 variables share common variance; high covariance = more linear, closer to +1 low covariance = less linear, closer to 0 X2 1 9 25 49 81 121 169 455 X 1 3 5 7 9 11 13 49 r= Y 7 4 13 16 10 22 19 91 Y2 49 16 169 256 100 484 361 1435 XY 7 12 65 112 90 242 247 775 X = 49 X2 = 455 Y = 91 Y2 = 1435 XY = 775 N=7 (X)2 = 2401 (Y)2 = 8281 775 - [(49)(91) / 7] [455 - [2401/7]] [1435 - [8281 / 7]] r = + 0.82 N.B. can get negative on top but not on bottom - If r = + 1 all data fall in a line; if |r| < 1 data are scattered. There are 3 types of variation: total = explained (r2) + unexplained (k2) if r = + 1 all is explained; if r = 0 all is unexplained a) r2 = coefficient of determination: proportion of 1 variable explained by the other b) k2 = coefficient of non-determination: proportion of 1 variable not explained by the other total = 1 or 100% so.... 1 = r2 + k2 => k2 = 1 - r2 e.g., r = 0.84 r2 = 0.71 k2 = 1 - 0.71 = 0.29 - cautions with Pearson r 1) measures linearity so low r means not linear; could still have a non-linear relationship 2) distribution need not be normal but must be unimodal 3) of truncated will get spuriously low r 2) Spearman r: with ordinal data; rs a) both variables must be rank ordered b) non-parametric test: looks at ranks only (parametric uses actual #s) rs = 1 - [ (6D2) / [N (N2 - 1)]] D = 0 N = # pairs D = rank X - rank Y X 140 120 136 100 129 125 rank X 1 5 2 6 3 4 Y 63 70 72 69 65 71 rank Y 6 3 1 4 5 2 D -5 2 1 2 -2 2 0 D2 25 4 1 4 4 4 42 rs = 1 - [ (6 42) / [6 ( 36 - 1)]] = - 0.20 - Tied Scores: if tied must take this into account to be fair X 140 120 136 100 120 125 rank X 1 4 2 6 5 3 adjusted rank X 1 4.5 (4 + 5) / 2 = 4.50 2 6 take mean of tied ranks 4.5 assign mean rank 3 - Correlation matrix: table to visualize many correlations kindergarten kinder -----grammar -----high -----college ------ grammar 0.93 ------------- high 0.74 -0.63 ----------- college 0.61 -0.54 0.36 ------ e.g., what 2 groups correlate the most? Grammar & kindergarten e.g., which 2 groups correlate the least? High school & college e.g., what is the correlation between grammar & high? -0.63 CHAPTER 9 HOMEWORK IN THE BOOK 9.1 a) using Benin Rep to Ghana only for InfMort (x) and Income (y) only 9.2 using same as above but by hand (not SPSS); also find r2 & k2 NOT IN BOOK 1) Use Spearman Rank to find the correlation coefficient % recall % recognition Sleepy 86 91 Dopey 81 95 Grumpy 75 86 Sneezy 78 93 Doc 58 80 Happy 62 70 Bashful 38 84 2) RANK ORDER THESE a) X 7 4 6 7 9 4 2 b) X 76 79 81 76 63 28 c) X -41 -38 -42 -41 -26 -26 -41 CHAPTER 9 - HOMEWORK ANSWERS income 9.1a) 7000 6000 5000 4000 3000 2000 1000 0 40 60 80 100 120 infmort 9.2) X 104 109 80 102 110 91 76 113 61 61 907 X2 10816 11881 6400 10404 12100 8281 5776 12769 3721 3721 85869 r= 1311052 - [(907 16554) / 10] Y 933 965 1573 1166 850 1654 880 628 6024 1881 16554 Y2 870489 931225 2474329 1359556 722500 2735716 774400 394384 36288576 3538161 50089336 XY 97032 X = 907 105185 X2 = 85869 125840 Y = 16554 118932 Y2 = 50089336 93500 XY = 1311052 150514 (X)2 = 822649 66880 (Y)2 = 274034916 70964 N = 10 367464 114741 1311052 [ 85869 - (822649 /10)] [ 50089336 - (274034916 / 10)] 1311052 – 1501447.8 = √(3604.1)(22685845) r = - 0.67 r2 = -0.672 = 0.45 k2 = 1 - 0.45 = 0.55 -190395.8 285940.65 NOT IN BOOK 1) % recall 86 81 75 78 58 62 38 rank recall 1 2 4 3 6 5 7 % recog. 91 95 86 93 80 70 84 rank recog. 3 1 4 2 6 7 5 D2 4 1 0 1 0 4 4 14 rank 3.5 2 1 3.5 5 6 X -41 -38 -42 -41 -26 -26 -41 rs = 1 - [ ( 6 14) / [7 (49 - 1)]] = 0.75 2) X 7 4 6 7 9 4 2 rank 2.5 5.5 4 2.5 1 5.5 7 2) X 76 79 81 76 63 28 3) rank 5 3 7 5 1.5 1.5 5 - Joe has a set of data correlating number of books read per month with age. He wants to plot these data on a graph and draw a line to show the general linear trend of the data. - Carol has a set of data on height as a function of how many grams of protein children had on average per day. She then wants to predict the height of an individual assuming they had 10 grams of protein on average per day. Chapter 10 - Regression - regression: allows you to predict relationships - remember Y = mX + b as the equation for a line? We rewrite it in regression analysis as Y = a + byX X, Y = variables by = slope (m) (tilt) a = y-intercept (b) (where it hits y-axis) a) if r = + 1 it's easy to predict & draw the line if r < + 1 you must draw a "best fit" line b) some properties of the regression line 1) squared deviations around line are minimal 2) sum deviations = 0 3) new symbols X' & Y' for predictions - To find regression line equation: by = (r) (sy/sx) X 1 2 3 4 5 3 1.41 Y 5 r = -1.0 4 3 2 1 3 mean 1.41 s a = Y - byX Y = a + byX by = (-1)(1.41/1.41) = -1 a = 3 - (-1)(3) = 6 Y = 6 + (-1) X leave X & Y as letters - To Draw the regression line for Y = 6 + (-1) X 1) pick 2 reasonable values for X 2) put in equation & solve for Y 3) plot the 2 pairs of X,Y points 4) connect the dots with a line centroid 6 If X = 5 Y = 6 + (-1)(5) = 1 If X = 1 Y = 6 + (-1)(1) = 5 5 Y 4 3 2 1 0 0 2 4 X - In regression analysis you can also find X = a + bxY and get 2 regression lines that have certain relationship r=1 r = 0.75 r = + 1 => superimposed r = 0.25 r = 0 => perpendicular intersection point = X,Y the centroid r=0 6 - To predict Y if know X Y' = Y + [ (r)(sy/sx)(X - X)] Given: X = 70 sx = 4 Y = 75 sy = 8 r = 0.6 If Sue got a 62 on X what did she get on Y? Y' = 75 + [ (0.6) (8/4) (62 - 70) ] = 65.40 - If you have z-scores zy' = (r)(zx) Y' = Y + (zy')(sy) Given: X = 62 X = 70 sx = 4 zx = -2 Y = 75 sy = 8 r = 0.6 a) zy' = (0.6) (-2) = -1.20 b) Y' = 75 + (-1.2)(8) = 65.40 - Standard Error of the Estimate (sesty): estimate of the standard deviation of data around the regression line; k2 was a version of this but not really in terms of standard deviation sesty = sy [N ( 1 - r2)] / (N - 2) r = + 1 => sesty = 0 no errors / deviation r = 0 => sesty is maximal Given: X = 70 sx = 4 Y = 75 sy = 8 N = 20 r = 0.60 sesty = 8 [ 20 (1 - 0.62)] / (20-2) = 6.75 Larger sesty => less accurate predictions - recall: Y' was a prediction not a fact. Using sesty we can find an interval where are 68% sure that the true Y will be Ytrue = Y' + sesty 1 + (1/N) + [(X - X)2 / SSx] Sesty & Ytrue are influenced by magnitude of X & Y variance: low variance => better / lower sesty => better Ytrue - Homoscedasticity: where variance of 1 variable is constant at all levels of the other variable - Heteroscedasticity: where variance of 1 variable is not constant at all levels of the other variable Homoscedasticity Heteroscedasticity - Post-Hoc Fallacy: assuming a cause & effect relationship from correlation data CHAPTER 10 HOMEWORK IN THE BOOK 10.1 using Y & X1 where mean Y = 6.7, s = 0.70; mean X = 46, s = 6.29, r = 0.62 (also plot the regression line), 10.2, 10.3 NOT IN BOOK X = 20 Sx = 5 X = 24 Zx = 0.8 Y = 50 Sy = 7 r = 0.7 a) Zy’ = ? b) Y’ = ? CHAPTER 10 – HOMEWORK ANSWERS 10.1) by = (0.62)(0.70 / 6.29) = 0.07 a = 6.7 - (0.07)(46) = 3.48 Y = 3.48 + 0.07 X Y = 0.70 + (0.07)(1) = 3.55 Y = 3.48 + (0.07) (3) = 3.69 4 y 3 2 1 0 1 3 x 10.2) sesty = 0.7 [10(1 - 0.622)] / (10 - 2) = 0.61 10.3) Y' = 6.7 + (0.62)(0.70 / 6.29)(70 - 46) = 8.36 NOT IN BOOK a) zy' = (0.7)(0.8) = 0.56 b) Y' = 50 + (0.56)(7) = 53.92 2 = [(Oi - Ei)2 / Ei] df = (r - 1)( c - 1) est 2 = (t2 - 1) / (t2 + N1 + N2 - 1) 2 est = [SSbet - (k - 1)(s w)] / (SStot + s2w) OR 2 est = [dfbet(F - 1)] / [dfbet (F - 1) + Ntot] 2 x = / N sx = s / N - 1 z = (X - ) / x HSD = q s2w / n upper limit = X + (t 0.05)(sx) lower limit = X - (t 0.05)(sx) t = (X - ) / sx df = N - 1 SS1 = X12 - [(X1)2 / N1] SS2 = X22 - [(X2)2 / N2] Sx1x2 = [(SS1 + SS2) / (N1 + N2 - 2)][(1/N1) + (1/N2)] t = [(X1 - X2) - (1 - 2)] / sx1x2 df = N1 + N2 - 2 SStot = Xtot2 - [(Xtot)2 / Ntot] dfw = Ntot - k SSbet = [(Xi)2 /Ni] - [(Xtot)2 / Ntot] s2bet = SSbet / dfbet s2w = SSw / dfw SSw = SStot - SSbet dfbet = k - 1 F=s 2 bet / s2w - Are there any underlying concepts that guide our choice of statistical tests? - Are there standards that we can compare our results to in order to see if there are statistically significant differences? - Are we always right or are there errors we should be aware of? Chapter 8 - Inferential Statistics & Errors - goal: estimate parameters of pop. from descriptive stats; compare 2+ groups of data 1) hypothesis testing: compare samples for differences - Step #1 = formulate all hypotheses 1) typically have experimental & control groups: manipulated vs comparison groups respectively 2) hypotheses a) null hypothesis (H0): expect no difference b) alternative hypothesis (H1): expect a difference 1) 1-tailed / directional: states how they differ (<, >) 2) 2-tailed / non-directional: just states they differ - Step #2 = conduct the study, collect the data, generate summary statistics (e.g., mean, SD, etc.) - Step #3 = choose appropriate statistical test (i.e., formulas) that will assess the evidence (data) against the null hypothesis by generating a test statistic = a single number that assesses the compatibility of the data with H0 - Step #4 = generate the p-value = the likelihood/probability that the result observed is due to random occurrence if H0 is correct or if H0 is true what is the probability of observing a test statistic as extreme as the one obtained in #3? p-values typically generated by statistical software packages - Step #5a (using software) = compare p-value to a fixed significance level () that the scientific community agrees that there is statistical significance (most common = 0.05 & 0.01): Rule: p < => reject H0 p > => accept H0 = 0.05 p = 0.03 reject H0, are different = 0.01 p = 0.06 accept H0, no different - Step #5b (by hand) = a) each statistical test is associated with a theoretical distribution of values (sampling distribution) of what would happen (theoretically) if every sample of a particular size were studied (i.e., what test statistic would you expect for a given sample size) b) when you generate a test statistic (using a formula) you can then go to a table with the sampling distribution and for a given -level & sample size find what test statistic value would expect if H0 is true – if your test statistic > table value reject H0 = there is a statistically significant difference - Central Limit Theorum (CLT): method to construct a sampling distribution of the population mean, providing a way to test H0; assumes that if random samples of fixed N from any pop. are drawn & X calculated then: 1) distribution of means becomes normal 2) grand mean approaches mean of pop. 3) standard deviation decreases - standard error of the means: the overall standard deviation of the sample means Since all of this is based on probabilities there is always the risk that you can make an error in your decisions. - decision errors a) Type I (): reject H0 when it's true b) Type II (): accept H0 when it's false your decision accept Ho reject Ho true status of null Ho true H0 false correct II / (1 - ) I/ correct (1 - ) - = 0.05 2-tail p = 0.03 1-tail H0: false 0.03 x 2 = 0.06 p > => accept H0 => Type II - = 0.05 1-tail p = 0.06 2-tail H0: true 0.06 / 2 = 0.03 p < => reject H0 => Type I - = 0.05 1-tail p = 0.03 1-tail H0: false p < => reject H0 => correct Rule: always fix the p-value CHAPTER 8 - HOMEWORK NOT IN BOOK 1) For the following decide accept/reject then state if there is an error and if it is Type I or II p H0 0.01 1-tail 0.008 1-tail T 0.05 2-tail 0.08 2-tail T 0.05 1-tail 0.06 1-tail F 0.02 1-tail 0.03 2-tail F 0.01 2-tail 0.006 1-tail T CHAPTER 8 – HOMEWORK ANSWERS 1a) p < => reject => Type I b) p > => accept => correct c) p > => accept => Type II d) p < => reject => correct e) p > => accept => correct - John has access to all the records for inductees into the US Army since it began and knows the average IQ and standard deviation for this population. He has a group of new inductees and wants to know if their average IQ differs significantly from past years. - Kelly knows that sampling errors always exist so the sample mean will not exactly match the true population mean. Can she determine a range of values that will cover the true mean with some degree of confidence? Chapter 12 - Single Sample Tests 1) z-test: know & X x = / N z = (X - ) / x x = standard error of the mean e.g., = 250 = 50 X = 263 N = 100 do the means differ? Use = 0.01 2-tailed x = 50 / 100 = 5 z = (263 - 250) / 5 = 2.60 from z-table: at 0.05 reject if |z| > 1.96 at 0.01 reject if |z| > 2.58 so....2.60 > 2.58 => reject null - they differ Rule: test statistic > table value => reject null Note: you are now getting the actual test statistic not pvalue! Alpha guides you to a place in the table to decide if test statistic is < or > that criterion. Computers provide pvalue along with answers. 2) Student's t-test: , X & s known sx = s / N - 1 t = (X - ) / sx df = N - 1 e.g., X = 85.1 s = 9.61 N = 10 = 72 do the means differ? Use = 0.01 1-tailed sx = 9.61 / 10 - 1 = 3.2 t = (85.1 - 72) / 3.2 = 4.09 df = 10 - 1 = 9 go to t-table page 551 1) choose 1-tail or 2-tail row 2) get for that row 3) find df = degrees of freedom = # of values free to vary after certain restrictions placed on data (reflection of sample size) so...... 4.09 > 2.821 => reject null, they differ df: # independent scores; e.g., if X = 4.5 & n = 4 and you know 3 of scores are 3, 4 & 5. Total scores must = 18 since 18/4 = 4.5. so last number must be 6. a) confidence limits for X: range of values representing probability that more samples drawn from pop. will fall within it 95% limits 99% limits upper limit = X + (t 0.05)(sx) upper limit = X + (t 0.01)(sx) lower limit = X - (t 0.05)(sx) lower limit = X - (t 0.01)(sx) e.g., X = 108 s = 15 N = 26 sx = 3 df = 25 upper = 108 + (2.06)(3) = 114.18 lower = 108 - (2.06)(3) = 101.82 95% limits t-table at 0.05 ALWAYS 2-TAILED upper = 108 + (2.787)(3) = 116.36 lower = 108 - (2.787)(3) = 99.64 99% limits t-table at 0.01 ALWAYS 2-TAILED NB: 95% limits are "tighter" than 99% 99 95 99 101 108 114 116 CHAPTER 12 – HOMEWORK IN THE BOOK 12.11) where X = 3.01, s = 7.18, n = 29; use = 5.06; = 0.05 2-tailed 12.12) compute both 95% & 99% confidence limits NOT IN THE BOOK 1) Using the same data from 12.11 but this time s = unknown and = 5.06, = 7.18; = 0.01 2-tailed CHAPTER 12 - HOMEWORK 12.11) = 0 s = 7.18 n = 29 X = 3.01 = 0.05 2-tailed sx = 7.18 / 29-1 = 1.36 t = (3.01 – 0) / 1.36 = 2.22 df = 28 2.22 > 2.048 => reject H0 12.12) upper = 3.01 + (2.048)(1.36) = 5.79 lower = 3.01 - (2.048)(1.36) = 0.22 upper = 3.01 + (2.763) (1.36) = 6.77 lower = 3.01 - (2.763)(1.36) = -0.75 NOT IN BOOK 1) = 5.06 n = 29 X = 3.01 = 7.18 = 0.01 2-tailed x = 7.18 / 29 = 1.33 z = (3.01 – 5.06) / 1.33 = -1.54 < 2.58 => accept H0 - Andy has two groups of rats and wants to see if what he feeds them affects how fast they run through a maze. One group gets mashed protein bars to eat and the other gets mashed bananas. He runs them through the maze and times them. The protein group runs it in 6.5 seconds on average and the banana group runs it in 10.3 seconds. Is there a significant difference? - Is there a way to estimate the degree to which the IV really contributes to the effect seen on the DV? Chapter 14 - 2-Sample Tests - Student's t-test for unknown population SS1 = X12 - [(X1)2 / N1] SS2 = X22 - [(X2)2 / N2] Sx1x2 = [(SS1 + SS2) / (N1 + N2 - 2)][(1/N1) + (1/N2)] t = [(X1 - X2) - (1 - 2)] / sx1x2 ** 1 - 2 = 0 ** df = N1 + N2 - 2 e.g., X1 = 477 X12 = 29845 X1 = 59.63 N1 = 8 X2 = 11 X22 = 101 X2 = 5.5 N2 = 2 = 0.05 1-tail SS1 = 29845 - [(4772)/8] = 1403.88 SS2 = 101 - [(112)/2] = 40.50 Sx1x2 = [(1403.88 + 40.50) / (8 + 2 - 2)] [ (1/8) + (1/2)] = 10.62 t = (59.63 - 5.50) / 10.62 = 5.10 > 1.86 => reject H0 df = 8 + 2 - 2 = 8 - est 2 (omega-squared): many things contribute to p-level and whether you accept of reject the null; one is 2 or degree to which IV accounts for variance in DV - how much are the 2 variables related? est 2 = (t2 - 1) / (t2 + N1 + N2 - 1) - interpret like r2 - higher 2 means have significant findings e.g., t = 5.097 in previous problem est 2 = (5.0972 - 1) / (5.0972 + 8 + 2 - 1) = 0.714 IV accounts for 71.4% of variance in DV - fairly significant Can follow this with the confidence limits CHAPTER 14 HOMEWORK 14.11 use = 0.05 2-tailed; also find est2 CHAPTER 14 – HOMEWORK ANSWERS 14.11) X1 = 169 X12 = 3297 X1 = 18.78 N1 = 9 X2 = 141 X22 = 2607 X2 = 17.63 N2 = 8 = 0.05 2-tailed SS1 = 3297 - (1692) / 9 = 123.56 SS2 = 2607 - (1412) / 8 = 121.88 Sx1x2 = [(123.56 + 121.88) / (9 + 8 - 2)] [(1/9) + 1/8)] = 1.96 t = (18.78 – 17.63) / 1.96 = 0.59 < 2.131 => accept H0 df = 15 2 = (0.592 - 1) / (0.592 + 9 + 8 - 1) = -0.04 - June has a new drug to control the number of manic episodes patients experience each month, but she is not sure of the most effective dose. She gets 30 manic patients and divides them randomly into 3 groups. She gives one group a low dose, one group a medium dose and one group a high dose of the drug. She then monitors them for one month, recording the number of manic episodes they experience. Group 1 has an average of 6 episodes, group 2 has 3, and group 3 has 5. Do they differ significantly in their effect on the number of manic episodes? - Exactly which doses differ from each other? Chapter 16 - Analysis of Variance (ANOVA) - omnibus test: permits analysis of several variables or variable levels at the same time - one-way ANOVA: analysis of various levels or categories of single treatment variables - why not do lots of t-tests? Will give experimentwise errors = drive up probability of making Type I errors ANOVA: divide total variance into between & within subjects variance s2 30.1 42.6 40.9 Rat 1 2 3 test 1 6.3 8.2 7.1 test 2 1.3 2.4 1.9 test 3 14.6 18.2 17.3 X 7.4 9.6 8.8 X S2 7.2 0.61 1.9 0.20 16.7 2.34 within subject variances Between subject variances - ANOVA is based on the General Linear Model: a conceptual mathematical model Xij = + i + ij <= random error or error variance e.g., blood pressure study: do the 3 means differ? = 0.05 active (X1) passive (X2) relaxed (X3) totals X 1407 1303 1308 4018 X2 99723 85479 86254 271456 X 70.35 65.15 65.40 -------N 20 20 20 60 Step 1: add across all rows to get totals; then do equations 1) SStot = Xtot2 - [(Xtot)2 / Ntot] 271456 - [(40182) / 60] = 2383.94 2) SSbet = [(Xi)2 /Ni] - [(Xtot)2 / Ntot] 14072 + 13032 + 13082 - 40182 20 20 20 60 i = individual = 344.04 3) SSw = SStot - SSbet 2383.94 - 344.04 = 2039.90 4) dfbet = k - 1 3-1=2 5) dfw = Ntot - k 60 - 3 = 57 k = # conditions 6) s2bet = SSbet / dfbet s2bet = MSbet 344.04 / 2 = 172.02 7) s2w = SSw / dfw s2w = MS w 2039.9 / 57 = 35.79 8) F = s2bet / s2w 172.02 / 35.79 = 4.81 9) F-table on page 558 - 560 - across top = dfbet - down left = dfw - light # = at 0.05 - bold # = at 0.01 df = 2,57 2,60 at 0.05 = 3.15 so...... 4.81 > 3.15 => reject H0 the 3 means do differ - F was an omnibus test - it just says the 3 means differ but not which ones; need follow-up tests to determine this a) a priori: decide prior to study what tests or comparisons will do; planned b) a posteriori or post hoc: do all possible pair-wise comparisons; not planned - Tukey HSD (Honestly Significant Difference) Test (post hoc) HSD = q s2w / n 1) prepare a means table 70.35 65.15 65.40 70.35 ---------------- 65.15 5.20* --------- 65.40 4.95* -0.25 ------ 2) do HSD test HSD = 3.40 35.79 / 20 = 4.54 q comes from table L on page 562 using dfw & k Any of the difference values (| |) in the table > to HSD value get an * meaning they differ significantly. - est 2: degree of association IV & DV est 2 = [SSbet - (k - 1)(s2w)] / (SStot + s2w) est 2 = [344.04 - (3 - 1)(35.79)] / (2383.94 + 35.79) = 0.11 OR est 2 = [dfbet(F - 1)] / [dfbet (F - 1) + Ntot] est 2 = [2(4.81 - 1)] / [2 (4.81 - 1) + 60] = 0.11 CHAPTER 16 HOMEWORK 16.21 use = 0.05; also find est2; also create a means table & find which means differ using = 0.05 CHAPTER 16 – HOMEWORK ANSWERS 16.21 X X2 X n X1 433 15519 28.87 15 X2 599 29595 39.93 15 X3 713 36897 47.53 15 totals 1745 82011 ---45 SStot = 82011 - (17452) / 45 = 14343.78 SSbet = 4332 + 5992 + 7132 - 17452 15 15 15 45 = 2643.38 SSw = 14343.78 – 2643.38 = 11700.40 dfbet = 3 - 1 = 2 dfw = 45 - 3 = 42 s2bet = 2643.38/2 = 1321.69 s2w = 11700.40/42 = 278.58 F = 1321.69/278.58 = 4.74 4.74 > 3.22 => reject H0 est 2 = [2643.38 - (3 - 1)(278.58)] / (14343.78 + 278.58) = 0.14 28.87 28.87 --39.93 --47.53 --- 39.93 -11.06 ----- 47.53 HSD = 3.44278.58/15 = 14.82 -18.66* -7.6 --- - Ed polls a random sample of people by phone to see how much they agree with the statement that the president is doing a good job: very good, good, neutral, poor, very poor. Is there a difference in the frequency with which people give responses for the different categories? - Kathy wants to know if people will help someone more or less as a function of gender of the person needing help. She has Bob & Ann pretend to drop a bag of groceries on a busy street and records how many times people stop to help either one of them. Was there a significant difference in helping versus non-helping for Bob vs Ann? Chapter 19 - Chi-Squared Test (2) - nonparametric: does not require normality - 2: typically with frequencies or proportions from nominal data 1) one-variable X2 or "goodness of fit" 2 = [(Oi - Ei)2 / Ei] strong agree 7 agree 12 undecided 13 O = observed data E = expected data i = individual strong disagree disagree 13 10 expected = total answers / # categories = 55/5 = 11 X2 = (7 - 11)2 + (12 - 11)2 + (13 - 11)2 + (13 - 11)2 + (10 - 11)2 11 11 11 11 11 = 2.3 df = N - 1 (n = # categories) df = 5 - 1 = 4 X2 table on page 572 at 0.05 => 9.488 2.3 < 9.488 => accept H0 no difference 2) multi-variable X2: same formula but different way to get expected drug placebo get better a 1 get worse b 17 18 c d 12 21 29 39 9 10 1) label boxes a - d fe = fcfr/n 2) find expected values a) (18/39) (10) = 4.6 b) (18/39) (29) = 13.4 c) (21/39) (10) = 5.4 d) (21/39) (29) = 15.6 3) use X2 formula a b x c d (1 - 4.6)2 + (17 - 13.4)2 + (9 - 5.4)2 + (12 - 15.6)2 4.6 13.4 5.4 15.6 df = (r - 1)(c - 1) = 7.09 r = # rows c = # columns df = (2 - 1)(2 - 1) = 1 7.09 > 6.635 => reject H0 they differ CHAPTER 19 – HOMEWORK 19.1, 19.8 CHAPTER 19 – HOMEWORK ANSWERS 19.1) 25 32 10 67/3 = 22 use = 0.05 (25 - 22)2 + (32 - 22)2 + (10 - 22)2 22 22 22 = 11.51 df = 3 - 1 = 2 11.51 > 5.991 => reject H0 19.8) use = 0.01 Smoke Non-Smoke Total 1 29 198 227 a) (100/586)(227) = 38.74 b) (100/586)(123) = 20.99 c) (100/586)(236) = 40.27 d) (486/586)(227) = 188.26 e) (486/586)(123) = 102.01 f) (486/586)(236) = 195.73 2 16 107 123 3 55 181 236 Total 100 486 586 (29 – 38.74)2 + (16 – 20.99)2 + (55 – 40.27)2 + (198 – 188.26)2 38.74 20.99 40.27 188.26 + 2 (107 - 102.01) (181 – 195.73)2 102.01 + 195.73 = 2.45 + 1.19 + 5.39 + 0.50 + 0.24 + 1.11 = 10.88 df = (2 - 1)(3 - 1) = 2 10.88 > 5.991 => reject H0 they differ