Research Methods Chapter 8 Data Analysis Two Types of Statistics • Descriptive – Allows you to describe relationships between variables • Inferential – Allows one to test hypotheses & see if results are generalizable Descriptive Statistics • Often begins with univariate analysis – Displays the variation of a variable – Several ways to display variation • Bar Chart, Frequency Polygram, Histogram, etc. Percent of Church Membership Rates of Church Affiliation, U.S., 1776-1995 70 60 50 40 30 20 10 0 1776 1850 1860 1870 1890 1906 1916 1926 1952 1980 1995 Year Frequency Polygon – 3 features of the shape of variation are important: • Central Tendency: The most common value or the value around which cases tend to center around – a.k.a averages like mean, median, mode • Variability: the degree to which cases are spread out or clustered together • Skewness – The extent to which cases are clustered more at one or the other end of a distribution » Can be either non, positive, or negative Negative Skew: Test to Easy Freq. 0 Score 100 Positive Skew: Test to Hard Freq. 0 Score 100 Frequency Distribution of Voting in 1992 Presidential Election Value Voted Did not vote Not eligible Refused Don’t know No answer Total Frequency 1,909 762 183 10 38 2 2,904 Valid Percent 71.5% 28.5 --------100.0% Ungroup and Grouped Age Distributions Ungrouped Age Percent 18 0.2% 19 1.2 20 1.4 21 1.3 And so on…... Grouped Age Percent 18-19 1.4 20-29 19.0 30-39 24.0 40-49 21.5 Calculating The Mean X = The Sum of Scores / # of Scores • So if you had the following test scores (5, 10, 15, 10, 5, 10, 5, 15, 15, 10) • What would be the mean? • Answer: 10! (100/10) Calculating the Mode • Mode = The most frequent value in a distribution • So if you had the following test scores: (10, 5, 10, 15, 10, 10, 5, 10, 5, 15, 15, 10) • What would be the mean? • Answer: 10! (There are more 10’s than any other number) Calculating the Median • Median = The value in the middle of a distribution • Example: (22, 25, 34, 35, 41, 41, 46, 46, 46, 47, 49, 54, 54, 59, 60) • Several Steps to calculate the Median – Arrange all observations in order of size, from smallest to largest – Determine the number of values in the distribution (N) • N in this case = 15 – Plug N into the following formula • (N+1)/2 = (15+1)/2 = 16/2= 8 – If you get a whole number (in this case you got an “8”) then count up that number in the distribution • (22, 25, 34, 35, 41, 41, 46, 46, 46, 47, 49, 54, 54, 59, 60) • Thus, the median is “46” • If you don’t get a whole number then you have to add a step • Example: 8, 13, 14, 16, 23, 26, 28, 33, 39, 61 • Find the N (In this case, the N is “10” • (N + 1)/2 = (10+1)/2 = 5.5. • Thus, counting up 5.5 gets you to the point between “23” & “26” • The extra step…. • (N1 + N2)/2 = (23 + 26)/2 = 49/2 = 24.5 • Thus, the Median in this case is 24.5 Determine the Mean, Median and Mode • • • • • 2, 2, 2, 2, 2 1,2,2,2,5,5,10,10,15,25 17, 18, 9, 9, 5 7, 7, 14, 3, 11, 27, 498 11, 67, 43, 2, 2, 2, 6 Answers • 2, 2, 2, 2, 2 – Mean = 10/5 = 2 – Median =(5 + 1)/2 = 6/2 = 3 Then: count up 3 spaces to get to “2” – Mode = 2 • 1,2,2,2,5,5,10,10,15,25 – Mean = 77/10 = 7.7 – Median = (10 + 1)/2 = 11/2 =5.5 Then: 10/2= 5 – Mode = 2 (5 + 5)/2 = • 17, 18, 9, 9, 5 – Mean = 58/5 = 11.6 – Median = (5 + 1)/2 = 3 Then: = 9 – Mode = 9 • 7, 7, 14, 3, 11, 27, 498 – Mean = 567/7 = 81 – Median =(7 + 1)/2 = 4 Then: = 11 – Mode = 7 • 11, 67, 43, 2, 2, 2, 6 – Mean = 133/7 = 19 – Median = (7 + 1)/2 = 4 Then: = 6 – Mode = 2 Suppose You Had the Following 1 person making $45,000 1 person making $15,000 2 People making $10,000 1 Person making $5,700 3 people making $5,000 4 people making $3,700 1 person making $3,000 12 people making $2,000 What did you Get? • Mean = – $142,500 / 25 = $5,700 • Median = – $3,000 (there are 12 above you and 12 below you • Mode = – $2,000 (occurs the most frequently) Mean Vs. Median Vs. Mode • Generally use the mean for interval or ratio levels of measurement – E.g. Fahrenheit temperatures, Age, Income • Look at shape of distribution first, however – If there are lot’s of outliers, the median might be preferable • Income if including Bill Gates • Use the mode for nominal levels of measurement – Gender Measures of Variation • Central tendency (mean, median, mode) although valuable, only shows us a small piece of the picture – Relying only on central tendency may give us an incomplete and misleading picture • Three towns may have the same mean and median income but be very different in social character – One may be mostly middle class with a few rich and many poor – One may have an euqal number of rich, middle class, & poor • Looking at measures of variation can help us see past the limitations of central tendency The Four Popular Measures of Variation 1 Range – Calculated by taking the highest value in a distribution and subtracting the lowest value, and then adding 1 – Shows us the range of possible values that may be encountered – Weakness: The range can be drastically altered by just one exceptionally high or low value (known as an “outlier”). 2 Interquartile Range – Avoids the problem created by outliers – Quartiles are the points in a distribution corresponding to the first 25%, the first 50%, and the first 75% of the cases. • The second quartile (50%) is the median 3 Variance – The average of the squared deviations from the mean Variance X 3 4 6 12 20 Total __ X=9 __ X-X -6 -5 -3 3 11 __ (X - X)2 36 25 9 9 121 200 X2 9 16 36 144 400 605 4 Standard Deviation – Gives an “average distance” between all scores and the mean – Calculated by squaring the variance Crosstabulation Voting Voted Did not Total (n) Family Income $17,500- $35,000<$17,500 $34,999 $59,999 $60,000+ 60% 73% 75% 84% 40% 27% 25% 16% 100% 100% 100% 100% (424) (550) (541) (433) Crosstabulating Variables • Crosstabulations reveal 4 aspects of the association between 2 variables: – Existence: is there a correlation? – Strength: How strong does the correlation appear to be? – Direction: Positive or negative correlation? – Pattern: Are changes in the percentage distribution of the dependent variable fairly regular (simply increasing or decreasing), or do they vary? Evaluating Association • Inferential Stats are used to determine the likelihood that an association exists in the larger pop. From which the sample is drawn • Thus, researchers often calculate probability levels that determine the probability of chance – E.g. p<.05 means that the probability that the association is due to chance is less than 5 out of 100, or 5% • Generally looking for at least .05, but some want .01 or .001 Controlling for a Third Variable • Associations, however, do not necessary mean causation • Use elaboration analysis to determine whether an association is due to a causal relationship or to another variable • Three types…. Intervening, extraneous, and specification... Intervening Variables Income Perceived Efficacy Voting Extraneous Variables Income Voting Education Findings • The 3 criteria – Time Order • Asked the following questions: – – How long have they been attended church? Used only those who had attended for over a year or more Eight questions about their deviant acts WITHIN THE PAST YEAR!! – Correlation • The data indicated a correlation between the two variables (church attendance and delinquency) – Spuriousness • Could another variable be the determining factor for delinquency instead of church attendance? (Elaboration Analysis) – Race – School – Grade – Gender Findings • The hypothesis was not supported! • The correlation between church attendance and delinquency is spurious – The third variable of gender appears to be an extraneous variable