Data Analysis Statistics Levels of Measurement • Nominal – Categorical; no implied rankings among the categories. Also includes written observations and written responses from qualitative interviews or openended survey questions. • Ordinal – Categorical data with implied rankings or data obtained through respondent ranking of categories. In some cases, a ranking process may be set up for a particular variable. • Interval – No fixed zero point. Data is numerical, not categorical. Rank order among variables is explicit with an equal distance between points in the data set: -2, -1, 0, +1, + 2 • Ratio – Fixed zero point; otherwise the same as interval. • • • • • • • • In general, type of data can be inferred using the following the criteria Nominal – Categorical; no implied rankings among the categories. Also includes written observations and written responses from qualitative interviews or open-ended survey questions. Ordinal – Categorical data with implied rankings or data obtained through respondent ranking of categories. In some cases, a ranking process may be set up for a particular variable. Interval – No fixed zero point. Data is numerical, not categorical. Rank order among variables is explicit with an equal distance between points in the data set: -2, -1, 0, +1, + 2 Ratio – Fixed zero point; otherwise the same as interval. Any categorical data is either nominal or ordinal. All qualitative data is nominal. All scores on standardized scales are either interval or ratio. (Note: almost all the scales we use in social work, except IQ scores are ratio). The level of measurement determines what statistical method we can use. In some cases, we can covert a variable into another level of measurement We can change a variable from ratio to either ordinal or nominal Coverting Data (Use Recode in SPSS) Data Set Categories 5 1 to 2 Occurrence s 2 8 4 2 9 3 to 5 6 to 8 9 to 10 3 3 2 6 10 7 Advantages of using ratio data • We can covert it to another level of data; we can’t do this with nominal data. • People can simply write down information about how they fit a particular attribute (age, income). • We have more statistical options with ratio data. Inferential statistics requires that dependent variables always be ratio. Primary types of data analysis are: • Qualitative • Descriptive. Used to describe the distribution of a single variable or the relationship between two nominal variables (mean, frequencies, crosstabulation) • Inferential (Used to establish relationships among variables; assumes random sampling and a normal distribution) • Nonparametric (Used to establish causation for small samples or data sets that are not normally distributed) Much of what you will use in your research will be descriptive statistics. For example, the most basic type of descriptive statistic is the frequency. Frequencies are the number of times a specific value or data within a specific category occurs. Most often we convert frequencies to percentages – Formula is f/n, where f = frequency and n = the total number of values in a data set. For example, the if the age 25 occurs 5 times in a data set of 50 = 5/50 = 10%. Examples of use of frequency data • 40% of respondents are male. • The mean level of income was $35,000 • 40% of all female voters cast their vote for Arnold compared to 52% of the male voters. *Note: the other descriptive statistic we use is the standard deviation. It describes the degree to which data points vary from the mean of a distribution. In a research article, you will see the standard deviation included with the mean. Application of Standard Deviation (SD) • Mean income was $35,000 with SD = $ 5,000 • M = $23,000, SD = $500 • This is interpreted as there being less variability in income among members of the second data set. That is scores are grouped more tightly around the mean. Normal Distribution • • • • Mean=median=mode Bell shape curve 50% of scores fall below and 50% fall above the mean. Data set can be assessed in terms of how much data falls within one, two or three standard deviations from the mean. • Generally is unimodal although some distributions may be bimodal or trimodal. • Theoretically, at least, inferential statistics may only be used when a set of scores conform to a normal distribution. However, this assumption is often violated. Frequencies used in almost all types of data analysis. Frequency tables can be formatted in a variety of ways. (Some analysis add value and cumulative percent) Age Number Percent 0-18 10 20.0% 19-34 15 30.0% 35-64 15 30.0% 65 & over Total 10 20.0% 50 100% We can also use tables to determine if there is a relationship between two nominal variables, although we can not assess the strength of the relationship. This is called a cross-tabulation Starting Salary $20,000 to $29,999 $30,000 to $39,999 $40,000 to $54,999 Total Female Male 19 (70%) 5 (23%) 7 (26%) 14 (64%) 1 (4%) 3 (13%) 27 (100%) 22 (100%) Categories in both Qualitative Analysis must be: • Mutually exclusive (no overlap) • Exhaustive (all possible categories should be included) Cross-tabulation is the basis for chi-square. Chi-square: • Measures the strength of the relationship between the two variables in the table. • Is not technically a inferential statistic – does not require a normal distribution – but is often grouped with inferential statistics. • Usually requires a random sample although data collected from everyone in a population group is usually considered sufficient for a chi-square analysis. Means can also be used to make comparisons among groups. Income Male M = $35,000 SD = $5,000 Female M = $22,000 SD = $750 You may use means on your project • If your variables include ratio data • If you want to compare groups on a ratio variable • If you want to summarize scores on a standardized instrument or a likert scale Some inferential statistics look at the strength of the relationship between mean scores on ratio level variables and membership in particular demographic group • T-tests (two group comparisons) • Analysis of variance (compares three or more groups) Answers question: Is the difference in means between the two (or more) groups large enough to be statistically significant? We also use correlations to measure the strength of a relationship between two variables. Correlations can only be used • To assess the strength of two ratio level variables. • To measure associations rather than cause and effect relationships. • With data sets in which there are 30 or more observations. Inferential statistics commonly used include: • Independent T-test (compares two groups on one variable). (Test statistic = T) • Paired sampled t-test (compares ratio level scores on pre and post test data). (Test statistic = T) • ANOVA – compares three or more groups on ratio data (Test statistic = F) • Correlation – measures the association between two ratio level variables (Test statistic = R) • Regression analysis (dependent ratio variable – can include more than one independent variable (can be a combination of ratio, ordinal, and nominal data in the regression model). (Test statistic is R2, F, or partial correlation coefficients) Inferential Statistics require that we assess the probability that there is actually a causal relationship between two variables. • We state the research & null hypotheses. • State the degree to which we will risk being wrong about whether or not a relationship actually exists between two variables (level of significance – usually under .10) • Choose an appropriate statistical test and compute it. • Compare the probability level on your computer print out to the level of significance. If the p. value is lower than your confidence level, then reject the null hypothesis. If the p value is higher than the confidence level, accept the null hypothesis. For example: • There is a positive relationship between scores on the self-esteem scale and depression. Level of significance is .05. R = .75, p = .01. Reject Null Hypothesis and accept the Research Hypothesis. • Women will have higher test scores than men. Level of significance = .10. T = .30, p. = .60. Accept the Null Hypothesis and Reject the Research Hypothesis. Other info • Chi-square is interpreted in the same way as inferential statistics. • Most statistics books contain tables that let you determine p values if you calculate test statistics by hand. • SPSS print outs always contain p values for inferential statistics. • Theoretical assumptions are often violated in research articles. • Sample size determines if a relationship between two or more variables is large enough to be statistically significant. • Relationships between two variables can be either positive or negative. High positive relationships are close