Research Terminology for The Social Sciences What is Data? Data is a collection of observations Observations have associated attributes These attributes are variables A collection of data is often called a “data set” What are variables? A measure that takes different values for different observations Across a population (cross-sectional) Across time (cross-temporal) Both! (Panel data) Independent/explanatory variables are variables we think have an effect on other variables Control variables are a special category Dependent/outcome variables are the variables we are trying to explain or predict Unpacking Variables Features of variables Take on some set of values Different values have different meanings Could be numerical, meaning they have number values attached Continuous Discrete or Limited Could be categorical, meaning they have descriptive terms attached Ordinal (the categories have numerical ranks associated with them) Typological (the categories are descriptive and do not represent some ordering/ranking/values) Research Design Determine what kind of data will be needed based upon your research question Quantitative? Large-N Measurable in a clear and consistent way Qualitative? Case studies Not easily quantifiable The Holy Grail of Social Science Research: Turning Quantitative Data into Qualitative Measures Collecting Data Libraries have a large collection of data sets that are ready to be used, in common software formats Digital Centers have software suites for all steps of data collection process Bibliographic packages Data management software Data analysis software Reference librarians are useful resources for discovery Sometimes, you may need to collect original data Field work: going out and gathering data from observations Archival work: finding the data in other information sources and aggregating it into a data set Operationalization Operationalization is the process of turning theoretical concepts into measurements Matching theory with variables Ideological framework The type of problem should suggest an appropriate measure Matching levels Macro vs. micro, and everything in between Matching observations Individuals? Pairs? Groups? Matching meanings This is the hardest Using Variables Models are statements about the way variables related to one another Two basic types in social science: analytical and formal Analytical Models Describe the causal relationships between variables Rely upon probability and statistics Formal Models Describe a simplified version of reality Variables become elements of this simplified reality Rely upon theoretical frameworks Both types of models can be tested with data Research Methods Mixed methods analysis is the “gold standard” Combination of quantitative and qualitative data Formal models Mathematical representations of decisions Game theory Matching the research design to the hypothesis under investigation is critical How questions are asked and answered What counts as evidence? Discovering the Data Descriptive Statistics These are measures designed to help you “picture” your data Means, Medians, Modes Standard Deviations, Variances Exploratory Visualization These are graphs that depict visually information contained in descriptive statistics Distribution plots Histograms Density plots Simple correlation plots Graphing two variables, one on each axis (i.e., X & Y) You can get more complicated later! Analyzing the Data Simple inferences Correlations/covariances These measures show the relationships between and among variables Commonly referred to as ANOVA – ANalysis of VAriance ANOVA is about comparing two (or more) samples, groups, populations Basic Linear Models These models explore Simple regression: one dependent variable, one independent variable This is really just a correlation Multivariate regression: one dependent variable, many independent variables This technique looks at simultaneous correlations among several variables Advanced Data Analysis Models for non-continuous/limited/discrete variables Logit and probit models: the dependent variable can take two values Tobit models: the dependent variable can take a set of values Ordered logit, ordered probit, and multinomial logit models: the dependent variable can take a small and discrete set of values Models for complex data Simultaneous equations models (SEMs): the dependent variable can also effect the independent variable Instrumental variables are a technique used to deal with this issue Time-series and panel data models The data cover multiple years and may have serial correlations (i.e., the values for one year are highly correlated with values from the previous year) Non-linear models The relationships between the variables are not of the form Y= mX + B