is also available - Columbia University Libraries Blogs

advertisement
Research Terminology for
The Social Sciences
What is Data?
 Data is a collection of observations
 Observations have associated attributes
 These attributes are variables
 A collection of data is often called a “data set”
 What are variables?
 A measure that takes different values for different observations
 Across a population (cross-sectional)
 Across time (cross-temporal)
 Both! (Panel data)
 Independent/explanatory variables are variables we think have an effect on
other variables
 Control variables are a special category
 Dependent/outcome variables are the variables we are trying to explain or
predict
Unpacking Variables
 Features of variables
 Take on some set of values
 Different values have different meanings
 Could be numerical, meaning they have number values attached
 Continuous
 Discrete or Limited
 Could be categorical, meaning they have descriptive terms attached
 Ordinal (the categories have numerical ranks associated with them)
 Typological (the categories are descriptive and do not represent some
ordering/ranking/values)
Research Design
 Determine what kind of data will be needed based upon your
research question
 Quantitative?
 Large-N
 Measurable in a clear and consistent way
 Qualitative?
 Case studies
 Not easily quantifiable
 The Holy Grail of Social Science Research:
Turning Quantitative Data into Qualitative Measures
Collecting Data
 Libraries have a large collection of data sets that are ready to be used,
in common software formats
 Digital Centers have software suites for all steps of data collection process
 Bibliographic packages
 Data management software
 Data analysis software
 Reference librarians are useful resources for discovery
 Sometimes, you may need to collect original data
 Field work: going out and gathering data from observations
 Archival work: finding the data in other information sources and
aggregating it into a data set
Operationalization
 Operationalization is the process of turning theoretical concepts
into measurements
 Matching theory with variables
 Ideological framework
 The type of problem should suggest an appropriate measure
 Matching levels
 Macro vs. micro, and everything in between
 Matching observations
 Individuals? Pairs? Groups?
 Matching meanings
 This is the hardest
Using Variables
 Models are statements about the way variables related to one
another
 Two basic types in social science: analytical and formal
 Analytical Models
 Describe the causal relationships between variables
 Rely upon probability and statistics
 Formal Models
 Describe a simplified version of reality
 Variables become elements of this simplified reality
 Rely upon theoretical frameworks
 Both types of models can be tested with data
Research Methods
 Mixed methods analysis is the “gold standard”
 Combination of quantitative and qualitative data
 Formal models
 Mathematical representations of decisions
 Game theory
 Matching the research design to the hypothesis under
investigation is critical
 How questions are asked and answered
 What counts as evidence?
Discovering the Data
 Descriptive Statistics
 These are measures designed to help you “picture” your data
 Means, Medians, Modes
 Standard Deviations, Variances
 Exploratory Visualization
 These are graphs that depict visually information contained in descriptive
statistics
 Distribution plots
 Histograms
 Density plots
 Simple correlation plots
 Graphing two variables, one on each axis (i.e., X & Y)
 You can get more complicated later!
Analyzing the Data
 Simple inferences
 Correlations/covariances
 These measures show the relationships between and among variables
 Commonly referred to as ANOVA – ANalysis of VAriance
 ANOVA is about comparing two (or more) samples, groups, populations
 Basic Linear Models
 These models explore
 Simple regression: one dependent variable, one independent variable
 This is really just a correlation
 Multivariate regression: one dependent variable, many independent
variables
 This technique looks at simultaneous correlations among several variables
Advanced Data Analysis
 Models for non-continuous/limited/discrete variables
 Logit and probit models: the dependent variable can take two values
 Tobit models: the dependent variable can take a set of values
 Ordered logit, ordered probit, and multinomial logit models: the dependent
variable can take a small and discrete set of values
 Models for complex data
 Simultaneous equations models (SEMs): the dependent variable can also effect
the independent variable
 Instrumental variables are a technique used to deal with this issue
 Time-series and panel data models
 The data cover multiple years and may have serial correlations (i.e., the values for one
year are highly correlated with values from the previous year)
 Non-linear models
 The relationships between the variables are not of the form Y= mX + B
Download