Research Design - University at Albany

Research Design and Dataset
GOG502/PLN504 Youqin Huang
1
Group Activity #1
 Housing inequality is an important
aspect of social inequality. Suppose
we want to answer the following
research questions:
 What is the degree and pattern of
housing inequality in the U.S.? (What?)
 What are the driving forces for housing
inequality? (Why?)
GOG502/PLN504 Youqin Huang
2
Please rank the following methods
from the best to the worst, why?
A) Read existing studies on housing consumption/
inequality in the U.S.
B) Analyze census data
C) Select a random sample of households and
conduct a questionnaire survey
D) Conduct in-depth interviews in a typical
neighborhood about people’s housing
consumption and views on housing inequality
GOG502/PLN504 Youqin Huang
3
Research Design
 Plan/framework/blueprint/proposal to
conduct research





Research Question
Literature Review
Hypothesis
Data
Methods of Analysis
 Sometimes, expected results
GOG502/PLN504 Youqin Huang
4
1) Research Question
 One or several clearly stated questions
 What is going on? (descriptive research)
 What is the spatial pattern of migration in the US?
 What is the current state of husing inequality in the US?
 Why did it happen? (explanatory research)
 What are the dynamics of migration in the US?
 Why has income inequality increased in recent decades?
 Significance/contribution





How important to the development of knowledge?
Extend existing theories to different areas/settings?
Solve inconsistency in current interpretations
Help understand a new phenomenon?
GOG502/PLN504 Youqin Huang
…
5
2) Literature Review
 Summarize existing research
 Critique:
 what is missing?
 what are the problems in existing research?
 Relates the study to the larger dialogue/debate
in the literature
 Provides a framework for establishing the
importance of your research
 Conceptually
 Methodologically
GOG502/PLN504 Youqin Huang
6
2) Literature Review
 A suggested model
 Introduce the review with a statement about the
organization of the sections
 Review literature about the independent variables
 Review literature about the dependent variables
 Review literature that relates the independent
variables to the dependent variables
 Provide a summary




Highlight important studies
Capture major themes
Suggest why more research is needed
Advances how the proposed study will fill this need
3) Hypotheses

Predictions that the researcher makes about
the expected relationships among variables




Dependent variable(s) (outcome)
Independent variable (s) (causal factors)
Nature of the relationship between the two
E.g. Women are paid less than men (or there is
gender discrimination against women in wage;
female sex has a negative effect on wage)
 Predictions about the population values that
the researcher will estimate based on data
from a sample
GOG502/PLN504 Youqin Huang
8
4) Data
 What kind of data is needed to test hypotheses ?
 Archival information: text, images, maps, statistics…
 Interviews
 Existing quantitative data:
 census data, CPS, ACS, GSS, AHS…
 Questionnaire survey
 Sampling method, sample size, variables
 data collection
 Cross-sectional vs. longitudinal
GOG502/PLN504 Youqin Huang
9
5) Methodology
 Three general approaches:
 Quantitative
 Qualitative
 Mixed
 Different designs for different
approaches
GOG502/PLN504 Youqin Huang
10
Research Methods
Quantitative
Methods
Instrument-based
questions
Performance,
attitude,
observational, and
census data
Statistical
analyses
Statistical
interpretation

Mixed
Methods
Both open- and
closed-ended
questions
Multiple forms of
data drawing on
all possibilities
Statistical and
text analyses
Across
databases
interpretation

Qualitative
Methods
Open-ended
questions
Interview,
observation,
document, and
audio-visual data
Text and image
analyses
Themes, patterns
interpretation

5) Methodology: quantitative
 Descriptive analysis
 What kind of tables, figures will be
created?
 What kind of summary statistics will be
calculated?
 Inferential analysis / Models
 Hypothesis test (e.g. t-test)
 Model specification (e.g. regression)
 Be very specific
GOG502/PLN504 Youqin Huang
12
Group Activity #2:
Which method is the most appropriate?
 Is there still racial discrimination in the housing
market in the US?
 What are the socio-spatial characteristics of
foreclosures took place in the last five years?
Why?
13
Research Design
 1) Identify the research question and justify its




selection.
2) Review previously published literature
3) Clearly and explicitly specify hypotheses
4) Describe the data and variables needed for
hypothesis test, and how the data will be
obtained.
5) Describe the methods of analysis; expected
results
GOG502/PLN504 Youqin Huang
14
Research Paper
 Research Design
+
 Results from analyses, findings
 Interpretations
 Conclusion and discussion
GOG502/PLN504 Youqin Huang
15
Group Activity #3
 Design an “ideal” dataset you need to
study housing inequality in the U.S.




What is the subject? Or unit of analysis?
What is the spatial coverage?
Which year(s)?
What kind of variables?
 What are your outcome variables (dependent
variables)?
 What are your explanatory variables (independent
variables)?
 Control variables? (independent variables)
GOG502/PLN504 Youqin Huang
16
Getting data
 Conducting a survey
 Sampling design, questionnaire,
interview…
 Expensive, time consuming, quality ?
 Using existing data
 Easier, cheaper/free
 You may not get everything you want
GOG502/PLN504 Youqin Huang
17
Existing Datasets
 U.S. census bureau
 Before 2010:
 Short form: 100%, basic info, available at block
level

SF1: basic racial category

SF2: detailed racial category
 Long form: sample, 1/6 HHs, detailed info, no
block level

SF3 and SF4
 2010:
 Short form + American Community Survey
GOG502/PLN504 Youqin Huang
18
Existing Datasets
 American Community Survey
 Ongoing, every month 250,000 hhs
 Provide estimates:
 1-yr : for 650,000+ pop, every year
 3-yr: 20,000+ pop,
 5-yr: all areas
 American Housing Survey (AHS)
 Economic Census (every 5 years)
GOG502/PLN504 Youqin Huang
19
 American FactFinder
 Exercise
 www2.census.gov for direct data
access
 Use Unix format
GOG502/PLN504 Youqin Huang
20
Existing Datasets
 Integrated Public Use Microdata
Series (IPUMS)
 IPUMS-USA
 Micro-level data
 Surveys from 15 censuses and ACS (20002010)
 IPUMS-CPS
 Yearly micro-level data
 IPUMS-International
 Must register to access data
GOG502/PLN504 Youqin Huang
21
Existing Datasets
 General Social Survey
 Social trend and attitude
 Core questions and topic of the year
 Over time (since 1972)
 1500 – 5000 people, depending on the
year
 Dataset already in SPSS format
GOG502/PLN504 Youqin Huang
22
Existing Datasets
 Inter-university Consortium for
Political & Social Research (ICPSR)
 More than 500,000 digital files, by
individuals and institutions
 Use search to find your dataset
GOG502/PLN504 Youqin Huang
23
Existing Datasets
 National Survey of Families and
Households
 Life history
 Three waves: 1987-88, 1992-93, 200102
 Large sample size: >10,000
GOG502/PLN504 Youqin Huang
24
Existing datasets
 Social Science Data Archive
 List of data archives on CSDA
GOG502/PLN504 Youqin Huang
25
Data Manipulation
 You may have to manipulate your
data before you conduct analysis
 Recoding
 Race: combining different minorities into
one group
 Age: creating age groups
GOG502/PLN504 Youqin Huang
26
Recoding the RACE Variable
Original RACE variable:
Recoded RERACE variable:
White: 0
White: 0
Black: 1
Non-white: 1
Asian: 2
Hispanic: 3
© 2011 Taylor and Francis
Recoding Variables (Ratio  ordinal)
Original age variable:
Recoded variable:
10
15
20
21
25
29
32,50, 61…
1 (<=20)
2 (21-30)
3 (>30)
© 2011 Taylor and Francis
Data Manipulation
 Recoding
 Creating an index




Combining several similar variables
Need to be measured on the same scale
Attitude on protest
Housing facility index:
 Heating: 1 yes, 0 no
 Tap water: 1 yes, 0 no
 Gas/electricity for cooking: 1 yes, 0 no
 Index=heating+tap water+ cooking fuel (0-3)
GOG502/PLN504 Youqin Huang
29
Research Paper:
Questions for yourself
 What is my research questions?
 What kind of method is the most appropriate
for my research?
 What kind of data do I need to answer my
research question? Where can I access the
data?
 What are my DVs? IDVs?
 What kind of descriptive analyses should I do?
 What kind of hypothesis tests should I do?
 What kind of regressions should I do?
GOG502/PLN504 Youqin Huang
30
GOG502/PLN504 Youqin Huang
31
Group Activity #4
 With the “ideal” dataset in mind, please
find one or two existing datasets that
can best suit your research goal.
GOG502/PLN504 Youqin Huang
32