Research Methods I

Data Collection and Sampling
Primary Data
• There are various methods for collecting
primary (original) data
– Eg
questionnaire, survey, interview,
• Control over investigation much greater
• Can more easily avoid “data-driven”
• Cost can be prohibitive
• Pilot studies can be very helpful
Choice of method
• Shipman: choice often between sampling
and case study
• Intensive versus extensive research design
• Qualitative versus quantitative data
• Interpretivists favour the former; positivists
favour the latter
• All primary research involves selection
• Most methods require sampling
Sampling: general principles
• No a priori superiority of any method
• Trade-offs: standardisation versus control,
generalisability versus flexibility
• Shipman: sampling method used dependent on
nature of study undertaken
• Basis for sample must be transparent
• Cost of surveying entire population is prohibitive
(e.g. census)
• Constraint of feasibility
Sampling: definitions
• Population: must be defined
• Finite population: e.g. voters
• Sampling unit: single potential member of
• Sampling frame: list of sampling units (NB
1936 US Presidential election)
• Sample: drawn from sampling frame
Probability Sampling
• Probability of each sampling unit being
chosen is known (often equal probability)
• Simple random sampling: classic method,
regarded as most reliable, least biased
• List numbered sampling frame members
and select via random number generator
• Other probabilistic methods are available
Systematic sampling
List members of sampling frame
Choose first sample member randomly
Then choose every Kth unit, where K=N/n
More convenient than SRS for large popn
Can be a systematic pattern in sample list,
leading to bias; e.g. corner shops
Stratified sampling
• Divide population into groups of alike
• Strata sizes usually proportionate to popn
• Draw randomly from groups
• Cost effective
• Ensure representativeness
• Can lead to excessive number of sub-groups
Cluster Sampling
• Select large groups
• Select sampling units from clusters
• Example: take a city, divide into areas,
number areas, select areas randomly,
number units within areas, select units
• Very cost-effective
• Very good if sampling frame poorly defined
Non-probability Sampling
• Convenience sampling: select whoever is
• Quota sampling: collect data according to
proportions of the population
• Selection of subjects absolutely crucial
• Requires great skill of interviewers
• Snowball sampling: select next subject from
previous subject
Non-Probability Sampling
• Theoretical sampling: select those most
likely to be affected by an issue
• Can ignore things which do not fit
• Can interpret observations according to the
• Non-prob sampling cannot claim
representativeness as easily but gives much
more discretion and control
Response Rates
• Another possible trade-off is on response
• R = 1 - (n-r)/n
• Even if initial sample size is appropriate (n’
= n/(1+(n/N)) where n = s2/SE2: see F-N
and N: 194-9) response rates can be low
• Postal questionnaires: typically 20-40%
• Non-response bias
Response Rates
• Non-respondents could affect findings
• If reason for non-response is related to
issue: e.g. reluctance to interview drunks
hampers study on alcoholism
• Response rate can be improved by cover
letter, callbacks, skill of researcher, length
of questionnaire, types of question
All types of primary data require selection
If sampling used: various methods possible
Sampling method relates to research tool
Different data collection techniques:
questionnaires, interviews, etc. - all to be
studied in Research Methods 2 - all have
advantages and disadvantages
Secondary Data
• Primary quantitative data has several
advantages, particularly control; qualitative
data too
• Do not equate primary and qualitative
• Today: advantages of secondary data
• Searching on electronic data sources
including the Internet
Secondary data
• Primary/secondary is not =
• Qualitative can include secondary data
sources such as personal documents,
auto/biographies, etc.
• Secondary: collected by someone else, e.g.
another academic researcher, business,
government agency, etc.
Secondary data
• Used extensively in social science
– Durkheim: suicide
– Marx: wages, incomes, prices
– Weber: church records
• Economists mainly use secondary data
Advantages of Secondary Data
• Might be the only data available
• Enables longitudinal /time series work
• Cheaper (cost and time) and more convenient than
primary data
• Aids generalisation
• Arises from natural settings
(nonreactive/unobtrusive data)
• Allows replication and checking - validity
Disadvantages of Secondary Data
• May be not exactly the data required
• Differences in underlying sampling, design,
questions asked, method of ascertaining
information, etc.
• Differences lead to bias
• Method of data generation crucial to
econometric studies
Electronic Data Sources
Through the library system
Through the internet
Known versus unknown sources
Known sources via library catalogue
Problem of reliability/credibility is common
to all electronic sources (more than nonelectronic sources)
Electronic Data - Literature
• You can search by author or subject across
journals, via several static websites/portals:
Electronic Data: Databases
• There are many databases available online
• Most have standardised, national data free
to download in various formats
• Common file format is .csv; but .html and
even .xls files also common
Penn World Tables:
World Bank:
US Statistical Abstract:
See Dissertation homepage/hb
• Secondary data has many advantages and
disadvantages relative to primary
• There is a wide range of secondary data
• Much data is available on the internet
• Internet sources must be scrutinised more
closely than other sources
Qualitative Data
• Principals of research design and sampling
basically hold for quantitative and
qualitative data
• However, they apply most easily to
quantitative analysis
• Qualitative analysis has different foci
• Qualitative analysis relatively (to quant;
other soc sci) unused in economics
Qualitative techniques: types
Case study
Fieldwork (ethnography)
Unstructured interviews
Analytic induction/grounded theory
Discourse analysis
Theoretical sampling
Qualitative techniques: principals
• Qual often = not quantitative
• Can use quant for pattern detection, qual for
causal analysis
• Or use qual and quant as equals in inference
• Quantification often inappropriate
Qualitative techniques: principals
• Interpretivism, verstehen
• Used to be associated only with using
autobiography, letters, personal documents,
• Ethnography fairly recent:
• Focus on cases rather than generality
Qualitative techniques: principals
• Analysis not really a separate stage of research
• Design, data collection and analysis all
simultaneous and continuous
• Open-ended approach: Theory and conclusions
formed iteratively
• Imagination is crucial
• Recognise importance of exceptions
• Context is crucial
• Study of people acting in their daily lives
• Access a group but remain somewhat
• Approach with key questions
• Teams get range of perspectives
• Danger of self-perception and bias
Participant Observation
• Adopt perspectives of subject group in order to
understand them
• Learning language, customs, behaviours, work,
leisure, etc.
• Hanging around and learning the ropes
• Being an outsider can changes subjects’ behaviour
• Complete participation - researcher wholly
concealed –  contamination and artificiality
Participant Observation
• Researchers can go native (internalise group
• Covert researchers can be in danger or create
detrimental behaviour
• Researchers can be “piggy in the middle”
• Covert: recording observations can be difficult
(e.g. need hidden cameras)
• Serious ethical issues with covert observation
Employ analytic induction
• Go in with prejudices and theories
• Revise theory in light of evidence
• Generate new theories until evidence seems
to fit
• Flexibility accorded but also required by the
• Need to be open to disconfirming cases
Grounded theory
• Data collected
• Develop categories (with inevitable
theoretical priors and language)
• Categories checked by data
• Once categories seem secure and grounded
in the evidence, formulate interconnection
between categories
• Broad range of qualitative techniques
• Control over the investigation; less data driven;
flexibility much greater than quantitative studies
• Logistically difficult: Huge amounts of data
produced and problems with manipulation
(although Nvivo will help with this)
• Must be careful to collect evidence widely to
avoid bias
• Can be ethical issues re: data collection and