Data Collection and Sampling Primary Data • There are various methods for collecting primary (original) data – Eg questionnaire, survey, interview, observation • Control over investigation much greater • Can more easily avoid “data-driven” research • Cost can be prohibitive • Pilot studies can be very helpful Choice of method • Shipman: choice often between sampling and case study • Intensive versus extensive research design • Qualitative versus quantitative data • Interpretivists favour the former; positivists favour the latter • All primary research involves selection • Most methods require sampling Sampling: general principles • No a priori superiority of any method • Trade-offs: standardisation versus control, generalisability versus flexibility • Shipman: sampling method used dependent on nature of study undertaken • Basis for sample must be transparent • Cost of surveying entire population is prohibitive (e.g. census) • Constraint of feasibility Sampling: definitions • Population: must be defined • Finite population: e.g. voters • Sampling unit: single potential member of sample • Sampling frame: list of sampling units (NB 1936 US Presidential election) • Sample: drawn from sampling frame Probability Sampling • Probability of each sampling unit being chosen is known (often equal probability) • Simple random sampling: classic method, regarded as most reliable, least biased • List numbered sampling frame members and select via random number generator • Other probabilistic methods are available Systematic sampling • • • • • List members of sampling frame Choose first sample member randomly Then choose every Kth unit, where K=N/n More convenient than SRS for large popn Can be a systematic pattern in sample list, leading to bias; e.g. corner shops Stratified sampling • Divide population into groups of alike members • Strata sizes usually proportionate to popn • Draw randomly from groups • Cost effective • Ensure representativeness • Can lead to excessive number of sub-groups Cluster Sampling • Select large groups • Select sampling units from clusters randomly • Example: take a city, divide into areas, number areas, select areas randomly, number units within areas, select units randomly • Very cost-effective • Very good if sampling frame poorly defined Non-probability Sampling • Convenience sampling: select whoever is available • Quota sampling: collect data according to proportions of the population • Selection of subjects absolutely crucial • Requires great skill of interviewers • Snowball sampling: select next subject from previous subject Non-Probability Sampling • Theoretical sampling: select those most likely to be affected by an issue • Can ignore things which do not fit • Can interpret observations according to the theory • Non-prob sampling cannot claim representativeness as easily but gives much more discretion and control Response Rates • Another possible trade-off is on response rates • R = 1 - (n-r)/n • Even if initial sample size is appropriate (n’ = n/(1+(n/N)) where n = s2/SE2: see F-N and N: 194-9) response rates can be low • Postal questionnaires: typically 20-40% • Non-response bias Response Rates • Non-respondents could affect findings • If reason for non-response is related to issue: e.g. reluctance to interview drunks hampers study on alcoholism • Response rate can be improved by cover letter, callbacks, skill of researcher, length of questionnaire, types of question Conclusions • • • • All types of primary data require selection If sampling used: various methods possible Sampling method relates to research tool Different data collection techniques: questionnaires, interviews, etc. - all to be studied in Research Methods 2 - all have advantages and disadvantages Secondary Data Introduction • Primary quantitative data has several advantages, particularly control; qualitative data too • Do not equate primary and qualitative • Today: advantages of secondary data • Searching on electronic data sources including the Internet Secondary data • Primary/secondary is not = qualitative/quantitative • Qualitative can include secondary data sources such as personal documents, auto/biographies, etc. • Secondary: collected by someone else, e.g. another academic researcher, business, government agency, etc. Secondary data • Used extensively in social science – Durkheim: suicide – Marx: wages, incomes, prices – Weber: church records • Economists mainly use secondary data Advantages of Secondary Data • Might be the only data available • Enables longitudinal /time series work • Cheaper (cost and time) and more convenient than primary data • Aids generalisation • Arises from natural settings (nonreactive/unobtrusive data) • Allows replication and checking - validity Disadvantages of Secondary Data • May be not exactly the data required • Differences in underlying sampling, design, questions asked, method of ascertaining information, etc. • Differences lead to bias • Method of data generation crucial to econometric studies Electronic Data Sources • • • • • Through the library system Through the internet Known versus unknown sources Known sources via library catalogue Problem of reliability/credibility is common to all electronic sources (more than nonelectronic sources) Electronic Data - Literature • You can search by author or subject across journals, via several static websites/portals: • www.econlit.org/ • www.sosig.ac.uk • www.mimas.ac.uk • www.economics.ltsn.ac.uk • www.esds.ac.uk Electronic Data: Databases • There are many databases available online • Most have standardised, national data free to download in various formats • Common file format is .csv; but .html and even .xls files also common • • • • • • • • • • • OECD: ONS: UN: Penn World Tables: BEA (US): Ameristat: Eurostat: World Bank: CIA: US Statistical Abstract: See Dissertation homepage/hb Conclusions • Secondary data has many advantages and disadvantages relative to primary • There is a wide range of secondary data available • Much data is available on the internet • Internet sources must be scrutinised more closely than other sources Qualitative Data Introduction • Principals of research design and sampling basically hold for quantitative and qualitative data • However, they apply most easily to quantitative analysis • Qualitative analysis has different foci • Qualitative analysis relatively (to quant; other soc sci) unused in economics Qualitative techniques: types • • • • • • • Case study Fieldwork (ethnography) Observation Unstructured interviews Analytic induction/grounded theory Discourse analysis Theoretical sampling Qualitative techniques: principals • Qual often = not quantitative • Can use quant for pattern detection, qual for causal analysis • Or use qual and quant as equals in inference (triangulation) • Quantification often inappropriate Qualitative techniques: principals • Interpretivism, verstehen • Used to be associated only with using autobiography, letters, personal documents, diaries • Ethnography fairly recent: • Focus on cases rather than generality Qualitative techniques: principals • Analysis not really a separate stage of research • Design, data collection and analysis all simultaneous and continuous • Open-ended approach: Theory and conclusions formed iteratively • Imagination is crucial • Recognise importance of exceptions • Context is crucial Fieldwork • Study of people acting in their daily lives • Access a group but remain somewhat detached • Approach with key questions • Teams get range of perspectives • Danger of self-perception and bias Participant Observation • Adopt perspectives of subject group in order to understand them • Learning language, customs, behaviours, work, leisure, etc. • Hanging around and learning the ropes • Being an outsider can changes subjects’ behaviour • Complete participation - researcher wholly concealed – contamination and artificiality Participant Observation • Researchers can go native (internalise group lifestyle) • Covert researchers can be in danger or create detrimental behaviour • Researchers can be “piggy in the middle” • Covert: recording observations can be difficult (e.g. need hidden cameras) • Serious ethical issues with covert observation Employ analytic induction • Go in with prejudices and theories • Revise theory in light of evidence • Generate new theories until evidence seems to fit • Flexibility accorded but also required by the researcher • Need to be open to disconfirming cases Grounded theory • Data collected • Develop categories (with inevitable theoretical priors and language) • Categories checked by data • Once categories seem secure and grounded in the evidence, formulate interconnection between categories Evaluation • Broad range of qualitative techniques • Control over the investigation; less data driven; flexibility much greater than quantitative studies • Logistically difficult: Huge amounts of data produced and problems with manipulation (although Nvivo will help with this) • Must be careful to collect evidence widely to avoid bias • Can be ethical issues re: data collection and reporting