UNIVERSITY OF PRETORIA Theme 1: Introduction to data Presenter: TM Malatji 1 Contents • • • • • • • • • Populations and samples Anecdotal evidence Sources of bias Confounding variables Observational studies Experimental studies Principles of experimental design Sampling strategies Correlation vs. Causation 2 Theme 1: Populations and samples • Population – An entire group that you want to draw conclusions about. • Sample – The specific group that you will collect data from. 3 Theme 1: Populations and samples • Research question: – Over the last 10 years, what is the average time to complete a degree for University of Pretoria undergraduate students? – What is the population? What is the sample? 4 Theme 1: Populations and samples • Research question: – Over the last 10 years, what is the average time to complete a degree for University of Pretoria undergraduate students? – What is the population? What is the sample? • Population: All the graduates from the University of Pretoria from the last 10 years. • Sample: The selected alumni students who will be questioned concerning completion time. 5 Theme 1: Why use samples? • Samples are used when: – The population is too large to collect data from it. – We do not have access to the entire population. – The population is unlimited in size and is hypothetical. E.g. The effects of a new medical procedure. 6 Theme 1: Populations and samples • Generally samples should: – Be based on a well defined selection criteria. – Be unbiased on the make-up of the sample cases. – Be random to allow fair selection of cases. – Consist of the different variations that are present in the population. 7 Theme 1: Sources of bias • Non-response – Is the data representative of the population? • Convenience sample – Is the data representative of the population? 8 Theme 1: Why random sampling? 9 Theme 1: Anecdotal evidence • Anecdotal evidence is based on individual accounts, rather than on reliable research or statistics, and so may not be valid. • The data: – Represents one or two cases. – Is not representative of the population. 10 Theme 1: Relationship between variables • Independent/Explanatory • Dependent/Response might Explanatory affect variable Response variable • Association ≠ Causation 11 Theme 1: Confounding variables • Confounding variables – Third party variable affecting both the supposed explanatory and response variables. • Example: You find that more workers are employed in provinces in which the market provides higher salaries. Does this mean that higher salaries lead to higher employment rates? 12 Theme 1: Confounding variables • Example: You find that more workers are employed in provinces in which the market provides higher salaries. Does this mean that higher salaries lead to higher employment rates? Job sector Average salary Number employed 13 Theme 1: Observational studies • Data is collected by monitoring what happens in a sample space. • This study comes in two forms: – Prospective study: A study in which events are recorded as they unfold. • A number of workers are observed as they grind a part to determine if there is a difference in which the grinding process is conducted in order to improve the process. 14 Theme 1: Observational studies • This study comes in two forms: – Retrospective study: Data is collected after events have occurred. • A number of workers are interviewed and asked to describe the method that they normally use when they grind a part in order to understand the grinding process and improve it. 15 Theme 1: Experimental studies • A study in which a treatment is given to cases. • A randomized experiment is one in which there is random assignment. Treatment Control 16 Theme 1: Principles of experimental design • Control possible confounders • Randomize into treatment and control groups • Sufficiently large sample or duplicate experiment • Block variables that can influence study • Blinding – Single or double 19 Theme 1: Sampling strategies • Simple random sampling – Each population member equally likely. 20 Theme 1: Sampling strategies • Stratified sampling – similar characteristics in each stratum (homogeneous). 21 Theme 1: Sampling strategies Stratified sampling • Example: You are interested in how having a doctoral degree affects the wage gap between men and women among graduates of a certain university. Because only a small proportion of this university’s graduates have obtained a doctoral degree, using a simple random sample would likely give you a sample size too small to properly compare the differences between men and women with a doctoral degree versus those without one. 22 Theme 1: Sampling strategies Stratified sampling • Example: You are interested in how having a doctoral degree affects the wage gap between men and women among graduates of a certain university. Characteristic Strata Groups Gender •Female •Male Degree •Bachelor’s •Master’s •Doctorate 1.Male bachelor’s graduates, 2.Female bachelor’s graduates, 3.Male master’s graduates, 4.Female master’s graduates, 5.Male doctoral graduates, 6.Female doctoral graduates. 23 Theme 1: Sampling strategies Stratified sampling • Example: You are interested in how having a doctoral degree affects the wage gap between men and women among graduates of a certain university. Female bachelor graduate Female master’s graduate Male bachelor graduate Female doctoral graduate Male master’s graduate Male doctoral graduate 24 Theme 1: Sampling strategies Stratified sampling Salary Gender Job history Qualification Sample Female bachelor graduate Female master’s graduate Male bachelor graduate Female doctoral graduate Male master’s graduate Male doctoral graduate 25 Theme 1: Sampling strategies • Cluster sampling – Diverse characteristics in each cluster (non-homogeneous) 26 Theme 1: Sampling strategies Cluster sampling • Example: You are interested in how having a doctoral degree affects the wage gap between men and women among graduates of a certain university. Male, Female, bachelor, masters, doctoral graduates Male, Female, bachelor, masters, doctoral graduates Male, Female, bachelor, masters, doctoral graduates Male, Female, bachelor, masters, doctoral graduates Male, Female, bachelor, masters, doctoral graduates 27 Theme 1: Sampling strategies Cluster sampling Sample Male, Female, bachelor, masters, doctoral graduates Male, Female, bachelor, masters, doctoral graduates Male, Female, bachelor, masters, doctoral graduates Male, Female, bachelor, masters, doctoral graduates Male, Female, bachelor, masters, doctoral graduates 28 Theme 1: Sampling strategies • Multistage cluster sampling – Cluster sampling, and then select cases for study from clusters Male, Female, bachelor, masters, doctoral graduates Male, Female, bachelor, masters, doctoral graduates Male, Female, bachelor, masters, doctoral graduates Male, Female, bachelor, masters, doctoral graduates 29 Theme 1: Study conclusions: Correlation vs. Causation • Observational study: – A study in which cases are observed or outcomes are measured without any intervention to affect the outcomes (e.g. No treatment given). • Experimental study: – A study in which and intervention is introduced and the effects are studied. 30 Theme 1: Study conclusions: Correlation vs. Causation • How does sleep deprivation affect your ability to drive? A recent study measured the effects on 19 professional drivers. Each driver participated in two experimental sessions: one after normal sleep and one after 27 hours of total sleep deprivation. The treatments were assigned in random order. In each session, performance was measured on a variety of tasks including a driving simulation. a. Correlation statement generalized to all drivers • E.g. Sleep deprivation is associated with decreased performance ability of professional drivers. b. Causal statement generalized to all drivers • E.g. Sleep deprivation decreases the performance ability of professional drivers. c. Causal statement about the sample • E.g. Sleep deprivation decreases the performance ability of 19 sampled professional drivers. 31 Theme 1: L2 Summary • • • • • • • • • • Populations and samples Anecdotal evidence Sources of bias Confounding variables Observational studies Experimental studies Principles of experimental design Sampling strategies Correlation vs. Causation PS: Do complete the homework as exercise 32 Thank you! Happy studying 33