1: Measurement and Sampling What is biostatistics? What is measurement? How do we sample populations? 7/28/2016 1: Measurement & Sampling 1 HS 167 Logistics Syllabus: materials (text, lab workbook, calculator) Calendar and assignments are on www.sjsu.edu/biostat → click HS167 (become familiar with Web site) Exam1 = 10/9, Exam2 = 11/13, Final = Thur 12/13 2:45 Lab 0 and Lab 1 (Tu and We lab may have additional time to complete Lab 1) Text (reading): pp. 1 –10, 15 – 19 (note vocabulary on p. 11) Exercises: 1.1 – 1.6, 1.8, 1.9, 2.1 – 2.3, 2.11 – 2.13 [due at beginning of next lecture] Yahoo group: send email to hs167-F07-subscribe@yahoogroups.com Academic integrity (do your own work) Odd-numbered exercises and lab work → OK to get help from friends Even numbered exercises & exams → do NOT get help from friends How to get a good grade: 7/28/2016 Attend all classes and labs (attendance required) Stay on task Read text (listed to Nancy) Do Lab & HWs diligently Do not cut corners 1: Measurement & Sampling 2 Biostatistics is not merely a compilation of computational techniques is a way of learning from data is concerned with all many elements of study design and analysis (not just computations) requires more judgment than math (pay attention to vocabulary) is statistics applied to biological and health problems 7/28/2016 1: Measurement & Sampling 3 Biostatistics involves A data detective element Uncovering patterns and clues This is a combination of exploratory data analysis (EDA) and descriptive statistics A data judge element 7/28/2016 Confirmation of clues This often requires inferential methods 1: Measurement & Sampling 4 Measurement P Measurement ≡ “assigning of numbers and codes according to prior-set rules” P Three types of statistical measurements: P Categorical ≡ classify observations into named (nominal) categories P e.g., HIV classified as “positive” or “negative” P Ordinal ≡ ranked categories P e.g., OPINION ranked 5 = strongly agree, 4 = agree, 3 = neutral, and so on P Quantitative ≡ numbers with equal spacing P e.g., AGE in years P e.g., BLOOD_PRESSURE in mm Hg 7/28/2016 1: Measurement & Sampling 5 Illustrative Example: Weight Change and Heart Disease Source: Willett et al., 1995 Goal: to determine the effect of weight change on coronary heart disease risk 115,818 women 30- to 55-years of age, free of CHD Body mass index (BMI, kg/m2) determined at entry to study Body weight determined as of age 18 Subjects followed for 14 years Number of CHD onsets (fatal and nonfatal) counted (1292 cases) 7/28/2016 1: Measurement & Sampling 6 Illustrative Example (cont.) Variables Categorical Ordinal Quantitative 7/28/2016 Smoker or nonsmoker Family history of heart disease (yes or no) Non-smoker, light-smoker, moderate smoker, heavy smoker BMI (kgs/m3) Age (years) Weight presently Weight at age 18 1: Measurement & Sampling 7 Variable, Value, Observation P Observation the unit upon which measurements are made P Can be an individual (e.g., a person) P Can be an aggregate of individuals (e.g., a region) P Variable the generic thing we measure P e.g., AGE of a person P e.g., HIV status of a person P Value a realized measurement P e.g.,“27” P e.g.,“positive” 7/28/2016 1: Measurement & Sampling 8 Data Structure (Forms) Data Collection Form Var1 (ID) 1 Var2 (AGE) 27 Var3 (SEX) Var4 (HIV) 7/28/2016 Observation 1 Observation 2 Observation 3 Observation 4 F Y Var5 (KAPOSISARC) Y Var6 (REPORTDATE) 4/25/89 Var7 (OPPORTUNIS) N 1: Measurement & Sampling 9 U.S. Census Form 7/28/2016 1: Measurement & Sampling 10 Data Structure (Table) Variable Value ID AGE SEX HIV KAPOSISARC REPORTDATE OPPORTUNIS --- --- --- --- ---------- ---------- ---------- Observation 1 27 F Y Y 04/25/89 N 2 30 F Y N 09/11/89 Y 3 21 F Y Y 01/12/89 N 4 30 M Y Y 10/08/89 Y Observations → rows Variables → columns Values → cells 7/28/2016 1: Measurement & Sampling 11 Illustrative Example: Cigarette Consumption and Lung Cancer Variables: country = name of country/region cig1930 = per capita cigarette consumption, 1930 mortalit = lung cancer deaths per 100,000 in 1950 Note: Unit of observation in this data set are regions (not people) 7/28/2016 1: Measurement & Sampling 12 Data Quality An analysis is only as good as its data GIGO ≡ garbage in, garbage out Does a variable measure what it purports to? Validity = freedom from systematic error Objectivity = seeing things as they are without making it conform to your worldview Discussion on avoiding bias when questioning e.g., consider the word “jam” 7/28/2016 1: Measurement & Sampling 13 Ethos: Which do you choose? Frankfurt, H. G. (2005). On Bullshit. Princeton University Press Blackburn, S. (2005). Truth. Oxford Univ. Press The difference is intention and method: BS has a predetermined outcome. Truth is earnest in its intent and does not bend the facts to a predetermined outcome. 7/28/2016 1: Measurement & Sampling 14 Truth Versus Perception Plato’s Allegory of the Cave We observe shadows on the wall. The truth lies outside. I cannot give any scientist of any age any better advice than this: The intensity of the conviction that a hypothesis is true has no bearing on whether it is true or not. Peter Medawar 1915-1987 7/28/2016 1: Measurement & Sampling 15 Two Types of Statistical Studies Surveys –quantify population characteristics e.g., % of population that is overweight e.g., expected life span Comparative Studies – determine relationships between variables e.g., relationship between weight gain and heart disease risk e.g., relationship between alcohol consumption and esophageal cancer risk We start by considering survey sampling 7/28/2016 1: Measurement & Sampling 16 Sampling for a Survey We seldom (if never) study an entire population Take a subset (sample) of the population Use characteristics of the sample to infer population characteristics Select a probability sample chance determines which individuals are selected Avoid non-probability samples 7/28/2016 Discuss volunteer bias as an example 1: Measurement & Sampling 17 Simple Random Sample (SRS) SRS (definition) = every possible sample from the population has the same probability this is the most basic type of probability sample SRSs have sampling independence selection of one individual does not influence selection of any other SRSs can be done with replacement or without replacement (both methods are usually valid) Sampling fraction = n ÷ N = probability of selection where 7/28/2016 n sample size N population size 1: Measurement & Sampling 18 SRS Method Compile census listing (sampling frame) individuals numbered: 1, 2, . . ., N Generate n random numbers between 1 and N Can be done with random number generator (lab) or with table of random digits Select individuals based on random number list You will take a SRS in lab this week 7/28/2016 1: Measurement & Sampling 19