1: Measurement and Sampling What is biostatistics? What is measurement?

advertisement
1: Measurement and
Sampling
What is biostatistics?
What is measurement?
How do we sample populations?
7/28/2016
1: Measurement & Sampling
1
HS 167 Logistics
Syllabus: materials (text, lab workbook, calculator)
Calendar and assignments are on www.sjsu.edu/biostat → click HS167
(become familiar with Web site)




Exam1 = 10/9, Exam2 = 11/13, Final = Thur 12/13 2:45
Lab 0 and Lab 1 (Tu and We lab may have additional time to complete Lab 1)
Text (reading): pp. 1 –10, 15 – 19 (note vocabulary on p. 11)
Exercises: 1.1 – 1.6, 1.8, 1.9, 2.1 – 2.3, 2.11 – 2.13 [due at beginning of next lecture]
Yahoo group: send email to hs167-F07-subscribe@yahoogroups.com
Academic integrity (do your own work)


Odd-numbered exercises and lab work → OK to get help from friends
Even numbered exercises & exams → do NOT get help from friends
How to get a good grade:





7/28/2016
Attend all classes and labs (attendance required)
Stay on task
Read text (listed to Nancy)
Do Lab & HWs diligently
Do not cut corners
1: Measurement & Sampling
2
Biostatistics
is not merely a compilation of computational
techniques
is a way of learning from data
is concerned with all many elements of study
design and analysis (not just computations)
requires more judgment than math (pay
attention to vocabulary)
is statistics applied to biological and health
problems
7/28/2016
1: Measurement & Sampling
3
Biostatistics involves
A data detective element


Uncovering patterns and clues
This is a combination of exploratory data analysis
(EDA) and descriptive statistics
A data judge element


7/28/2016
Confirmation of clues
This often requires inferential methods
1: Measurement & Sampling
4
Measurement
P Measurement ≡ “assigning of numbers and
codes according to prior-set rules”
P Three types of statistical measurements:
P Categorical ≡ classify observations into named
(nominal) categories
P e.g., HIV classified as “positive” or “negative”
P Ordinal ≡ ranked categories
P e.g., OPINION ranked 5 = strongly agree, 4 = agree, 3 =
neutral, and so on
P Quantitative ≡ numbers with equal spacing
P e.g., AGE in years
P e.g., BLOOD_PRESSURE in mm Hg
7/28/2016
1: Measurement & Sampling
5
Illustrative Example:
Weight Change and Heart Disease
Source: Willett et al., 1995
Goal: to determine the effect of weight change on
coronary heart disease risk
115,818 women 30- to 55-years of age, free of CHD
Body mass index (BMI, kg/m2) determined at entry to
study
Body weight determined as of age 18
Subjects followed for 14 years
Number of CHD onsets (fatal and nonfatal) counted
(1292 cases)
7/28/2016
1: Measurement & Sampling
6
Illustrative Example (cont.)
Variables
Categorical
Ordinal
Quantitative
7/28/2016
Smoker or nonsmoker
Family history of heart disease (yes or
no)
Non-smoker, light-smoker, moderate
smoker, heavy smoker
BMI (kgs/m3)
Age (years)
Weight presently
Weight at age 18
1: Measurement & Sampling
7
Variable, Value, Observation
P Observation  the unit upon which
measurements are made
P Can be an individual (e.g., a person)
P Can be an aggregate of individuals (e.g., a region)
P Variable  the generic thing we measure
P e.g., AGE of a person
P e.g., HIV status of a person
P Value  a realized measurement
P e.g.,“27”
P e.g.,“positive”
7/28/2016
1: Measurement & Sampling
8
Data Structure (Forms)
Data Collection Form
Var1 (ID)
1
Var2 (AGE)
27
Var3 (SEX)
Var4 (HIV)
7/28/2016
Observation 1
Observation 2
Observation 3
Observation 4
F
Y
Var5 (KAPOSISARC)
Y
Var6 (REPORTDATE)
4/25/89
Var7 (OPPORTUNIS)
N
1: Measurement & Sampling
9
U.S. Census Form
7/28/2016
1: Measurement & Sampling
10
Data Structure (Table)
Variable
Value
ID AGE SEX HIV KAPOSISARC REPORTDATE OPPORTUNIS
--- --- --- --- ---------- ---------- ----------
Observation
1
27 F
Y
Y
04/25/89
N
2
30 F
Y
N
09/11/89
Y
3
21 F
Y
Y
01/12/89
N
4
30 M
Y
Y
10/08/89
Y
Observations → rows
Variables → columns
Values → cells
7/28/2016
1: Measurement & Sampling
11
Illustrative Example: Cigarette
Consumption and Lung Cancer
Variables:
country = name of
country/region
cig1930 = per capita
cigarette consumption, 1930
mortalit = lung cancer
deaths per 100,000 in 1950
Note: Unit of observation in this data set are
regions (not people)
7/28/2016
1: Measurement & Sampling
12
Data Quality
An analysis is only as good as its data
GIGO ≡ garbage in, garbage out
Does a variable measure what it purports to?


Validity = freedom from systematic error
Objectivity = seeing things as they are without
making it conform to your worldview
Discussion on avoiding bias when questioning
e.g., consider the word “jam”
7/28/2016
1: Measurement & Sampling
13
Ethos: Which do you choose?
Frankfurt, H. G. (2005).
On Bullshit. Princeton
University Press
Blackburn, S. (2005).
Truth. Oxford Univ. Press
The difference is intention and method: BS has a predetermined
outcome. Truth is earnest in its intent and does not bend the facts
to a predetermined outcome.
7/28/2016
1: Measurement & Sampling
14
Truth Versus Perception
Plato’s Allegory of the
Cave We observe shadows on
the wall. The truth lies outside.
I cannot give any scientist of
any age any better advice than
this: The intensity of the
conviction that a hypothesis is
true has no bearing on whether
it is true or not.
Peter Medawar 1915-1987
7/28/2016
1: Measurement & Sampling
15
Two Types of Statistical Studies
Surveys –quantify population
characteristics


e.g., % of population that is overweight
e.g., expected life span
Comparative Studies – determine
relationships between variables


e.g., relationship between weight gain and heart
disease risk
e.g., relationship between alcohol consumption
and esophageal cancer risk
We start by considering survey sampling
7/28/2016
1: Measurement & Sampling
16
Sampling for a Survey
We seldom (if never) study an entire population
Take a subset (sample) of the population
Use characteristics of the sample to infer population
characteristics
Select a probability sample

chance determines which individuals are selected
Avoid non-probability samples

7/28/2016
Discuss volunteer bias as an example
1: Measurement & Sampling
17
Simple Random Sample (SRS)
SRS (definition) = every possible sample from the
population has the same probability
 this is the most basic type of probability sample
SRSs have sampling independence

selection of one individual does not influence selection of
any other
SRSs can be done with replacement or without
replacement (both methods are usually valid)
Sampling fraction = n ÷ N = probability of selection
where


7/28/2016
n  sample size
N  population size
1: Measurement & Sampling
18
SRS Method
Compile census listing (sampling frame)

individuals numbered: 1, 2, . . ., N
Generate n random numbers between 1 and
N

Can be done with random number generator (lab)
or with table of random digits
Select individuals based on random number
list
You will take a SRS in lab this week
7/28/2016
1: Measurement & Sampling
19
Download