N318b Winter 2002 Nursing Statistics Lecture 4 Normal distribution, Z-scores, Central Limit Theorem, Probability Today’s Class Normal distribution Z-scores Central limit theorem << 10 min break >> Probability Applying knowledge to assigned readings (Wolfe et al., 1996) No work group today ! School of Nursing Institute for Work & Health Nur 318b 2002 Lecture 4: page 2 A Quick Review from Last Week Data presentation Bar graphs, pie charts Histograms, polygons (lines) Box plots Measures of asymmetry Skew Kurtosis School of Nursing Institute for Work & Health Nur 318b 2002 Lecture 4: page 3 Normal Distribution “what is all the fuss about?!” Statistics is a branch of applied math Most statistical tests are based on a set of basic assumptions about data Most assumptions refer to distribution If assumptions not true tests not valid ! Review: How do you check normality of data? School of Nursing Institute for Work & Health Nur 318b 2002 Lecture 4: page 4 The (Standard) Normal Curve - a hypothetical distribution that forms basis of statistical theory (also called Gaussian curve) School of (See Figure 3.1 in textbook, page 64) Nursing Institute for Work & Health Nur 318b 2002 Lecture 4: page 5 Why use normal curve? Many variables are normally distributed Many tests require normal distribution Allows for tests of inference since study results can be compared against it (i.e. it is a probability or “chance” distribution) “Understanding the normal curve prepares you for understanding the concept of hypothesis testing” (Textbook page 64) School of Nursing Institute for Work & Health Nur 318b 2002 Lecture 4: page 6 Where did the normal curve come from? There is an elegant mathematical formula (theory) underlying the distribution (you don’t need to know it !) Discovered in 1700’s by Demoivre, then later Gauss (1800’s) and then used by Galton (medicine) Another example of mathematical theory helping to explain observed phenomena School of Nursing Institute for Work & Health Nur 318b 2002 Lecture 4: page 7 What is the normal curve used for? Test if your observed value (e.g. BP) is different from expected value (i.e. can use standardized or Z-scores to check this) Estimate precision of observed study mean (i.e. confidence intervals) Tests based on probability (likelihood) that observed results “fit” normal curve School of Nursing Institute for Work & Health Nur 318b 2002 Lecture 4: page 8 What are the properties of the normal curve? X-axis measured in SD’s (from mean) Y-axis is frequency (units or counts) Mean, median, mode all same Symmetrical (“bell-shaped”) around mean +/- 1 SD includes 68% of population +/- 2 SD’s includes 96% of population “tails” hold very small % of population (REMEMBER: total area under curve = 100% or 1.0) School of Nursing Institute for Work & Health Nur 318b 2002 Lecture 4: page 9 Standard normal curve -1 SD Mean -2 SD +1 SD +/-1 SD either side of mean includes about 68% of sample +2 SD +/- 2 SD includes 96% of sample School of Nursing Institute for Work & Health Nur 318b 2002 Lecture 4: page 10 Z-scores If a variable is normally distributed then observed (mean) values can be converted to a z-score WHY? Test if your study mean (e.g. BP) is different from expected value Z-score just another name for SD “distance” from the population mean School of Nursing Institute for Work & Health Nur 318b 2002 Lecture 4: page 11 Z-scores – an example HOW? X- Z = -----SD = sample mean SD = sample SD A population has a mean sys BP of 110 mmHG and SD of 15 mmHG What proportion (%) of people have BP between 95 and 120? School of Nursing Institute for Work & Health Nur 318b 2002 Lecture 4: page 12 Z-scores – an example X- Z = -----SD = sample mean SD = sample SD X- 95-110 Z1 = ------ = -------- 15 = -1.0 X- Z2 = ----- = 0.67 120-110 = --------15 School of Nursing Institute for Work & Health Nur 318b 2002 Lecture 4: page 13 Z-scores – an example Now need to extract % values from the Z-scores using a table (e.g. Appendix A, pg. 417-8 of textbook) -’ve values are % areas to left of mean, +’ve values are to the right of mean ( ) From Table in Appendix A Z1 = -1.0 = 34.13% (between 95 to 110) Z2 = 0.67 = 24.86% (between 110 to 125) Total area = 34.13 + 24.86 = 58.99% School of Nursing Institute for Work & Health Nur 318b 2002 Lecture 4: page 14 Z-scores – example 2 What proportion (%) of people have a systolic BP above 140? X- 140-110 Z = ------ = --------SD 15 = 2.0 From Table in Appendix A Z = 2.0 = 47.72% between 110 to 140 But this represents what? > 140 = 50 – 47.72 = 2.28% School of Nursing Institute for Work & Health Nur 318b 2002 Lecture 4: page 15 Central Limit Theorem What is it? When large enough (e.g. n>= 25) samples are drawn from a population with a known variance, the sample mean will be normally distributed i.e. if you plot ’s you get a bell-curve Theorem holds even if underlying distribution moderately non-normal (e.g. a bit skewed) School of Nursing Institute for Work & Health Nur 318b 2002 Lecture 4: page 16 Central Limit Theorem – What is its importance? Now have ability to statistically test the likelihood of observed (sample) mean Variation (“dispersion”) about true mean is called “Standard error” (SE) of mean SE (of mean) and SD (of sample) are directly related mathematically SE = SD / square root of n (where n = sample size) School of Nursing Institute for Work & Health Nur 318b 2002 Lecture 4: page 17 Z-scores – for means How likely is it (i.e. what %) that a sample of size n=100 will have mean systolic BP > 113 (assuming = 110 and = 15)? 113 - 110 - Want Z-scores Z = ----- / n = --------15 / 10 = 2.0 >= about 2 ! From Z-score Table in Appendix A Z = 2.0 = 47.72% of area to right of But once again this represents what? Sample means between 110 - 113 mmHg > 113 = 50 – 47.72 = 2.28% School of Nursing Institute for Work & Health Nur 318b 2002 Lecture 4: page 18 Effect of sample size on mean? What happens if sample size drops to 10 (i.e. n=10, > 113 and = 110, = 15)? 113 - 110 - Z = ----- / n = ----------15 / 3.16 = 0.63 From Table in Appendix A Z = 0.63 = 23.57% But once again this represents what? - sample means that fall below 113 mmHg For > 113 = 50 – 23.57 = 21.43% School of Nursing Institute for Work & Health Nur 318b 2002 Lecture 4: page 19 10 minute break ! School of Nursing Institute for Work & Health Nur 318b 2002 Lecture 4: page 20 Probability Think of it as a statistical measure of chance A proportion (e.g. %) that lets you make intelligent guesses about future events Often expressed as a “p-value” p-value “rules” in (quantitative) research P(event) = number of events ------------------------number of subjects (Often expressed as % when multiplied by 100) School of Nursing Institute for Work & Health Nur 318b 2002 Lecture 4: page 21 Probability – cont’d You read a well done clinical trial that followed 1000 women with breast CA, 200 of whom died from BC at 5 yrs You then see a women with BC on the ward and she asks you if she is going to live – what do you tell her? She has a 20% probability or a 1 in 5 chance of dying from BC within 5 yrs School of Nursing Institute for Work & Health Nur 318b 2002 Lecture 4: page 22 Probability – cont’d What if she then tells you she is node negative and the tumour was small? Then she tells you her mother and sister both died from BC by age 45 Probability is a way of quantifying risk or likelihood of events occurring (usually according to a set of criteria) School of Nursing Institute for Work & Health Nur 318b 2002 Lecture 4: page 23 Probability – Facts Probabilities always between 0 and 1 (0 = min value = no chance) (1 = max value = definite event) P-value = “probability due to chance” arbitrarily “set” at p<=0.05 in most cases, but it can vary from 0.2 to <0.01 P-value refers to the “tails” of the normal curve distribution (lower = better!) School of Nursing Institute for Work & Health Nur 318b 2002 Lecture 4: page 24 Probability – Rules Conditional Probabilities probability of event A given event B Multiplication Rule (Independence !) probability of A and B = P(A) x P(B) Addition Rule (Mutually exclusive !) probability of A and B = P(A) + P(B) School of Nursing Institute for Work & Health Nur 318b 2002 Lecture 4: page 25 Part 2: Application to the Assigned Reading School of Nursing Institute for Work & Health Nur 318b 2002 Lecture 4: page 26 Wolfe et al. (1996) Quick summary of the paper: an etiologic study aimed at exploring possible causal pathways between back pain and osteoarthritis of the knee a 3-year consecutive series of 368 knee OA patients via a rheumatology clinic X-sectional questionnaire assessment of key study variables (possible bias?) School of Nursing Institute for Work & Health Nur 318b 2002 Lecture 4: page 27 Wolfe et al. (1996) Typical example of a sophisticated multistage exploratory analysis Descriptive analysis Exploratory univariate analysis Causal pathway multivariate analysis School of Nursing Institute for Work & Health Nur 318b 2002 Lecture 4: page 28 Some questions … What does Figure 1 tell us? Why did they group BMI in quartiles? School of Nursing Institute for Work & Health Nur 318b 2002 Lecture 4: page 29 Some questions … Do you understand the major features of the data in Table 1? What do all the columns mean? e.g. “unadjusted” vs. “adjusted” Odds ratios and confidence intervals studied later (CI’s in next lecture !) School of Nursing Institute for Work & Health Nur 318b 2002 Lecture 4: page 30 Next Week - Lecture 5: Inference testing, Type I and Type II errors, p-values, and Confidence Intervals For next week’s class please review: 1. Page 14 in syllabus 2. Textbook Chapter 3, pages 80-91 3. Syllabus papers: i) Birenbaum et al. (1996) ii) Gulick (1995) School of Nursing Institute for Work & Health Nur 318b 2002 Lecture 4: page 31 Research Practicum Can those who signed up please stay for a few extra minutes to decide placements? Did those who signed-up last term and did NOT get placed want to be put back in the “pool” to be placed? School of Nursing Institute for Work & Health Nur 318b 2002 Lecture 4: page 32