Section I. Statistics What do they mean and why are they important? What do stats mean? • To be an intelligent consumer of statistics, your first reflex must be to question the statistics that you encounter. The British Prime Minister Benjamin Disraeli famously said, "There are three kinds of lies -- lies, damned lies, and statistics." • It is important to think about the numbers, their sources, and most importantly, the procedures used to generate them. Top 10 ways you use statistics every day • • • • • • • • • • Weather forecasts Emergency preparedness Predicting disease Medical studies Genetics Political campaigns Insurance Consumer goods Quality testing Stock market But I’m never going to do research! • Six good reasons to study statistics – – – – – to be able to effectively conduct research, to be able to read and evaluate journal articles, to further develop critical thinking and analytic skills, to act as an informed consumer, and to know when you need to hire outside statistical help. – Even Florence Nightingale did it! Why nursing research • Increasing emphasis on evidence based practice – Informs nurses’ decisions and actions – Empowers nurses to make clinical decisions which benefit their patients, whether individual or community – Friendly nursing research environment required for Magnet status – Increases recognition for nursing contribution in health care and policy Variables • The characteristics we are measuring – Varies according to the population, patient, event, intervention • Data levels of measurement help us measure the variables – Nominal – Ordinal – Interval – Ratio Data levels of measurement: Nominal • sometimes called categorical or qualitative – Permissible statistics: mode, chi-squared – Lowest form of data, least sophisticated • Names • Characteristics/Descriptive (i.e. pain - throbbing, stabbing, dull) • Letters (i.e. M/F, Y/N) • Numbers may be assigned to designate categories but have no numerical meaning (i.e. M=1, F=2) Data Levels of measurement: Ordinal – Permissible statistics: median, percentile – Can’t be added • Rank order –1st, 2nd, 3rd • Rating –Pain rating 0-10 • Likert scale Likert scales • Dissatisfied, somewhat dissatisfied, neither satisfied nor dissatisfied, somewhat satisfied, very satisfied – No numerical data to quantify – Answers run on a continuum Data Levels of measurement: Interval • Permissible statistics: mean, SD, correlation, regression, ANOVA – Rank ordering of objects. – Equivalent distance between each measurement – The Fahrenheit scale is a clear example of the interval scale of measurement – Arbitrary zero does not represent the lowest value Data Levels of measurement : Ratio • Highest level of measurement • Permissible statistics: same as interval plus more • The ratio scale of measurement is similar to the interval scale in that it also represents quantity and has equality of units. • has an absolute zero (no numbers exist below zero). Very often, physical measures will represent ratio data (for example, height and weight). Example: measuring a length of a piece of wood in centimeters: you have quantity, equal units, and the measure can’t go below zero centimeters. Examples of data levels of measurement Subject Ratio level Interval level Ordinal level Nominal level Cookie 180 70 6 2 Bunny 110 0 1 1 Frosty 165 55 4 2 Tootsie 130 20 3 1 Candy 175 65 5 2 Fluffy 115 5 2 1 Question 1 • The colors of M&M candies would be which type of measurement? A. B. C. D. Interval Nominal Ordinal Ratio Question 2 • Height, weight, lab test results, and age are examples of which type of data measurement? A. Ratio B. Nominal C. Interval D. Ordinal Rankin Scale • The Rankin scale is used to assess functional status after stroke. Measurements are: • • • • • 0 = no symptoms at all 1 = symptoms with no significant disability 2 = slight disability; unable to carry out previous activities 3 = moderate disability; needs some assistance, can walk alone 4 = moderately severe disability; unable to walk or attend bodily functions without assistance • 5 = severe disability; bedridden, incontinent, needs constant nursing care • 6 = dead Question 3 • The Rankin scale is which type of measurement? A. Ratio B. Nominal C. Interval D. Ordinal Section II. Descriptive Statistics and Intro to the Normal Distribution Descriptive Statistics= Describing the Data • For any study, consider what parts would be useful to describe in numbers – Sample – Variables of interest • In any study where the data are numerical, data analysis should begin with descriptive statistics. • The appropriate choice of descriptive statistics depends on the level of data that was collected! Types of Summary Statistics • Frequency distributions – Ungrouped – Grouped – Percentages • Measures of central tendency • Measures of dispersion Ungrouped Frequency Distributions • The number of times something happened. • Used with categorical data (ordinal, nominal) • As simple as a tally or count http://www.gigawiz.com/histograms.html Example • Using ungrouped frequency distributions to describe research variables • How often newborns fit each demographic criteria or birth attendant reported a particular behavior (ex. using CHG vs. not) From Rhee et al. (2008). Maternal and birth attendant hand washing and neonatal mortality in Southern Nepal. Archives of Pediatrics and Adolescent Medicine, 162(7), 603-608 Grouped Frequency Distributions • The number of times something happened. • Used to break continuous data (often things like age, weight, income) into groups. – You will always loose some information by doing this – There are conventions for groupings • Groups ideally have equal ranges but may see open ended at ends of data spectrum • All data points must fit into a group • Not too many, not too few (you don’t want to loose patterns in the data) Percentage Distributions • What percentage of the time something happened. – Useful when comparing to studies with different numbers of participants – Often presented with other frequency distributions in the following format: No.(%) – Often graphically represented using pie charts, bar charts Example • Questionnaires given to parents of underimmunized children. • The tables indicate the number and percentage of participants selecting each response. Luthy, K., Beckstrand, R., & Peterson, N. (2009). Parental hesitation as a factor in delayed Childhood Immunization Question • Which measure of central tendency is being used here to summarize participant’s age: – A- Mode – B- Median – C- Mean – D- Standard deviation Measure of Central Tendency • Used to describe a “typical” result or the middle of the dataset • Most common measures: – Median – Mode – Mean Median • Literally the number in the middle of the dataset (odd # scores) – 50% of scores above and 50% of scores below this point (known as the 50th percentile) • Most appropriately used for ordinal data • Because focus is on middle score, the median is less affected by outliers Mode • The most common score(s) – May or may not be in the “middle” but is always a number in the dataset – Most appropriate for nominal data (ex. Most answers are “yes”). Mean • = Sum of Scores / Total # of Scores – Also known as an average • Data must be continuous to generate a mean (interval and ratio level data only!) • Most affected by outliers • May be denoted in a number of ways (M, X mean) Measures of Variance • How spread out is the data? Or how different are the scores from one another? – Range • Subtract the lowest number from the highest number in the set. Tells the total distance between ends of the data set. – Variance (interval or ratio levels only!) • Computed mathematically and provides data on dispersion or spread – Standard deviation (interval or ratio levels only!) • Relates dispersion of values to the mean • Is an average of variance • Usually reported as SD Normal Distribution • In a true normal distribution, the mean, median, and mode are equal • No real distribution exactly fits • However, in most sets of data, the distribution is similar to the normal curve Normal Distribution •Unique properties All possible values fall under the curve Probability of any score occurring is related to its location under the curve • Important SDs: 68.3% of all values within 1 SD from mean 95.5% within 2SD from mean 99.7% within 3 SD from mean +/- 1 SD +/- 2 SD Section III. Stat theory Hypotheses Type 1 and 2 Errors Level of Significance Power Probability Theory (p values) • Deductive • Used to explain: – Extent of a relationship – Probability of an event occurring – Probability that an event can be accurately predicted • Expressed as lowercase p with values expressed as percents Probability • If probability is 0.23, then p = 0.23. • There is a 23% probability that a particular event will occur. • Probability is usually expected to be p < 0.05. • Example? • Patients who cardiac arrest in the operating room have a 5% chance of death. Decision Theory • Inductive reasoning • Assumes all groups in a study are the same • Up to the researcher to provide evidence (NEVER use the words PROVE!) that there really is a difference • To test the assumption of no difference, a cutoff point is selected before analysis. Hypothesis • Statement of the expected outcome • Example? • Nursing students who study in the library have higher GPAs than nursing students who study in their dorm rooms/apartments. Characteristics of a Hypothesis • • • • • • Testable Logical Directly related to the research problem Theoretically or Factually based States relationship between variables Stated so that it can be accepted or rejected Research Hypothesis • Directional – explains and predicts the direction and existence of a specific relationship – relationship will be either positive or negative – more specific than the non-directional hypothesis – cause-and-effect hypothesis • Non - Directional Null hypothesis • Statistical statement that there is no difference between the groups under study Cutoff Point • level of significance or alpha (α) • Point at which the results of statistical analysis are judged to indicate a statistically significant difference between groups • For most nursing studies, level of significance is 0.05. Cutoff Point (cont d) Absolute NO “CLOSE ENOUGH” - If value is only a fraction above the cutoff point, groups are from the same population. Results that reveal a significant difference of 0.001 are not considered more significant than the cutoff point. Inference A conclusion/judgment based on evidence Judgments are made based on statistical results Statistical inferences must be made cautiously and with great care Generalization • A generalization is the application of information that has been acquired from a specific instance to a general situation. • Example? Normal Curve A theoretical frequency distribution of all possible values in a population . Levels of significance and probability are based on the logic of the normal curve. Normal Curve One-Tailed Test (cont d) Two-Tailed Test Type I and Type II Errors Type I error occurs when the researcher rejects the null hypothesis when it is true. The results indicate that there is a significant difference, when in reality there is not. Type II error occurs when the researcher regards the null hypothesis as true but it is false. The results indicate there is no significant difference, when in reality there is a difference. Reasons for Errors • Type I – Greater @.05 level than .01 • Type II – Greater @.01 level than .05 – Flaws in research methods • Multiple variables interact • Precision of instruments • Small samples Statistical Power (AKA Power Analysis) • DEF: the probability of rejecting the null hypothesis when it should have been rejected OR • Probability that a statistical test will detect a significant difference that exists Power • Maneuver to increase control over: – Types of errors – CORRECT DECISIONS Power and Risk for Type II Error Power analysis = 0.80 minimum Influenced by sample size As sample increases so does power Influenced by effect size – degree to which a phenomenon is present in a population The larger the true difference between the two groups the greater the power Question #1 The level of significance usually set in nursing studies is at either: a. .5 or .1 b. .05 or .01 c. .005 or .001 Question #2 Which of the following is TRUE about the level of significance? a. ensures that findings will be correct 95% of the time if an alpha value was less than .05 was used b. refers to a statistic calculated during computer analysis c. represents the risk the researcher is willing to take in making a type I error and is established before data is analyzed Question #3 There is a greater risk of a Type I error with a 0.05 level of significance than with a 0.01 level of significance. A. True B. False Section IV. •Statistical Significance •Clinical Significance •Reliability •Validity •Generalizability & Inference Statistical Significance • Known as the Alpha () • The threshold at which statistical significance is reached. Cut Off Point • Referred to as level of significance or alpha (α) • Point at which the results of statistical analysis are judged to indicate a statistically significant difference between groups • For many nursing studies, level of significance is 0.05. • Typically written as α = 0.05 Cutoff Point (cont’d) • The cutoff point is absolute. • If the value obtained is only a fraction above the cutoff point no meaning can be attributed to the differences between the groups. Levels of Acceptable Significance • • • • 0.05 0.01 0.005 0.001 Clinical Significance • Findings can have statistical significance but not clinical significance. • Related to practical importance of the findings • No common agreement in nursing about how to judge clinical significance – Difference sufficiently important to warrant changing the patient’s care? Clinical Significance (cont’d) • Who should judge clinical significance? – Patients and their families? – Clinician/researcher? – Society at large? • Clinical significance is ultimately a value judgment. Simpson & James (2005) Effects of Immediate Vs. Delayed Pushing During Second-Stage Labor…. Significance differences between groups: Fetal oxygen desaturation during second stage labor (immediate: M=12.5; delayed: M=4.6), p = .001 Variable decelerations in fetal heart rate (immediate: M=22.4; delayed: M=15.6), p = .02 There were no differences in length of labor, method of birth, Apgar scores, or umbilical cord gases. Question: A statistically significant finding means that: a. Findings are clinically important and valuable. b. Interventions should be used in clinical practice. c. Obtained results are not likely to have been due to chance. d. Results will be the same if the study is repeated with another sample. Question: A researcher reports that the results of a study were not statistically significant. How is this to be interpreted? a. Intervention was not strong enough to make a difference. b. Researcher does not have enough evidence to reject Ho. c. Researcher’s logic or conceptualization in setting up the study was faulty. d. Topic is of no further interest to nurse researchers or clinicians. Testing Reliability of Measurement • Examine reliability of study scales before using them. • The degree of consistency with which an instrument measures a construct. Reliability Coefficient • A quantitative index • Usually ranges from .00 to 1.00 • Provides an estimate of how reliable an instrument is • Should be at least 0.70 • Most common one is Cronbach’s alpha Hollen et al. (1994) Measurement of QOL in patients with.…Psychometric assessment of the LCSS. LCSS has good reliability • Internal consistency of = 0.82 • High reproducibility/stability (test-retest reliability (n=52, r>0.75) • High repeated inter-rater agreement /equivalence among experts (95-100% agreement) Validity 1. The degree to which inferences made in a study are accurate = Internal Validity 2. The degree to which results can be generalized = External Validity 3. The degree to which an instrument measures what it is intended to measure = Validity Hollen et al. (1994) Measurement of QOL in patients with.…Psychometric assessment of the LCSS. Validity has been established for the LCSS • • • • Content validity ~ expert panel Convergence validity ~ similar QOL tool Construct validity ~ unrelated tools Criterion-related validity ~ correlation with a “gold” standard (e.g. Sickness Illness Profile) Inference •A conclusion or judgment based on evidence •Judgments are made based on statistical results •Statistical inferences must be made cautiously and with great care Generalization • A generalization is the application of information that has been acquired from a specific instance to a general situation. • Generalizing requires making an inference. • Both inference and generalization require the use of inductive reasoning. Generalization (cont’d) • An inference is made from a specific case and extended to a general truth, from a part to a whole, from the known to the unknown. • In research, an inference is made from the study findings to a more general population. Simpson & James (2005) Effects of Immediate Vs. Delayed Pushing During Second-Stage Labor…. “Results from this study suggest that delayed second-stage pushing until the urge to push and pushing with the open-glottis technique in nulliparous women with epidural anesthesia is more favorable for physiologic fetal well-being as measured by FSpO (p. 155).” “The benefits of less fetal oxygen desaturation ….appear to outweigh any disadvantages of a longer second stage (p. 155).” 2 Question: Which of the following questions relates to generalization? a. Are the findings generally significant to people in the study? b. Can these findings be applied to other groups or settings? c. Does the degree of control in the study allow for statistical significance? d. How many alternative explanations can be proposed? Section V. Common Statistical Tests • Independent T-Test • One-Way ANOVA • Chi-Square • Correlation • Regression Independent T-Test • To compare means between two groups • The continuous variable is measured once. For example: Research Question Is there a difference in self-efficacy for pain management in week 10 between participants with Fibromyalgia (FM) in guided imagery group and those in standard care group? Hypotheses Ho: µGI - µSC = 0 α = 0.05 Ha: µGI - µSC ≠ 0 Independent T-Test (Cont’d) Tests of assumptions with the sample • Independent groups (no overlap). • Dependent variable is continuous (interval or ratio level). • Normal distribution. • Homogeneity of Variance is met. Group Statistics Group Self efficacy Guided for pain management Imagery (GI) in week 10 Standard Care (SC N Mean Std. Deviation Std. Error Mean 24 64.5833 22.69249 4.63209 24 49.8333 20.30992 4.14574 Independent T-Test (Cont’d) Ho: µGI - µSC = 0 Ha: µGI - µSC ≠ 0 α = 0.05 p = 0.011 = 1.1% t = 2.373 Conclusion: There is a difference in selfefficacy in week 10 between participants with Fibromyalgia (FM) in guided imagery group and those in standard care group. In our sample, in week 10, participants in guided imagery group had greater self-efficacy than those in standard care group. One-Way Analysis of Variance (ANOVA) • Tests for differences between means. • More flexible than other analyses in that it can examine data from two or more groups. For example: Research Question Is there a difference in depression scores depending on types of elderly housing and care (independent living, assisted living, and nursing care)? Hypotheses Ho = µIL = µAL = µNC Ha = At least 2 groups differ α = 0.05 ANOVA (cont’d) Tests of assumptions — Independent groups — Normal distribution Variables Depression scores, Mean (SD) - Continuous dependent variable - Homogeneity of Variance is met Independent Living (n=16) 12.25 (7.594) Assisted Living (n=19) 12.84 (7.274) Nursing p Care (n=17) 16.44 0.234 (8.043) (> 0.05) If significant, Post Hoc tests are used to determine the location of differences. Conclusion: There is no difference in depression scores depending on types of elderly housing and care (independent living, assisted living, and nursing care). Chi-Square Test of Independence • Used with nominal or ordinal data • Hypothesis: – Ho: There is no difference in Y depending on X – Ha: There is a difference in Y depending on X • Assumptions: – Frequency data – Adequate n: > 5 expected per cell and can be violated up to 20% of cells. Example of Chi-Square Test Research Question Is there a difference in depression at week 12 depending on the helplessness category - low or high? Hypotheses • Ho: There is no difference in depression at week 12 depending on the helplessness category - low or high. • Ha: There is a difference in depression at week 12 depending on the helplessness category - low or high Crosstabulation AHI Depression (cat.) at week 12 Not Depressed Count Expected Count % within AHI Depressed Count Expected Count % within AHI Total Count Expected Count % within AHI 2 = 5.99, df = 1, p = 0.07 or 7% Low 26 22.3 High 14 17.7 Total 40 40.0 89.7% 60.9% 76.9% 3 6.7 9 5.3 12 12.0 10.3% 39.1% 23.1% 29 29.0 23 23.0 52 52.0 100.0% 100.0% 100.0% -Arthritis Helplessness Index (AHI) Conclusion: There is a difference in depression at week 12 depending on the helplessness category - low or high. Those people in the high helplessness group had higher level of depression compared to those in the low helplessness group. Pearson Product-Moment Correlation • Tests for the presence of a relationship between two variables – Called bivariate correlation • Types of correlation are available for all levels of data. Best results are obtained using interval data. • Results – Nature of the relationship (positive or negative) – Magnitude of the relationship (–1 to +1) – Strength of r: High= > 0.70; Moderate= 0.30-0.69; Low= < 0.30 – Testing the significance of a correlation coefficient – The R2 is the variation between two variables expressed as a percentage. Scatterplots and Correlation Coefficients Maximum positive correlation (r = 1.0) Maximum negative correlation (r = -1.0) Strong correlation & outlier (r = 0.71) Correlation Results QUESTION Which one is significant if level of significance used in this test is 0.01? A. r = 0.56 (p = 0.03) B. r = –0.13 (p = 0.2) C. r = 0.65 (p = 0.002) D. r = 0.33 (p = 0.04) Regression Analysis • Used when one wishes to predict the value of one variable based on the value of one or more other variables • For example: – one might wish to predict the possibility of passing the credentialing exam based on grade point average (GPA) from a graduate program. – Or to predict the length of stay in a neonatal unit based on the combined effect of multiple variables such as gestational age, birth weight, number of complications, and sucking strength. Regression Analysis (cont’d) • Assumptions: – – – – Must have Independent Variable & Dependent Variable Both variables must be continuous Normally distributed data Linear relationship (scatter plot) • The outcome of analysis is the regression coefficient R. • When R is squared, it indicates the amount of variance in the data that is explained by the equation. • The R2 is also called the coefficient of multiple determination. Regression Results • R2 = 0.63 • This result indicates that 63% of the variance in length of stay can be predicted by the combined effect of age, weight, complications, and sucking strength. Overlay of Scatterplot and Best-Fit Line Conclusion • Statistical tests selection depends on the research question. • Some research questions can be answered by using basic statistical tests; while others require advanced statistical tests.