Matthew Perri, Bs. Pharm., Ph.D., R.Ph. Professor of Pharmacy Director, Pharmacy Care Administration Graduate Program January 2014 British Prime Minister Benjamin Disraeli Popularized by Mark Twain PHRM 4700 Basic Statistical Concepts H.G. Wells PHRM 4700 Basic Statistical Concepts H.G. Wells PHRM 4700 Basic Statistical Concepts • Understanding statistics will enable you to draw your own conclusions and make decisions: Will you recommend this drug to patients or physicians? Is the drug likely to work for your patients? Is it better or safer than existing therapies? Should the drug be listed on the formulary or PDL? Should there be dispensing limits, refill limits, prior authorization, limiting prescribing authority? PHRM 4700 Basic Statistical Concepts You are a recent Pharm. D. graduate from the University of GA. Two of your professors (Drs. May and Perri) have served over the last two decades on the GA Department of Community Health Drug Utilization Review Board (DURB). The DURB is the governing body of physicians, pharmacists and others that study and select appropriate drug therapy for the lives covered by all GA state funded health plans (e.g., Medicaid, State Merritt, Board of Regents, Peach Care). Upon departure from the Board, the Commissioner sought input from Drs. May and Perri about who might be good to replace them on this body of decision makers. Dr. Perri made the recommendation to include you on the list of possible candidates and you were eventually selected by the commissioner. The DURB meets quarterly and prior to each meeting a binder is sent to all members reviewing the disease states and recent literature about the drugs to be reviewed at the next meeting. PHRM 4700 Basic Statistical Concepts Statistics let you make general conclusions from limited data. Statistics is not intuitive. (Not easy to understand or use.) Statistical conclusions are always presented in terms of probability. All statistical tests are based on assumptions. Decisions about how to analyze data should be made in advance. A confidence interval quantifies precision and is easy to interpret. A P Value tests a null hypothesis and is hard to understand at first. Statistically significant does not mean the “effect” or “phenomenon” is large or scientifically – clinically important. Not statistically different” does not mean the effect or phenomenon is absent, small or scientifically – clinically irrelevant. Multiple comparisons make it hard to interpret statistical results – which is why we have statistics to help fix that. (ANOVA, range tests) Correlation does not mean causation. Published statistics tend to be optimistic. PHRM 4700 Basic Statistical Concepts What we hope to do here is to teach you the basics needed to navigate evaluation of research. Things you might need to know in a Spanish speaking country” • ¿Dónde está el cuarto de baño por favor? • Déme una cerveza por favor. • ¿Dónde está la biblioteca? In statistics: • Were the data normally distributed? • What was the mean? Standard deviation? PHRM 4700 Basic Statistical Concepts GENERAL QUESTION: HOW DO PEOPLE LET YOU KNOW THEY ARE AT YOUR DOOR AND WANT TO COME IN? ANSWERS They ring the doorbell. They knock. They stand outside, studying kinetics, until you open the door for your own reasons. PHRM 4700 Basic Statistical Concepts POSSIBLE RESEARCH QUESTIONS DATA SOURCES? • How do people knock on someone’s door? • How many times do they knock? • Do people speak when they knock? Search literature and review/compile the results of previous studies on this subject Survey people and ask them how they knock Observe people as they knock and record data PHRM 4700 Basic Statistical Concepts Questions/Propositions • People generally approach a residence and knock when they wish to enter. • Describe how people knock when at someone’s door. Method: • Review available data • Design survey, experiment, interviews or some combination. Database: • Sample: http://www.youtube.com/watch?v=tKV4XYD3xK4 PHRM 4700 Basic Statistical Concepts • Descriptive Statistics Number of events observed (also known as “n” or sample size) was 35. Sheldon knocked between 0 and 30,000 (self-reported) times when approaching Penny’s door. He used 1, 2, 6 and 30,000 knocks each one time. (The “1” was the robot) He knocked for Leonard, then Penny, 5 times, with one instance where he knocked for Penny first. Penny knocked one time on Sheldon’s door, in this case she knocked three times. In one instance, he knocked, then approached an interior door where he knocked a second time. • Parametric Statistics The average number of knocks was 860.06 (mean) The most common number of knocks was 3 (mode) The median number of knocks was 3 (1, 2, 3, 6, 30000) The standard deviation of the mean number of knocks was 4997.46 PHRM 4700 Basic Statistical Concepts • Without any other information, which of the following can we infer: In this sample, three knocks were used to alert the resident that someone was at the door. People in general knock three times. Knocking three times is always effective in getting someone to answer the door. Tony Orlando and Dawn ( http://www.youtube.com/watch?v=k7Jvsbcxunc ) were wrong in the 70’s when they concluded that: You should knock three times on the ceiling… You should knock twice on the pipe if the answer is no… In our data, knocks were always associated with the calling out of a name and this process was repeated. If someone is at your door and they knock three times, followed by your name three times, and this is repeated three times, it is likely to be Sheldon. Sheldon has issues. PHRM 4700 Basic Statistical Concepts People in general knock 3 times. • How would our results have changed if we had seen only a subset of the data? (Smaller sample size…) For example what if we missed the “flash” – how would the results have changed? The average number of knocks was 3 (mean) The most common number of knocks was 3 (mode) The median number of knocks was 3 (1, 2, 3, 6, 30000) The standard deviation of the mean number of knocks was 0.641689 PHRM 4700 Basic Statistical Concepts Good research always poses new questions. Additional research questions for this example: • Is there a time when two knocks are sufficient? • Are mechanical/technological means of knocking just as effective as in person knocking? • How hard would it be to find a new apartment? PHRM 4700 Basic Statistical Concepts Statistics • Techniques and procedures regarding the collection, organization, analysis, interpretation and presentation of information that can be stated numerically (Kuzma and Bohnenblust, 2004) Biostatistics • Application of statistics to the biomedical sciences PHRM 4700 Basic Statistical Concepts Descriptive Statistics • Sometimes, formal statistical analyses are not needed or desired, depending on the research questions. Descriptive stats tell us something about a phenomenon or population: Number of drug overdose fatalities in 2013 Pharmacy student acceptance rate at UGA College of Pharmacy Demographics of a study population (2) Numbers of patients experiencing an adverse reaction to a medication. Consumer awareness of advertising. PHRM 4700 Basic Statistical Concepts Inferential Statistics • Observed information is incomplete and uncertain, so we can’t know for sure – instead we infer. Drawing conclusions based on observed information. Generalizing from the specifics (as is done in most clinical research). • Example: Once-daily aminoglycoside (ODA) regimens have been studied. When done in one location, e.g., Athens Regional, what, if anything can or should we infer, or generalize to other patient groups? What about a different dose? Would these results still apply? PHRM 4700 Basic Statistical Concepts Variables vs. Data Survey vs. Experiment Population vs. Samples Response Rate Sampling Techniques PHRM 4700 Basic Statistical Concepts • When making a gentamycin dosing recommendation, you need to understand the patient’s characteristics, such as age, weight and height. In statistics, patient characteristics are referred to as variables (e.g., Systolic Blood Pressure) because the observed values change. The actual values of the characteristics (variables) recorded are referred to as data (e.g., 115 mmHg) PHRM 4700 Basic Statistical Concepts Surveys • Observations of events or phenomena over which few, if any, controls are imposed; i.e., teaching evaluations • Teaching evaluations, political opinion polls, satisfaction studies are all examples of survey research. Experiments • Design a research plan that manipulates, for example, dosage, e.g., 50mg drug A v. 100mg or placebo • Studying the effects on health outcomes before and after limiting formulary access to antipsychotic agents in GA Medicaid. • Studying two doses of a new drug for toxicity. PHRM 4700 Basic Statistical Concepts Both survey and experiments are important research designs FDA requires all drugs submitted for approval to be evaluated by experimental research to substantiate their safety and efficacy However, survey design is often used in postmarketing surveillance for monitoring safety PHRM 4700 Basic Statistical Concepts A population is a set of persons (or objects) having a common observable characteristic A sample is a subset of a population • The goal is for this subset to be as representative of the population as possible. Example: • The US population was 317,330,434 as of 8:30AM January 8, 2014.1 • The CBS News Poll surveyed a sample of 808 adults to assess preferences for presidential candidates. (1) http://www.census.gov/main/www/popclock.html PHRM 4700 Basic Statistical Concepts If you wanted to study all insulin-dependent diabetics, is there any way you could create a list of all insulin dependent diabetics from which to draw a sample? You can create / collect a random sample of patients who generally represent the population in question: then draw inferences from this group and generalize our results to all insulin-dependent diabetics based on how well your sample mimics the entire population. (Note: what assumption does this require you to make?) PHRM 4700 Basic Statistical Concepts PHRM 4700 Basic Statistical Concepts 2nd Year Rx Students are a sample (but probably not random – which we will talk about in a minute) of many populations, such as all pharmacy students at UGA, all pharmacy students in the US, students at UGA, etc., or even a sample of the US population. However, they are also the total population of 2nd year pharmacy students at UGA COP. Answering questions about a sample requires you to know the perspective you are taking. PHRM 4700 Basic Statistical Concepts Sampling nomenclature is important to understanding research design and to evaluating studies. The goal in evaluation of sampling methods is to make sure the right population was sampled for the study – and the sample was created properly. We don’t want to accidentally observe the “Sheldons” of the world. PHRM 4700 Basic Statistical Concepts Sampling frame: • a complete, non-overlapping list of the persons or objects in the population. e.g., Want to draw a sample of GA pharmacists we could use the database of all registered GA pharmacists as a sampling frame Hard to develop a sampling frame for studying patients with asthma, or any condition for that matter. This makes finding a representative sample very important. Random sampling is the primary method of obtaining a sample that is representative of a larger population and an issue which can have a huge impact on study results. PHRM 4700 Basic Statistical Concepts Random Sample Sample units are chosen in an unpredictable way i.e., using a random number table, putting all the names in a hat Types Simple random sample: all members have equal chance of selection. Cluster: units are selected in groups such as geographic area (Northeast, Southeast, Central ,West) then a random sample is created in each area. Stratified: choosing sub-groups or “strata” (e.g., race, gender, age group, education) within a population and sampling from within these groups. PHRM 4700 Basic Statistical Concepts Same as putting ALL the names of a population in a hat, mix them up, and select however many names you want. • Note, it must be all the names and each has the same chance of being selected. • Advantages Avoids known and unknown biases on average Helps convince others that the study was conducted properly It is the basis for statistical theory that underlies hypothesis testing and confidence intervals PHRM 4700 Basic Statistical Concepts You may see other techniques used in biomedical research: • Convenience Sample e.g., intercepting patients after having a prescription filled at a local community pharmacy or shopping mall. • Systematic sampling e.g., take a phone book and pick a random place to start, then take every 9th name in the book. • Stratified sampling • Cluster sampling • Others…e.g., snowball sampling (which is kind of cool) PHRM 4700 Basic Statistical Concepts Often used when it is virtually impossible to select a random sample • Underlying assumption is that the sample will accurately represent the population Example: Estimate the average PCAT scores for pharmacy students in the US, would you: Use UGA Class of 2014 pharmacy students as a study sample and survey some number of students? While we might do this we have to ask, how representative would this actually be? Use multiple pharmacy schools? In a clinical trial, we might recruit patients from multiple doctors’ offices to get a better picture. PHRM 4700 Basic Statistical Concepts Grouping members of the population into homogenous groups. Strata should be mutually exclusive, subjects can be in only one strata, no group should be excluded. Then, use random or systematic sampling to id subjects in each strata. Can be proportional or not. Proportional: If the population consists of 60% in the male stratum and 40% in the female stratum, then the relative size of the two samples (three males, two females) should reflect this proportion. Sometimes this is used in medical research, e.g., where you want to study patients with certain characteristics: obesity, gender, pregnancy, past history of disease, etc. PHRM 4700 Basic Statistical Concepts Why is random sampling less prone to bias than convenience sampling? • Think about how we selected our “convenience” sample of events from YouTube. Does using a random sample guarantee a representative sample? PHRM 4700 Basic Statistical Concepts Similar meanings clinically and statistically. Clinically it is how many patients responded in a certain manner. Consider a random sample of college students in the US. You sent out a questionnaire to these students to assess how frequently college students skip classes. The response rate is how many (usually %) students completed and returned the questionnaire. Is a 50% response rate good enough? Generally, the higher the response rate, the more representative the sample, but extremely high response rates may not always be required. Is there any potential for bias in a study like this? PHRM 4700 Basic Statistical Concepts Sampling bias exists if the sample of data you received are not representative of the population, e.g., studied only a certain age group when all age groups were of concern. In our previous example, bias may occur students who returned the questionnaire are somehow inherently different from those who did not. • e.g., one could infer that more diligent students are more likely to respond than less studious ones…. PHRM 4700 Basic Statistical Concepts Clinical trials often employ a non-random sample – they do however use random assignment of patients to groups (arms) within the study. PHRM 4700 Basic Statistical Concepts Assess “how” subjects were identified and used in research. • Researchers often have to make hard choices in their investigations regarding • • • • how to “find” subjects for research. Sampling procedures must be appropriate for the study population. Studies are rarely perfect and most have their own biases: random sampling/assignment can help. We seldom get definitive answers, so we make inferences from the data and analyses we do have. Learning statistics will allow you to understand the assumptions researchers make so that you can make your best professional judgment. Thought question: Is a sample of healthy volunteers ever a good “sample” to study a drug? PHRM 4700 Basic Statistical Concepts Descriptive statistics are used to describe the main features of a collection of data in quantitative terms. Descriptive statistics are distinguished from inferential stats (we talked about these last time) in that descriptive statistics quantitatively summarize a data set, rather than being used to support inferential statements about the population in question. Even when a data analysis draws its main conclusions using statistical analysis, descriptive statistics are generally presented along with more formal analyses, to give the audience an overall sense of the data being analyzed. Pharmacy Manpower Trends: http://www.pharmacymanpower.com/trends.jsp Research Article: Gabapentin for RLS HCV Treatment Study Recall that data = observations which are the values of the variables you record. 4 Basic Levels of Measurement Scales: Nominal, Ordinal, Interval and Ratio Qualitative scales: (Nominal and Ordinal) Nominal scale Eye color: Blue, green, or brown No rank or order to the categories Presence or absence of a disease Gender • Ordinal scale All the characteristics of a nominal scale, plus there is a ranking among the categories: e.g., Mild, Moderate, Severe; First place, Second place, Third place Strongly Agree - - - - Strongly Disagree Wong-Baker Faces Scale Quantitative scales • Interval scale Designates an equal-interval ordering No true zero point The distance between 1 and 2 is the same as the distance between 49 and 50 Fahrenheit temperature scale: 0 degrees F does not mean no temperature 60 degrees F is not twice as warm as 30 degrees • Ratio scale All the above plus, a true zero point Wealth: $0 means no money $100 is twice as much as $50 Defining levels of measurement facilitates the choice of appropriate statistical techniques for data analysis Nominal Ratio Increasing ability to use higher level statistical analyses Non-parametric testing is generally performed with nominal and ordinal level data Parametric testing with interval and ratio www.statsoft.com/textbook/stnonpar.htm Interval and Ratio data can further be classified as: • Discrete data Data are in whole numbers and measured by nominal or ordinal scales: Number of children, number of times you been married, date of birth, etc. • Continuous data Data may (but are not required) take on fractional values Temperature (37.5 degrees), age, Body Mass Index (BMI) The type of data you have dictates the statistics you will use. • Generally, nominal & ordinal use non-parametric and interval and ratio levels use parametric stats. Incorporating the Web into your communication mix yields strategic benefits Rx info search after DTC 21% Toll-free number 7% Print 7% MD/RPh 79% 77% Internet 0% Have sought Rx info 42% 20% 40% 60% 80% 100% Have not sought Rx info n=482 Searching the web for more information will encourage consumers to talk to their MDs about advertised Rxs DTC encourages consumers to look for more information by going to the Web. From recent research on DTC ads by Menon, Desphande and Perri Normal (symmetrical) Distribution (bell shaped) Nonsymmetrical Distribution Bimodal Distribution Descriptive Statistics • For normally distributed data, measured on interval and ratio level scales, the appropriate measure of central tendency is the mean. • The median is most appropriate for data measured on ordinal scales (but can still be used for continuous data) • Mode is the appropriate measure of central tendency for nominal data. Mean is calculated by summing all the observations and dividing the sum by the number of observations Median is the observation that divides the distribution of data into equal parts Mode is the observation that occurs most frequently Data: Monthly income of 10 college students: $300, $375, $485, $500, $600, $625, $1000, $2000, $3000, $3500 Mean ( 300 + 375 + 485 + 500 + 600 + 625 + 1000 + 2000 + 3000 + 3500) / 10 = $1238.5 Median average of $600 and 625 = $612.5 (half the data above, half below.) Mode: there is no mode • Range – Largest value – smallest value – Sometimes see quartiles (75th vs. 25th quartiles, with the median at the 50th quartile) • Mean Deviation (Standard Deviation) – Sum of the deviations of each variable from the “mean” observation divided by sample size; it’s the average deviation of all observations from the mean • Variance – Is computed by squaring each deviation from the mean, adding them up and dividing their sum by one less than “n” • Note: The closer the data are around the mean, the smaller the standard deviation. Coefficient of variation: • Not as common as mean, s.d., variance, or range. • Expressed as a percentage, with higher percentages indicating greater variation: • Calculated by taking the s.d. and dividing by the mean, X100. Useful in comparing the amount of variability between data. e.g., not much point in comparing the standard deviation of HbA1c values with the standard deviation of blood glucose values because they are measured on different scales. You could compare coefficient of variation (percentage) to see which has the greater variability. Example of Range: LIPITOR Benefit #1: Lower Cholesterol Along with diet and exercise, LIPITOR is proven to help you: Lower your LDL ("bad" cholesterol) by 39% to 60%. (The average effect depends on dose) Lower your triglycerides (a type of fat found in your blood) by 19% to 37%. (The average effect depends on dose) Raise your HDL ("good" cholesterol) by up to 9%. (The average effect depends on dose) http://www.lipitor.com/learn-about-lipitor/lipitor-benefits.jsp?setShowOn=../learn-aboutlipitor/home.jsp&setShowHighlightOn=../learn-about-lipitor/lipitorbenefits.jsp&source=google&HBX_PK=c_lipitor&HBX_OU=50&o=23127370|166376222|0 accessed 1/8/08 The type of data dictates the measure of central tendency that most accurately represents the data. Sometimes data are best described by summarizing in a descriptive fashion. Otherwise, data are described by a measure of central tendency and a measure of variation; mean and standard deviation. Sometimes a combination of both are used. More information about your sample is better when it comes to informing those who may want to draw conclusions from your work.