1. DATA ATTRIBUTES ; SUMMARY 1.1 1.2 1.3 1.4 1.5 Introduction to biostatistics The Mean Measures of Variability The Normal Distribution Distribution; Data components 1.1 Introduction to biostatistics • Statistics is the science of data, involves -collecting, classifying, summarizing, organizing, analyzing, and interpreting numerical information -Biostatistic deals with biological data definition • Inferential statistics utilizes sample data to make estimate, decision, predictions, or other generalizations about a larger set of data definition • A population is a set of units (usually of people, animals, objects, transactions, or events) in a study or a survey definition • A sample is a subset of the units of a population definition • A variable is a characteristic or property of an individual population unit definition • A statistical inference is an estimate or prediction or some other generalization about a population based on information contained in a sample definition • A measure of reliability is a statement (usually quantified) about the degree of uncertainty associated with a statistical inference definition • Quantitative data are measurements that are recorded on a naturally occurring numerical scale definition • Qualitative data are measurements that cannot be measured on a natural numerical scale; they can only be classified into one of a group of categories Data sources • • • • • Publication Experiment Survey Observations Tacit knowledge definition • A representative sample exhibits characteristics typical of those possessed by the target population definition • A random sample is one obtained through a sampling procedure which ensures that every subset of fixed size in the population has the same chance of being included in the sample definition • Statistical thinking involves applying rational thought to assess data and the inferences made from them critically - involves need to measure, analyze, evaluate, and infer-from, data sets intelligently Abuse of statistics • UNTRUE; 150, 000 women a year die from anorexia • TRUTH; 150, 000 women a year die from problems that were likely caused by anorexia Abuse of statistics • UNTRUE; Only 29% of school girls are happy with themselves • TRUTH; Of 3,000 school girls, 29% responded “Always true” to the statement “I am happy the way I am”. Most answered “Sort of true” and “Sometimes true” opportunities • • • • • • • Research environment Teaching profession Consultancy expertise Advisory role Management system Decision support system Information/Knowledge Specialist 1.2 The Mean Population means are often denoted by Sum of values for each member of population Population Mean number of population members The equivalent mathematical statement is X N 1.3 Measures of Variability Sum of (values associated with member of population Population Variance - mean of population ) 2 number of population members The equivalent mathematical statement is 2 ( X ) 2 N Population Standard Deviation Population variance Sum of (values associated with member of population - mean of population ) 2 number of population members Or mathematically 2 ( X ) 2 N 1.4 The Normal Distribution Its any given value of X is 2 1 1 X exp 2 2 • When the population values are not distributed symmetrically about the mean, reporting the mean and standard deviation can give the reader an inaccurate impression of the distribution of the values in the population. Figure 1a. Panel A shows the true distribution of the height of the 100 Jovians (note that it is skewed toward taller heights). Figure 1b. Panel B shows normally distributed population with 100 members and the same mean and standard deviation as in panel A (Fig 1a) 1.5 Distribution; Data components Qualitative & Quantitative Data • A dependent variable assumes value from one or more independent variables • An independent variable contributes value to the dependent variable - indep var is frequently controlled by the investigator • A class is a category of data classification • Class frequency is no. observations in a class • Class relative frequency is class frequency divided by total no. of observations in the data set • Dependent variable – hierarchy of components yi = Fi + residual or yi = mx + c or yi = bx + residual 1.5.1 Quantitative data • Attributes of class, class frequency, and class relative frequency also apply to quantitative data • Both data types can be used to • - DESCRIBE sets of data • - PREDICT values of other measurements 1.5.2 Forecasting techniques • Qualitative techniques - human judgment & rating system - turn qualitative info. into quantitative estimates • Quantitative techniques - statistical (stochastic, probabilistic) deterministic (causal) 1.5.3 MODELS – LINEAR & NONLINEAR • Straight lines Y = ß0 + ß1Xi + εi , where εi is the residual • Parabola or quadratic Y = ß0 + ß1Xi + ß2Xi2 + εi • Cubic Y = ß0 + ß1Xi + ß2Xi2 + ß3Xi3 + εi • Quartic Y = ß0 + ß1Xi + ß2Xi2 + ß3Xi3 + ß4Xi4 + εi • Nth-Degree Y = ß0 + ß1Xi + ß2Xi2 + … + εi • Exponential Y = abX + εi or Log Y = log a + (log b)X + εi = a0 + a1X + εi • Geometric Y = aXb + εi or Log Y = log a + b (log X) + εi = a0