Statistics March 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility Nemours Biomedical Research Overview • Class goals – Master basic statistical concepts – Learn analytic techniques & when to apply them – Learn how to interpret analysis results – Develop familiarity with R and related tools – Gain understanding that will transfer to a broad range of other statistics tool Nemours Biomedical Research Overview • Class structure – 8 sessions – 1.5 hours per session – Several homework assignments • Class website – http://www.medsci.udel.edu/open/StatClass/March2009 Nemours Biomedical Research R • Installing – Download from Class website • Pick right (Mac versus Windows) version – Run installer program • Go with all the defaults • Running R – Live Demonstration Nemours Biomedical Research R Concepts • Command line similar to – Windows Command Shell – Mac Terminal – Unix/Linux Shell • Uses ‘>’ as prompt • Accepts – – – – Constants (e.g., 2 3 156 -99.0, etc.) Variables (e.g., Height, Weight, SubjID,…) Operations (e.g., ‘+’ ‘-’ ‘*’ ‘/’ ‘^’ ‘>’ ‘<‘ ‘==‘ ‘<-’) Functions (e.g., sum(c(1,2,3)), mean(c(1,2,3))…) Nemours Biomedical Research R Concepts • Variable types – – – – Scalar (a = 128) Vector (a = c(3, 2, 9, 5)) Matrix (dim(a) = c(2,2)) Data Frame - collection of vectors. • • • • Similar to spread sheet ‘rows’ are indexed ‘cols’ named and indexed E.g., df$a is the column of data frame df named ‘a’ and df$a[5] is the 5th element of ‘a’. Nemours Biomedical Research Statistics • Science of data collection, summarization, analysis and interpretation. • Descriptive versus Inferential Statistics: – Descriptive Statistic: Data description (summarization) such as center, variability and shape for quantitative variable (e.g. age) and number (frequency) and percentage for categorical variable (e.g. gender, race etc). – Inferential Statistic : Drawing conclusion beyond the sample studied, allowing for prediction. Nemours Biomedical Research Statistical Description of Data • Statistics describes a numeric set of data by its • Center (mean, median, mode etc) • Variability (standard deviation, range etc) • Shape (skewness, kurtosis etc) • Statistics describes a categorical set of data by • Frequency, percentage or proportion of each category Nemours Biomedical Research Statistical Inference sample population •Statistical inference is the process by which we acquire information about populations from samples. •Two types of estimates for making inferences: –Point estimation. –Interval estimate. Nemours Biomedical Research Population and Sample • Population: The entire collection of individuals or measurements about which information is desired. • Sample: A subset of the population selected for study. – Primary objective is to create a subset of population whose center, spread and shape are as close as that of population. Nemours Biomedical Research Parameter v.s. Statistic • Parameter: – Any statistical characteristic of a population. – Population mean, population median, population standard deviation are examples of parameters. – Parameter describes the distribution of a population – Parameters are fixed and usually unknown Nemours Biomedical Research Parameter v.s. Statistic • Statistic: – Any statistical characteristic of a sample. – Sample mean, sample median, sample standard deviation are some examples of statistics. – Statistic describes the distribution of population – Value of a statistic is known and is varies for different samples – Are used for making inference on parameter Nemours Biomedical Research Parameter v.s. Statistic • Statistical Issue – Estimate a population parameter using a sample statistic. – E.g., the sample mean is an estimate of the population mean. Nemours Biomedical Research