Statistics

advertisement
Statistics
March 2009
Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D.
Nemours Bioinformatics Core Facility
Nemours Biomedical Research
Overview
• Class goals
– Master basic statistical concepts
– Learn analytic techniques & when to apply
them
– Learn how to interpret analysis results
– Develop familiarity with R and related tools
– Gain understanding that will transfer to a
broad range of other statistics tool
Nemours Biomedical Research
Overview
• Class structure
– 8 sessions
– 1.5 hours per session
– Several homework assignments
• Class website
– http://www.medsci.udel.edu/open/StatClass/March2009
Nemours Biomedical Research
R
• Installing
– Download from Class website
• Pick right (Mac versus Windows) version
– Run installer program
• Go with all the defaults
• Running R
– Live Demonstration
Nemours Biomedical Research
R Concepts
• Command line similar to
– Windows Command Shell
– Mac Terminal
– Unix/Linux Shell
• Uses ‘>’ as prompt
• Accepts
–
–
–
–
Constants (e.g., 2 3 156 -99.0, etc.)
Variables (e.g., Height, Weight, SubjID,…)
Operations (e.g., ‘+’ ‘-’ ‘*’ ‘/’ ‘^’ ‘>’ ‘<‘ ‘==‘ ‘<-’)
Functions (e.g., sum(c(1,2,3)), mean(c(1,2,3))…)
Nemours Biomedical Research
R Concepts
• Variable types
–
–
–
–
Scalar (a = 128)
Vector (a = c(3, 2, 9, 5))
Matrix (dim(a) = c(2,2))
Data Frame - collection of vectors.
•
•
•
•
Similar to spread sheet
‘rows’ are indexed
‘cols’ named and indexed
E.g., df$a is the column of data frame df named ‘a’ and
df$a[5] is the 5th element of ‘a’.
Nemours Biomedical Research
Statistics
• Science of data collection, summarization, analysis
and interpretation.
• Descriptive versus Inferential Statistics:
– Descriptive Statistic: Data description
(summarization) such as center, variability and
shape for quantitative variable (e.g. age) and
number (frequency) and percentage for
categorical variable (e.g. gender, race etc).
– Inferential Statistic : Drawing conclusion beyond
the sample studied, allowing for prediction.
Nemours Biomedical Research
Statistical Description of Data
• Statistics describes a numeric set of data by its
• Center (mean, median, mode etc)
• Variability (standard deviation, range etc)
• Shape (skewness, kurtosis etc)
• Statistics describes a categorical set of data by
• Frequency, percentage or proportion of each
category
Nemours Biomedical Research
Statistical Inference
sample
population
•Statistical inference is the process by which we acquire
information about populations from samples.
•Two types of estimates for making inferences:
–Point estimation.
–Interval estimate.
Nemours Biomedical Research
Population and Sample
• Population: The entire collection of individuals or
measurements about which information is desired.
• Sample: A subset of the population selected for study.
– Primary objective is to create a subset of population
whose center, spread and shape are as close as that of
population.
Nemours Biomedical Research
Parameter v.s. Statistic
• Parameter:
– Any statistical characteristic of a population.
– Population mean, population median, population standard
deviation are examples of parameters.
– Parameter describes the distribution of a population
– Parameters are fixed and usually unknown
Nemours Biomedical Research
Parameter v.s. Statistic
• Statistic:
– Any statistical characteristic of a sample.
– Sample mean, sample median, sample standard deviation
are some examples of statistics.
– Statistic describes the distribution of population
– Value of a statistic is known and is varies for different
samples
– Are used for making inference on parameter
Nemours Biomedical Research
Parameter v.s. Statistic
• Statistical Issue
– Estimate a population parameter using a
sample statistic.
– E.g., the sample mean is an estimate of
the population mean.
Nemours Biomedical Research
Download