1. DATA ATTRIBUTES ; SUMMARY

advertisement
1. DATA ATTRIBUTES ;
SUMMARY
1.1
1.2
1.3
1.4
1.5
Introduction to biostatistics
The Mean
Measures of Variability
The Normal Distribution
Distribution; Data components
1.1 Introduction to biostatistics
• Statistics is the science of data, involves
-collecting, classifying, summarizing,
organizing, analyzing, and interpreting
numerical information
-Biostatistic deals with biological data
definition
• Inferential statistics utilizes sample data
to make estimate, decision, predictions,
or other generalizations about a larger
set of data
definition
• A population is a set of units (usually of
people, animals, objects, transactions,
or events) in a study or a survey
definition
• A sample is a subset of the units of a
population
definition
• A variable is a characteristic or property
of an individual population unit
definition
• A statistical inference is an estimate or
prediction or some other generalization
about a population based on information
contained in a sample
definition
• A measure of reliability is a statement
(usually quantified) about the degree of
uncertainty associated with a statistical
inference
definition
• Quantitative data are measurements
that are recorded on a naturally
occurring numerical scale
definition
• Qualitative data are measurements that
cannot be measured on a natural
numerical scale; they can only be
classified into one of a group of
categories
Data sources
•
•
•
•
•
Publication
Experiment
Survey
Observations
Tacit knowledge
definition
• A representative sample exhibits
characteristics typical of those
possessed by the target population
definition
• A random sample is one obtained
through a sampling procedure which
ensures that every subset of fixed size
in the population has the same chance
of being included in the sample
definition
• Statistical thinking involves applying
rational thought to assess data and the
inferences made from them critically
- involves need to measure, analyze,
evaluate, and infer-from, data sets
intelligently
Abuse of statistics
• UNTRUE; 150, 000 women a year die
from anorexia
• TRUTH; 150, 000 women a year die
from problems that were likely caused
by anorexia
Abuse of statistics
• UNTRUE; Only 29% of school girls are
happy with themselves
• TRUTH; Of 3,000 school girls, 29%
responded “Always true” to the
statement “I am happy the way I am”.
Most answered “Sort of true” and
“Sometimes true”
opportunities
•
•
•
•
•
•
•
Research environment
Teaching profession
Consultancy expertise
Advisory role
Management system
Decision support system
Information/Knowledge Specialist
1.2 The Mean
Population means are often denoted by 
Sum of values for each member of population
Population Mean 
number of population members
The equivalent mathematical statement is
X

N
1.3 Measures of Variability
Sum of (values associated with member of population
Population Variance 
- mean of population ) 2
number of population members
The equivalent mathematical statement is
2

(
X


)
2 
N
Population Standard Deviation
 Population variance
Sum of (values associated with member of population
- mean of population ) 2
number of population members

Or mathematically
  
2
( X   ) 2
N
1.4 The Normal Distribution
Its any given value of X is
2

1
1 X   
exp  
 
 2
 2    
• When the population values are not distributed
symmetrically about the mean, reporting the mean
and standard deviation can give the reader an
inaccurate impression of the distribution of the values
in the population.
Figure 1a.
Panel A shows the true distribution of the height of the 100
Jovians (note that it is skewed toward taller heights).
Figure 1b.
Panel B shows normally distributed population with
100 members and the same mean and standard
deviation as in panel A (Fig 1a)
1.5 Distribution; Data components
Qualitative & Quantitative Data
• A dependent variable assumes value
from one or more independent variables
• An independent variable contributes
value to the dependent variable
- indep var is frequently controlled by
the investigator
• A class is a category of data
classification
• Class frequency is no. observations in a
class
• Class relative frequency is class
frequency divided by total no. of
observations in the data set
• Dependent variable – hierarchy of
components
yi = Fi + residual
or
yi = mx + c
or
yi = bx + residual
1.5.1 Quantitative data
• Attributes of class, class frequency, and
class relative frequency also apply to
quantitative data
• Both data types can be used to
• - DESCRIBE sets of data
• - PREDICT values of other
measurements
1.5.2 Forecasting techniques
• Qualitative techniques
- human judgment & rating system
- turn qualitative info. into
quantitative estimates
• Quantitative techniques
- statistical (stochastic, probabilistic)
deterministic (causal)
1.5.3 MODELS – LINEAR & NONLINEAR
• Straight lines
Y = ß0 + ß1Xi + εi , where εi is the residual
• Parabola or quadratic
Y = ß0 + ß1Xi + ß2Xi2 + εi
• Cubic
Y = ß0 + ß1Xi + ß2Xi2 + ß3Xi3 + εi
• Quartic
Y = ß0 + ß1Xi + ß2Xi2 + ß3Xi3 + ß4Xi4 + εi
• Nth-Degree
Y = ß0 + ß1Xi + ß2Xi2 + … + εi
• Exponential
Y = abX + εi or
Log Y = log a + (log b)X + εi = a0 + a1X + εi
• Geometric
Y = aXb + εi or
Log Y = log a + b (log X) + εi = a0
Download