DESCRIPTIVE STATISTICS

advertisement
DESCRIPTIVE STATISTICS
The purpose of statistics is
to condense raw data to
make it easier to answer
specific questions; test
hypotheses.
DESCRIPTIVE VS.
INFERENTIAL STATISTICS
• Descriptive
– To organize, summarize &
describe the data
• Inferential
– To determine reliability of the data
RELATIONSHIPS – SCALES OF
MEASURMENT
• Nominal Scale
– Only use those statistical procedures
that rely on counting -- the number (N) in
the sample.
• Ordinal Scale
– Same as nominal scale
– Can use statistics that indicate points
below which certain percentages of the
cases fall.
RELATIONSHIPS – SCALES OF
MEASURMENT
• Interval Scale
– Any of the above plus procedures
that include adding.
• Ratio Scale
– Any statistical procedure is
acceptable.
MEASUREMENT SUMMARY
Measurement
Characteristics
Scoring
Types
Examples
Nominal
Lowest level -- used
to classify variables
into two or more
categories.
Cases placed in the
same category must
be equivalent.
The categories must
be exhaustive -- all
persons or items must
fit into one of the
categories.
Must also be mutually
exclusive -- one
person or item can't
fit more than one
category.
Counting
“N” in sample
N of
sample
Labels or #’s
Mode
Football player
jerseys – 48
not better
than 36
No relation
between #’s
Range
Race
Gender
MEASUREMENT SUMMARY
Measurement
Characteristics
Scoring
Types
Examples
Ordinal
Numbers only used
to indicate the
rank order of cases
of a variable.
Cannot measure or
evaluate the
difference in value
between each
case.
No mathematical
or statistical
operations (you
can't add label 1 to
label 2, etc.).
Points below
which certain
% falls.
Frequency
distribution
Hardness of
metal
Median
Personnel
evaluations of
performance
Size of
distance
between
intervals
unknown.
Order of
objects with
respect to an
attribute.
Quartile
deviation
Spearman
rho
coefficient of
correlation
MEASUREMENT SUMMARY
Measurement
Characteristics
Scoring
Types
Examples
Interval
Has all of the above
characteristics.
= intervals
w/ arbitrary
origin
Mean
Temperature
difference
Added requirement
of equal distances
or intervals
between labels -represent equal
distances in the
variables of your
study.
No true zero
Adding
Standard
deviation
Variance
Pearson
product
moment
coefficient of
correlation
Footcandle
levels in
lighting
IQ’s
MEASUREMENT SUMMARY
Measurement
Characteristics
Scoring
Types
Examples
Ratio
Has all of above
features plus an
absolute zero
point.
Equal
intervals
All types
Income
ranges.
Enables you to
multiple and
divide scale
numbers to create
ratios between
labels.
Multiply
Divide
Number of
years of
school.
Age in years.
Yardstick or
architect’s
scale.
FREQUENCY DISTRIBUTIONS
• The arrangement
of the scores
from lowest to
highest.
• Implies a general
shape to the
data because of
the shape of the
distribution.
FREQUENCY DISTRIBUTIONS
• The easiest way for you to do
summary statistics is with a
dedicated statistical package.
• With small data sets, you can do
most data manipulation for
summary statistics with a
spreadsheet.
HISTOGRAMS & POLYGONS:
GENERAL RULES
• On horizontal
axis, lay out
lowest scores to
highest -- left to
right.
• Lay out
frequencies on
vertical axis -from 0 up to
highest
frequency.
HISTOGRAMS & POLYGONS:
GENERAL RULES
• Place a point at center of
score/frequency intersection.
• Construct either a histogram or
polygon.
HISTOGRAMS & POLYGONS:
GENERAL RULES
• Histogram or
polygon.
MEASURES OF CENTRAL
TENDANCY
• Used to summarize data
through a single number that
can represent the whole set of
scores.
• Types: mode, median, mode,
mean
MEASURES OF CENTRAL
TENDANCY
• Mode
– The value or number that occurs
most frequently in the distribution.
Two modes are bi-modal; three or
more are tri-model or multi-modal.
– Very stable and there can be more
than one mode.
– Only appropriate measure for nominal
scales.
MEASURES OF CENTRAL
TENDANCY
• Median
– The point in the distribution below which
50% of the scores lie.
– Scores must be placed in rank order
from lowest to highest first.
– The median can fall between the upper
limit and lower limit of a score.
– Can fall on the border line between
scores.
MEASURES OF CENTRAL
TENDANCY
• Median (continued)
– The median is an ordinal statistic
because it is based on rank.
– Can be used on interval and ratio
data but the interval characteristic
of the data is not used.
– Only time the median is really
useful is when there are extreme
scores in the distribution.
MEASURES OF CENTRAL
TENDANCY
• Mean
– The arithmetic average -sum of all the scores
divided by the N.
– Most stable measure of
central tendency and is
more precise than the
median or mode.
– Can be used with interval
and ratio scales.
MEASURES OF CENTRAL
TENDANCY
• Mean (continued)
– Can calculate the Mean for a
distribution of scores or for a
frequency distribution.
– Best indicator of combined
performance whereas the median
is the best indicator of typical
performance.
DISTRIBUTION SHAPES SYMMETRICAL
• The mean and
median are the
same.
• If a single mode,
it falls at the
same location
as the mean and
median.
DISTRIBUTION SHAPES SKEWED
• When distributions
are skewed the
values of central
tendency differ.
• Determine
skewness by
comparing the
mean & median
without drawing a
histogram or
polygon.
DISTRIBUTION SHAPES POSITIVE SKEW
• The mean is
always greater
than the median
& the median is
usually greater
than the mode.
• Skew is to the
left.
DISTRIBUTION SHAPES NEGATIVE SKEW
• The mean is
always smaller
than the median
& the median is
usually smaller
than the mode.
• Skew is to the
right.
DISTRIBUTION SHAPES NORMAL CURVE
• A symmetrical curve
with the same number
of scores above &
below the mean.
• Same as symmetrical.
• Most scores are
concentrated around
the mean.
• Approximately 68% of
the cases are within
+/- 1 SD unit from the
mean.
VARIABILITY MEASURES
• Range
– Difference between the highest
and lowest scores.
– Determine by subtraction.
– Is an unreliable index of variability
because it is derived from only two
scores.
VARIABILITY MEASURES
• Quartile deviation
– Half the difference between the upper
and lower quartiles in a distribution.
– The 75th percentile & the 25th
percentile.
– Provides a measure of one-half of the
range of scores within which lie the
middle 50% of the scores.
– It is an ordinal scale statistic and is
used with the median (which means that
it is not often used unless there are
extreme scores).
VARIABILITY MEASURES
• Variance
– Based on the mean.
– Considers the size and location of
individual scores.
– Variance & standard deviation are based
on the deviation score which is the
difference between a raw score & the
mean.
– The sum of the deviation scores of a
distribution are always zero because the
scores above the mean are always
positive while the scores below the
mean are always negative.
VARIABILITY MEASURES
• Standard Deviation
– SD is the square root of variance
– Is used to summarize data in the
same units as the original data.
– Most commonly used statistic for
variability.
– It is the square root of the mean of
the squared deviation scores.
STANDARD SCORES
• z-scores
– The distance of a score from the mean
in standard deviation units.
– Scores with the same numerical value
as the mean will have a z-score of zero.
– Used to compare one set of scores to
another -- example two exams and S's
performance on the exams.
– Use of z-scores requires use of negative
values and fractions. Overcome by
using Z-scores.
STANDARD SCORES
• Z-scores
– Obtained by multiplying the z-score by
10 and adding 50 to the result.
– Used to compare scores in different
distributions.
– Allows descriptions in whole numbers.
– A type of standard score.
– Does not alter the shape of the original
distribution.
CORRELATION
• Used to describe the
relationship between pairs of
scores.
• Shows the extent to which a
change in one variable is
associated with change in
another variable.
CORRELATION
• Scattergrams
– Used to show correlation.
– One variable on each axis (horizontal
and vertical).
– Plot scattergrams to see both direction
& strength of a relationship.
– Direction shows positive or negative
relationship.
– Scores for independent variable on
horizontal axis & dependent variable on
vertical axis.
CORRELATION
• Lower left to
upper right
– Positive
relationship
– Low scores on
one variable
associated with
low scores on
other
– High on one high
on other.
CORRELATION
• Upper left to
lower right
– Negative
relationship.
– High on one, low
on the other
variable.
CORRELATION
• Narrow dot band
– High strength.
– Straight line
shows strong
relationship
between
variables.
CORRELATION
• Scattered dot
band
– Low strength.
– Relatively weak
relationship
between
variables.
CORRELATION
• Prediction of one variable from
another can occur with strong
relationships
• Positive and negative equally
important.
• The higher the correlation between
variables in either a positive or
negative direction, the more
accurate the prediction.
CORRELATION
COEFFICIENTS
• Range from -1.00 to +1.00.
• -1.00 = perfect negative
relationship.
• +1.00 = perfect positive
relationship.
• 0.00 (midpoint) = no relationship
at all.
CORRELATION
COEFFICIENTS
• Correlation coefficients near unity
indicate high degree of relationship.
• Make accurate prediction about one
variable from info about another
variable.
• Desirable to have +/- 0.90 and above.
• Again, negative & positive both
equally good for prediction.
PEARSONS R
•
•
•
•
(PRODUCT MOMENT
CORRELATION)
Used with either interval or
ratio scales.
Defined as the mean of z-score
products of two variables.
Most common method for
correlation.
Same statistical family as
mean.
PEARSONS R
(PRODUCT MOMENT
CORRELATION)
• Assumes a linear relationship
between the two variables.
(Straight line fit between scores
of the two variables).
• If curvilinear, must use other
methods.
SPEARMAN RHO
• Used with rank order data;
ordinal scales.
• Part of the same statistical
family as median.
• Ranges from -1.00 to +1.00
(same as Pearsons R).
SOURCES OF INFO
• See your bibliography for the
class!
Download