Statistics midterm

advertisement
1
Day One
Term/concept
Description
Statistics o Techniques used to summarize data to answer
questions.
o Techniques were developed because humans
are limited info processors.
3 Purposes of Statistics
3 Stats skills learned in this o Identify when to use a statistic
class o Conduct basic statistical analysis
o Properly interpret data.
Type of question
MC
MC
MC
Day Two
Term
Definition (MC)
Variable Characteristics and condition of
interest that varies within your
sample.
- Change/have different
values depending on
person.
- Measured by 1+
questions on survey or
interview.
Values Possible categories that the
variable can take on.
- Possible survey
answers.
Identification (SA)
1. Explanatory (independent)
variable:
 Causes/influences or b4
the other.
2. Outcome (dependent)
variable:
 Caused, influenced, or
followed by
independent.
1. Look at the choices from
which the respondent must
answer.
2. Could be in numbers,
words, etc.
Population The larger group of people you 1. What overall group are you
want your study to generalize
looking at?
to.
2. What group are you trying
- Group you want to draw
to draw conclusions about?
conclusions about.
Example
Research showed that
women are more likely to
use Facebook for # of
reasons. Identify
explanatory/outcome:
 Explanatory =
gender’s impact on
reasons people use
Facebook.
 Outcome =
reasons women
use Facebook.
Survey asking likelihood
of 1st years returning to
DPU in the fall. Choices
= very unlikely, somewhat
likely, not sure, somewhat
likely, very likely.
 Values = choices.
Researchers conducted a
survey of 735 Chicago
residents who rent their
housing about their
spending habits. They
want to use the data to
better understand how
Chicago renters spend
their money.

Sampling 
Error
When you survey a small
group of people uncertainty
creeps into statistics.
1. The number of people that
participated in the study.
Population =
Chicago renters
Same example, but
sample size.
Usually measured by
confidence variable.
o E.g. you have
certain % of
confidence.
 Discrepancies due to
random factors between a
sample statistic and a
population parameter.
Sample Size A representation of all the
population; must be chosen
wisely.

-
Sample Subgroup of people from the
population that were studied.
1. Conduct consensus
 If small, depends on
time constraints/budget.
2. Use sample size from
similar study.
3. Use table to find sample
size.
4. Sample size calculator
5. Formula.
The number of people that
participated in the study
Statistic Value that summarizes data
from a sample.
Parameter Value that summarizes a
population.
X
X
2
Sample = 735
Chicago renters
who participated.
Same example, but
sample size.
Sample = 735 Chicago
renters who participated.
X
X
Measurement
Description
Nominal o Separates cases into categories.
(=,) o Only provide information to
distinguish one thing from
another.
o Values = differences in type or
quantity BUT NOT AMOUNT.
o When numbers are used they’re
just place holders.
Ordinal o Values provide enough
(>,<)
information to order objects.
 Whether more or less of
characteristic is
possessed.
o You cannot tell the amount by
which values differ.
Interval o Differences between values =
(-,+)
meaningful.
 i.e. unit of measurement
exists.
o How much of the characteristic
a case possesses.
o You can tell differences in
amount.
Examples
o Zip codes
o Employee ID
numbers
o Eye color
o Gender
o Nationality
Application
o Mode
o Frequency distribution
o Entropy
o Contingency
o Correlation
o X2 test.
o
o
o
o
o
o
o
o
o
o
o
Median
Frequency distribution
Percentiles
Tank correlation
Run tests
Sign tests
o
o
o
o
Mean
Standard deviation
Frequency distribution
Pearson’s correlation
Grades
Street numbers
Place in race
Employee rank
Class ranking
o Calendar dates.
o Temperature (C & F)
o Dress size
3
o They have property of units – 1
always means the same.
Ratio o Differences & ratios are
(*,/)
meaningful.
o Same as interval, however it
has absolute 0:
 0 truly represents the
absence of the
characteristic.
o Temperature (in
Kelvin)
o Monetary quantities
o Counts
o Age
o Mass
o Length
o Income
o Weight
o Electrical current
o
o
o
o
Mean
Standard Deviation
Frequency distribution
Standard error of the
mean.
o Median & percentiles.
o Ratio or coefficient of
variation
Day Three
Terms (MC)
Definition
Ungrouped Frequency  A count of how often
each variable value
Distribution
occurs in a data set.
Application
Used when values a
variable can take are
limited.
Example
Survey asking # siblings
students have – majority
would probably say 1,2,3.
Grouped Frequency  The frequency counts
are 4 adjacent
Distribution
groupings of values, or
intervals, of the
variable.
Used when variable has
large number of values and
it’s acceptable to lose info
by collapsing values into
intervals.
Survey asking the # of kids
in graduating high school
class; could range from 1500.
Discrete Variable  Answers the question:
“How many?”
 Whole number values
ONLY!
 Always:
Nominal/ordinal-level
variables.
 SOMETIMES
interval/ratio are
discrete.
 Answer to someone
asking how many jeans
you have.
 How many siblings do
you have?
 How many neurons are in
a spinal cord?
 How much aggression a
person has.
 How much intelligence a
person has.
 Person’s weight.
Continuous Variable  Answers the question:
“How much?”
 They can take on values
between whole
numbers – they have
fractional values.
 Fractional values =
distance.
Histogram  Graphic display of a
frequency distribution.
 Height = frequency in
intervals.
 Touching bars =
representative of
 Always interval/ratio
level of measurement.
 Continuous variables.
X
4
continuous variables
Term
Definition
Frequency 



Percentage 

Frequency distributions 



Type of Graph
Give a raw number.
Number of individual cases located in a specific category of value of a variable.
All other variables are basically based off this.
How often specific value occurred in a sample.
Proportions of cases in specific category or value of a variable out of all cases
divided by 100.
f/s
Summarize set of data
Tally of often the values, rage of values, occur.
For everything except nominal-level data, they can display info about
cumulative frequency, percentage and cumulative percentage.
Display statistical information.
Definition
Application
Discrete
Bar Graph  Bars don’t touch each other
 Used to demonstrate the frequency with
which the different values of discrete
variables occur.
Histogram  Clearly labeled axis; height = frequency. Continuous
 Graphic display of frequency distribution
for continuous data.
 Bars touch.
Continuous
Frequency Polygon  Frequency marked with dots on the
midpoint of the interval.
 Dots connected by lines.
 Frequencies go to zero at the far left and
far right of the graph.
5
Day Four
Shape of f
Definition
Distributions
Modality  How many peaks
exist in the curve
of the frequency
distribution.
 Peak: high point
(or, mode) and it
represents the
score/interval
with the largest
frequency.
Skewness 

Kurtosis 
Application



Measure of
symmetry.
Asymmetric =
skewed.

Fancy term for
how peaked or
flat a distribution
is.




Outlier 
Normal 
Curve

Extreme score
that falls away
from others on
the data set.
Perfectly
symmetric.
Not too peaked
or too flat.
Unimodal:
only for
normal
curve; has
one peak.
Bimodal:
two peaks.
Multimodal:
3+ peaks.
Positive
Skew: tail
goes to the
right.
Negative
Skew: tail
goes to the
left.
Leptokurtic:
pointy peak.
Mesokurtosis
: Medium
peak.
Platykurtic:
medium
shape curve.
X

Interpretation
Unimodal:
one peak at
the center of
the
distribution.
X
6
Day Five
Central Tendency: Value used to summarize a set of scores – also known as the average.
Tell the typical or average score in the database.
Often, but not always, located in the center of the database.
There are three measures of CT.
Levels of CT Description
Interpretation
Application
o “The average of X is #.” o Interval & Ratio
Mean o Mathematical average of the
score.
o Don’t over interpret
 Without skew
o Sum of sample/sample size
because not all scores
& Outlier
will fall in the mean.
ONLY.
Median o The middle score that separates o “The middle or central o Ordinal
the bottom half of scores from
score is #.”
o Interval & Ratio
the rest.
 With skew &
o Odd # of scores = order scores
outlier ONLY.
high to low, middle = median.
o Even # = order low to high,
median = average of the middle.
o “The most common
o Only one that can use
Mode o Score with the highest
frequency or occurs most often.
score is #.”
nominal/categorical.
o NOT what most people have its o “The score occurring
o Any level of
what I common, but doesn’t
most often is #.”
measurement.
mean majority.




Day Six


Amount of spread in set of scores.
Low
 Extent to which scores cluster together – less spread out.
 High
 Extent to which scores stretch out widely.
Measurement
of Variability Definition
Interpretation Application
o “The
o Interval &
Range o Most simple to
calculate.
distance
Ratio
o The distance between
between
ONLY
the highest and lowest
the highest
variable.
and lowest
score is
___.”
o Interval &
Variance o How close the scores o “The
in distribution are to
average
Ratio
the middle of the
square
ONLY
distribution
distance
o Average squared
from the
difference of the
mean is
scores from the mean
___.”
o Can be calculated for
either population or
sample – formula
changes with each.
Equation
o Highest – lowest =
range
Xmax – Xmin= range
o Population variance:
𝜎2 =
Σ(𝑥 − μ)2
𝑛
o Sample Variance:
𝑠=
Σ(𝑥−μ)2
𝑛−1
7
Standard o The square root of
variance.
Deviation
o The average of the
average.
o Can be calculated for
either population or
sample – formula
changes with each.
o “The
average
distance
from the
average X
was ___.”
o Interval &
Ratio
ONLY.
o Population Standard
Deviation:
𝜎 = √Σ(𝑥 − μ)2 /n
o Sample Standard
Deviation:
𝑠 = √Σ(𝑥 − μ)2 /(n − 1)
Day Seven
Definition
 The raw score
expressed in terms
of how many
standard
deviations is score
is away from the
mean.
 Deviation
score/SD.
Purpose
1. Tell how extreme/typical a raw score is.
 Between z = (-)1or z = (+)1.
2. Helps us how to compare how extreme a score
is across different samples and different
characteristics & scales.
 Depression scale 1-30 with a score of
25; z=1 (Less extreme)
 Depression scale 1-10 with a score of 5;
z=30 (more extreme)
3. How extreme a score is relative to others in the
population or sample (NOT overall good/bad).
 Test to show preparedness for college:
 Z English = -2.
 Z Math = .75
 More prepared for math.
4. In psychology, there are a variety of
tests/assessments that have a known mean/SD
for some groups (e.g. adults/kids)
 You can obtain a client’s raw score
using known mean + SD for the scale,
calculate their z-score.
 Allow you to evaluate how
extreme they are relative to a
reference group.
Interpretation
 You need the
direction, the
amount, and the unit
(SD).
 “X was # SD
above/below the
mean.
Download