Z score

advertisement
S519: Evaluation of
Information Systems
Social Statistics
Chapter 7: Are your curves
normal?
This week



Why understanding probability is important?
What is normal curve
How to compute and interpret z scores.
What is probability?



The chance of winning a lottery
The chance to get a head on one flip of a
coin
Determine the degree of confidence to state
a finding
Normal distribution

Figure 7.4 – P157


Almost 100% of the scores fall between (-3SD,
+3SD)
Around 34% of the scores fall between (0, 1SD)
Are all distributions
normal?
Normal distribution
The distance between
contains
Range (if mean=100,
SD=10)
Mean and 1SD
34.13% of all cases
100-110
1SD and 2SD
13.59% of all cases
110-120
2SD and 3SD
2.15% of all cases
120-130
>3SD
0.13% of all cases
>130
Mean and -1SD
34.13% of all cases
90-100
-1SD and -2SD
13.59% of all cases
80-90
-2SD and -3SD
2.15% of all cases
70-80
< -3SD
0.13% of all cases
<70
Z score – standard score


If you want to compare individuals in different
distributions
Z scores are comparable because they are
standardized in units of standard deviations.
Z score

Standard score
z
X 

X: the individual score
 : the mean
 : standard deviation
Sample or
population?
Z score
Mean and SD for Z
distribution?
Mean=25, SD=2, what is
the z score for 23, 27, 30?
Z score




Z scores across different distributions are
comparable
Z scores represent a distance of z score
standard deviation from the mean
Raw score 12.8 (mean=12, SD=2)  z=+0.4
Raw score 64 (mean=58, SD=15)  z=+0.4
Equal distances from the mean
Comparing apples and
oranges:




Eric competes in two track events: standing
long jump and javelin. His long jump is 49
inches, and his javelin throw was 92 ft. He
then measures all the other competitors in
both events and calculates the mean and
standard deviation:
Javelin: M = 86ft, s = 10ft
Long Jump: M = 44, s = 4
Which event did Eric do best in?
Excel for z score


Standardize(x, mean, standard deviation)
(x-average(array))/STDEV(array)
What z scores represent?




Raw scores below the mean has negative z
scores
Raw scores above the mean has positive z
scores
Representing the number of standard
deviations from the mean
The more extreme the z score, the further it
is from the mean,
What z scores represent?




84% of all the scores fall below a z score of
+1 (why?)
16% of all the scores fall above a z score of
+1 (why?)
This percentage represents the probability of
a certain score occurring, or an event
happening
If less than 5%, then this event is unlikely to
happen
Lab
Exercise

In a normal distribution with a mean of 100
and a standard deviation of 10, what is the
probability that any one score will be 110 or
above?
What about 6σ
http://en.wikipedia.org/wiki/Six_Sigma
If z is not integer


Table B.1 (S-P357-358)
NORMSDIST(z)

To compute the probability associated with a
particular z score
Lab
Exercise

The probability associated with z=1.38



41.62% of all the cases in the distribution fall
between mean and 1.38 standard deviation,
About 92% falls below a 1.38 standard deviation
How and why?
Between two z scores

What is the probability to fall between z score
of 1.5 and 2.5



Z=1.5, 43.32%
Z=2.5, 49.38%
So around 6% of the all the cases of the
distribution fall between 1.5 and 2.5 standard
deviation.
Lab
Exercise

What is the percentage for data to fall
between 110 and 125 with the distribution of
mean=100 and SD=10
Lab
Exercise

The probability of a particular score occurring
between a z score of +1 and a z score of
+2.5
Lab
Exercise

Compute the z scores where mean=50 and
the standard deviation =5





55
50
60
57.5
46
Lab
Exercise

The math section of the SAT has a μ = 500
and σ = 100. If you selected a person at
random:



a) What is the probability he would have a score
greater than 650?
b) What is the probability he would have a score
between 400 and 500?
c) What is the probability he would have a score
between 630 and 700?
Determine sample size
Sample Size 
Number of Responses
Expected


Response
Needed
Rate
Expected response rate: obtain based on
historical data
Number of responses needed: use formula to
calculate
Number of responses needed
Z 
2
2
n
e





x
2
n=number of responses needed (sample
size)
Z=the number of standard deviations that
describe the precision of the results
e=accuracy or the error of the results
2
 x =variance of the data
for large population size
Deciding  x



2
from previous surveys
intentionally use a large number
conservative estimation



e.g. a 10-point scale; assume that responses will
be found across the entire 10-point scale
3 to the left/right of the mean describe virtually
the entire area of the normal distribution curve
2
 =10/6=1.67; =2.78
Example
Z 
2
2
n
e






x
2
Z=1.96 (usually rounded as 2)
2
 =2.78
e=0.2
n=278 (responses needed)
assume response rate is 0.4
Sample size=278/0.4=695
Exercise
Z 
2
2
n
e





x
2
Z=1.96 (usually rounded as 2)
5-point scale (suppose most of the responses
are distributed from 1-4)
error tolerance=0.4
assume response rate is 0.6
What is sample size?
Download