Chapter1 Looking at Data - Distributions

advertisement
Chapter1
Looking at Data - Distributions
Introduction
• Goal: Using Data to Gain Knowledge
• Terms/Definitions:
– Individiduals: Units described by or used to obtain
data, such as humans, animals, objects (aka
experimental or sampling units)
– Variables: Characteristics corresponding to
individuals that can take on different values among
individuals
• Categorical Variable: Levels correspond to one of
several groups or categories
• Quantitaive Variable: Take on numeric values such that
arithmetic operations make sense
Introduction
• Spreadsheets for Statistical Analyses
– Rows: Represent Individuals
– Columns: Represent Variables
– SPSS, Minitab, EXCEL are examples
• Measuring Variables
– Instrument: Tool used to make quantitative measurement on
subjects (e.g. psychological test or physical fitness
measurement)
• Independent and Dependent Variables
– Independent Variable: Describes a group an individal comes
from (categorical) or its level (quantitative) prior to observation
– Dependent Variable: Random outcome of interest
Independent and Dependent Variables
• Dependent variables are also called response variables
• Independent Variables are also called explanatory variables
• Marketing: Does amount of exposure effect attitudes?
– I.V.: Exposure (in time or number), different subjects receive
different levels
– D.V.: Measurement of liking of a product or brand
• Medicine: Does a new drug reduce heart disease?
– I.V.: Treatment (Active Drug vs Placebo)
– D.V.: Presence/Absence of heart disease in a time period
• Psychology/Finance: Risk Perceptions
– I.V.: Framing of Choice (Loss vs Gain)
– D.V.: Choice Taken (Risky vs Certain)
Rates and Proportions
• Categorical Variables: Typically we count the
number with some characteristic in a group of
individuals.
• The actual count is not a useful summary. More
useful summaries include:
– Proportion: The number with the characteristic
divided by the group size (will lie between 0 and 1)
– Percent: # with characteristic per 100 individuals
(proportion*100)
– Rate per 100,000: proportion*100,000
Graphical Displays of Distributions
• Graphs of Categorical Variables
– Bar Graph: Horizontal axis defines the various
categories, heights of bars represent numbers of
individuals
– Pie Chart: Breaks down a circle (pie) such that the
size of the slices represent the numbers of
individuals in the categories or percentage of
individuals.
Example - AAA Ratings of FL Hotels
(Bar Chart)
AAA Rating
60
50
Percent
40
30
20
10
0
Percent
1
2
3
4
5
7.589599438
36.47224174
52.28390724
3.302881237
0.351370344
Number of Stars
Example - AAA Ratings of FL Hotels
(Pie Chart)
AAA Rating
5
4
3% 0%
1
8%
2
36%
3
53%
Graphical Displays of Distributions
• Graphs of Numeric Variables
– Stemplot: Crude, but quick method of displaying the
entire set of data and observing shape of distribution
1 Stem: All but rightmost digit, Leaf: Rightmost Digit
2 Put stems in vertical column (small at top), draw vertical line
3 Put leaves in appropriate row in increasing order from stem
– Histogram: Breaks data into equally spaced ranges on
horizontal axis, heights of bars represent frequencies or
percentages
Example: Time (Hours/Year) Lost to Traffic
LOS ANGELES
NEW YORK
CHICAGO
SAN FRANCISCO
DETROIT
WASHINGTON,DC
HOUSTON
ATLANTA
BOSTON
PHILADELPHIA
DALLAS
SEATTLE
SAN DIEGO
MINNEAPOLIS/ST PAUL
MIAMI
ST LOUIS
DENVER
PHOENIX
SAN JOSE
BALTIMORE
PORTLAND
ORLANDO
FORT LAUDERDALE
CINCINNATI
INDIANAPOLIS
CLEVELAND
KANSAS CITY
LOUISVILLE
TAMPA
COLUMBUS
SAN ANTONIO
AUSTIN
NASHVILLE
LAS VEGAS
JACKSONVILLE
PITTSBURGH
MEMPHIS
CHARLOTTE
NEW ORLEANS
56
34
34
42
41
46
50
53
42
26
46
53
37
38
42
44
45
31
42
31
34
42
29
32
37
20
24
37
35
29
24
45
42
21
30
14
22
32
18
Stems: 10s of hours
Step 1:
Step 2:
Step 3:
Leaves: Hours
Stems:
1
2
3
4
5
Stems and Leaves
1 48
2 01244699
3 0112244457778
4 122222245566
5 0336
. Source: Texas Transportation Institute (5/7/2001).
Example: Time (Hours/Year) Lost to Traffic
EXCEL Output
Histogram
Stem & Leaf
Display
12
Frequency
10
8
6
4
2
0
14
21
28
35
42
49
More
Stems
1
2
3
4
5
Leaves
->48
->01244699
->0112244457778
->122222245566
->0336
Bin
Note in histogram, the bins represent the number up to and including that
number (e.g. T14, 14<T21, …, 42<T49, T>49)
Comparing 2 Groups - Back-to-back
Stemplots
• Places Stems in Middle, group 1 to left,
group 2 to right
• Example: Maze Learning:
– Groups (I.V.): Adults vs Children
– Measured Response (D.V.): Average number of
Errors in series of Trials
Example - Maze Learning (Average Errors)
Adult
Adult
Child
17.8
13.3
13.7
17.0
8.2
11.5
9.0
22.2
18.2
10.1
15.2
7.8
7.9
13.9
6.0
17.5
28.7
12.7
12.2
13.0
16.6
12.9
40.2
22.9
Stems: Integer parts
Leaves: Decimal Parts
98
2
0
1
5
973
2
80
2
2
Stem
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
Child
0
279
0
6
5
9
7
2
Examinining Distributions
• Overall Pattern and Deviations
• Shape: symmetric, stretched to one direction,
multiple humps
• Center: Typical values
• Spread: Wide or narrow
• Outlier: Individual whose value is far from others
(see bottom right corner of previous slide)
– May be due to data entry error, instrument
malfunction, or individual being unusual wrt others
Time Plots -Variable Measured Over Time
c o mp o s i t e
6000
5000
4000
3000
2000
1000
0
0
1000
2000
3000
4000
day
5000
6000
7000
Time Plot with Trend/Seasonality
US Monthly Oil Imports
400000
350000
300000
200000
150000
100000
50000
Month (1/1971-12/2004)
375
364
353
342
331
320
309
298
287
276
265
254
243
232
221
210
199
188
177
166
155
144
133
122
111
100
89
78
67
56
45
34
23
12
0
1
1000s of barrels
250000
Numeric Descriptions of Distributions
• Measures of Central Tendency
– Arithmetic Mean: Total equally divided among individual cases
– Median: Midpoint of the distribution (M)
x

• Measures of Spread (Dispersion)
– Quartiles (first/third): Points that break out the smallest and
largest 25% of distribution (Q1 , Q3)
– 5 Number Summary: (Minimum,Q1,M,Q3,Maximum)
– Interquartile Range: IQR = Q3-Q1
– Boxplot: Graphical summary of 5 Number Summary
– Variance: “Average” squared deviation from mean (s2)
– Standard Deviation: Square root of variance (s)
Measures of Central Tendency
• Arithmetic Mean: Obtain the total by summing all
values and divide by sample size (“equal allotment”
among individuals)
x1  x2    xn 1
x
  xi
n
n
• Median: Midpoint of Distribution
• Sort values from smallest to largest
• If n odd, take the (n+1)/2 ordered value
• If n even, take average of n/2 and (n/2)+1 ordered values
2005 Oscar Nominees (Best Picture)
• Movie: Domestic Gross/Worldwide Gross
–
–
–
–
–
The Aviator: $103M / $214M
Finding Neverland: $52M / $116M
Million Dollar Baby: $100M / $216M
Ray: $75M / $97M
Sideways: $72M / $108M
• Mean & Median Domestic Gross among nominees ($M):
103  52  100  75  72 402
Mean : x 

 80.4
5
5
n 1 5 1 6
Median : n  5

 3
2
2
2
52  72  75  100  103  M  75
Delta Flight Times - ATL/MCO Oct,2004
•
•
•
•
N=372 Flights 10/1/2004-10/31/2004
Total actual time: 30536 Minutes
Mean Time: 30536/372 = 82.1 Minutes
Median: 372/2=186, (372/2)+1=187
– 186th and 187th ordered times are 81 minutes: M=81
100
80
60
40
20
Std. Dev = 8.81
Mean = 82.1
N = 372.00
0
65.0
75.0
70.0
ACTUAL
85.0
80.0
95.0
90.0
105.0
100.0
115.0
110.0
120.0
Measures of Spread
• Quartiles: First (Q1 aka Lower) and Third (Q3 aka
Upper)
– Q1 is the median of the values below the median
position
– Q3 is the median of the values below the median
position
– Notes(See examples on next page):
• If n is odd, median position is (n+1)/2, and finding quartiles
does not include this value.
• If n is even, median position is treated (most commonly) as
(n+1)/2 and the two values (positions) used to compute
median are used for quartiles.
order
1
2
3
4
5
• Oscar Nominations:
–
–
–
–
–
–
# of Individuals: n=5
Median Position: (5+1)/2=3
Positions Below Median Position: 1-2
Positions Above Median Position: 4-5
Median of Lower Positions: 1.5
Median of Lower Positions: 4.5
52  72
 60
2
100  103
Q3 
 101.5
2
IQR  101.5  60  41.5
Q1 
• ATL/MCO Flights:
–
–
–
–
–
–
revenue
52
72
75
100
103
# of Individuals: n=372
Median Position: (372+1)/2=186.5
Positions Below Median Position: 1-186
Positions Above Median Position: 187-372
Median of Lower Positions: 93.5
Median of Upper Positions: 279.5
order
93
94
279
280
acttime
76
76
86
86
76  76
 76
2
86  86
Q3 
 86
2
IQR  86  76  10
Q1 
Outliers - 1.5xIQR Rule
• Outlier: Value that falls a long way from other values in
the distribution
• 1.5xIQR Rule: An observation may be considered an
outlier if it falls either 1.5 times the interquartile range
above the third (upper) quartile or the same distance
below the first (lower) quartile.
• ATL/MCO Data: Q1=76 Q3=86 IQR=10 1.5xIQR=15
– “High” Outliers: Above 86+15=101 minutes
– “Low” Outliers: Below 76-15=61 minutes
– 12 Flights are at 102 minutes or more (Highest is 122). See
(modified) boxplot below
Measures of Spread - Variance and S.D.
• Deviation: Difference between an observed value and
the overall mean (sign is important):
xi  x
• Variance: “Average” squared deviation (divides the sum
of squared deviations by n-1 (as opposed to n) for
reasons we see later:

x  x   x

2
s
2
1


2
2
 x    xn  x
n 1

2

• Standard Deviation: Positive square root of s2


2
1
s
xi  x

n 1

2
1

xi  x

n 1
Example - 2005 Oscar Movie Revenues
•
•
•
•
•
•
Mean: x=80.4
The Aviator: i=1 x1=103 Deviation: 103-80.4=22.6
Finding Neverland: i=2 x2=52 Dev: 52-80.4= -28.4
Million Dollar Baby: i=3 x3=100 Dev: 100-80.4=19.6
Ray: i=4 x4=75 Dev: 75-80.4 = -5.4
Sideways: i=5 x5=72 Dev: 72-80.4 = -8.4


1
22.62   28.42  19.62   5.42   8.42
5 1
1
1801.20
 510.76  806.56  384.16  29.16  70.56 
 450.30
4
4
s   450.30  21.22
s2 
Computer Output of Summary Measures and
Boxplot (SPSS) - ATL/MCO Data
e
N
e
u
e
m
a
m
A
2
5
2
6
9
2
8
V
2
130
372
120
371
370
110
100
369
367
368
366
365
364
363
361
362
359
360
90
80
70
60
N=
372
ACTUAL
Linear Transformations
• Often work with transformed data
• Linear Transformation: xnew = a + bx for
constants a and b (e.g. transforming from metric
system to U.S., celsius to fahrenheit, etc)
• Effects:
– Multiplying by b causes both mean and standard
deviation to be multiplied by b
– Addition by a shifts mean and all percentiles by a
but does not effect the standard deviation or spread
– Note that for locations, multiplication of b precedes
addition of a
Density Curves/Normal Distributions
• Continuous (or practically continuous) variables that
can lie along a continuous (practically) range of values
• Obtain a histogram of data (will be irregular with rigid
blocks as bars over ranges)
• Density curves are smooth approximations (models) to
the coarse histogram
– Curve lies above the horizontal axis
– Total area under curve is 1
– Area of curve over a range of values represents its probability
• Normal Distributions - Family of density curves with
very specific properties
Mean and Median of a Density Curve
• Mean is the balance point of a distribution of
measurements. If the height of the curve represented
weight, its where the density curve would balance
• Median is the point where half the area is below and
half the area is above the point
– Symmetric Densities: Mean = Median
– Right Skew Densities: Mean > Median
– Left Skew Densities: Mean < Median
• We will mainly work with means. Notation:
Population : Mean  
Sample :Mean  x
Symmetric (Normal) Distribution
Right Skewed Density Curve
Mean is the Balance Point
Normal Distribution
• Bell-shaped, symmetric family of distributions
• Classified by 2 parameters: Mean () and standard
deviation (s). These represent location and spread
• Random variables that are approximately normal have
the following properties wrt individual measurements:
–
–
–
–
Approximately half (50%) fall above (and below) mean
Approximately 68% fall within 1 standard deviation of mean
Approximately 95% fall within 2 standard deviations of mean
Virtually all fall within 3 standard deviations of mean
• Notation when X is normally distributed with mean 
and standard deviation s :
X ~ N ( ,s )
Two Normal Distributions
Normal Distribution
P( X   )  0.50 P(   s  X    s )  0.68 P(   2s  X    2s )  0.95
Example - Heights of U.S. Adults
• Female and Male adult heights are well approximated by
normal distributions: XF~N(63.7,2.5) XM~N(69.1,2.6)
20
20
18
16
14
12
10
10
8
6
4
Std. Dev = 2.48
Std. Dev = 2.61
2
Mean = 63.7
Mean = 69.1
0
N = 99.68
55.5
57.5
56.5
59.5
58.5
61.5
60.5
63.5
62.5
65.5
64.5
67.5
66.5
INCHESF
69.5
68.5
70.5
N = 99.23
0
59.5 61.5 63.5 65.5 67.5 69.5 71.5 73.5 75.5
60.5 62.5 64.5 66.5 68.5 70.5 72.5 74.5 76.5
INCHESM
Cases weighted by PCTM
Cases weighted by PCTF
Source: Statistical Abstract of the U.S. (1992)
Standard Normal (Z) Distribution
• Problem: Unlimited number of possible normal
distributions (- <  <  , s > 0)
• Solution: Standardize the random variable to have
mean 0 and standard deviation 1
X ~ N ( ,s )  Z 
X 
s
~ N (0,1)
• Probabilities of certain ranges of values and specific
percentiles of interest can be obtained through the
standard normal (Z) distribution
Standard Normal (Z) Distribution
Standard Normal (Z) Distribution
0.45
0.4
0.35
Table Area
0.3
1-Table Area
f(z)
0.25
0.2
0.15
0.1
0.05
0
-4
-3
-2
-1
0
z
1
z
2
3
4
2nd Decimal Place
I
n
t
g
e
r
p
a
r
t
&
1st
D
e
c
i
m
a
l
z
-3.0
-2.9
-2.8
-2.7
-2.6
-2.5
-2.4
-2.3
-2.2
-2.1
-2.0
-1.9
-1.8
-1.7
-1.6
-1.5
-1.4
-1.3
-1.2
-1.1
-1.0
-0.9
-0.8
-0.7
-0.6
-0.5
-0.4
-0.3
-0.2
-0.1
-0.0
0.00
0.0013
0.0019
0.0026
0.0035
0.0047
0.0062
0.0082
0.0107
0.0139
0.0179
0.0228
0.0287
0.0359
0.0446
0.0548
0.0668
0.0808
0.0968
0.1151
0.1357
0.1587
0.1841
0.2119
0.2420
0.2743
0.3085
0.3446
0.3821
0.4207
0.4602
0.5000
0.01
0.0013
0.0018
0.0025
0.0034
0.0045
0.0060
0.0080
0.0104
0.0136
0.0174
0.0222
0.0281
0.0351
0.0436
0.0537
0.0655
0.0793
0.0951
0.1131
0.1335
0.1562
0.1814
0.2090
0.2389
0.2709
0.3050
0.3409
0.3783
0.4168
0.4562
0.4960
0.02
0.0013
0.0018
0.0024
0.0033
0.0044
0.0059
0.0078
0.0102
0.0132
0.0170
0.0217
0.0274
0.0344
0.0427
0.0526
0.0643
0.0778
0.0934
0.1112
0.1314
0.1539
0.1788
0.2061
0.2358
0.2676
0.3015
0.3372
0.3745
0.4129
0.4522
0.4920
0.03
0.0012
0.0017
0.0023
0.0032
0.0043
0.0057
0.0075
0.0099
0.0129
0.0166
0.0212
0.0268
0.0336
0.0418
0.0516
0.0630
0.0764
0.0918
0.1093
0.1292
0.1515
0.1762
0.2033
0.2327
0.2643
0.2981
0.3336
0.3707
0.4090
0.4483
0.4880
0.04
0.0012
0.0016
0.0023
0.0031
0.0041
0.0055
0.0073
0.0096
0.0125
0.0162
0.0207
0.0262
0.0329
0.0409
0.0505
0.0618
0.0749
0.0901
0.1075
0.1271
0.1492
0.1736
0.2005
0.2296
0.2611
0.2946
0.3300
0.3669
0.4052
0.4443
0.4840
0.05
0.0011
0.0016
0.0022
0.0030
0.0040
0.0054
0.0071
0.0094
0.0122
0.0158
0.0202
0.0256
0.0322
0.0401
0.0495
0.0606
0.0735
0.0885
0.1056
0.1251
0.1469
0.1711
0.1977
0.2266
0.2578
0.2912
0.3264
0.3632
0.4013
0.4404
0.4801
0.06
0.0011
0.0015
0.0021
0.0029
0.0039
0.0052
0.0069
0.0091
0.0119
0.0154
0.0197
0.0250
0.0314
0.0392
0.0485
0.0594
0.0721
0.0869
0.1038
0.1230
0.1446
0.1685
0.1949
0.2236
0.2546
0.2877
0.3228
0.3594
0.3974
0.4364
0.4761
0.07
0.0011
0.0015
0.0021
0.0028
0.0038
0.0051
0.0068
0.0089
0.0116
0.0150
0.0192
0.0244
0.0307
0.0384
0.0475
0.0582
0.0708
0.0853
0.1020
0.1210
0.1423
0.1660
0.1922
0.2206
0.2514
0.2843
0.3192
0.3557
0.3936
0.4325
0.4721
0.08
0.0010
0.0014
0.0020
0.0027
0.0037
0.0049
0.0066
0.0087
0.0113
0.0146
0.0188
0.0239
0.0301
0.0375
0.0465
0.0571
0.0694
0.0838
0.1003
0.1190
0.1401
0.1635
0.1894
0.2177
0.2483
0.2810
0.3156
0.3520
0.3897
0.4286
0.4681
0.09
0.0010
0.0014
0.0019
0.0026
0.0036
0.0048
0.0064
0.0084
0.0110
0.0143
0.0183
0.0233
0.0294
0.0367
0.0455
0.0559
0.0681
0.0823
0.0985
0.1170
0.1379
0.1611
0.1867
0.2148
0.2451
0.2776
0.3121
0.3483
0.3859
0.4247
0.4641
2nd Decimal Place
z
I
n
t
g
e
r
p
a
r
t
&
1st
D
e
c
i
m
a
l
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2.0
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
3.0
0
0.5000
0.5398
0.5793
0.6179
0.6554
0.6915
0.7257
0.7580
0.7881
0.8159
0.8413
0.8643
0.8849
0.9032
0.9192
0.9332
0.9452
0.9554
0.9641
0.9713
0.9772
0.9821
0.9861
0.9893
0.9918
0.9938
0.9953
0.9965
0.9974
0.9981
0.9987
0.01
0.5040
0.5438
0.5832
0.6217
0.6591
0.6950
0.7291
0.7611
0.7910
0.8186
0.8438
0.8665
0.8869
0.9049
0.9207
0.9345
0.9463
0.9564
0.9649
0.9719
0.9778
0.9826
0.9864
0.9896
0.9920
0.9940
0.9955
0.9966
0.9975
0.9982
0.9987
0.02
0.5080
0.5478
0.5871
0.6255
0.6628
0.6985
0.7324
0.7642
0.7939
0.8212
0.8461
0.8686
0.8888
0.9066
0.9222
0.9357
0.9474
0.9573
0.9656
0.9726
0.9783
0.9830
0.9868
0.9898
0.9922
0.9941
0.9956
0.9967
0.9976
0.9982
0.9987
0.03
0.5120
0.5517
0.5910
0.6293
0.6664
0.7019
0.7357
0.7673
0.7967
0.8238
0.8485
0.8708
0.8907
0.9082
0.9236
0.9370
0.9484
0.9582
0.9664
0.9732
0.9788
0.9834
0.9871
0.9901
0.9925
0.9943
0.9957
0.9968
0.9977
0.9983
0.9988
0.04
0.5160
0.5557
0.5948
0.6331
0.6700
0.7054
0.7389
0.7704
0.7995
0.8264
0.8508
0.8729
0.8925
0.9099
0.9251
0.9382
0.9495
0.9591
0.9671
0.9738
0.9793
0.9838
0.9875
0.9904
0.9927
0.9945
0.9959
0.9969
0.9977
0.9984
0.9988
0.05
0.5199
0.5596
0.5987
0.6368
0.6736
0.7088
0.7422
0.7734
0.8023
0.8289
0.8531
0.8749
0.8944
0.9115
0.9265
0.9394
0.9505
0.9599
0.9678
0.9744
0.9798
0.9842
0.9878
0.9906
0.9929
0.9946
0.9960
0.9970
0.9978
0.9984
0.9989
0.06
0.5239
0.5636
0.6026
0.6406
0.6772
0.7123
0.7454
0.7764
0.8051
0.8315
0.8554
0.8770
0.8962
0.9131
0.9279
0.9406
0.9515
0.9608
0.9686
0.9750
0.9803
0.9846
0.9881
0.9909
0.9931
0.9948
0.9961
0.9971
0.9979
0.9985
0.9989
0.07
0.5279
0.5675
0.6064
0.6443
0.6808
0.7157
0.7486
0.7794
0.8078
0.8340
0.8577
0.8790
0.8980
0.9147
0.9292
0.9418
0.9525
0.9616
0.9693
0.9756
0.9808
0.9850
0.9884
0.9911
0.9932
0.9949
0.9962
0.9972
0.9979
0.9985
0.9989
0.08
0.5319
0.5714
0.6103
0.6480
0.6844
0.7190
0.7517
0.7823
0.8106
0.8365
0.8599
0.8810
0.8997
0.9162
0.9306
0.9429
0.9535
0.9625
0.9699
0.9761
0.9812
0.9854
0.9887
0.9913
0.9934
0.9951
0.9963
0.9973
0.9980
0.9986
0.9990
0.09
0.5359
0.5753
0.6141
0.6517
0.6879
0.7224
0.7549
0.7852
0.8133
0.8389
0.8621
0.8830
0.9015
0.9177
0.9319
0.9441
0.9545
0.9633
0.9706
0.9767
0.9817
0.9857
0.9890
0.9916
0.9936
0.9952
0.9964
0.9974
0.9981
0.9986
0.9990
Finding Probabilities of Specific Ranges
• Step 1 - Identify the normal distribution of interest (e.g.
its mean () and standard deviation (s) )
• Step 2 - Identify the range of values that you wish to
determine the probability of observing (XL , XU), where
often the upper or lower bounds are  or -
• Step 3 - Transform XL and XU into Z-values:
ZL 
XL  
s
ZU 
XU  
s
• Step 4 - Obtain P(ZL Z  ZU) from Z-table
Example - Adult Female Heights
• What is the probability a randomly selected female is
5’10” or taller (70 inches)?
• Step 1 - X ~ N(63.7 , 2.5)
• Step 2 - XL = 70.0 XU = 
• Step 3 70.0  63.7
ZL 
 2.52
ZU  
2.5
• Step 4 - P(X  70) = P(Z  2.52) = 1-P(Z2.52)=1.9941=.0059 (  1/170)
z
2.4
2.5
2.6
.00
.9918
.9938
.9953
.01
.9920
.9940
.9995
.02
.9922
.9941
.9956
.03
.9925
.9943
.9957
Finding Percentiles of a Distribution
• Step 1 - Identify the normal distribution of interest (e.g.
its mean () and standard deviation (s) )
• Step 2 - Determine the percentile of interest 100p% (e.g.
the 90th percentile is the cut-off where only 90% of scores are
below and 10% are above).
• Step 3 - Find p in the body of the z-table and
itscorresponding z-value (zp) on the outer edge:
– If 100p < 50 then use left-hand page of table
– If 100p 50 then use right-hand page of table
• Step 4 - Transform zp back to original units:
x p    z ps
Example - Adult Male Heights
•
•
•
•
•
Above what height do the tallest 5% of males lie above?
Step 1 - X ~ N(69.1 , 2.6)
Step 2 - Want to determine 95th percentile (p = .95)
Step 3 - P(z1.645) = .95
Step 4 - X.95 = 69.1 + (1.645)(2.6) = 73.4 (6’,1.4”)
z
1.5
1.6
1.7
.03
.9370
.9484
.9582
.04
.9382
.9495
.9591
.05
.9394
.9505
.9599
.06
.9406
.9515
.9608
Statistical Models
• When making statistical inference it is useful to
write random variables in terms of model
parameters and random errors
X    ( X  )    
  X 
• Here  is a fixed constant and  is a random variable
• In practice  will be unknown, and we will use sample data to
estimate or make statements regarding its value
Download