FREQUENCY DISTRIBUTION

advertisement
BASIC STATISTICS
FOR SYSTEM STUDIES
TARIQ AL-AHDAL, PhD
1
Basic Statistics
OBJECTIVES / CONTENTS
STATISTICAL DISTRIBUTIONS
GENERATION OF RANDOM NUMBERS
CONFIDENCE INTERVALS
REQUIRED DEVELOPMENT
Tariq Alahdal
2
Basic Statistics
Statistical Distributions
FREQUENCY DISTRIBUTION: 1/3
INPUT:
ORGANIZE:
raw data “a set of (N) values”
values in ascending / descending order
DETERMINE:
DIVIDE:
the range of raw data / values
the range into sub-ranges
Tariq Alahdal
3
Basic Statistics
Statistical Distributions
FREQUENCY DISTRIBUTION: 2/3
FIND: number of values per sub-range “frequency”
RESULT: frequency distribution
DIVIDE: frequency of each sub-range by (N)
RESULT: relative frequency distribution
“probability density”
Tariq Alahdal
4
Basic Statistics
Statistical Distributions
FREQUENCY DISTRIBUTION: 3/3
ADD:
frequencies sub-range by sub-range
RESULT: cumulative frequency distribution
ADD:
relative frequencies sub-range by sub-range
RESULT:
relative cumulative frequency
distribution “cumulative probability”
Tariq Alahdal
5
Basic Statistics
Statistical Distributions
FREQUENCY DISTRIBUTION: Problem
Given: “N values” raw data
(N is very large for probability considerations)
Frequency Distribution
Find
(Graphs to
Illustrate)
Relative Frequency Distribution:
(Probability Density)
Cumulative Frequency Distribution
Relative Cumulative Frequency
Distribution
(Cumulative Probability)
Tariq Alahdal
6
Basic Statistics
Statistical Distributions
MEAN : AVERAGE : EXPECTATION
Definition
Raw Data
Mean
Raw Data
Weighted
Mean
Problem
Arithmetic mean
Given values: x[1], x[2], …. x[N]
1
m=
N
i N
 x[i ]
i 1
Given ranges: y[1], y[2], ….y[n]
Frequencies: f[1], f[2], …. f[n]
1
m=
N
j n
 f [ j ]. y[ j ]
j 1
:
j n
 f [ j]  N
j 1
Write and test computer functions /
Give Illustrations
Tariq Alahdal
7
Basic Statistics
Statistical Distributions
MODE / MEDIAN
Median
Middle value
Mode
Value with highest frequency
Raw Data
Median
Given values:
For ODD N:
x[1], x[2], …. x[N]
m = x[(N+1)/2]
EVEN: m = (1/2) {x[N/2]+x[(N/2)+1]}
Mode
Find frequency distribution:
m = x[k] : f[k] highest frequency
Problem
Write and test computer functions /
Give Illustrations
Tariq Alahdal
8
Basic Statistics
Statistical Distributions
DEVIATION / VARIANCE
Deviation
Mean
Deviation
Variance /
Standard
Deviation
Standard
Score
Problem
Deviation from the “mean”
d[i] = |x[i] – m|
1
d
N
i N
1 n
d [i ] d   f [ j ].d [ j ]

n j 1
i 1
1
v
N
i N
 d [i]
2
i 1
Standardized Variables:
j n
 f [ j]  N
s
j 1
v
z[i] = d[i] / s
Write and test computer functions /
Give Illustrations
Tariq Alahdal
9
Basic Statistics
Statistical Distributions
UNIFORM DISTRIBUTION
“min”: Minimum number
Features
Range
“max”: Maximum number
Principle
All numbers “x[i] : x” are equally likely
Probability Density
p(x,min,max) = 1 / (max-min)
Mean
m = (max + min) / 2
Variance
v = (max - min)2 / 12
Problem
Write and test computer functions /
Give Illustrations
Tariq Alahdal
10
Basic Statistics
Statistical Distributions
BINOMIAL DISTRIBUTION
Features
N
p
q
x
Probability Density
Number of trials
Probability of success
Probability of failure: q = 1 - p
Number of successful trials in N
p ( x, N , p )  C x
N
pq
Mean
m=N.p
Variance
v=N.p.q
Problem
x
N x
Write and test computer functions /
Give Illustrations
Tariq Alahdal
11
Basic Statistics
Statistical Distributions
POISSON DISTRIBUTION
Features
r
t
x
Probability Density
Mean
Rate of arrivals: mean
Time interval, may be “t=1 time unit”
Possible number of arrivals during “t”
(rt ) x  rt
p( x, r , t ) 
e
x!
m=r
v=r
Variance
Problem
Write and test computer functions /
Give Illustrations
Tariq Alahdal
12
Basic Statistics
Statistical Distributions
EXPONENTIAL DISTRIBUTION
r
w
Features
Principle
Probability Density
Mean
Rate of arrivals
Waiting time for next arrival: Inter-arrival
Distribution for the value of “w”
p( w, r )  r.e
m=1/r
(Poisson inter-arrival mean)
v = 1 / r2
Variance
Problem
 rw
Write and test computer functions /
Give Illustrations
Tariq Alahdal
13
Basic Statistics
Statistical Distributions
NORMAL DISTRIBUTION
Range
- (infinity) <
m
v
x
Features
Principle
Probability Density
Standard form:
Problem
x<
+ (infinity)
Average value: mean / median / mode
Variance: v = s2
Possible value
Usually a measurement process
1
p( x, m, s) 
e
s 2
p( z ,0,1) 

1
2s
2
( xm)2
1
 z2
1
e 2
2
Write and test computer functions /
Give Illustrations
Tariq Alahdal
14
Basic Statistics
Statistical Distributions
t DISTRIBUTION
Range
m
f
Features
Principle
t
Probability Density
- (infinity) <
t <
+ (infinity)
mean / median / mode at “zero”
Degree of freedom: 0 < f < + (infinity)
Used for estimation: small sample
Distribution variable: variance unknown
[( f  1) / 2]
t 2 ( f 1) / 2
p (t , f ) 
(1  )
f
( f / 2) f

Gamma Function
( f ) 
f 1  x
x
e dx  ( f  1)!

0
Problem
Write and test computer functions /
Give Illustrations
Tariq Alahdal
15
Basic Statistics
Statistical Distributions
COMPUTATION TIPS: 1/2
0!=1
Integer Factorial
(i+1) ! = i! (i+1)
Sterling Formula:
Real Factorial
Gamma Function
Problem
i! 
2i i e
i
i
Useful for its computation
Write and test computer functions /
Give Illustrations
Tariq Alahdal
16
Basic Statistics
Statistical Distributions
COMPUTATION TIPS: 2/2
C
Combination
N
i 1
C
Poisson
Integration
Problem
N
0
C
N
i
1
( N  i  1)
i
A0
1
0!
Ai 1
Ai
A

(i  1)!
i! (i  1)
Trapezoid Rule: Summation in small steps
Write and test computer functions /
Give Illustrations
Tariq Alahdal
17
Basic Statistics
Generation of Random Numbers
RNGS: WHY
RANDOM
PROCESSES
SAMPLING
TESTING
COMPUTER
ALGORITHMS
NUMERICAL ANALYSIS
SYSTEM
MODELING /
SIMULATION
DECISION
MAKING
OTHER REASONS
Tariq Alahdal
18
Basic Statistics
Generation of Random Numbers
UNIFORM RNG: U (0,1)
Modulus factor: large (st) prime number within
m
memory cell size (for wide repeated sequence cycle)
m = 231 - 1 = 2,147,483,674 (for 32 bit cell)
Features
a
b
X[0]
X[i-1]
Multiplier: a = 314,159,269
Increment: b = 453,806,245
Starting value: X[0] = 577,215,665 (the seed)
“(ith-1)” value: seed for X[i]
Uniform: X (0,m)
X[i] = {a . X[i-1] + b} MOD m
Uniform: U (0,1)
U[i] = X[i] / m
Problem
Write and test computer functions /
Give Illustrations
Tariq Alahdal
19
Basic Statistics
Generation of Random Numbers
UNIFORM RNG: U (min , max)
min
max
U (min, max)
Integer
Range
Problem
Test
Note
Required minimum integer value
Required maximum integer value
min + TRUNC [(max – min + 1) . U (0, 1)]
Write and test computer functions /
Give Illustrations
100,000 “runs”: Test
• Frequency Distribution
• Mean
• Variance & Standard Deviation
(Relative to theoretical expectations)
Every new set of runs should start with
a different seed: X[0]
Tariq Alahdal
20
Basic Statistics
Generation of Random Numbers
EXPONENTIAL RNG: E (m)
Required “mean”:
m
E (mean)
- (m) . Ln [U(0, 1)]
Problem
Write and test computer
functions / Give Illustrations
“inter-event” Poisson
Test
100,000 “runs”: Test
• Frequency Distribution
• Mean
• Variance & Standard Deviation
(Relative to theoretical
expectations)
Tariq Alahdal
21
Basic Statistics
Generation of Random Numbers
NORMAL RNG: N (m, s)
Features
m
s
Required “standard deviation”
2
3
V[1] = 2 . { U(0,1)[1] } – 1
V[2] = 2 . { U(0,1)[2] } – 1
SUM = V2[1] + V2[2]
IF SUM >= 1 GO TO STEP 1
4
Y  V [2]. [2.Ln(SUM )] / SUM
5
N (m,s) = m + s . Y
N (0,1) = Y
1
STEPS
Required “mean” of the normal RNG
Standard Normal
Problem
Write and test computer functions /
Give Illustrations
Tariq Alahdal
22
Basic Statistics
Confidence
MEASUREMENTS & ESTIMATIONS
MEASUREMENTS:
Experiments on real
systems / models / Simulation
Large Sample: N >= 30
Small Sample: N < 30
SET OF VALUES: Sample of results (N)
Confidence
for
“Sample”
ESTIMATION
THEORY
Confidence
for
“Mean”
Tariq Alahdal
23
Basic Statistics
Confidence Intervals
EVALUATION ALGORITHM: General
INPUT:
Measurements: N & x[1], x[2],….,x[N]
Required “confidence level” : L (%)
MEAN:
|
1
m 
N
STANDARD DEVIATION:
|s 
i N
 x[i ]
i 1
1 i n
2
{
x
[
i
]

m
}

N  1 i 1
AREA UNDER CURVE: | a = 1 – (L/100)
Tariq Alahdal
24
Basic Statistics
Confidence Intervals
0.45
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
Probability
Density
NORMAL DISTRIBUTION
68.27 %
Standard
Deviations
3.9
-3
Standard
Deviations
95.45 %
-2
-z
-1
(a/2)
-0.5
0
Mean
0.5
1
2
3
3.9
+ z (a/2)
Tariq Alahdal
25
Basic Statistics
Confidence Intervals
EVALUATION ALGORITHM: N >=30
CONFIDENCE COEFFICIENT: Z (a/2)
|
1 a

2
2
2
Z( a /

2)
e
z2
2
dz
0
| m – (Z (a/2) . S) < sample < m – (Z (a/2) . S)
|
m  Z ( a / 2)
S
S
 mean  m  Z ( a / 2 )
N
N
Tariq Alahdal
26
Basic Statistics
Confidence Intervals
EVALUATION ALGORITHM: N < 30
|
1 a
( N / 2)

2
[( N  1) / 2]  ( N  1)
|

( f ) 

T( a / 2 )

0
t 2 ( N / 2 )
(1 
)
dt
N 1
x f 1.e  x dx  ( f  1)!
0
| m – (T (a/2) . S) < sample < m – (T (a/2) . S)
|
m  T( a / 2 )
S
 mean  m  T( a / 2 )
N
S
N
Tariq Alahdal
27
Basic Statistics
Required Development
SOFTWARE FUNCTIONS: 1/4
SUBJECT
FUNCTION
Frequency Distribution
(Raw Data: Empirical
Distributions)
Mean
Central
Measures
Median
Mode
Mean
Deviation
Dispersion
Measures
Variance
Standard
Deviation
INPUT
OUTPUT
“N: Integer”: Number of values
“Array of N values: Real Array”
“n: Integer”: Number of ranges
“Array of n ranges: Real Array”
Array of n frequencies:
Integer Array”
“N: Integer”: Number of values
“Array of N values: Real Array”
(Can also be considered with
frequency ranges)
As above
(May be with Mean: Real)
(It can also be: Variance for
Standard Deviation and Vice
Versa)
“Mean value: Real”
“Median: Real”
“Mode: Real”
“Average Deviation:
Real”
“Variance: Real”
“Standard Deviation:
Real”
Tariq Alahdal
28
Basic Statistics
Required Development
SOFTWARE FUNCTIONS: 2/4
SUBJECT
Binomial
Distribution
Poisson
Distribution
FUNCTION
INPUT
OUTPUT
Combination
“N: Integer”: Number of values
(objects / independent trails)
“X: Integer”: Number of selected
values (success)
“Value of
Combination:
Real”
Probability
Density
As above plus:
“p: Real”: probability of success
“Probability of X
successes: Real”
Cumulative
Probability
“N, p”: As above.
“X1-X2: integers”: Range values
“Sum of
probabilities
(range): Real”
Probability
Density
“X: Integer”: Number of (arrivals)
“m: Real”: Mean number of (arrivals)
Cumulative
Probability
“m”: As above.
“X1-X2: Integers”: Range values
“Probability of X
arrivals: Real”
“Sum of
probabilities
(range): Real”
Tariq Alahdal
29
Basic Statistics
Required Development
SOFTWARE FUNCTIONS: 3/4
SUBJECT
Standard
Normal
Distribution
FUNCTION
INPUT
OUTPUT
Probability
Density
“z: Real”: Random variable
(measurement)
“Probability of value
z: Real”
Cumulative
Probability
“z1-z2: Real”: Range of values
“Sum of probabilities
(range): Real”
“f: Integer”: Degree of freedom
Gamma
Function
TDistribution
“Value of
gamma function:
“f: Real”: Degree of freedom
Real”
“f: Integer or Real”: Degree of freedom
Probability
Density
“f: Integer or Real”
“t: Real”: Random variable
“Probability of value
t: Real”
Cumulative
Probability
“f1-f2: Real”: Range of values
“Sum of probabilities
(range): Real”
Tariq Alahdal
30
Basic Statistics
Required Development
SOFTWARE FUNCTIONS: 4/4
SUBJECT
FUNCTION
Confidence
Co-efficient
For Large
Sample
For Small
Sample
OUTPUT
“L: Real”: Level of
Confidence (%)
Z(a/2): Real
“L”: As above.
“N: Integer”: Sample size
T(a/2): Real
“Seed: Real / Integer” :
According to requirements
“Uniform random value
(0-1): Real”
U (min, max)
“Min, Max: Integers”:
Range
“Uniform random value
in the range: Integer”
E (m)
“m: Real”: Mean value
(duration)
“Exponential random
value of mean m: Real”
N (m, s)
“m: Real”: Mean value
(measure)
“s: Real”: Standard
deviation
“Normal random value of
mean m, and standard
deviation s: Real”
Uniform
Random
Number
Generators
INPUT
Tariq Alahdal
31
Basic Statistics
REFERENCES
Seq.
Authors /
References
Title
Publication
1
Murray R Spiegel
Statistics
Schaum’s Outline Series,
McGraw-Hill, 1972
2
Ronald E. Walpole
Raymond H. Myers
Probability and Statistics for
Engineers & Scientists
Collier Macmillan, 1972
Donald E. Knuth
The Art of Computer Programming,
Vol.2
Addison-Wesley, 1969
4
Saad Haj Bakry and
Mustafa Shatila
Pascal Functions for the Generation
of Random Numbers
Journal of Computer,
Mathematics & Applications,
Vol. 15, No. 11, pp. 969-973,
1988 Pergamon Press, UK
5
Saad Haj Bakry and
Mustafa Shatila
A Computer Algorithm for Comp the
Confidence Limits of Measured
Factors
Journal of Engineering
Science, KSU, Vol. 2, 1990,
pp. 195-200
6
Averill M. Law
W. David Kelton
Simulation Modeling and Analysis
McGraw-Hill, 2000
Tariq Alahdal
32
Download