BASIC STATISTICS FOR SYSTEM STUDIES TARIQ AL-AHDAL, PhD 1 Basic Statistics OBJECTIVES / CONTENTS STATISTICAL DISTRIBUTIONS GENERATION OF RANDOM NUMBERS CONFIDENCE INTERVALS REQUIRED DEVELOPMENT Tariq Alahdal 2 Basic Statistics Statistical Distributions FREQUENCY DISTRIBUTION: 1/3 INPUT: ORGANIZE: raw data “a set of (N) values” values in ascending / descending order DETERMINE: DIVIDE: the range of raw data / values the range into sub-ranges Tariq Alahdal 3 Basic Statistics Statistical Distributions FREQUENCY DISTRIBUTION: 2/3 FIND: number of values per sub-range “frequency” RESULT: frequency distribution DIVIDE: frequency of each sub-range by (N) RESULT: relative frequency distribution “probability density” Tariq Alahdal 4 Basic Statistics Statistical Distributions FREQUENCY DISTRIBUTION: 3/3 ADD: frequencies sub-range by sub-range RESULT: cumulative frequency distribution ADD: relative frequencies sub-range by sub-range RESULT: relative cumulative frequency distribution “cumulative probability” Tariq Alahdal 5 Basic Statistics Statistical Distributions FREQUENCY DISTRIBUTION: Problem Given: “N values” raw data (N is very large for probability considerations) Frequency Distribution Find (Graphs to Illustrate) Relative Frequency Distribution: (Probability Density) Cumulative Frequency Distribution Relative Cumulative Frequency Distribution (Cumulative Probability) Tariq Alahdal 6 Basic Statistics Statistical Distributions MEAN : AVERAGE : EXPECTATION Definition Raw Data Mean Raw Data Weighted Mean Problem Arithmetic mean Given values: x[1], x[2], …. x[N] 1 m= N i N x[i ] i 1 Given ranges: y[1], y[2], ….y[n] Frequencies: f[1], f[2], …. f[n] 1 m= N j n f [ j ]. y[ j ] j 1 : j n f [ j] N j 1 Write and test computer functions / Give Illustrations Tariq Alahdal 7 Basic Statistics Statistical Distributions MODE / MEDIAN Median Middle value Mode Value with highest frequency Raw Data Median Given values: For ODD N: x[1], x[2], …. x[N] m = x[(N+1)/2] EVEN: m = (1/2) {x[N/2]+x[(N/2)+1]} Mode Find frequency distribution: m = x[k] : f[k] highest frequency Problem Write and test computer functions / Give Illustrations Tariq Alahdal 8 Basic Statistics Statistical Distributions DEVIATION / VARIANCE Deviation Mean Deviation Variance / Standard Deviation Standard Score Problem Deviation from the “mean” d[i] = |x[i] – m| 1 d N i N 1 n d [i ] d f [ j ].d [ j ] n j 1 i 1 1 v N i N d [i] 2 i 1 Standardized Variables: j n f [ j] N s j 1 v z[i] = d[i] / s Write and test computer functions / Give Illustrations Tariq Alahdal 9 Basic Statistics Statistical Distributions UNIFORM DISTRIBUTION “min”: Minimum number Features Range “max”: Maximum number Principle All numbers “x[i] : x” are equally likely Probability Density p(x,min,max) = 1 / (max-min) Mean m = (max + min) / 2 Variance v = (max - min)2 / 12 Problem Write and test computer functions / Give Illustrations Tariq Alahdal 10 Basic Statistics Statistical Distributions BINOMIAL DISTRIBUTION Features N p q x Probability Density Number of trials Probability of success Probability of failure: q = 1 - p Number of successful trials in N p ( x, N , p ) C x N pq Mean m=N.p Variance v=N.p.q Problem x N x Write and test computer functions / Give Illustrations Tariq Alahdal 11 Basic Statistics Statistical Distributions POISSON DISTRIBUTION Features r t x Probability Density Mean Rate of arrivals: mean Time interval, may be “t=1 time unit” Possible number of arrivals during “t” (rt ) x rt p( x, r , t ) e x! m=r v=r Variance Problem Write and test computer functions / Give Illustrations Tariq Alahdal 12 Basic Statistics Statistical Distributions EXPONENTIAL DISTRIBUTION r w Features Principle Probability Density Mean Rate of arrivals Waiting time for next arrival: Inter-arrival Distribution for the value of “w” p( w, r ) r.e m=1/r (Poisson inter-arrival mean) v = 1 / r2 Variance Problem rw Write and test computer functions / Give Illustrations Tariq Alahdal 13 Basic Statistics Statistical Distributions NORMAL DISTRIBUTION Range - (infinity) < m v x Features Principle Probability Density Standard form: Problem x< + (infinity) Average value: mean / median / mode Variance: v = s2 Possible value Usually a measurement process 1 p( x, m, s) e s 2 p( z ,0,1) 1 2s 2 ( xm)2 1 z2 1 e 2 2 Write and test computer functions / Give Illustrations Tariq Alahdal 14 Basic Statistics Statistical Distributions t DISTRIBUTION Range m f Features Principle t Probability Density - (infinity) < t < + (infinity) mean / median / mode at “zero” Degree of freedom: 0 < f < + (infinity) Used for estimation: small sample Distribution variable: variance unknown [( f 1) / 2] t 2 ( f 1) / 2 p (t , f ) (1 ) f ( f / 2) f Gamma Function ( f ) f 1 x x e dx ( f 1)! 0 Problem Write and test computer functions / Give Illustrations Tariq Alahdal 15 Basic Statistics Statistical Distributions COMPUTATION TIPS: 1/2 0!=1 Integer Factorial (i+1) ! = i! (i+1) Sterling Formula: Real Factorial Gamma Function Problem i! 2i i e i i Useful for its computation Write and test computer functions / Give Illustrations Tariq Alahdal 16 Basic Statistics Statistical Distributions COMPUTATION TIPS: 2/2 C Combination N i 1 C Poisson Integration Problem N 0 C N i 1 ( N i 1) i A0 1 0! Ai 1 Ai A (i 1)! i! (i 1) Trapezoid Rule: Summation in small steps Write and test computer functions / Give Illustrations Tariq Alahdal 17 Basic Statistics Generation of Random Numbers RNGS: WHY RANDOM PROCESSES SAMPLING TESTING COMPUTER ALGORITHMS NUMERICAL ANALYSIS SYSTEM MODELING / SIMULATION DECISION MAKING OTHER REASONS Tariq Alahdal 18 Basic Statistics Generation of Random Numbers UNIFORM RNG: U (0,1) Modulus factor: large (st) prime number within m memory cell size (for wide repeated sequence cycle) m = 231 - 1 = 2,147,483,674 (for 32 bit cell) Features a b X[0] X[i-1] Multiplier: a = 314,159,269 Increment: b = 453,806,245 Starting value: X[0] = 577,215,665 (the seed) “(ith-1)” value: seed for X[i] Uniform: X (0,m) X[i] = {a . X[i-1] + b} MOD m Uniform: U (0,1) U[i] = X[i] / m Problem Write and test computer functions / Give Illustrations Tariq Alahdal 19 Basic Statistics Generation of Random Numbers UNIFORM RNG: U (min , max) min max U (min, max) Integer Range Problem Test Note Required minimum integer value Required maximum integer value min + TRUNC [(max – min + 1) . U (0, 1)] Write and test computer functions / Give Illustrations 100,000 “runs”: Test • Frequency Distribution • Mean • Variance & Standard Deviation (Relative to theoretical expectations) Every new set of runs should start with a different seed: X[0] Tariq Alahdal 20 Basic Statistics Generation of Random Numbers EXPONENTIAL RNG: E (m) Required “mean”: m E (mean) - (m) . Ln [U(0, 1)] Problem Write and test computer functions / Give Illustrations “inter-event” Poisson Test 100,000 “runs”: Test • Frequency Distribution • Mean • Variance & Standard Deviation (Relative to theoretical expectations) Tariq Alahdal 21 Basic Statistics Generation of Random Numbers NORMAL RNG: N (m, s) Features m s Required “standard deviation” 2 3 V[1] = 2 . { U(0,1)[1] } – 1 V[2] = 2 . { U(0,1)[2] } – 1 SUM = V2[1] + V2[2] IF SUM >= 1 GO TO STEP 1 4 Y V [2]. [2.Ln(SUM )] / SUM 5 N (m,s) = m + s . Y N (0,1) = Y 1 STEPS Required “mean” of the normal RNG Standard Normal Problem Write and test computer functions / Give Illustrations Tariq Alahdal 22 Basic Statistics Confidence MEASUREMENTS & ESTIMATIONS MEASUREMENTS: Experiments on real systems / models / Simulation Large Sample: N >= 30 Small Sample: N < 30 SET OF VALUES: Sample of results (N) Confidence for “Sample” ESTIMATION THEORY Confidence for “Mean” Tariq Alahdal 23 Basic Statistics Confidence Intervals EVALUATION ALGORITHM: General INPUT: Measurements: N & x[1], x[2],….,x[N] Required “confidence level” : L (%) MEAN: | 1 m N STANDARD DEVIATION: |s i N x[i ] i 1 1 i n 2 { x [ i ] m } N 1 i 1 AREA UNDER CURVE: | a = 1 – (L/100) Tariq Alahdal 24 Basic Statistics Confidence Intervals 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 Probability Density NORMAL DISTRIBUTION 68.27 % Standard Deviations 3.9 -3 Standard Deviations 95.45 % -2 -z -1 (a/2) -0.5 0 Mean 0.5 1 2 3 3.9 + z (a/2) Tariq Alahdal 25 Basic Statistics Confidence Intervals EVALUATION ALGORITHM: N >=30 CONFIDENCE COEFFICIENT: Z (a/2) | 1 a 2 2 2 Z( a / 2) e z2 2 dz 0 | m – (Z (a/2) . S) < sample < m – (Z (a/2) . S) | m Z ( a / 2) S S mean m Z ( a / 2 ) N N Tariq Alahdal 26 Basic Statistics Confidence Intervals EVALUATION ALGORITHM: N < 30 | 1 a ( N / 2) 2 [( N 1) / 2] ( N 1) | ( f ) T( a / 2 ) 0 t 2 ( N / 2 ) (1 ) dt N 1 x f 1.e x dx ( f 1)! 0 | m – (T (a/2) . S) < sample < m – (T (a/2) . S) | m T( a / 2 ) S mean m T( a / 2 ) N S N Tariq Alahdal 27 Basic Statistics Required Development SOFTWARE FUNCTIONS: 1/4 SUBJECT FUNCTION Frequency Distribution (Raw Data: Empirical Distributions) Mean Central Measures Median Mode Mean Deviation Dispersion Measures Variance Standard Deviation INPUT OUTPUT “N: Integer”: Number of values “Array of N values: Real Array” “n: Integer”: Number of ranges “Array of n ranges: Real Array” Array of n frequencies: Integer Array” “N: Integer”: Number of values “Array of N values: Real Array” (Can also be considered with frequency ranges) As above (May be with Mean: Real) (It can also be: Variance for Standard Deviation and Vice Versa) “Mean value: Real” “Median: Real” “Mode: Real” “Average Deviation: Real” “Variance: Real” “Standard Deviation: Real” Tariq Alahdal 28 Basic Statistics Required Development SOFTWARE FUNCTIONS: 2/4 SUBJECT Binomial Distribution Poisson Distribution FUNCTION INPUT OUTPUT Combination “N: Integer”: Number of values (objects / independent trails) “X: Integer”: Number of selected values (success) “Value of Combination: Real” Probability Density As above plus: “p: Real”: probability of success “Probability of X successes: Real” Cumulative Probability “N, p”: As above. “X1-X2: integers”: Range values “Sum of probabilities (range): Real” Probability Density “X: Integer”: Number of (arrivals) “m: Real”: Mean number of (arrivals) Cumulative Probability “m”: As above. “X1-X2: Integers”: Range values “Probability of X arrivals: Real” “Sum of probabilities (range): Real” Tariq Alahdal 29 Basic Statistics Required Development SOFTWARE FUNCTIONS: 3/4 SUBJECT Standard Normal Distribution FUNCTION INPUT OUTPUT Probability Density “z: Real”: Random variable (measurement) “Probability of value z: Real” Cumulative Probability “z1-z2: Real”: Range of values “Sum of probabilities (range): Real” “f: Integer”: Degree of freedom Gamma Function TDistribution “Value of gamma function: “f: Real”: Degree of freedom Real” “f: Integer or Real”: Degree of freedom Probability Density “f: Integer or Real” “t: Real”: Random variable “Probability of value t: Real” Cumulative Probability “f1-f2: Real”: Range of values “Sum of probabilities (range): Real” Tariq Alahdal 30 Basic Statistics Required Development SOFTWARE FUNCTIONS: 4/4 SUBJECT FUNCTION Confidence Co-efficient For Large Sample For Small Sample OUTPUT “L: Real”: Level of Confidence (%) Z(a/2): Real “L”: As above. “N: Integer”: Sample size T(a/2): Real “Seed: Real / Integer” : According to requirements “Uniform random value (0-1): Real” U (min, max) “Min, Max: Integers”: Range “Uniform random value in the range: Integer” E (m) “m: Real”: Mean value (duration) “Exponential random value of mean m: Real” N (m, s) “m: Real”: Mean value (measure) “s: Real”: Standard deviation “Normal random value of mean m, and standard deviation s: Real” Uniform Random Number Generators INPUT Tariq Alahdal 31 Basic Statistics REFERENCES Seq. Authors / References Title Publication 1 Murray R Spiegel Statistics Schaum’s Outline Series, McGraw-Hill, 1972 2 Ronald E. Walpole Raymond H. Myers Probability and Statistics for Engineers & Scientists Collier Macmillan, 1972 Donald E. Knuth The Art of Computer Programming, Vol.2 Addison-Wesley, 1969 4 Saad Haj Bakry and Mustafa Shatila Pascal Functions for the Generation of Random Numbers Journal of Computer, Mathematics & Applications, Vol. 15, No. 11, pp. 969-973, 1988 Pergamon Press, UK 5 Saad Haj Bakry and Mustafa Shatila A Computer Algorithm for Comp the Confidence Limits of Measured Factors Journal of Engineering Science, KSU, Vol. 2, 1990, pp. 195-200 6 Averill M. Law W. David Kelton Simulation Modeling and Analysis McGraw-Hill, 2000 Tariq Alahdal 32