Probability Distributions with SCILAB
By
Gilberto E. Urroz, Ph.D., P.E.
Distributed by
i nfoClearinghouse.com
©2001 Gilberto E. Urroz
All Rights Reserved
A "zip" file containing all of the programs in this document (and other
SCILAB documents at InfoClearinghouse.com) can be downloaded at the
following site:
http://www.engineering.usu.edu/cee/faculty/gurro/Software_Calculators/Scil
ab_Docs/ScilabBookFunctions.zip
The author's SCILAB web page can be accessed at:
http://www.engineering.usu.edu/cee/faculty/gurro/Scilab.html
Please report any errors in this document to: gurro@cc.usu.edu
PROBABILITY DISTRIBUTIONS
3
Discrete probability distributions
Bernoulli probability distribution
Binomial probability distribution
Poisson probability distribution:
Geometric probability distribution:
Hypergeometric probability mass function
3
3
4
5
6
7
Cumulative distribution functions for discrete probability distributions
SCILAB functions for discrete cumulative distribution functions
SCILAB function cdfbin
Discrete probability calculations through user-defined functions
Combinations
Binomial distribution
Poisson distribution
Geometric distribution
Hypergeometric distribution
Continuous probability functions
Factorials and the Gamma function
The gamma distribution
The exponential distribution
The beta distribution
The Weibull distribution
The uniform distribution
User-defined functions for continuous probability distributions
Continuous probability distributions used in statistical inference
The Normal distribution
The Student-t distribution
The Chi-squared (χ2) distribution
The F distribution
Applications of the normal distribution in data analysis
Plotting a histogram and its corresponding normal curve
Plotting data against their normal scores
The lognormal distribution
9
9
9
10
11
11
12
13
14
15
15
16
17
17
19
19
20
25
25
25
27
28
30
31
34
36
Generating synthetic data
Generating normally-distributed synthetic data
Additional applications of function rand
SCILAB function for generating synthetic data
Examples of synthetic data generation using function grand
Additional notes on function grand
Pseudo-random generators
Generating log-normally-distributed data
Generating data that follows the Weibull distribution
Generating data that follows the Student’s t distribution
Generating data that follows a discrete distribution
38
38
39
40
41
49
50
51
52
53
54
Download at InfoClearinghouse.com
1
© 2001 Gilberto E. Urroz
Statistical simulation
Simulating traffic through a service station
An user-defined function to simulate traffic through a service station
Modeling traffic through a service station with random input
56
57
58
60
STIXBOX: a rudimentary statistics toolbox
63
Exercises
72
Download at InfoClearinghouse.com
2
© 2001 Gilberto E. Urroz
Probability Distributions
There are a number of mathematical functions that possess the properties of a probability mass
function for discrete random variables or the properties of a probability density function for
continuous random variables. In this section we introduce a number of those functions for the
calculation of probabilities. Because these probability distributions depend on a finite number
of parameters they are typically referred to as parametric distributions.
Discrete probability distributions
Some of the most useful discrete probability distributions are the Bernoulli, Binomial, Poisson,
geometric, and hypergeometric distributions. The definitions of the corresponding probability
mass and distribution functions are shown below. We also present expressions for the mean,
variance, and standard deviation of these distributions.
Bernoulli probability distribution
The Bernoulli probability distribution applies to a discrete random variable that can only have
values of 0 or 1, i.e., X = 0, 1. Let the probability of X = 1 be p, i.e., fX(1) = p, then fX(0) = 1-p.
This can be summarized as
fX(x) = px(1-p)1-x, x = 0,1
The mean value of the distribution is
µX = 0 (1-p) + 1 p = p.
The expectation of X2, E(X2), is needed to calculate the variance Var(X) = E(X2)-µX2. For the
Bernoulli distribution,
E(X2) = 02 (1-p) + 12 p = p,
and
Var(X) = E(X2)-µX2 = p-p2 = p(1-p).
Thus, the standard deviation is
σX = [p(1-p)]1/2.
These results can be obtained using SCILAB as follows:
-->p=poly(0,'p')
p =
p
-->X = [0,1]
X =
!
0.
1. !
-->Prob = [1-p p]
Prob =
Download at InfoClearinghouse.com
3
© 2001 Gilberto E. Urroz
!
1 - p
p
!
-->muX = X*Prob'
muX = p
-->EX2 = X^2*Prob'
EX2 = p
-->VarX = EX2 - muX^2
VarX =
2
p - p
The Bernoulli distribution applies to a simple binary experiments in which only two possible
outcomes exist: 1 or 0, yes or no, success or failure. The value of the probability of success,
p, can be obtained, for example, from the classical or from the frequency definitions of
probability. Bernoulli processes constitute the base of the binomial and geometric
distributions presented below.
Binomial probability distribution
If a Bernoulli experiment with success probability p is repeated n times, the probability of
having x successes out of the n trials is given by
n
Γ(n + 1)
f X ( x) =   ⋅ p x ⋅ (1 − p) n − x =
⋅ p x ⋅ (1 − p) n − x , x = 0,1,2,..., n, 0 < p < 1
Γ(r + 1) ⋅ Γ(n − r + 1)
 x
with
µX = np, Var(X) = np(1-p), and σx = [np(1-p)]1/2.
In SCILAB, we can define the probability mass function for the Binomial distribution as
-->deff('[f]=fX(x,n,p)',…
-->'f=gamma(n+1).*p.^x.*(1-p).^(n-x)./(gamma(x+1).*gamma(n-x+1))')
Next, we use this function to produce a plot of the probability mass function for n = 10, p =
0.10:
-->n=10; p=0.10; xx=[0:1:10]; yy = fX(xx);
-->xset('window',1);xset('mark',-9,2); plot2d(xx',yy',-9)
-->xtitle('Binomial pmf','x','fX(x)')
Download at InfoClearinghouse.com
4
© 2001 Gilberto E. Urroz
The following commands produce a plot of the cumulative distribution function:
-->yyy = [];for j = 1:n+1, yyy = [yyy sum(yy(1:j))]; end;
-->xset('window',2); xset('mark',-9,2); plot2d(xx',yyy',-9)
-->xtitle('Binomial cdf','x','FX(x)')
Poisson probability distribution:
If X is a Binomial variable with n →∞ and p →0, we calculate the parameter λ = n⋅p, and define
the Poisson probability mass function as
e −λ ⋅ λx
f X ( x) =
, x = 0,1,2,..., ∞; λ > 0.
x!
The Poisson pmf can be used to model the number of occurrences of a certain event in a given
time period or per unit length, area or volume, if λ represents the mean occurrence of the
even per unit time, length, area or volume, respectively.
The Poisson distribution has the parameters
µX = λ, Var(X) = λ2, and σx = λ.
Download at InfoClearinghouse.com
5
© 2001 Gilberto E. Urroz
In SCILAB we can define the Poisson distribution pmf as:
-->deff('[p]=fX(x,lambda)','p=exp(-lambda).*lambda.^x./gamma(x+1)')
A plot of the pmf for λ = 2.5 for values of x between 0 and 20:
-->lambda = 2.5; xx = [0:1:20]; yy =fX(xx,lambda);
-->xset('window',1);xset('mark',-9,2);plot2d(xx',yy',-9)
-->xset('Poisson pmf','x','fX(x)')
A plot of the corresponding cumulative distribution function follows:
-->yyy = []; for j = 1:21, yyy = [yyy sum(yy(1:j))]; end;
-->xset('window',2); xset('mark',-9,2); plot2d(xx',yyy',-9)
-->xset('window',2); xset('mark',-9,2); plot2d(xx',yyy',9)
-->xtitle('Poisson cdf','x','FX(x)')
Geometric probability distribution:
Suppose that we have a Bernoulli experiment with probability of success p being repeated until
a successful outcome occurs. Let X represent the number of repetitions before a success, then
X can be modeled with the geometric pmf:
fX(x) = p⋅(1-p)x-1, x = 1, 2, …,∞; 0<p<1.
The Poisson distribution has the parameters
µX = 1/p, Var(X) = (1-p)/p2, and σx = (1-p)1/2/p.
The pmf for the geometric distribution and a plot of it is obtained in SCILAB by using:
-->deff('[f]=fX(p,x)','f=p*(1-p)^(x-1)')
-->p = 0.25; xx = [0:1:20]; yy = fX(p,xx);
-->xset('window',1);xset('mark',-9,2);plot2d(xx',yy',-9)
-->xset('window',1);xset('mark',-9,2);plot2d(xx',yy',-9)
-->xtitle('geometric pmf','x','fX(x)')
Download at InfoClearinghouse.com
6
© 2001 Gilberto E. Urroz
A plot of the geometric distribution CDF is shown next:
-->yyy = [];for j = 1:21, yyy = [yyy sum(yy(1:j))]; end;
-->xset('window',2); xset('mark',-9,2); plot2d(xx',yyy',-9)
-->xtitle('geometric cdf','x','FX(x)')
Hypergeometric probability mass function
Suppose that we have a finite population of N elements, out of which a < N elements are
defective. Suppose also that we take a sample of size n < N out of the population, and let X
represent the number of defective elements in the sample of size n. The probability of X is
given by the following pmf:
 a  N − a 
 

x  n − x 

f X ( x , n, a , N ) =
,0 < n < N ,0 < a < N , x = 0,1,..., n.
N
 
n
Download at InfoClearinghouse.com
7
© 2001 Gilberto E. Urroz
Parameters of the distribution are:
µX = n⋅a/N, Var(X) = na(N-a)(N-n)/(N2(N-1)).
To produce plots of the hypergeometric probability mass function and cumulative distribution
function, we first define a function accounting for the binomial coefficient:
-->deff('[CC]=C(n,r)','CC=gamma(n+1)./(gamma(r+1).*gamma(n-r+1))')
This function is incorporated in the definition of the hypergeometric function:
-->deff('[p]=fX(x)','p=C(a,x).*C(N-a,n-x)./C(N,n)')
Next, we produce plots of the hypergeometric pmf and CDF for N = 100, a = 25, and n = 20:
-->N=100;a=25;n=20;
-->xx=[0:1:20];yy=fX(xx);
-->xset('window',1);xset('mark',-9,2);
-->plot2d(xx',yy',-9);xtitle('Hypergeometric distribution','x','fX(x)');
-->yyy=[];for j=1:21, yyy=[yyy sum(yy(1:j))]; end;
-->xset('window',2);xset('mark',-9,2);
-->plot2d(xx',yyy',-9);xtitle('Hypergeometric distribution','x','FX(x)');
-->plot2d(xx',yyy',9)
Download at InfoClearinghouse.com
8
© 2001 Gilberto E. Urroz
Cumulative distribution functions for discrete
probability distributions
Out of the five probability distributions presented above, namely, Bernoulli, Binomial, Poisson,
geometric, and hypergeometric, three of them represent finite populations of discrete values
(Bernoulli, Binomial, hypergeometric) and two representing infinite populations (Poisson and
geometric).
For the Binomial, Poisson, geometric, and hypergeometric functions, the
cumulative distribution function is calculated using
x
FX ( x) = ∑ f X (k ),
k =0
where fX(x) represents the corresponding probability mass functions. (This is the definition
used to produce the CDF graphics shown in the previous examples).
The cumulative
distribution function FX(x) is defined in the same range of values of the discrete random
variable X.
For the geometric distribution, whose domain starts at x = 1, the corresponding expression is
x
x
k =1
k =1
FX ( x) = ∑ f X (k ) = ∑ p (1 − p) k −1 , x = 1,2,3,...
SCILAB functions for discrete cumulative distribution functions
SCILAB provides a number of functions for operations with cumulative distribution functions.
For discrete distributions the following functions are provided:
•
•
•
cdfbin - Binomial distribution
cdfnbn - Negative binomial distribution
cdfpoi - Poisson distribution (described in detail in Chapter …)
Information on these functions can be obtained by using the help function.
the use of function cdfbin.
Next, we describe
SCILAB function cdfbin
There four different forms of the call to function cdfbin:
[P,Q]=cdfbin("PQ",S,Xn,Pr,Ompr)
[S]=cdfbin("S",Xn,Pr,Ompr,P,Q)
[Xn]=cdfbin("Xn",Pr,Ompr,P,Q,S)
[Pr,Ompr]=cdfbin("PrOmpr",P,Q,S,Xn)
The variable Pr in these calls represents the probability of success on any given trial that we
refer to as p in the definition of the Bernoulli pmf shown earlier. On the other hand, OmPr
represents 1-Pr (in some references this is referred to as q = 1 - p), i.e., the probability of
failure in a given trial.
The variable P represents the probability P(X≤S), where X ~
Binomial(Xn,Pr), while Q = 1 - P.
Download at InfoClearinghouse.com
9
© 2001 Gilberto E. Urroz
The first argument in the calls to function cdfbin is a string that determines which variable is
being sought, according to:
-calculate probabilities, P = P(X≤S) and Q = 1 - P
-calculate the inverse CDF, i.e., calculate S from P = P(X≤S)
-calculate the number of trials (n in the definition of the pdf)
- calculate the probability of success in any given trial (p in the pdf definition)
“PQ”
“S”
“Xn”
“PrOmpr”
Care should be exercised in keeping the proper order of the variables in the calls to the
function.
Some examples follow:
-->n = 10; x = 6; p = 0.35; q = 1-p;
-->[P,Q] = cdfbin('PQ',x,n,p,q)
Q =
//Calculating probabilities
.0260243
P
=
.9739757
-->n=20;p=0.35;q=1-p;P=0.75;Q=1-P;
-->x = cdfbin("S",n,p,q,P,Q)
x =
//Calculating the inverse CDF
7.9132062
-->[p,q] = cdfbin("PrOmpr",P,Q,x,n) //Calculating p and q = 1-p
q =
.7391494
p
=
.2608506
Notes: Use help cdfnbn to learn more about the function that implements the negative
Binomial distribution. The function cdfpoi was described in detail in Chapter 13.
Discrete probability calculations through user-defined functions
Besides the few pre-programmed cumulative distribution functions provided by SCILAB,
probabilities can be calculated by defining probability mass and cumulative distribution
functions for the different distributions presented earlier.
The basic definitions of
probabilities in terms of probability mass and cumulative distribution functions are:
P(X=x) = fX(x), pmf
x
P( X ≤ x) = ∑ f X (k ), cdf for Binomial, Poisson, and hypergeometric distributions
x =0
Download at InfoClearinghouse.com
10
© 2001 Gilberto E. Urroz
x
P( X ≤ x) = ∑ f X (k ), cdf for geometric distribution
x =1
We will define the following functions for the distributions shown earlier:
pmf
Binomial
b(x,n,p)
Poisson
p(x,lambda)
geometric
g(x,p)
hypergeometric h(x,N,n,a)
CDF
B(x,n,p)
P(x,lambda)
G(x,p)
H(x,N,n,a)
The following is a SCILAB script, called DiscreteProbabilityFunctions, which includes the
definitions for the eight function calls listed in the table immediately above:
//Defining discrete probability distributions
deff('[CC]=C(n,r)','CC=gamma(n+1)./(gamma(r+1).*gamma(n-r+1))')
deff('[bb]=b(x,n,p)','bb=C(n,x).*p.^x.*(1-p).^(n-x)')
deff('[BB]=B(x,n,p)','BB=sum(b([0:1:x],n,p))')
deff('[pp]=p(x,lambda)','pp=exp(-lambda).*lambda^x./gamma(x+1)')
deff('[PP]=P(x,lambda)','PP=sum(p([0:1:x],lambda))')
deff('[gg]=g(x,p)','gg=p.*(1-p).^(x-1)')
deff('[GG]=G(x,p)','GG=sum(g([1:x],p))')
deff('[hh]=h(x,N,n,a)','hh=C(a,x).*C(N-a,n-x)./C(N,n)')
deff('[HH]=H(x,N,n,a)','HH=sum(h([0:1:x],N,n,a))')
//Binomial coefficient
//Binomial pmf
//Binomial CDF
//Poisson pmf
//Poisson CDF
//Geometric pmf
//Geometric CDF
//Hypergeometric pmf
//Hypergeometric CDF
To execute the script that defines the discrete probability functions use:
-->exec('DiscreteProbabilityFunctions')
Combinations
The function C(n,r) represents combinations of n elements taken r by r, or the binomial
coefficient:
-->C(10,5)
ans =
252.
This is a vector of values of C(n,r) for n = 10, and r = 0,1, …, 10:
-->C10=[];for j=0:10,C10=[C10 C(10,j)]; end; C10
C10 =
!
1.
1. !
10.
45.
120.
210.
252.
210.
120.
45.
10.
Binomial distribution
For the binomial distribution with n = 10 and p = 0.25, the following call to function b(x,n,p)
calculates the probability P(X=2) = b(2,10,0.25):
-->b(2,10,0.25)
ans =
Download at InfoClearinghouse.com
11
© 2001 Gilberto E. Urroz
.2815676
The following is a list of values of the binomial pmf for n = 10, p = 0.25, for all possible values
of x = 0,1, …, 10:
-->b10=[];for j=0:10,b10=[b10 b(j,10,0.25)]; end; b10
b10 =
column 1 to 7
!
.0563135
.016222 !
column
.1877117
!
.2815676
.2502823
.145998
.0583992
.0030899
8 to 11
.0003862
.0000286
9.537E-07 !
The binomial CDF for x = 2, n = 10, p = 0.25 is calculated with the following call to function
B(x,n,p). This value represents P(X≤2):
-->B(2,10,0.25)
ans =
.5255928
This value represents P(X>2) = 1 - P(X≤2):
-->1-B(2,10,0.25)
ans =
.4744072
The following is a list of values of the binomial CDF for n = 10, p = 0.25, for all values of x =
0,1, …, 10:
-->B10=[];for j=0:10,B10=[B10 B(j,10,0.25)]; end; B10
B10 =
column 1 to 7
!
.0563135
.9964943 !
column
!
.9995842
.2440252
.5255928
.7758751
.9218731
.9802723
8 to 11
.9999704
.9999990
1. !
Poisson distribution
The pmf of the Poisson distribution can be used to calculate probabilities such as P(X=2) for λ =
5.2:
-->p(2,5.2)
ans =
.0745840
Download at InfoClearinghouse.com
12
© 2001 Gilberto E. Urroz
For P(X=6), the Poisson distribution with for λ = 5.2 produces:
-->p(6,5.2)
ans =
.1514803
The cumulative distribution function for the Poisson distribution, with for λ = 5.2, provides the
probability P(X≤6):
-->P(6,5.2)
ans =
.7323933
The following SCILAB commands produce a vector of values of the Poisson cdf for x = 0, 1, …,
10, and λ = 5.2:
-->P10=[];for j=1:10, P10=[P10 P(j,5.2)]; end; P10
P10 =
column 1 to 7
!
.0342027
.8449216 !
column
!
.1087867
.2380655
.406128
.580913
.7323933
8 to 10
.9180650
.9603256
.9823011 !
Geometric distribution
The probabilities P(X=3) and P(X=5) using the geometric distribution with p = 0.50 are
calculated as:
-->g(3,0.50)
ans =
.125
-->g(5,0.50)
ans =
.03125
The following example shows a way to calculate a vector of values of the geometric
distribution pmf for x = 1, 2, …, 10:
-->g([1:10],0.5)
ans =
column 1 to 9
!
.5
.0039063
.25
.125
.0019531 !
.0625
.03125
.015625
.0078125
column 10
Download at InfoClearinghouse.com
13
© 2001 Gilberto E. Urroz
!
.0009766 !
The following evaluations of the geometric distribution cdf are used to calculate the
probabilities P(X6), P(X3), and P(X1), respectively:
-->G(6,0.5)
ans =
.984375
-->G(3,0.5)
ans =
.875
-->G(1,0.5)
ans =
.5
A vector of values of the geometric distribution CDF, with p = 0.5, is produced by using the
following commands:
-->G10=[];for j=1:10, G10=[G10 G(j,0.5)]; end; G10
G10 =
column 1 to 9
!
.5
.9960938
.75
.875
.9980469 !
.9375
.96875
.984375
.9921875
column 10
!
.9990234 !
Hypergeometric distribution
The next line assign values to the parameters N, n, and a in the hypergeometric distribution:
-->N=100;n=20;a=35;
The probability P(X=12) for the hyperbolic distribution with the parameters N, n, and a defined
above is calculated as:
-->h(12,N,n,a)
ans =
.0078581
The cumulative distribution function for the hypergeometric distribution for x = 12 is
calculated as follows:
-->H(12,N,n,a)
ans =
.9976693
Download at InfoClearinghouse.com
14
© 2001 Gilberto E. Urroz
The value just calculated represents the probability P(X≤12). The next statement generates a
vector of values of the hypergeometric pdf for x = 0, 1, 2, …, 20:
-->h([0:20],N,n,a)
ans =
column 1 to 7
!
.0000529
.1847085 !
column
!
.2060210
.0019176 !
.0008046
.0055295
.0228093
.0633073
.1256018
.1179114
.0613139
.0248839
.0078581
.0000051
3.698E-07
1.761E-08
4.924E-10
8 to 14
.1768671
column 15 to 21
!
.0003575
6.060E-12 !
.0000501
The next line produces a vector of values of the hypergeometric CDF:
-->H10=[];for j=1:10,H10=[H10 h(j,N,n,a)]; end; H10
H10 =
column 1 to 7
!
.0008046
.2060210 !
column
!
.1768671
.0055295
.0228093
.0633073
.1256018
.1847085
8 to 10
.1179114
.0613139 !
Continuous probability functions
In this section we describe several continuous probability distributions including the gamma,
exponential, beta, and Weibull distributions. Some of these distributions make use of the
Gamma function, Γ(x), which is defined next.
__________________________________________________________________________________
Factorials and the Gamma function (see also Chapter 13)
The Gamma function is defined by
∞
Γ(α ) = ∫ xα −1e− x dx
0
This function has the property that ,
Γ(α) = (α-1) Γ(α−1), for α > 1,
therefore, it can be related to the factorial of a number, i.e.,
Download at InfoClearinghouse.com
15
© 2001 Gilberto E. Urroz
Γ(α) = (α-1)!,
when α is a positive integer.
Factorials have applications in combinatorics (calculation of combinations and permutations,
etc.), and in some discrete probability distributions (e.g., binomial probability distribution),
while the gamma function has applications in continuous probability distributions (e.g., the
gamma probability distribution.)
__________________________________________________________________________________
The gamma distribution
The probability distribution function (pdf) for the gamma distribution is given by
f ( x) =
1
x
⋅ x α −1 ⋅ exp( − ), for
β
β Γ (α )
α
x > 0 , α > 0 , β > 0;
The parameters α and β are referred to, respectively, as the shape and scale parameters of the
gamma distribution. Other parameters of this distribution are:
µX = α ⋅ β, σ X = α ⋅ β 2.
SCILAB provides function cdfgam for operations with the gamma distribution CDF. The calls to
this function take the form
[P,Q]=cdfgam("PQ",X,Shape,Scale)
[X]=cdfgam("X",Shape,Scale,P,Q)
[Shape]=cdfgam("Shape",Scale,P,Q,X)
[Scale]=cdfgam("Scale",P,Q,X,Shape)
where P = P(XX<X), Q = 1- P, Shape = α, and Scale = β, with XX ~ gamma(α,β).
The following are examples of applications of function cdfgam. The following three calls
determine, respectively, the probabilities P = P(X<10), P = P(X<3), and P = P(X<0.5), as well as
the probabilities of the complement, Q = 1 - P, for the gamma distribution with α = 2, β = 3:
-->[P,Q]=cdfgam("PQ",10,2,3)
Q =
P
2.901E-12
=
1.
-->[P,Q]=cdfgam("PQ",3,2,3)
Q =
.0012341
P
=
.9987659
Download at InfoClearinghouse.com
16
© 2001 Gilberto E. Urroz
-->[P,Q]=cdfgam("PQ",0.5,2,3)
Q =
.5578254
P
=
.4421746
The next call to function cdfgam calculates the inverse gamma function, i.e., the value of x for
P = P(X<x) where X follows the gamma distribution with α = 2, β = 3:
-->x=cdfgam('X',2,3,0.4,0.6)
x =
.4588071
The next call to the function is used to calculate the shape parameter, α, given a probability P
= P(X<0.3) = 0.6, Q = 1-P = 0.4, with X following the gamma distribution with a scale parameter
β = 2:
-->alpha = cdfgam('Shape',2,0.6,0.4,0.3)
alpha =
.7190660
The next call to function cdfgam calculates the scale parameter, β, given a probability P =
(X<1.2) = 0.2, Q = 1-P = 0.8, with X following the gamma distribution with α = 3:
-->beta = cdfgam('Scale',0.2,0.8,1.2,3)
beta =
1.2792035
The exponential distribution
The exponential distribution is the gamma distribution with α = 1. Its pdf is given by
f X ( x) =
1
x
⋅ exp( − ),
β
β
x > 0 , β > 0;
While its cdf is given by
FX(x) = 1 - exp(-x/β), for x>0, β >0.
Parameters of the exponential distribution include:
µX =
1
1
, σX = .
β
β
The beta distribution
Download at InfoClearinghouse.com
17
© 2001 Gilberto E. Urroz
The pdf for the beta distribution is given by
fX (x) =
Γ(α +β) α−1
⋅ x ⋅(1−x)β−1, 0 < x <1, α > 0, β >0;
Γ(α)⋅Γ(β)
As in the case of the gamma distribution, the corresponding cdf for the beta distribution is
also given by an integral with no closed-form solution.
The parameters of the beta distribution include
µX =
α
α⋅β
, Var ( X ) =
α+β
(α + β + 1)(α + β ) 2 .
SCILAB provides function cdfbet for operations with the cumulative distribution function of the
beta distribution. Calls to the function are the following:
[P,Q]=cdfbet("PQ",X,Y,A,B)
[X,Y]=cdfbet("XY",A,B,P,Q)
[A]=cdfbet("A",B,P,Q,X,Y)
[B]=cdfbet("B",P,Q,X,Y,A)
In these calls P = P(XX<X), Y = 1 - X, Q = 1 - P, A, B are the parameters α and β of the beta
distribution.
Next, we present some applications of function cdfbet. The first example calculate the
probability P(X<0.35) for the beta distribution with α = 2, β = 3:
-->[P,Q]=cdfbet('PQ',0.35,1-0.35,2,3)
Q =
.5629813
P
=
.4370187
An example that calculates the inverse function of the beta cdf, i.e., the value of x for which P
= P(X<x) = 0.75, for the beta distribution with α = 3, β = 5 is presented next:
-->[X,Y] = cdfbet("XY",3,5,0.75,1-0.75)
Y =
.5139030
X
=
.4860970
The next two examples shows how to obtain the parameters a and b from the beta distribution
given values of X = 0.3, Y = 1-X = 0.7, P = P(X<0.3) = 0.4, and Q = 1-P = 0.6. In the first
application β = 3.5, while in the second application α = 1.5:
-->alpha = cdfbet("A",3.5,0.4,0.6,0.3,0.7)
alpha =
Download at InfoClearinghouse.com
18
© 2001 Gilberto E. Urroz
2.0459494
-->beta = cdfbet("B",0.6,0.4,0.8,0.2,1.5)
beta =
.7453948
The Weibull distribution
The pdf for the Weibull distribution is given by
f ( x ) = α ⋅ β ⋅ x β −1 ⋅ exp( −α ⋅ x β ),
for x > 0, α > 0, β > 0
While the corresponding cdf is given by
F ( x ) = 1 − exp( −α ⋅ x β ),
for x > 0, α > 0 , β > 0
Parameters of this distribution are:
µ X = α −1 / β ⋅ Γ(1 +

1
2
1 
), Var ( X ) = α − 2 / β Γ(1 + ) − Γ 2 (1 + ) .
β
β
β 

The uniform distribution
The uniform distribution for a continuous random variable is defined for values of X such that a
<x<b. The corresponding probability density function is given by
f X ( x) =
1
,a< x<b
b−a
The cumulative distribution function is
FX ( x ) =
x−a
,a< x<b
b−a
The parameters of the uniform distribution are:
µX =
a+b
(b − a) 2
, Var ( X ) =
.
2
12
The following function definition implements the cumulative distribution function for the
uniform distribution in SCILAB:
-->deff('[FF]=FX(x)','FF=(x-a)/(b-a)')
For values of a = 2.5 and b = 3.2, we proceed to calculate some probabilities:
Download at InfoClearinghouse.com
19
© 2001 Gilberto E. Urroz
--> a = 2.5; b = 3.2;
First, we calculate P(X<2.7) = FX(2.7):
-->FX(2.7)
ans =
.2857143
Next, we calculate P(X>3) = 1 - P(X<3) = 1 - FX(3):
-->1-FX(3)
ans =
.2857143
The following example calculates P(2.8<X<3) = P(X<3) - P(X<2.8) = FX(3) - FX(2.8):
-->FX(3)-FX(2.8)
ans =
.2857143
User-defined functions for continuous probability distributions
The following SCILAB script defines the probability density function and the cumulative density
function for four selected continuous distributions: gamma, exponential, beta, and Weibull.
The script is called ContinuousProbabilityFunctions, and is invoked by using:
-->exec('ContinuousProbabilityFunctions')
The listing of the script is the following:
//Define selected continuous probability functions
deff('[gg]=gam(x,a,b)','gg=x.^(a-1).*exp(-x./b)./(b.^a.*gamma(a))')
deff('[GG]=GAM(x,a,b)','GG=intg(0,x,gam)')
deff('[ee]=eex(x,b)','ee=exp(-x./b)./b')
deff('[EE]=EEX(x,b)','EE=1-exp(-x./b)')
deff('[bb]=bet(x,a,b)',...
'bb=gamma(a+b).*x.^(a-1).*(1-x).^b./(gamma(a).*gamma(b))')
deff('[BB]=BET(x,a,b)','BB=intg(0,x,bet)')
deff('[ww]=w(x,a,b)','ww=a.*b.*x^(b-1).*exp(-a.*x.^b)')
deff('[WW]=W(x,a,b)','WW=1-exp(-a.*x.^b)')
The functions defined through the script are summarized in the following table:
pdf
gam(x,α,β)
gamma
exponential eex(x,β)
bet(x,α,β)
beta
Weibull
w(x,α,β)
CDF
GAM(x,α,β)
EEX(x,β)
BET(x,α,β)
W(x,α,β)
Applications of these functions follow, starting with the gamma distribution.
The gamma distribution
First, we plot the pdf of the distribution using α = 2 and β = 3:
Download at InfoClearinghouse.com
20
© 2001 Gilberto E. Urroz
-->xx=(0:0.1:20);yy=gam(xx,2,3);
-->plot(xx,yy,'x','fX(x)','gamma distribution')
A plot of the gamma distribution CDF for α = 2 and β = 3 is obtained by using:
-->yyy=[];for x=0:0.1:20, yyy=[yyy GAM(x,2,3)]; end;
-->plot(xx,yyy,'x','FX(x)','gamma distribution')
The CDF can be used to calculate probabilities. The next three lines calculate the following
probabilities P(X<5) = FX(5), P(6<X<11) = FX(11) - FX(5), and P(X>7.5) = 1 - P(X<7.5) = 1 - FX(7.5):
-->GAM(5,2,3)
ans = .4963317
-->GAM(11,2,3)-GAM(6,2,3)
ans = .2867187
-->1-GAM(7.5,2,3)
ans = .2872975
The exponential distribution
Download at InfoClearinghouse.com
21
© 2001 Gilberto E. Urroz
The following commands generate plots of the pdf and CDF for the exponential distribution
using β = 2.5:
-->xx=(0:0.1:20);yy=eex(xx,2.5);
-->plot(xx,yy,'x','fX(x)','exponential distribution')
-->yyy=[];for x=0:0.1:20, yyy=[yyy EEX(x,2.5)]; end;
-->plot(xx,yyy,'x','FX(x)','exponential distribution')
The following probability calculations for the exponential distribution with β = 2.5 are
presented next: P(X<6) = FX(6), P(X>4) = 1 - P(X<4) = 1 - FX(4), and P(4<X<6) = FX(6)-FX(4):
-->EEX(6,2.5)
ans =
.9092820
-->1-EEX(4,2.5)
ans =
.2018965
-->EEX(6,2.5)-EEX(4,2.5)
ans =
.1111786
The beta distribution
Download at InfoClearinghouse.com
22
© 2001 Gilberto E. Urroz
To plot the pdf and CDF of the beta distribution with α = 2.5, β = 3.5, we use:
-->xx=(0:0.05:1);yy=bet(xx,2.5,3.5);
-->plot(xx,yy,'x','fX(x)','beta distribution')
-->yyy=[];for x=0:0.05:1, yyy=[yyy BET(x,2.5,3.5)]; end;
-->plot(xx,yyy,'x','FX(x)','beta distribution')
The following probability calculations for the beta distribution with β = 2.5 are presented next:
P(X<0.25) = FX(0.25), P(X>0.75) = 1 - P(X<0.75) = 1 - FX(4), and P(0.3<X<0.8) = FX(0.8)-FX(0.3):
-->BET(0.25,2.5,3.5)
ans =
.1737696
-->1-BET(0.75,2.5,3.5)
ans =
.4250376
-->BET(0.8,2.5,3.5)-BET(0.3,2.5,3.5)
ans =
.3428804
The Weibull distribution
Download at InfoClearinghouse.com
23
© 2001 Gilberto E. Urroz
Plots of the pdf and CDF for the Weibull distribution with α = 2 and β = 3 are obtained as
follows:
-->xx=(0:0.01:2);yy=w(xx,2,3);
-->plot(xx,yy,'x','fX(x)','Weibull distribution')
-->yyy=[];for x=0:0.01:2, yyy=[yyy W(x,2,3)]; end;
-->plot(xx,yyy,'x','FX(x)','Weibull distribution')
The following probability calculations for the Weibull distribution with α = 2 and β = 3 are
presented next: P(X<1.5) = FX(1.5), P(X>0.6) = 1 - P(X<0.6) = 1 - FX(4), and P(0.5<X<1.2) =
FX(0.8)-FX(0.3):
-->W(1.5,2,3)
ans =
.9988291
-->1-W(0.6,2,3)
ans =
.6492094
-->W(1.2,2,3)-W(0.5,2,3)
ans =
.7472451
Download at InfoClearinghouse.com
24
© 2001 Gilberto E. Urroz
Continuous probability distributions used in statistical inference
Statistical inference is the process by which sample data is used to provide information about
the population. Some of the products of statistical inference are the generation of confidence
intervals and the test of hypotheses for population parameters.
There are a number of
continuous probability distributions of great utility in statistical inference. These are:
the standard normal distribution
the Student’s t distribution
the Chi-square (χ2) distribution
the F distribution
The probability density functions (pdf) for these distributions are presented below:
The Normal distribution
The expression for the normal distribution pdf is:
f ( x) =
1
σ 2π
exp[ −
(x − µ)2
],
2σ 2
where µ is the mean, and σ2 the variance of the distribution.
SCILAB provides function cdfnor for operations with the cumulative distribution function for the
normal distribution. Function cdfnor was presented in detail in Chapter …. To find on-line
information on this function use the command:
-->help cdfnor
The Student-t distribution
The Student-t, or simply, the t-, distribution has one parameter ν, known as the degrees of
freedom. The probability density function (pdf) is given by
ν +1
Γ(
)
ν +1
t2 −
2
⋅ (1 + ) 2 ,−∞ < t < ∞
f (t ) =
ν
ν
Γ ( ) ⋅ πν
2
The following SCILAB commands can be used to plot the pdf for the Student t distribution with
-->deff('[f]=fT(t,nu)',...
-->'f=gamma((nu+1)./2).*(1+t.^2./nu).^(-(nu+1)/2)/(sqrt(%pi*nu)*gamma(nu/2))')
-->tt=[-4:0.1:4];ff=fT(tt,6);
-->plot(tt,ff,'t','fT(t)','Student t - nu = 6')
Download at InfoClearinghouse.com
25
© 2001 Gilberto E. Urroz
SCILAB provides function cdft for operations with the cumulative distribution function of the
Student’s t distribution. The calls to the function are as follows:
[P,Q]=cdft("PQ",T,Df)
[T]=cdft("T",Df,P,Q)
[Df]=cdft("Df",P,Q,T)
In these function calls, P = P(TT<T), Q = 1 - P, Df = degrees of freedom = ν, with TT ~ Student
t(Df).
-->[P,Q] = cdft("PQ",0.4,6)
Q =
//Probability calculation
.3515041
P
=
.6484959
-->t = cdft("T",8,0.45,1-0.45)
t =
-
//Inverse CDF calculation
.1297073
-->nu = cdft("Df",0.7,0.3,0.8)
nu =
//Obtaining degrees of freedom
.7716700
A plot of the CDF for the Student t distribution can be produced using the following commands:
-->xx=[-4:0.1:4];
-->yy=[];for x=-4:0.1:4, yy=[yy cdft('PQ',x,6)]; end;
-->plot(xx,yy,'t','fX(t)','Student t - nu = 6')
Download at InfoClearinghouse.com
26
© 2001 Gilberto E. Urroz
The Chi-squared (χ2) distribution
The Chi-squared (χ2) distribution has one parameter ν, known as the degrees of freedom. The
probability distribution function (pdf) is given by
f (x) =
1
ν
2
ν
2 ⋅ Γ( )
2
ν
−1
−
x
⋅ x 2 ⋅ e 2 ,ν > 0, x > 0
A plot of the pdf for the Chi-square distribution with ν = 10 can be obtained by using:
-->xx = [0:0.1:10];
-->yy=[];for x=0:0.1:10, yy=[yy cdfchi('PQ',x,4)]; end;
-->plot(xx,yy,'t','fX(t)','Chi-square - nu = 4')
SCILAB provides function cdfchi for operations with the cumulative distribution function of the
χ2 (chi-square) distribution. The calls to this function include:
[P,Q]=cdfchi("PQ",X,Df)
[X]=cdfchi("X",Df,P,Q);
[Df]=cdfchi("Df",P,Q,X)
Download at InfoClearinghouse.com
27
© 2001 Gilberto E. Urroz
In these calls to function cdfchi P = P(XX<X), Q = 1 - P, Df = degrees of freedom, with XX ~ χ2
(Df).
-->[P,Q] = cdfchi("PQ",1,10)
Q =
//Probability calculation
.9998279
P
=
.0001721
-->[P,Q] = cdfchi("PQ",0.2,10)
Q =
//Probability calculation
.9999999
P
=
7.668E-08
-->chi2 = cdfchi("X",4,0.4,0.6)
chi2 =
//Inverse CDF calculation
2.7528427
-->nu = cdfchi("Df",0.4,0.6,2.7)
nu =
//Calculating degrees of freedom
3.9409085
A plot of the CDF for the Chi-square distribution with n = … is obtained by using:
-->deff('[f]=fC(x,nu)',...
-->'f=x.^(nu/2-1).*exp(-x./2)/(2.^(nu/2).*gamma(nu./2))')
-->cc=[0:0.1:30];ff=fC(cc,10);
-->plot(cc,ff,'chi^2','fC(chi^2)','Chi-square - nu = 10')
The F distribution
The F distribution has two parameters νN = numerator degrees of freedom, and νD =
denominator degrees of freedom. The probability distribution function (pdf) is given by
Download at InfoClearinghouse.com
28
© 2001 Gilberto E. Urroz
νN
νN
−1
νN + νD
νN
) ⋅(
) 2 ⋅ F 2
Γ(
νD
2
f (x) =
νN +νD
νN
νD
νN ⋅ F ( 2
) ⋅Γ(
) ⋅ (1 +
)
Γ(
νD
2
2
)
νD>0, νN>0, x>0.
A plot of the F-distribution pdf for nN = 4, nD = 6, is obtained by using:
-->deff('[f]=fF(F,nuN,nuD)',...
-->'f=gamma((nuN+nuD)./2).*(nuN./nuD).^(nuN./2).*F.^(nuN./2-1)./...
-->(gamma(nuN./2).*gamma(nuD./2).*(1+nuN.*F./nuD).^((nuN+nuD)./2))')
-->xx=[0:0.1:10];ff=fF(xx,4,6);
-->plot(xx,ff,'F','fF(F)','F distribution - nuNum = 4 - nuDen = 6')
SCILAB provides the function cdff for operations with the cumulative distribution function of
the F distribution.
[P,Q]=cdff("PQ",F,Dfn,Dfd)
[F]=cdff("F",Dfn,Dfd,P,Q);
[Dfn]=cdff("Dfn",Dfd,P,Q,F);
[Dfd]=cdff("Dfd",P,Q,F,Dfn)
In these calls of the function cdff, P = P(FF<F), Q = 1 - P, Dfn and Dfd = degrees of freedom in
the numerator and denominator of F.
-->[P,Q] = cdff("PQ",1.2,6,12)
Q =
.3697351
P =
.6302649
-->F = cdff("F",10,2,0.4,0.6)
F =
//Probability calculation
//Inverse CDF calculation
.9944093
-->nuNum= cdff('Dfn',5,0.4,0.6,0.8)
nuNum =
//calculating degrees of freedom
5.3847039
Download at InfoClearinghouse.com
29
© 2001 Gilberto E. Urroz
A plot of the F-distribution CDF is produced through the following SCILAB commands:
-->xx = [0:0.1:10];
-->yy=[];for x=0:0.1:10, yy=[yy cdff('PQ',x,4,6)]; end;
-->plot(xx,yy,'t','fX(t)','F - nuNum = 4 - nuDen = 6')
Applications of the normal distribution in data
analysis
The normal distribution, also known as the bell curve, appears commonly when determining
the frequency distribution of different types of physical measurements. We first introduced
the normal distribution in Chapter 14 as an example of a continuous probability distribution. In
this section we present some applications of this probability distribution in data analysis.
The probability density function, pdf, for a general normal distribution, X, with a mean value,
µ, and a standard deviation, σ, is given by
f X ( x) =
 (x − µ)2
⋅ exp −
2σ 2
σ 2π

1

, σ > 0, − ∞ < x < ∞.

The standard normal distribution has mean value µ = 0 and standard deviation σ = 1.
SCILAB provides function cdfnor for operations with the normal cumulative distribution
function. The different forms of the call to the function were presented in detail in Chapter$,
and are repeated here:
[p,q] = cdfnor(“PQ”,x,mu,sigma)
[x] = cdfnor(“X”,mu,sigma,p,q)
[mu] = cdfnor(“Mean”,sigma,p,q,x)
[sigma] = cdfnor(“Std”,p,q,x,mu)
where mu is the mean value (m), sigma is the standard deviation (s), p = P(X<x), and q = 1 - p =
P(X>x). The first argument in the different calls to cdfnor is a string that indicates the type of
result expected:
“PQ”
“X”
“Mean” -
to request probabilities p and q
to request a value of the normal variable
to request the mean of the distribution
Download at InfoClearinghouse.com
30
© 2001 Gilberto E. Urroz
“Std”
-
to request the standard deviation of the distribution
Because the normal distribution is commonly found in the analysis of physical measurements, it
if often recommended that you check if your data set (your sample) follows the normal
distribution. In this section we present two graphical approaches for checking if your data
follows the normal distribution. The first consists of superimposing a normal distribution pdf,
based on the mean value and standard deviation of the sample, on top of the sample
histogram. The second approach consists in plotting the data against what is commonly known
their normal scores. The resulting graph is equivalent to plotting the data in a normal
probability paper, i.e., a paper with one scale representing the normal probability
corresponding to the data set. These two approaches are described next.
Plotting a histogram and its corresponding normal curve
The purpose of this plot is to visually check if the histogram of a sample, with a suitable
number of classes, matches a superimposed normal curve. For that purpose we propose the
following SCILAB user-defined function, histnorm:
function [chi2,cmark,fcount]=histnorm(x, xclass)
//This function calculates the frequency distribution
//for the data in (row) vector x according to the
//class boundaries contained in the (row) vector
//xclass. It also produces a histogram of the
//data and the normal curve that best fit the data.
//
//Typical call: [chi2,cm,f] = freqdist(x,xclass)
//where cm
= class marks, f = frequency count,
//
chi2 = chi-square parameter for the fitting
[m n] = size(x);
[m nB] = size(xclass);
k = nB - 1;
//Sample size
//Number of class boundaries
//Number of classes
//Calculate class marks
cmark = zeros(1,k);
for ii = 1:k
cmark(ii) = 0.5*(xclass(ii)+xclass(ii+1));
end
//Initialize frequency counts to zero
fcount=zeros(1,k);
fbelow=0; fabove=0;
//Accumulate frequency counts
for ii = 1:n
if x(ii) < xclass(1)
fbelow = fbelow + 1;
elseif x(ii) > xclass(nB)
fabove = fabove + 1;
else
for jj = 1:k
if x(ii)>= xclass(jj) & x(ii)< xclass(jj+1)
fcount(jj) = fcount(jj) +1;
end
end
end
end
//define normal CDF, calculate xbar, sx, chi-square parameter
nn = sum(fcount);
xbar = mean(x);
sx = st_deviation(x);
xmin = min(xclass); xmax = max(xclass);
Download at InfoClearinghouse.com
31
© 2001 Gilberto E. Urroz
pk = [];
for j = 1:k+1
pk = [pk cdfnor("PQ",xclass(j),xbar,sx)];
end;
p_in_classes = pk(k+1)-pk(1);
pxclass = pk(2:k+1) - pk(1:k);
fc = pxclass*nn*p_in_classes;
//Chi square parameter
chi2=0;
for j = 1:length(fc)
chi2 = chi2 + (fcount(j)-fc(j))^2/fc(j);
end;
//Produce normal distribution for data
Dx = (xmax-xmin)/100;
xx = [xmin:Dx:xmax];
xxx = xx(1:100) + Dx/2;
pkk = [];
for j = 1:101
pkk = [pkk cdfnor("PQ",xx(j),xbar,sx)];
end;
pp = pkk(2:101) - pkk(1:100);
fcc = pp*p_in_classes*nn*100/k;
//Determine plot rectangle
ymin = 0;
ymaxf = max(fcount); ymaxy = max(fcc);
ymax = max(ymaxf,ymaxy);
ymax = int(1.1*ymax);
plotrectangle = [xmin ymin xmax ymax];
//plot the histogram and normal curve
xp = xclass(1:k);
xset('window',1);xbasc(1);
plot2d2('onn',xclass',[fcount fcount(k)]',[1],'011','y',[xmin ymin xmax ymax]);
plot2d3('onn',xp',fcount',[1],'000');
plot2d(xxx',fcc',[2],'000');
xtitle('Histogram with normal curve','x','frequency');
//end function histnorm
Notice that this function uses SCILAB function cdfnor to calculate values of the cumulative
distribution function for the normal distribution where needed.
The general call to the
function is:
[chi2,cm,f] = freqdist(x,xclass)
which returns, in general, the class marks, cm, the frequency count, f, and a chi-square
parameter defined as
( f i − fci ) 2
χ =∑
,
fci
i =1
k
2
where fi is the actual frequency count for the ith class, fci is the estimated frequency count
obtained from the normal distribution for the ith class, and k is the number of classes in the
frequency distribution.
The χ2 parameter follows the chi-square distribution with ν = k-1 degrees of freedom, and it is
used to check the hypothesis that the frequency distribution under consideration follows
indeed the normal distribution. The subject of hypothesis testing is developed in Chapter …,
therefore, we delay until then the use of the parameter returned from function histnorm.
Download at InfoClearinghouse.com
32
© 2001 Gilberto E. Urroz
Application of the function histnorm
In this example we apply function histnorm to a set of 200 data values between 0 and 100
generated using function rand as follows:
-->x = int(100*rand(1,200));
First, we check the minimum and maximum value of the data:
-->min(x), max(x)
ans =
0.
ans =
99.
A set of class boundaries of 0, 10, 20, …, 100, will produce 10 classes for this sample:
-->xclass = [0:10:100];
Next, we load the function histnorm and apply the function to the data stored in x using the
class boundaries stored in xclass
-->getf(‘histnorm’)
-->histnorm(x,xclass)
ans =
1.9583514
The value returned is the chi-square parameter for the normal curve fitting. The plot of the
histogram with the super-imposed normal curve is:
A second example for the same data sample is presented next in which we use 20 classes, with
class boundaries 0, 5, 10, …, 95, 100, to classify the data:
-->xclass=[0:5:100];
The results from function histnorm are the chi-square parameter and the following plot:
-->histnorm(x,xclass)
ans =
2.0146916
Download at InfoClearinghouse.com
33
© 2001 Gilberto E. Urroz
The function can be invoked with a vector of three values in the left-hand side to produce not
only the chi-square parameter and the plot, but also the class marks and the frequency count
of the sample:
-->[X2,cm,f] = histnorm(x,[0:10:100])
f =
!
20.
column 1 to 9
18.
27.
18.
23.
22.
16.
18.
14. !
column 10
!
cm
!
X2
24. !
=
5.
15.
=
1.9583514
25.
35.
45.
55.
65.
75.
85.
95. !
Notice that in the two graphs shown above, the normal curve does not fit the histograms very
well. The main reason is that the data was generated from an uniform distribution (i.e., using
the default settings of SCILAB’s function rand) and not from a normal distribution. Later in
this chapter we deal with the generation of data other than from an uniform distribution, and
will be using function histnorm to check how well those data fit the normal distribution.
Plotting data against their normal scores
Assume that the continuous random variable X follows the normal distribution with mean µ and
standard deviation σ. Given a probability p (0<p<1) such that P(X<x)=p with X ~ N(µ,σ), then
the value of x is referred to as the normal score for p. [Note: In some references in the
statistical literature the normal scores are related to a probability α = 1 - p, so that if P(X>xα) =
α, with X ~ N(µ,σ), xα is the normal score for α.]
Suppose that we have an ordered data set, xp = {xp1<xp2< …<xpn} that follows the normal
distribution with mean and standard deviations equal to the sample’s mean (x) and standard
deviation (sx). Also, assume that the probability of the interval [xpi, xpi+1] is the same for all
values of i = 1, 2, …, n-1, say P(xpi<X<xpi+1) = q. Also, assume that P(X<xp1) = P(X>xpn) = q.
Thus, the entire area under the normal curve is split into n+1 sub-regions of the same area q as
illustrated in the figure below.
Download at InfoClearinghouse.com
34
© 2001 Gilberto E. Urroz
The value of q is, therefore, q = 1/(n+1), and we can write:
P(X<xp1) = q, P(X<xp2) = 2q, …, P(X<xpi) = iq, …, P(X<xpn) = nq.
In general,
P(X<xpi) = i/(n+1) = pi,
for of i = 1, 2, …, n. The values pi are referred to as plotting positions for they are used to
obtain the normal scores corresponding normal score xpi.
Given an ordered data set, x = {x1 < x2 < … < xn}, of size, n, we can generate a vector of
plotting positions, pi = i/(n+1), and obtain a set of normal scores xpi, by using the function call
cdfnor(“X”,x,sx,pi,1-pi), where x and sx are the mean and standard deviation of the data set.
If the given data set, x, does indeed follow the normal distribution with mean µ =x, and
standard deviation σ = sx, a plot of normal scores xp versus the original data x should produce a
straight line.
A function to produce a plot of data versus normal scores
The following function, normplot, takes as input a data set, or sample, x = {x1, x2, …, xn},
orders it in increasing order, obtains the plotting positions pi, calculates the normal scores xpi,
and plots the normal scores versus the ordered data. It also plots a straight line representing
y=x, or the exact fitting for a normal distribution. The closer the plot of normal scores vs.
data is to the straight line representing the exact fitting for a normal distribution, the closer
the data set follows the normal distribution.
function normplot(x)
//This function produces a normal probability
//paper plot for the data in (row) vector x
xx
xm
sx
nn
=
=
=
=
sortup(x);
mean(xx);
st_deviation(xx);
length(x);
//order sample in increasing order
//mean of sample
//standard deviation of sample
//sample size
//Calculating plotting positions and normal scores
pp = []; xp = [];
for j = 1:nn
pp = [pp j/(nn+1)];
xp = [xp cdfnor(“X”,xm,sx,pp(j),1-pp(j))];
end;
Download at InfoClearinghouse.com
35
© 2001 Gilberto E. Urroz
//Determine the plotting rectangle xmin1 = min(xx); xmin2 = min(xp); xmin =
min(xmin1,xmin2); xmax1 = max(xx); xmax2 = max(xp); xmax = max(xmax1,xmax2);
ymin = min(xp); ymax = max(xp);
//Produce a graduated scale
[xminp, xmaxp, nxp] = graduate(xmin,xmax);
[yminp, ymaxp, nyp] = graduate(ymin,ymax);
//Plot scores vs. data and exact normal distribution fitting
plot2d(xp’,xp’,[ 1],’011’,’y’,[xminp yminp xmaxp ymaxp])
xset(‘mark’,-9,2);
plot2d(xx’,xp’,[-9],’011’,’y’,[xminp yminp xmaxp ymaxp])
xtitle(‘Normal probability plot’,’x’,’z’);
//end function normplot
An application of this function is shown next. First, we produce a sample of 200 data points
using a uniform distribution. Next, we load function normplot and produced the normal
probability plot.
-->x =int(100*rand(1,200));
-->getf(‘normplot’)
-->normplot(x)
The resulting graph shows that the data does not follow the normal distribution particularly
near the lowest and highest values of the data set.
The lognormal distribution
If the random variable Y = ln(X) follows the normal distribution with mean µY = µln(X) and
standard deviation σY = σln(X), then we say that the random variable X follows the lognormal
distribution. The probability density function of the lognormal distribution is given by
f X ( x) =
 (ln x − µ ln( X ) ) 2
⋅ exp −
2

2σ ln(
σ ln( X ) x 2π
X)

1

, x > 0.


with
Download at InfoClearinghouse.com
36
© 2001 Gilberto E. Urroz
1 2 

2
2
µ X = exp µ ln( X ) + ⋅ σ ln(
X ) , Var ( X ) = exp(σ ln( X ) )(exp(σ ln( X ) ) − 1)exp( 2 µ ln( X ) ).
2


For calculating probabilities we can use the normal distribution cdf by first calculating the
natural log of the variable, for example, if X~lognormal(µln(X)=1.2, σln(X)=0.5), to calculate the
probability P(X<2) use P(X<2) = P(ln(X)<ln(2)) = P(Y<0.6931) where Y ~ N(1.2, 0.5). We can
use function cdfnor to calculate this probability in SCILAB as follows:
-->cdfnor(“PQ”,log(2),1.2,0.5)
ans =
.1553616
Suppose that we want to find the inverse cumulative distribution function, i.e., a value of X
for which P(X<x) = 0.35, given µln(X)=1.2, σln(X)=0.5, we can use:
-->cdfnor(“X”,1.2,0.5,0.35,0.65)
ans =
1.0073398
The previous result actually gives a value of Y = ln(X) with Y ~ N(1.2, 0.5). The corresponding
value of X is calculated as X = exp(Y), i.e.,
-->exp(ans)
ans =
2.7383068
A graph of the lognormal probability density function for µln(X)=1.2, σln(X)=0.5 is produced by
using:
-->deff(‘[ff]=fX(x,mu,sigma)’,...
-->‘ff=exp(-(log(x)-mu).^2./(2.*sigma.^2))./(sigma.*x.*sqrt(2.*%pi))’)
-->mu=1.2;sigma=0.5;xx=[0.01:0.1:10];yy=fX(xx,mu,sigma);
-->plot(xx,yy,’x’,’fX(x)’,’Log-normal pdf’)
Download at InfoClearinghouse.com
37
© 2001 Gilberto E. Urroz
Generating synthetic data
In this section we present pre-defined and user-defined functions that allows us to generate
data that follows a particular probability distribution. We refer to such data as synthetic
data.
Generating normally-distributed synthetic data
In the examples presented in the previous section on applications of the normal distribution we
generated data by using the function rand, which, by default, produces random data uniformly
distributed in the interval [0,1]. The function rand can also be used to produce normally
distributed data, z, that follows the standard normal distribution, i.e., Z ~ N(0,1), by, first,
using the function call
rand(‘normal’)
and next using the function call
rand(n,m)
where n and m are integers. The last call to function rand will produce a matrix of n rows and
m columns whose elements are random numbers following the standard normal function.
Recalling that the standardized normal variate is defined as
Z = (X-µ)/σ,
values of x can be obtained from values of z by using
x = µ + σz.
The following example illustrate how to use function rand to produce 200 data points that
follow the normal distribution with mean µ = 150, and standard deviation σ = 50:
-->x = 150 + 50.*rand(1,200);
To verify that the data do indeed follow the normal distribution, we use functions histnorm and
normplot applied to this data set. To use function histnorm, we first determine the minimum
and maximum values of the data set to determine which class boundaries use in the histogram:
-->xmin = min(x), xmax = max(x)
xmin =
34.558873
xmax = 317.59609
We select for class boundaries the values 25, 50, 75, …, 300, 325:
-->xclass = [25:25:325];
The resulting histogram and superimposed normal curve are shown next:
-->histnorm(x,xclass);
Download at InfoClearinghouse.com
38
© 2001 Gilberto E. Urroz
The fitting of the histogram to the corresponding normal curve is relatively good, in spite of
the apparent discrepancy towards the center of the data. We can also use function normplot
to check the normality of the data as follows:
-->normplot(x)
The resulting normal probability plot is:
The plot suggests that the data follows the normal distribution for most of the range except for
values larger than about 220.
Additional applications of function rand
SCILAB’s function rand, as most numerical random number generators, uses a number, known
as the seed, to produce random numbers.
To find out the current value of the seed in
function rand use:
-->rand(‘seed’)
ans = 8.096E+08
To find out which type of random number generator is active in function rand (i.e., normal or
uniform) use:
-->rand(‘info’)
ans = normal
Download at InfoClearinghouse.com
39
© 2001 Gilberto E. Urroz
To change the function rand back to uniform use:
-->rand(‘uniform’)
To change the seed to the number 15, for example, use:
-->rand(‘seed’,15)
The first 10 random numbers generated by rand after seeding it with a value of 15 are:
-->rand(1,10)
ans =
!
!
column 1 to 5
.1018111
.5348560
column 6 to 10
.4106913
.6578733
.9628528
.1235873
.6667947 !
.6756193
.1201851
.0268646 !
After generating those 10 random numbers the value of seed has changed to:
-->rand(‘seed’)
ans =
57691269.
If, for some reason, you need to re-start the previous sequence of random numbers, you can
simply re-seed function rand with the value of 15:
-->rand(‘seed’,15)
Check that you get the same sequence of random numbers by comparing the following 5
random numbers with the first 5 random numbers generated earlier after using seed = 15:
-->rand(1,5)
ans =
!
.1018111
.5348560
.9628528
.1235873
.6667947 !
SCILAB function for generating synthetic data
SCILAB provides function grand (generating random numbers) to generate a vector or matrix
with data that follows, among others, the following distributions: binomial, Poisson, gamma,
beta, exponential, uniform integer, uniform real, normal, chi-squared, and Student’s t. Two
general calls to the function are:
[x] = grand(m,n,dist_type,dist_parameters)
[x] = grand(A,dist_type,dist_parameters)
where dist_type is a string identifying the type of distribution, and dist_parameters is a list of
the parameters defining the distribution. In the first form of the call the values m and n
represent the number of rows and columns of a matrix to be generated containing random
numbers that follow the desired distribution.
In the second form of the function call an
existing matrix A is provided so that the function generates a new matrix with the same
dimensions as A containing the random numbers that follow the desired distribution.
Download at InfoClearinghouse.com
40
© 2001 Gilberto E. Urroz
The following strings identify the type of distribution requested.
parameters required for each distribution:
String
‘bin’
‘poi’
‘bet’
‘gam’
‘exp’
‘nor’
‘chi’
‘f’
‘uin’
‘unf’
We also identify the
Distribution Parameters
N, P
Binomial
Poisson
λ
Beta
α, β
α = shape, β = scale
Gamma
exponential
µ=1/β
normal
µ, σ
chi-square
ν
νN, νD
F
a, b
uniform integer
uniform real
a, b
The specific function calls for each probability distribution are shown next:
Binomial:
x=grand(m,n,’bin’,N,P), x=grand(A,’bin’,N,P)
Poisson:
x=grand(m,n,’poi’,mu), x=grand(x,’poi’,λ)
Beta:
x=grand(m,n,’bet’,α,β), x=grand(A,’bet’, α,β)
Gamma:
x=grand(m,n,’gam’, α,β), x=grand(A,’gam’, α,β)
Exponential:
x=grand(m,n,’exp’,µ), x=grand(A,’exp’,µ)
Normal:
x=grand(m,n,’nor’,µ, σ), x=grand(A,’nor’, µ, σ)
Chi-square:
x=grand(m,n,’chi’,ν), x=grand(A,’chi’, ν)
F-distribution:
x=grand(m,n,’f’, νN, νD), x=grand(A,’f’, νN, νD)
Uniform integer:
x=grand(m,n,’uin’, α,β), x=grand(x,’uin’, a, b)
Uniform real:
x=grand(m,n,’unf’, α,β),x=grand(x,’unf’, a, b)
Examples of synthetic data generation using function grand
The following examples demonstrate how to use function grand to generate sets of 200 data
points that follow specific probability distributions.
After the data are generated we
determine their maximum and minimum values, select class boundaries for histograms of the
data, and use functions histnorm and normplot to check how close the data are to normality.
We start the exercises by loading these two functions:
-->getf(‘histnorm’);getf(‘normplot’);
Binomial data
-->x=grand(1,200,’bin’,20,0.35);xmin=min(x),xmax=max(x)
xmin = 2.
Download at InfoClearinghouse.com
41
© 2001 Gilberto E. Urroz
xmax = 14.
-->xclass=[2:2:14];xset(‘window’,1);histnorm(x,xclass);
-->xset(‘window’,2);normplot(x);
Poisson data
-->x=grand(1,200,’poi’,12.5);xmin=min(x),xmax=max(x)
xmin =
4.
xmax =
23.
-->xclass=[4:2:24];xset(‘window’,1);histnorm(x,xclass);
Download at InfoClearinghouse.com
42
© 2001 Gilberto E. Urroz
-->xset(‘window’,2);normplot(x);
Beta data
-->x=grand(1,200,’bet’,2,3);xmin=min(x),xmax=max(x)
xmin = .0480813
xmax = .9132797
-->xclass=[0:0.1:1];xset(‘window’,1);histnorm(x,xclass);
-->xset(‘window’,2);normplot(x);
Download at InfoClearinghouse.com
43
© 2001 Gilberto E. Urroz
Gamma data
-->x=grand(1,200,’gam’,2,3);xmin=min(x),xmax=max(x)
xmin = .0042184
xmax = 2.6455776
-->xclass=[0:0.4:2.8];xset(‘window’,1);histnorm(x,xclass);
-->xset(‘window’,2);normplot(x);
Download at InfoClearinghouse.com
44
© 2001 Gilberto E. Urroz
Normal data
-->x=grand(1,200,’nor’,2500,1250);xmin=min(x),xmax=max(x)
xmin =
1294.6718
xmax =
6467.2541
-->xclass=[-1000:1000:7000];xset(‘window’,1);histnorm(x,xclass);
-->xset(‘window’,2);normplot(x);
Chi-square data
-->x=grand(1,200,’chi’,12);xmin=min(x),xmax=max(x)
xmin =
3.8312405
xmax =
28.583772
-->xclass=[0:3:30];xset(‘window’,1);histnorm(x,xclass);
Download at InfoClearinghouse.com
45
© 2001 Gilberto E. Urroz
-->xset(‘window’,2);normplot(x);
F distribution data
-->x=grand(1,200,’f’,10,5);xmin=min(x),xmax=max(x)
xmin =
.110966
xmax = 53.694396
-->xclass=[0:10:60];xset(‘window’,1);histnorm(x,xclass);
-->xset(‘window’,2);normplot(x);
Download at InfoClearinghouse.com
46
© 2001 Gilberto E. Urroz
-->xclass=[0:2:12];histnorm(x,xclass);
-->xclass=[0:0.5:6];histnorm(x,xclass);
Uniform integer data
-->x=grand(1,200,’uin’,-5,5);xmin=min(x),xmax=max(x)
xmin =
-5.
xmax =
5.
Download at InfoClearinghouse.com
47
© 2001 Gilberto E. Urroz
-->xclass=[-5:1:5];xset(‘window’,1);histnorm(x,xclass);
-->xset(‘window’,2);normplot(x);
Uniform real data
-->x=grand(1,200,’unf’,-5,5);xmin=min(x),xmax=max(x)
xmin =
-4.9677424
xmax =
4.9660118
-->xclass=[-5:1:5];xset(‘window’,1);histnorm(x,xclass);
Download at InfoClearinghouse.com
48
© 2001 Gilberto E. Urroz
-->xset(‘window’,2);normplot(x);
Additional notes on function grand
The previous examples were used to illustrate applications of function grand to the generation
of data that follows the binomial, Poisson, gamma, beta, exponential, normal, chi-square, F-,
uniform integer, and uniform real distributions. Function grand allows the user to obtain data
that follow other distributions that are not presented in this book, such as the negative
binomial distribution, the multinomial distribution, the non-central F distribution, and the noncentral chi-square distribution.
(To find information about these and other distributions
consult a statistics and probability textbook such as Spanos, A., 1999, “Probability Theory and
Statistical Inference - Econometric Modeling with Observational Data,” Cambridge University
Press, Cambridge, U.K.).
To obtain additional details on the use of function grand use:
-->help grand
Function grand has access to 32 different random number generators that constitute the basis
upon which random numbers that follow a particular probability distribution are generated. By
default, functions rand and grand use generator number 1. To check out which is the current
active random number generator use:
-->grand(‘getcgn’)
ans =
1.
This result indicates that you are currently using SCILAB’s default random number generator.
The random number generators provided by SCILAB for use with function grand require two
seed numbers. To see the current seed numbers you can use the statement:
-->seeds = grand(‘getsd’)
seeds =
1.0E+08 *
!
20.45933
9.2172801 !
You can re-initialize those seed to the original seeds by using:
Download at InfoClearinghouse.com
49
© 2001 Gilberto E. Urroz
-->grand(‘initgn’,-1)
ans =
1.
We can check the initial seeds after re-initialization by using:
-->seeds = grand(‘getsd’)
seeds =
1.0E+08 *
!
12.345679
1.2345679 !
You can also re-seed the generator (i.e., provide new seeds) by using the following call to
function grand:
-->grand(‘setall’,10,20)
ans =
setall
To check that the new seeds are active use:
-->seeds=grand(‘getsd’)
seeds =
!
10.
20. !
To change the random number generator from generator number 1 to generator number 5, for
example, use:
-->grand(‘setcgn’,5)
ans =
5.
The following call to function grand can be used to verify that the change of generator has
been made:
-->grand(‘getcgn’)
ans =
5.
To check the values of the seeds for the current generator use:
-->seeds=grand(‘getsd’)
seeds =
!
3.795E+08
77757764. !
Pseudo-random generators
The random number generators used in SCILAB and other computer applications are known as
pseudo-random generators because, after generating a sufficiently long sequence of numbers,
the numbers start repeating. Therefore, they are not strictly random generators, but only
quasi-random or pseudo-random.
The random number generator provided with SCILAB is able to produce 2.3×1018 numbers
before repetition of numbers occurs. This collection of numbers is partitioned into 32 pseudorandom generators, each containing 220 =1,048,576 blocks of non-overlapping random numbers.
Each block is 230 = 1,073,741,824 in length.
Download at InfoClearinghouse.com
50
© 2001 Gilberto E. Urroz
Given the size of the sequences of random numbers that can be generated with each of
SCILAB’s 32 pseudo-random number generators, we are confident that the numbers thus
generated are random enough for most practical applications. Furthermore, use of the default
generator should be enough for most applications unless you
Another application of function grand is in the generation of permutations of a column vector.
For example, the following application produces 10 permutations of the vector M containing
the first five positive integers. The permutations are shown as columns of a matrix.
-->M = [1 2 3 4 5]’;
-->grand(10,’prm’,M)
ans =
!
1.
2.
4.
!
3.
1.
2.
!
2.
3.
5.
!
5.
4.
3.
!
4.
5.
1.
1.
4.
5.
3.
2.
4.
2.
5.
3.
1.
4.
2.
3.
5.
1.
5.
1.
2.
3.
4.
4.
3.
2.
1.
5.
1.
4.
2.
3.
5.
3. !
2. !
5. !
4. !
1. !
Generating log-normally-distributed data
To generate log-normally distributed data we first generate a set of normally distributed data
and then apply the exponential function to that data set. For example, if X follows the
lognormal distribution with µln(X)=1.2, σln(X)=0.5, we can use the following SCILAB commands to
generate a set of 200 data points. We apply functions histnorm and normplot to this data set
to check how close the data are to normality.
-->y=grand(1,200,’nor’,1.2,0.5); //Generate normal data N(1.2,0.5)
-->x=exp(y);
//Generate log-normal data by using exp
-->xmin=min(x),xmax=max(x)
//Determine min and max values
xmin =
1.1210567
xmax =
11.161347
-->xclass=[0:2:12];histnorm(x,xclass);
//Histogram
-->normplot(x);
Download at InfoClearinghouse.com
//Normal probability plot
51
© 2001 Gilberto E. Urroz
Generating data that follows the Weibull distribution
SCILAB does not provide for a function to generate data that follows the Weibull distribution,
however, using the uniformly-generated random numbers from function rand we can generate
numbers p between 0 and 1 that represent probabilities p = FX(x) = P(X<x). Next, we use the
cumulative distribution function for the Weibull distribution, namely,
F ( x ) = 1 − exp( −α ⋅ x β ),
for x > 0, α > 0 , β > 0
and solve for x given values of p, i.e.,
 ln(1 − p ) 
x = −
α 

1/ β
.
The following SCILAB commands are used to generate 200 data points that follow the Weibull
distribution with a =2, b = 3. We also use functions histnorm and normplot to check how close
these data are to normality.
-->getf(‘histnorm’);getf(‘normplot’)
-->p=rand(1,200);
-->a=2; b=3;
-->x = (-log(1-p)/a)^(1/b);
-->xmin=min(x), xmax = max(x)
xmin =
.1230276
xmax =
1.3553315
-->xclass = [0:0.1:1.4];
-->histnorm(x,xclass);
//Load functions
//Generate probabilities
//parameters of Weibull distribution
//generate Weibull data
//check data range
Download at InfoClearinghouse.com
52
//select classes for histogram
//plot histogram and normal curve
© 2001 Gilberto E. Urroz
-->normplot(x)
//create normal probability plot
It is interesting to notice that this Weibull data is very close to normality.
Generating data that follows the Student’s t distribution
Function grand does not allow for the generation of data following the Student’s t distribution.
However, SCILAB provides for function cdft which lets you obtain the inverse of the cumulative
distribution. Using an approach similar to that shown above for the Weibull distribution, we
can generate random probability values through function rand, and then use function cdft to
generate the data required.
The following example illustrates the procedure:
-->getf(‘histnorm’);getf(‘normplot’);
-->pp = rand(1,200);
-->x = [];
-->for j =1:200
-->
x = [x cdft(“T”,6,pp(j),1-pp(j))];
-->end;
-->xmin=min(x), xmax=max(x)
xmin = 6.9441809
xmax = 3.4425429
//Load functions histnorm & normplot
//Generate random probabilities
//This line and the for … end
//construct calculate values of x
//Determine min & max values
-->xclass=[-7:1:4];xset(‘window’,1);histnorm(x,xclass);
Download at InfoClearinghouse.com
53
//Histogram
© 2001 Gilberto E. Urroz
-->xset(‘window’,2);normplot(x);
//Normal probability plot
Generating data that follows a discrete distribution
Using function grand we were able to generate discrete data that follows the binomial,
Poisson, and uniform integer distributions. In this section we present a general method for the
generation of data given a discrete distribution in the form of a table. For example, the
following table shows the probability mass function, fx(x) = P(X=x), and cumulative distribution
function, FX(x) = P(X<x), of a discrete random variable X:
Random numbers
X
0.5
1.5
2.5
3.5
4.5
5.5
fX(x)
0.10
0.25
0.20
0.15
0.15
0.15
FX(x)
0.10
0.35
0.55
0.70
0.85
1.00
From
0.00
0.10
0.35
0.55
0.70
0.85
to
0.10
0.35
0.55
0.70
0.85
1.00
The last two columns of the table represent the range of probabilities corresponding to the
cumulative distribution function for each value of X. The procedure for generating data
Download at InfoClearinghouse.com
54
© 2001 Gilberto E. Urroz
consists in obtaining a value of random probability p = P(X<x) from a uniform distribution, e.g.,
using function rand, and then assigning a value of X according to the range of values of the
random numbers. Thus, if function rand produces the random number 0.25, we assign to x the
corresponding value X = 1.5.
The following function, discrand, will generate a matrix of dimensions n×m random numbers
given vectors of values of X and FX, representing the values of a discrete random variable and
its corresponding cumulative distribution function.
function [x] = discrand(n,m,xx,FX)
//A function to generate a matrix nxm
//following a discrete probability distribution
//represented by vectors xx and FX = P(X<xx)
nx = length(xx);
pp = rand(n,m);
x = zeros(n,m);
FXX = [0.00 FX];
for i = 1:n
for j = 1:m
for k = 1:nx
if pp(i,j)>FXX(k) & pp(i,j)<=FXX(k+1) then
x(i,j) = xx(k);
end;
end;
end;
end;
//end function discrand
An application of the function to generate 200 data points that follow the probability
distribution shown in the table above is presented next. We first load function discrand, then
enter the values of X and FX(x), and generate a row vector of 200 points. Next, we load
functions histnorm and normplot to check how well the data follows a normal distribution.
-->getf(‘discrand’)
-->X = [0.5:1.0:5.5]; FX = [0.10,0.35,0.55,0.70,0.85,1.00];
-->x=discrand(1,200,X,FX);
-->getf(‘histnorm’);getf(‘normplot’);
-->xmin=min(x), xmax=max(x)
xmin =
.5
xmax =
5.5
-->xclass=[0.5:0.5:5.5];
-->histnorm(x,xclass)
ans =
24.643214
Download at InfoClearinghouse.com
55
© 2001 Gilberto E. Urroz
-->normplot(x)
Statistical simulation
Many physical or other type of systems are described by one or more mathematical
relationships (e.g., algebraic, difference, or differential equations) of diverse degrees of
complexity. We will refer to the set of mathematical relationships that describe a physical
system as a model. A model typically depends of certain constant values known as the
parameters of the model. In the simplest of cases, a model can be represented by a black box
into which a set of input data is provided, and from which a set of output results is obtained.
This is illustrated in the following figure:
If the model is such that for a given set of input data it always produces a predictable result, it
is referred to as a deterministic model. An example of a deterministic model is the equation
Download at InfoClearinghouse.com
56
© 2001 Gilberto E. Urroz
that describes the electric current, I, through a resistor, R, when a voltage, V, is applied across
the terminals of the resistor. The equation is
I = V/R.
If we apply a constant voltage Vo to the resistor, we get back a constant electric current, I0 =
Vo/R. If we instead apply a variable voltage V(t) = Vo⋅sin(ωt), we obtain an electric current,
I(t) = (Vo/R)⋅sin( ωt). Thus, knowing the value of the resistance R and the input to the system,
i.e., the voltage, V0 or V(t), we can always find the value of the electric current. We cannot
get more deterministic than this example.
If the input to the model is of a random nature, or if there is a random component to the
model itself, the model is said to be probabilistic or stochastic. For example, the black-box
model described above can be used to describe a hydrological basin. The input data is the
amount and duration of the precipitation falling on the basin on a certain period of time. (A
graphical representation of precipitation vs. time is referred to as a hyetograph). This input
is, by its own nature, random or stochastic. This means that we cannot know exactly the
amount of precipitation that will occur, say, in the next 24 hours.
Although a hydrological basin is extremely more complicated than an electric resistor, the
model used to predict the runoff (output) to the system can be a simple relationship involving
one or two parameters. (A graphical representation of the runoff coming out of the basin as a
function of time is known as a hydrograph). If the input hyetograph is known, then the output
hydrograph can be obtained in a deterministic way.
However, because we do not know
exactly the input hyetograph for a particular period of time, except in a statistical manner, the
model is indeed a stochastic one.
Through the keeping of historical records of precipitation in the basin we can get a good idea
of the stochastic nature of precipitation to use as input for our stochastic model. We can then
generate synthetic data representing the precipitation and use it as input to the model. This
approach to modeling physical (or economical, or other type of) systems is known as a Monte
Carlo method. (The name derives from the capital of the European principalty of Monaco, the
city of Monte Carlo, famous for its casinos, where the laws of probability are seen in action
night and day.)
Monte Carlo methods find applicability in all types of models where there is a random
component to the input or parameters of the model. Statistical modeling can be used to
model, for example, economic responses from human populations, the distribution of soil
permeabilities in an aquifer, the distribution of animal or plant populations, traffic patterns in
highways or airports, weather phenomena, etc. A simple application of a Monte Carlo method
to simulate the patterns of traffic through a service station is shown below.
Simulating traffic through a service station
Suppose we want to simulate the traffic through a service station in which only one customer
can be serviced at a time. We also assume that once a customer arrives to the service station,
he or she will not leave until service is provided. This is a simplistic model, but it could be
used to simulate a vehicle service station in a city or highway, a medical emergency room, a
highway service station for state or privately own trucks, a store, etc.
The first customer arrives at a certain arrival time, AT1 (Arrival Time). He or she is taken care
of right away so that the starting time of service for customer 1, ST1 (Starting Time), coincides
with his or her arrival time, thus, ST1 = AT1. The waiting time for customer 1 is, therefore,
zero, i.e., WT1 = 0. The number of customers awaiting service at this point is also zero, i.e.,
Download at InfoClearinghouse.com
57
© 2001 Gilberto E. Urroz
NW1 = 0. The time required to service this first customer is referred to as TS1 (Time of
Service). The first customer leaves the service station at time ET1 = ST1 + TS1 (Ending Time).
The second customer arrives at the service station at a time AT2. If AT2 < ET1 (i.e., the second
customer arrives before service for the first one has finished), the second customer must wait
until the first customer leaves, so that ST2 becomes ET1 (ST2 = ET1). In this case, we can
calculate a waiting time for the second customer equal to WT2 = ET1 - AT2. Also, the number of
customers waiting for service at this point is NW2 = 1. If, instead, the second customer arrives
at a time AT2 ≥ ET1, then ST2 = AT2, and WT2 = 0. In any event, the ending time for the second
customer is calculated as ET2 = ST2 + TS2.
We define the inter-arrival time between customers 1 and 2 as IAT1 = AT2 - AT1. In general, the
inter-arrival time between customers i and i+1 is IATi = ATi+1 - ATi. The inter-arrival time (IATi)
and the time of service (TSi) are considered random variables of discrete nature. Thus, IATi
and TSi constitute random input to the model.
Suppose that we want to simulate the operation of the service center for n customers, we first
generate n-1 values of inter-arrival time {IAT1, IAT2, …, IATn-1}, as well as n values of the
service time {TS1, TS2, …, TSn}. Then, we proceed to calculate the arrival times as
ATi+1 = ATi + IATi, i = 1, 2, …, n-1.
As indicated earlier, the starting and ending times for the first customer are ST1 = AT1, ET1 =
ST1 + TS1. Also, the waiting time and number of customers waiting at this stage are both zero,
i.e., WT1 = 0, and NW1 = 0. The starting time for customer 2 is obtained as follows:
If AT2 > ET1, then ST2 = AT2, WT2 = 0, NW2 = 0
If AT2 < ET1, then ST2 = ET1, WT2 = ET1 - AT2, and NW2 = 1.
For the third customer, we need to check the arrival time, AT3, against the ending times of
both the first and second customers so we can determine the starting time, the waiting time,
and the number of customers waiting at that point. The following piece of pseudo-code can
be used to determine such values:
for j = 2:n
NWj = 0
WTj = 0
for k = 1:j-1
if ATj < ETk then
NWj = NWj + 1
WTj = ETk - ATj
STj = ETk
else
STj = ATj
end
end
ET(j) = ST(j)+TS(j)
End
An user-defined function to simulate traffic through a service
station
The steps outlined above are put together in the following function, service:
function [MR] = service(IAT,TS)
Download at InfoClearinghouse.com
58
© 2001 Gilberto E. Urroz
//Simulation of traffic in a service station
//Given n-1 values of inter-arrival time IAT
//and n values of time of service TS.
//Results:
//Arrival time = AT, Starting time = ST
//Ending time = ET, Waiting time = WT
//Number of waiting customers
= NW
//
n = length(TS);
AT = zeros(1,n);
ST = zeros(1,n);
ET = zeros(1,n);
NW = zeros(1,n);
WT = zeros(1,n);
IATT = [IAT 0];
ST(1) = AT(1);
ET(1) = ST(1) + TS(1);
for j = 2:n
AT(j) = AT(j-1) + IAT(j-1);
end;
for j = 2:n
NW(j) = 0;
WT(j) = 0;
for k = 1:j-1
if AT(j) < ET(k) then
NW(j) = NW(j) + 1;
WT(j) = ET(k) - AT(j);
ST(j) = ET(k);
else
ST(j) = AT(j);
end;
end;
ET(j) = ST(j)+TS(j);
end;
disp(' ');
printf('===============================================================\n');
printf(' j
AT
IAT
ST
TS
ET
WT NW \n');
printf('===============================================================\n');
for j = 1:n
printf('%3.0f %8.2f %8.2f %8.2f %8.2f %8.2f %8.2f %3.0f\n',...
j,AT(j),IATT(j),ST(j),TS(j),ET(j),WT(j),NW(j));
end;
printf('===============================================================\n');
MR = [AT' IATT' ST' TS' ET' WT' NW']; //Matrix of Results
printf('AT = arrival times
IAT = inter-arrival times \n');
printf('ST = starting times TS = time of service \n');
printf('ET = ending times
WT = waiting times \n');
printf('NW = number of customers waiting \n');
disp('
AT
IAT
ST
TS
ET
WT
NW');
//end function service
As an example, suppose that we have the following inter-arrival times (IAT) and times of
service (TS):
Download at InfoClearinghouse.com
59
© 2001 Gilberto E. Urroz
-->IAT = [ 0.5 0.75 0.5 0.25 0.5];
-->TS = [ 1 2 1 1 2 1];
We can load function service and run it with the values of IAT and TS defined earlier to obtain
the following results:
-->Matrix_of_results = service(IAT,TS)
===============================================================
j
AT
IAT
ST
TS
ET
WT NW
===============================================================
1
0.00
.50
0.00
1.00
1.00
0.00
0
2
.50
.75
1.00
2.00
3.00
.50
1
3
1.25
.50
3.00
1.00
4.00
1.75
1
4
1.75
.25
4.00
1.00
5.00
2.25
2
5
2.00
.50
5.00
2.00
7.00
3.00
3
6
2.50
0.00
7.00
1.00
8.00
4.50
4
===============================================================
AT = arrival times
IAT = inter-arrival times
ST = starting times TS = time of service
ET = ending times
WT = waiting times
NW = number of customers waiting
AT
IAT
ST
Matrix_of_results =
!
!
!
!
!
!
0.
.5
1.25
1.75
2.
2.5
.5
.75
.5
.25
.5
0.
0.
1.
3.
4.
5.
7.
TS
1.
2.
1.
1.
2.
1.
ET
1.
3.
4.
5.
7.
8.
WT
0.
.5
1.75
2.25
3.
4.5
NW
0.
1.
1.
2.
3.
4.
!
!
!
!
!
!
The function is designed to provide a table of results, as well as a matrix summarizing the
results in case that additional operations on those results are required within SCILAB. The
function, as applied in this case, is purely deterministic in the sense that for the given input we
get a unique result. To work out a stochastic modeling of traffic through a service station we
need to provide random input. The following example shows how to obtain that random input.
Modeling traffic through a service station with random input
Suppose that the inter-arrival times and time of service for the service station model follows
the probability distributions shown in the following table:
x = IAT
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Download at InfoClearinghouse.com
FX(x)
0.05
0.10
0.20
0.35
0.45
0.50
0.70
0.75
0.95
1.00
x = TS
0.25
0.50
0.75
1.00
1.25
1.50
1.75
2.00
60
FX(x)
0.10
0.20
0.40
0.70
0.80
0.90
0.95
1.00
© 2001 Gilberto E. Urroz
We want to analyze the traffic through the service station for 10 customers by generating 9
inter-arrival times and 10 service times from these generations. The inter-arrival times and
times of service can be generated using function discrand as follows:
-->getf('discrand')
-->xIAT = [0.1:0.1:1.0]; FIAT = [0.05,0.1,0.2,0.35,0.45,0.5,0.7,0.75,0.95,1.0];
-->xTS = [0.25:0.25:2]; FTS = [0.1,0.2,0.4,0.7,0.8,0.9,0.95,1];
-->IAT = discrand(1,9,xIAT,FIAT)
IAT =
!
.4
.7
.7
.5
//generate IAT data
.4
-->TS = discrand(1,10,xTS,FTS)
TS =
!
1.
.75
1.
.75
.7
.5
.9
.1 !
//generate TS data
.5
1.25
.75
.5
1.
.5 !
With these values of IAT and ST we now call function service:
-->M = service(IAT,TS)
===============================================================
j
AT
IAT
ST
TS
ET
WT NW
===============================================================
1
0.00
.40
0.00
1.00
1.00
0.00
0
2
.40
.70
1.00
.75
1.75
.60
1
3
1.10
.70
1.75
1.00
2.75
.65
1
4
1.80
.50
2.75
.75
3.50
.95
1
5
2.30
.40
3.50
.50
4.00
1.20
2
6
2.70
.70
4.00
1.25
5.25
1.30
3
7
3.40
.50
5.25
.75
6.00
1.85
3
8
3.90
.90
6.00
.50
6.50
2.10
3
9
4.80
.10
6.50
1.00
7.50
1.70
3
10
4.90
0.00
7.50
.50
8.00
2.60
4
===============================================================
AT = arrival times
IAT = inter-arrival times
ST = starting times TS = time of service
ET = ending times
WT = waiting times
NW = number of customers waiting
M
!
!
!
!
!
!
!
!
!
!
AT
=
IAT
ST
0.
.4
1.1
1.8
2.3
2.7
3.4
3.9
4.8
4.9
.4
.7
.7
.5
.4
.7
.5
.9
.1
0.
0.
1.
1.75
2.75
3.5
4.
5.25
6.
6.5
7.5
TS
1.
.75
1.
.75
.5
1.25
.75
.5
1.
.5
ET
WT
1.
1.75
2.75
3.5
4.
5.25
6.
6.5
7.5
8.
NW
0.
.6
.65
.95
1.2
1.3
1.85
2.1
1.7
2.6
0.
1.
1.
1.
2.
3.
3.
3.
3.
4.
!
!
!
!
!
!
!
!
!
!
Out of the matrix of results, M, we can extract individual columns of data, for example, the
waiting time data corresponds to the sixth column of M:
Download at InfoClearinghouse.com
61
© 2001 Gilberto E. Urroz
-->WT = M(:,6)
WT =
!
!
!
!
!
!
!
!
!
!
0.
.6
.65
.95
1.2
1.3
1.85
2.1
1.7
2.6
!
!
!
!
!
!
!
!
!
!
The number of waiting customers is extracted from the seventh column of matrix M:
-->NW = M(:,7)
NW =
!
!
!
!
!
!
!
!
!
!
0.
1.
1.
1.
2.
3.
3.
3.
3.
4.
!
!
!
!
!
!
!
!
!
!
The columns of data extracted from the matrix of results, M, can be used to obtain statistics
such as the mean and standard deviation:
-->WT_mean = mean(WT), WT_sdev = st_deviation(WT)
WT_mean = 1.295
WT_sdev = .7836701
-->NW_mean = mean(NW), NW_sdev = st_deviation(NW)
NW_mean = 2.1
NW_sdev = 1.2866839
We can also function normplot to check how close the data is to normality:
-->getf('normplot')
-->normplot(NW')
Download at InfoClearinghouse.com
62
© 2001 Gilberto E. Urroz
-->normplot(WT')
STIXBOX: a rudimentary statistics toolbox
STIXBOX (an abbreviation of statistical toolbox) is a collection of functions that perform
selected statistical and probability calculations. STIXBOX is available for download from the
SCILAB main web page (http://www-rocq.inria.fr/SCILAB/). Instructions for its installation are
provided with the downloaded functions. The package includes a set of help manual pages
that briefly describe the operation of the functions. Once loaded, the manual pages are
available through the main SCILAB Help window.
Probability mass and probability density functions
Probability mass functions or pmf (for discrete random variables) and probability density
functions of pdf (for continuous random variables) start with the letter d, e.g., dbeta, dbinom,
etc. Mass distribution functions are referred to by pX(k) = P[X=k], and probability density
functions by fX(x).
Thus, if X ~ Binomial(n,p) with n = 10, p = 0.5, P[X=2] = pX(2) =
dbinom(2,10,0.5). And, if X ~ Normal(µ,σ2) with µ = 1.5, σ = 0.2, then fX(1.75) =
dnorm(1.75,1.5,0.2). The following probability mass and density functions are defined:
dbeta
dbinom
dchisq
df
dgamma
dhypgeo
dnorm
dt
the beta density function
the binomial probability function
the chisquare density function
The F density function [modified by the author, 2/1/2001]
the gamma density function
the hypergeometric probability function
the normal density function [modified by the author, 2/1/2001]
the student t density function
Cumulative distribution functions
Cumulative distribution functions (cdf) are referred to as distribution functions if dealing with
continuous variable, or as cumulative probability function if dealing with discrete variables.
All cdfs in this package start with a p: pbeta, pbinom, etc. Both, discrete and continuous cdfs
are referred to by FX(x) = P[X≤x]. Thus, if X ~ Binomial(n,p) with n = 10, p = 0.5, P[X≤2] =
Download at InfoClearinghouse.com
63
© 2001 Gilberto E. Urroz
FX(2) = pbinom(2,10,0.5). And, if X ~ Normal(µ,σ2) with µ = 1.5, σ = 0.2, then FX(1.75) =
pnorm(1.75,1.5,0.2). The following cumulative distribution functions are defined:
pbeta
pbinom
pchisq
pf
pgamma
phypge
pnorm
pt
the beta distribution function
the binomial cumulative probability function
the chisquare distribution function
The F distribution function
the gamma distribution function
the hypergeometric cumulative probability function
the normal distribution function
the student t cdf (modified by the author, 2/1/2001)
Inverse cumulative distribution functions
Inverse cumulative distribution functions start with q: qbeta, qbinom, etc. . If FX(q) = P[X≤q] =
p, then q = FX-1(p). The value q is also referred to as a quantile of the distribution. The
following inverse cumulative distribution functions are defined:
qbeta
qbinom
qchisq
qf
qgamma
qhypg
qnorm
qt
quantile
the beta inverse distribution function
the binomial inverse cdf
the chisquare inverse distribution function
The F inverse distribution function
the gamma inverse distribution function
the hypergeometric inverse cdf
the normal inverse distribution function
the student t inverse distribution function
empirical quantile (percentile).
Generating synthetic data
The generation of synthetic data that follows a particular distribution can be accomplished
with the following random number generators. The name of the random generator functions
begins with r: rbeta, rbinom, etc. Maple already provides function rand that produces
uniformly distributed random numbers (use help rand for more information). The functions
provided by STIXBOX generates random numbers that follow the distributions suggested by the
names of the functions. Thus, if you want to generate n = 10 data values x that follow the
normal distribution, with µ = 0.5, and σlnX = 0.1, use rnorm(10,0.5,0.1).
rbeta
rbinom
rchisq
rexpweib
rf
rgamma
rgeom
rhypg
rjbinom
rjgamma
rjpoiss
rnorm
rjpoiss
random numbers from the beta distribution
random numbers from the binomial distribution
random numbers from the chisquare distribution
random numbers from the exponential or weibull distributions
random numbers from the F distribution
random numbers from the gamma distribution
random numbers from the geometric distribution
random numbers from the hypergeometric distribution
random numbers from the binomial distribution (reject method)
generates gamma random deviates (reject method)
random numbers from the poisson distribution (reject method)
normal random numbers
random numbers from the poisson distribution (renewal method)
Download at InfoClearinghouse.com
64
© 2001 Gilberto E. Urroz
rt
random numbers from the student t distribution
Logistic regression
These functions involve the logistic population growth model (see, for example, Example 8.3,
page 504, in Kottegoda, N.T. and R. Rosso, 1997, Probability, Statistics, and Reliability for
Civil and Environmental Engineers, The McGraw-Hill Companies, Inc., New York).
lodds
loddsinv
logitfit
log odds function.
compute the inverse of log odds.
fit a logistic regression model.
Statistical graphics
Functions to produce a variety of statistical graphics. A normal probability paper plot is
obtained by using qqnorm. Probability paper plots are also referred to as Q-Q plots. For that
reason the corresponding function names start with qq, e.g., qqgamma, qqnorm, etc. Also of
interest are functions histo, plotsym.
histo
identify
pairs
plotdens
plotsym
qqnorm
qqplot
plot a histogram
identify points on a plot by clicking with the mouse.
pairwise scatter plots (does not work)
draw a nonparametric density estimate.
plot with symbols
normal probability paper
plot empirical quantile vs empirical quantile
Binomial coefficients
bincoef
calculates binomial coefficients: (n r) = n!/(r!(n-r)!),
Resampling methods
These methods apply to the process of resampling by which an attempt is made to remove any
existing bias in the sample. For a quick introduction to jackknife (named so because the
jackknife, like this method, is an useful tool) and the bootstrap (named so from the expression
"lifting oneself by one's bootstraps"), see pp. 116-117 in Kottegoda, N.T. and R. Rosso, 1997,
Probability, Statistics, and Reliability for Civil and Environmental Engineers, The McGraw-Hill
Companies, Inc., New York.
covboot
covjack
stdboot
stdjack
rboot
ciboot
test1b
bootstrap estimate of the variance of a parameter estimate.
Jackknife estimate of the variance of a parameter estimate.
bootstrap estimate of the parameter standard deviation.
Jackknife estimate of the standard deviation of a parameter.
simulate a bootstrap resample from a sample.
various bootstrap confidence interval.
bootstrap t test and confidence interval for the mean.
Download at InfoClearinghouse.com
65
© 2001 Gilberto E. Urroz
Tests, confidence intervals, and model estimation
These are functions related to statistical inference. Of interest for this class are the functions
lsfit, testln, and test2r. Use the help function to obtain additional information on the
functions.
cmpmod
ciquant
kstwo
linreg
lsfit
lsselect
test1n
test1r
test2n
test2r
compare small linear model versus large one
nonparametric confidence interval for quantile
Kolmogorov-Smirnov statistic from two samples (needs function pks)
linear or polynomial regression
fit a multiple regression model.
select a predictor subset for regression
tests and confidence intervals based on a normal sample
test for median equals 0 using rank test
tests and confidence intervals based on two normal samples
test for equal location of two samples using rank test
Stixbox demonstrations
These are SCILAB functions that demonstrate some of the functions contained in STIXBOX
stixdemo
stixtest
demonstrate various stixbox routines.
a second demo for stixbox
Famous datasets
Function getdata is used to load well-known datasets into the SCILAB environment. The data
sets included are:
1 Phosphorus Data
2 Scottish Hill Race Data
3 Salary Survey Data
4 Health Club Data
5 Brain and Body Weight Data
6 Cement Data
7 Colon Cancer Data
8 Growth Data
9 Consumption Function
10 Cost-of-Living Data
11 Demographic Data
To activate function getdata and load data into variable x use:
--> x = getdata()
This function produces a dialog box displaying the list of data sets. The user can type in the
number of the data set and get back some information about the data set before the set is
loaded. The dialog box produced by getdata() is shown below.
Download at InfoClearinghouse.com
66
© 2001 Gilberto E. Urroz
The dialog box shows that we have selected data set number 5. Pressing [OK] will load the
data as well as provide information as shown below.
Examples on probability distributions using STIXBOX
!Plot of the standard normal distribution:
-->z=-4:0.1:4;phi=dnorm(z,0,1);plot(z,phi,'z','phi(z)','standard normal')
Download at InfoClearinghouse.com
67
© 2001 Gilberto E. Urroz
!Plot of the Student-t distribution for ν = 2, 5, 10, 15, 20
-->t=-4.0:0.1:4;nu=[2,5,10,15,20];
-->for k=1:5,f=dt(t,nu(k));plot2d(t,f,k,'011',' ',[-4 0 4 0.4]), end
-->xtitle('Student t distribution','t','f(t)')
!Plot of the chi-square distribution for nu=5
-->x=0:0.1:20;nu=5;f=dchisq(x,nu);
-->plot(x,f,'x','f(x)','Chi-square distribution, nu=5')
!Plot the F distribution for nu1=5 and nu2=10:
-->x=0:0.1:5;nu1=5;nu2=10;f=df(x,nu1,nu2);
-->plot(x,f,'F','f(F)','F distribution, nu1=5, nu=10')
Download at InfoClearinghouse.com
68
© 2001 Gilberto E. Urroz
!Determining zα, such that P(Z>zα) > α, or P(Z<zα) > 1- α. Also, zα/2 is such that P(Z>z ) >
α/2
α/2, or P(Z<zα/2) > 1- α/2:
-->alpha = 0.05; z_alpha=qnorm(1-alpha), z_alpha2=qnorm(1-alpha/2)
z_alpha
= 1.6448536
z_alpha2 = 1.959964
!Determining tν,α, such that P(T>tα) > α, or P(T<tα) > 1- α. Also tν,α/2 is such that P(T>t ) >
α/2
α/2, or P(T<tα/2) > 1- α/2:
-->nu=10;alpha=0.01;t_alpha=qt(1-alpha,nu),t_alpha2=qt(1-alpha/2,nu)
t_alpha
= 2.7637695
t_alpha2 = 3.1692727
!Determining χ2ν,α, such that P(X2>χ2α) > α, or P(X2>χ2α) > 1- α. Similar definitions are used
to calculate the values χ2ν,1−α, χ2ν,α/2, χ2ν,1−α/2:
-->nu=6;alpha=0.10;X_alpha=qchisq(1-alpha,nu)
X_alpha =
10.644641
-->X_alpha2=qchisq(1-alpha/2,nu)
X_alpha2 =
12.591587
-->nu=6;alpha=0.10;X_alpha=qchisq(alpha,nu)
X_alpha =
2.2041307
-->X_alpha2=qchisq(alpha/2,nu)
X_alpha2 =
1.6353829
!Generating 20 data points that follow the Weibull distribution, and producing a normal
probability plot for such data:
-->x = rexpweib(20,3,5); qqnorm(x,'o')
Download at InfoClearinghouse.com
69
© 2001 Gilberto E. Urroz
!Generating 200 data points that follow the binomial distribution. A histogram of the data is
then produced.
-->x = rbinom(200,10,0.35); histo(x);
Other options for function histo( ),using 8 suggested classes (or bins). Parameter odd = 0. The
function histo( )chooses 6 classes:
-->histo(x,8,0)
Download at InfoClearinghouse.com
70
© 2001 Gilberto E. Urroz
In the next call, we suggest 15 classes, and the odd parameter takes a value odd = 1:
-->histo(x,15,1)
The next call scales area in the histogram bars so that the total area is equal to 1:
-->histo(x,8,0,1)
Download at InfoClearinghouse.com
71
© 2001 Gilberto E. Urroz
Exercises
[1]. The probability of a flood occurring in a particular section of a river in a given month is
estimated, form existing records, to be 0.15. (a) What is the probability that there will be
three months of flood in the next year. (b) What is the probability that there will be less than
6 months of flood in the next year.
[2]. Data kept at an airport shows an average of five cars per minute stopping to leave or pick
up passengers in the terminal curb. (a) What is the probability that in the next minute there
will be 10 or more cars stopping at the curb? (b) What is the probability that there will be no
cars at the curb in a given minute.
[3] It is known that 25 out of a batch of 200 concrete cylinders were prepared using a defective
type of cement. If a laboratory receives a sample of 15 of those cylinders, what is the
probability that the sample will contain 5 of the defective cylinders?
[4]. If a factory is known to produce 5% defective truck tires, what is the probability that in a
given assembly line the first defective tire is detected after 20 tires have come out of the
assembly line? What is the probability that the first defective tire is detected after 10 tires
have come out of the assembly line?
[5]. The time required to finish the construction of a mile of a particular highway is known to
have a normal distribution with a mean value of 3.5 days and a standard deviation of 0.5 days.
What is the probability that the next mile in the road will be completed between 3 and 5 days?
What is the probability that the construction of the next mile of the road will take more than 7
days?
[6]. Let X represent the intensity of an earthquake in a particular scale. If X is modeled using
the exponential distribution with parameter β = 6.5, determine the probability that the
intensity of the next earthquake will be 3.5 or less. Also, determine the probability that the
intensity of the earthquake will be between 2.5 and 4.5.
[7]. The gamma distribution, with parameters α =1.2, and β = 0.5, is used to model the time
of failure (in hours) of an electronic component. Determine the probability that a particular
component will last 100 hours or more. Determine the probability that the component will last
less than 2 hours.
[8]. If the wind velocity in miles per hour near a harbor is assumed to follow a Weibull
distribution with parameters α = 2 and β = 3, determine the probability of the wind velocity
being between 15 and 75 mph. Also, determine the probability of the wind velocity being
larger than 10 mph.
[9]. For a large value of n, the Binomial distribution can be approximated by the normal
Suppose that you receive a shipment of
distribution with parameters µ = np, σ = np(1-p).
1000 resistors produced by a machine that is know to produce 0.5% defective resistors. What is
the probability that there will be more than 200 defective resistors in the shipment by using:
(a) the normal distribution approximation to the Binomial distribution, and (b) the Poisson
distribution to the Binomial distribution.
[10]. Plot the probability mass function, fX(x),
for the following discrete distributions:
(a) Binomial with n = 20, p = 0.25
(c) Binomial with n = 20, p = 0.75
(e) Geometric with p = 0.25, for x = 1,2,…,10
(b) Binomial with n = 20, p = 0.50
(d) Poisson with λ = 5.0, plot for x = 0,1,2…,10
(f) Geometric with p = 0.50, for x = 1,2,…,10
Download at InfoClearinghouse.com
72
and the cumulative distribution function, FX(x),
© 2001 Gilberto E. Urroz
(g) Geometric with p = 0.75, for x = 1,2,…,10
(i) Hypergeometric with N=40, n = 10, a = 20
(h) Hypergeometric with N=100,n=20,a=40
(j) Hypergeometric with N = 120,n = 80,a = 10
[11]. Let X be a discrete random variable that follows the binomial distribution with
parameters n and p. Let P0 = P(X ≤ x). Calculate:
(a) P0 given n = 20, p = 0.35, x = 5
(b) n given p = 0.25, x = 8, P0 = 0.80
(d) x given n = 10, p = 0.80, P0 = 0.30
(c) p given n = 25, x = 20, P0 = 0.75
[12]. Plot the probability density function, fX(x), and the cumulative distribution function,
FX(x), for the following continuous distributions:
(a) Gamma with α = 0.5, β = 1.5
(c) Beta with α = 0.5, β = 1.5
(e) Weibull with α = 0.5, β = 1.5
(g) Uniform with a = 2, b = 6
(i) Exponential with β = 12.5
(k) Normal with µ = 5, σ = 5
(m) Student t with ν = 4
(o) Chi-square with ν = 4
(q) F distribution with νN = 4, νD = 10
(b) Gamma with α = 2, β = 3
(d) Beta with α = 3, β = 2
(f) Weibull with α = 2, β = 2
(h) Uniform with a = -3, b = 3
(j) Exponential with β = 4.8
(l) Normal with µ = 150, σ = 25
(n) Student t with ν = 12
(p) Chi-square with ν = 12
(r) F distribution with νN = 4, νD = 10
[13]. Let X be a continuous random variable that follows the Gamma probability distribution
with parameters α and β. Let P0 = P(X ≤ x). Calculate:
(a) P0 given α = 2, β = 3, x = 3.5
(c) β given P0 = 0.60, α = 5, x = 10.5
(b) α given P0 = 0.40, β = 1.5, x = 1.2
(d) x given P0 = 0.20, α = 10.5, β = 0.3
[14]. Let X be a continuous random variable that follows the Beta probability distribution with
parameters α and β. Let P0 = P(X ≤ x). Calculate:
(a) P0 given α = 2, β = 3.5, x = 0.35
(c) β given P0 = 0.60, α = 2.5, x = 0.45
(b) α given P0 = 0.40, β = 2.3, x = 0.76
(d) x given P0 = 0.20, α = 10.5, β = 0.3
[15]. Let T be a continuous random variable that follows Student t distribution with ν degrees
of freedom. Let P0 = P(T ≤ t). Calculate:
(a) P0 given ν = 10, t = 1.5
(b) ν given P0 = 0.40, t = -0.8
(c) t given P0 = 0.20, ν = 8
[16]. Let χ2 be a continuous random variable that follows the chi-square distribution with ν
degrees of freedom. Let P0 = P(Χ2 ≤ χ2). Calculate:
(d) P0 given ν = 6, χ2 = 2.25
(e) ν given P0 = 0.40, χ2 = -0.8
(f) χ2 given P0 = 0.20, ν = 12
[17]. Let F be a continuous random variable that follows the F distribution with νN degrees of
freedom in the numerator and νD degrees of freedom in the denominator. Let P0 = P(F≤ F).
Calculate:
(a) P0 given νN = 4, νD = 10, F = 2.5
(c) νD given P0 = 0.60, νN = 3, F = 0.45
(b) νN given P0 = 0.40, νD = 15, F = 3.2
(d) F given P0 = 0.20, νN = 8, νD = 12
Download at InfoClearinghouse.com
73
© 2001 Gilberto E. Urroz
[18]. The following data represent measurements of the diameter of a cylinder produced for a
precision mechanism:
232.
246.
260.
244.
267.
248.
308.
247.
264.
243.
242.
221.
228.
243.
270.
250.
275.
274.
255.
275.
239.
261.
205.
261.
260.
244.
217.
254.
236.
281.
265.
260.
230.
226.
240.
262.
273.
252.
264.
257.
259.
228.
263.
260.
268.
236.
269.
255.
265.
231.
(a) Use function histnorm with a suitable number of classes to plot a histogram of the data as
well as the corresponding normal curve. (b) Use function normplot to produce a normal
probability plot of the data. (c) Based on these two plots, how well do the data follow the
normal distribution?
[19]. The following data set represents the time to failure, in years, of light bulbs.
1.39
.97
1.33
3.05
3.21
1.07
1.01
.82
.42
.74
3.22
.44
2.04
1.17
3.04
3.67
1.97
1.02
1.72
2.74
.55
1.9
.53
2.68
.83
.81
.89
.13
.56
.79
1.22
3.25
2.06
2.13
1.56
1.26
.85
2.96
1.56
1.55
.05
1.04
1.96
2.09
.96
1.54
.43
1.5
1.26
1.23
(a) Use function histnorm with a suitable number of classes to plot a histogram of the data as
well as the corresponding normal curve. (b) Use function normplot to produce a normal
probability plot of the data. (c) Based on these two plots, how well do the data follow the
normal distribution?
[20]. The following data set represents the yearly rainfall depth, in mm, recorded at a certain
location:
126.
408.
277.
135.
82.9
189.
13.7
215.
41.5
646.
52.3
106.
4.35
7.82
171.
201.
346.
313.
314.
51.
102.
17.4
60.6
43.
830.
165.
29.1
335.
12.8
24.5
468.
59.4
366.
32.6
887.
174.
471.
39.3
44.5
870.
(a) Use function histnorm with a suitable number of classes to plot a histogram of the data as
well as the corresponding normal curve. (b) Use function normplot to produce a normal
probability plot of the data. (c) Based on these two plots, how well do the data follow the
normal distribution?
[21]. The following data set represents the number of vehicles stopping at a service station in a
given hour:
3.
4.
6.
6.
4.
8.
5.
8.
9.
5.
5.
3.
6.
5.
10.
9.
9.
11.
4.
4.
7.
12.
8.
4.
5.
4.
4.
11.
7.
5.
9.
6.
3.
5.
5.
9.
4.
7.
5.
13.
3.
5.
4.
4.
9.
8.
6.
1.
11.
7.
9.
10.
5.
8.
4.
8.
11.
6.
5.
6.
(a) Use function histnorm with a suitable number of classes to plot a histogram of the data as
well as the corresponding normal curve. (b) Use function normplot to produce a normal
probability plot of the data. (c) Based on these two plots, how well do the data follow the
normal distribution?
[22]. Generate data sets consisting of k values that follow the indicated distribution with the
parameters listed below. Use functions histnorm and normplot to produce a histogram and a
Download at InfoClearinghouse.com
74
© 2001 Gilberto E. Urroz
normal probability plot of the data. How well do the data thus generated follow the normal
distribution based on the histogram and probability plot?
(a) Binomial, k = 200, n = 30, p = 0.7
(b) Poisson, k = 300, λ = 14.5
(c) Beta, k = 150, α =3.5, β = 5.2
(d) Gamma, k = 100, α =3.5, β = 5.2
(e) Exponential, k = 500, µ = 5.75
(f) Normal, k=180, µ = 5.75, σ = 1.2
(g) Chi-square, k = 230, ν = 5
(h) F-distribution, k = 350, νN = 5, νD = 5
(i) Uniform integer, k = 125, a = -50, b = 50
(j) Uniform real, k = 200, a = 5.5, b = 17.5
(k) Weibull, k = 200, α =7.2, β = 2.1
(l) Student’s t, k = 150, ν = 12
(m) Log-normal, k = 200, µln(X) = 1.2, σln(x) = 0.5
[23]. Generate data sets consisting of 250 values that follow the discrete distribution
described by the following probability mass function:
x
1.2
2.3
4.1
5.2
6.1
7.2
8.4
9.3
11.1
fX(x)
0.04
0.08
0.12
0.16
0.08
0.04
0.20
0.24
0.04
Use functions histnorm and normplot to produce a histogram and a normal probability plot of
the data. How well do the data thus generated follow the normal distribution based on the
histogram and probability plot?
[24]. Function service was developed to simulate the traffic through a service station. Use
function service to produce a simulation of traffic through a service station that takes as input
50 values of the inter-arrival time (IAT) and 50 values of the time of service (TS) generated out
of the following cumulative distribution functions:
x=IAT
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
1.8
2.0
FX(x)
0.03
0.14
0.08
0.12
0.23
0.10
0.05
0.05
0.10
0.10
x=TS
0.4
0.8
1.2
1.6
2.0
2.4
FX(x)
0.05
0.15
0.35
0.25
0.15
0.05
Use functions histnorm and normplot to produce a histogram and a normal probability plot of
the waiting time (WT) and number of customers waiting (NW). How well do the WT and NW
data follow the normal distribution?
[25]. One-dimensional random walk. Consider a particle that moves along a straight line
subject to a random motion. The particle starts at x1 = 0 and moves to position x2 = x1 + ∆x1,
where ∆x1 is a random number. The next position of the particle is x3 = x2 + ∆x2, where ∆x2 is a
second random number. Subsequent positions of the particle are given by xk+1 = xk + ∆xk. The
Download at InfoClearinghouse.com
75
© 2001 Gilberto E. Urroz
random numbers used must include both positive and negative values so that the particle can
move forward and backward.
(a) Plot the position xk vs. k for a one-dimensional random walk that involves 300
displacements ∆xk generated from a normal distribution with µ = 0 and σ = 1.
(b) Plot the position xk vs. k for a one-dimensional random walk that involves 300
displacements ∆xk generated from a uniform distribution between -1 and 1.
[26]. Two-dimensional random walk. A two-dimensional random walk involves the displacement
of a particle from a point (xk,yk) to a point (xk+1,yk+1) so that
xk+1 = xk + rk cos(θk), and xk+1 = xk + rk sin(θk),
where the values rk and θk are random numbers.
(a) Plot the two-dimensional random walk that results form 200 values of rk with a normal
distribution with mean µ = 1 and standard deviation σ = 0.2, and 200 values of θk
uniformly distributed between 0 and 2π.
(b) Plot the two-dimensional random walk that results form 100 values of rk with a Weibull
distribution with parameters α = 2 and β = 3, and 200 values of θk uniformly distributed
between 0 and 2π.
(c) Plot the two-dimensional random walk that results form 150 values of rk with a Gamma
distribution with parameters α = 0.2 and β = 1.3, and 200 values of θk normally
distributed with mean µ = π and standard deviation σ = π/2.
(d) Plot the two-dimensional random walk that results form 250 values of rk with a Beta
distribution with parameters α = 2 and β = 3, and 200 values of θk uniformly distributed
between 0 and 2π.
[27]. The following table shows the annual maximum flow for the Ganga River in India
measured at specific station.
Year
1885
1886
1887
1888
1889
1890
1891
1892
1893
1894
1895
1896
1897
1898
1899
3
Q(m /s)
7241
9164
7407
6870
9855
11887
8827
7546
8498
16757
9680
14336
8174
8953
7546
Year
1907
1908
1909
1910
1911
1912
1913
1914
1915
1916
1917
1918
1919
1920
1921
Download at InfoClearinghouse.com
3
Q(m /s)
7546
11504
8335
15077
6493
8335
3579
9299
7407
4726
8416
4668
6296
8174
9079
Year
1929
1930
1931
1932
1933
1934
1935
1936
1937
1938
1939
1940
1941
1942
1943
76
3
Q(m /s)
4545
5998
3470
6155
5267
6193
5289
3320
3232
3525
2341
2429
3154
6650
4442
Year
1951
1952
1953
1954
1955
1956
1957
1958
1959
1960
1961
1962
1963
1964
1965
3
Q(m /s)
4458
3919
5470
5978
4644
6381
4548
4056
4493
3884
4855
5760
9192
3024
2509
© 2001 Gilberto E. Urroz
1900
1901
1902
1903
1904
1905
1906
6652
11409
9164
7404
8579
9362
7092
1922
1923
1924
1925
1926
1927
1928
7407
5482
19136
9680
3698
7241
3698
1944
1945
1946
1947
1948
1949
1950
4229
5101
4629
4345
4890
3619
5899
1966
1967
1968
1969
1970
1971
4741
5919
3789
4546
3842
4542
(a) Use function histnorm with a suitable number of classes to plot a histogram of the data
as well as the corresponding normal curve.
(b) Use function normplot to produce a normal probability plot of the data.
(c) Based on these two plots, how well do the data follow the normal distribution?
The following problems require that you load the functions from the Stixbox SCILAB
toolbox.
[28]. Using function getdata() load data set number 1, described as:
__________________________________________________________________________________
************************ Phosphorus Data **********************************
Source:
Snedecor, G. W. and Cochran, W. G. (1967),Statistical Methods,
(6 Edition), Iowa State University, Ames, Iowa, p. 384.
Taken From: Chatterjee and Hadi (1988), p. 82.
Dimension:
18 observations on 3 variables
Description: An investigation of the source from which corn plants obtain
their phosphorus was carried out. Concentrations of phosphorus
in parts per millions in each of 18 soils were measured.
Column
1
2
3
Description
Concentrations of inorganic phosphorus in the soil
Concentrations of organic phosphorus in the soil
Phosphorus content of corn grown in the soil at 20 degrees C
__________________________________________________________________________________
(a) Separate the three columns of data into vectors x, y, and z, and use the user-defined
function describe to obtain statistics of each of the columns of data.
(b) Use Stixbox function histo to obtain histograms of each of the columns of data.
(c) Use Stixbox function qqnorm to obtain a normal probability plot of each of the data
columns.
[29].Using function getdata() load data set number 1, described as:
*********************** Scottish Hill Race Data *************************
(...lines removed...)
Column
1
2
3
Definition
Distance (miles)
Climb (ft)
Time (seconds)
__________________________________________________________________________________
Download at InfoClearinghouse.com
77
© 2001 Gilberto E. Urroz
(d) Separate the three columns of data into vectors x, y, and z, and use the user-defined
function describe to obtain statistics of each of the columns of data.
(e) Use Stixbox function histo to obtain histograms of each of the columns of data.
(f) Use Stixbox function qqnorm to obtain a normal probability plot of each of the data
columns.
REFERENCES (for all SCILAB documents at InfoClearinghouse.com)
Abramowitz, M. and I.A. Stegun (editors), 1965,"Handbook of Mathematical Functions with Formulas, Graphs, and
Mathematical Tables," Dover Publications, Inc., New York.
Arora, J.S., 1985, "Introduction to Optimum Design," Class notes, The University of Iowa, Iowa City, Iowa.
Asian Institute of Technology, 1969, "Hydraulic Laboratory Manual," AIT - Bangkok, Thailand.
Berge, P., Y. Pomeau, and C. Vidal, 1984,"Order within chaos - Towards a deterministic approach to turbulence," John
Wiley & Sons, New York.
Bras, R.L. and I. Rodriguez-Iturbe, 1985,"Random Functions and Hydrology," Addison-Wesley Publishing Company,
Reading, Massachussetts.
Brogan, W.L., 1974,"Modern Control Theory," QPI series, Quantum Publisher Incorporated, New York.
Browne, M., 1999, "Schaum's Outline of Theory and Problems of Physics for Engineering and Science," Schaum's
outlines, McGraw-Hill, New York.
Farlow, Stanley J., 1982, "Partial Differential Equations for Scientists and Engineers," Dover Publications Inc., New
York.
Friedman, B., 1956 (reissued 1990), "Principles and Techniques of Applied Mathematics," Dover Publications Inc., New
York.
Gomez, C. (editor), 1999, “Engineering and Scientific Computing with Scilab,” Birkhäuser, Boston.
Gullberg, J., 1997, "Mathematics - From the Birth of Numbers," W. W. Norton & Company, New York.
Harman, T.L., J. Dabney, and N. Richert, 2000, "Advanced Engineering Mathematics with MATLAB® - Second edition,"
Brooks/Cole - Thompson Learning, Australia.
Harris, J.W., and H. Stocker, 1998, "Handbook of Mathematics and Computational Science," Springer, New York.
Hsu, H.P., 1984, "Applied Fourier Analysis," Harcourt Brace Jovanovich College Outline Series, Harcourt Brace
Jovanovich, Publishers, San Diego.
Journel, A.G., 1989, "Fundamentals of Geostatistics in Five Lessons," Short Course Presented at the 28th International
Geological Congress, Washington, D.C., American Geophysical Union, Washington, D.C.
Julien, P.Y., 1998,”Erosion and Sedimentation,” Cambridge University Press, Cambridge CB2 2RU, U.K.
Keener, J.P., 1988, "Principles of Applied Mathematics - Transformation and Approximation," Addison-Wesley
Publishing Company, Redwood City, California.
Kitanidis, P.K., 1997,”Introduction to Geostatistics - Applications in Hydogeology,” Cambridge University Press,
Cambridge CB2 2RU, U.K.
Koch, G.S., Jr., and R. F. Link, 1971, "Statistical Analysis of Geological Data - Volumes I and II," Dover Publications,
Inc., New York.
Korn, G.A. and T.M. Korn, 1968, "Mathematical Handbook for Scientists and Engineers," Dover Publications, Inc., New
York.
Kottegoda, N. T., and R. Rosso, 1997, "Probability, Statistics, and Reliability for Civil and Environmental Engineers,"
The Mc-Graw Hill Companies, Inc., New York.
Kreysig, E., 1983, "Advanced Engineering Mathematics - Fifth Edition," John Wiley & Sons, New York.
Download at InfoClearinghouse.com
78
© 2001 Gilberto E. Urroz
Lindfield, G. and J. Penny, 2000, "Numerical Methods Using Matlab®," Prentice Hall, Upper Saddle River, New Jersey.
Magrab, E.B., S. Azarm, B. Balachandran, J. Duncan, K. Herold, and G. Walsh, 2000, "An Engineer's Guide to
MATLAB®", Prentice Hall, Upper Saddle River, N.J., U.S.A.
McCuen, R.H., 1989,”Hydrologic Analysis and Design - second edition,” Prentice Hall, Upper Saddle River, New Jersey.
Middleton, G.V., 2000, "Data Analysis in the Earth Sciences Using Matlab®," Prentice Hall, Upper Saddle River, New
Jersey.
Montgomery, D.C., G.C. Runger, and N.F. Hubele, 1998, "Engineering Statistics," John Wiley & Sons, Inc.
Newland, D.E., 1993, "An Introduction to Random Vibrations, Spectral & Wavelet Analysis - Third Edition," Longman
Scientific and Technical, New York.
Nicols, G., 1995, “Introduction to Nonlinear Science,” Cambridge University Press, Cambridge CB2 2RU, U.K.
Parker, T.S. and L.O. Chua, , "Practical Numerical Algorithms for Chaotic Systems,” 1989, Springer-Verlag, New York.
Peitgen, H-O. and D. Saupe (editors), 1988, "The Science of Fractal Images," Springer-Verlag, New York.
Peitgen, H-O., H. Jürgens, and D. Saupe, 1992, "Chaos and Fractals - New Frontiers of Science," Springer-Verlag, New
York.
Press, W.H., B.P. Flannery, S.A. Teukolsky, and W.T. Vetterling, 1989, “Numerical Recipes - The Art of Scientific
Computing (FORTRAN version),” Cambridge University Press, Cambridge CB2 2RU, U.K.
Raghunath, H.M., 1985, "Hydrology - Principles, Analysis and Design," Wiley Eastern Limited, New Delhi, India.
Recktenwald, G., 2000, "Numerical Methods with Matlab - Implementation and Application," Prentice Hall, Upper
Saddle River, N.J., U.S.A.
Rothenberg, R.I., 1991, "Probability and Statistics," Harcourt Brace Jovanovich College Outline Series, Harcourt Brace
Jovanovich, Publishers, San Diego, CA.
Sagan, H., 1961,"Boundary and Eigenvalue Problems in Mathematical Physics," Dover Publications, Inc., New York.
Spanos, A., 1999,"Probability Theory and Statistical Inference - Econometric Modeling with Observational Data,"
Cambridge University Press, Cambridge CB2 2RU, U.K.
Spiegel, M. R., 1971 (second printing, 1999), "Schaum's Outline of Theory and Problems of Advanced Mathematics for
Engineers and Scientists," Schaum's Outline Series, McGraw-Hill, New York.
Tanis, E.A., 1987, "Statistics II - Estimation and Tests of Hypotheses," Harcourt Brace Jovanovich College Outline
Series, Harcourt Brace Jovanovich, Publishers, Fort Worth, TX.
Tinker, M. and R. Lambourne, 2000, "Further Mathematics for the Physical Sciences," John Wiley & Sons, LTD.,
Chichester, U.K.
Tolstov, G.P., 1962, "Fourier Series," (Translated from the Russian by R. A. Silverman), Dover Publications, New York.
Tveito, A. and R. Winther, 1998, "Introduction to Partial Differential Equations - A Computational Approach," Texts in
Applied Mathematics 29, Springer, New York.
Urroz, G., 2000, "Science and Engineering Mathematics with the HP 49 G - Volumes I & II", www.greatunpublished.com,
Charleston, S.C.
Urroz, G., 2001, "Applied Engineering Mathematics with Maple", www.greatunpublished.com, Charleston, S.C.
Winnick, J., , "Chemical Engineering Thermodynamics - An Introduction to Thermodynamics for Undergraduate
Engineering Students," John Wiley & Sons, Inc., New York.
Download at InfoClearinghouse.com
79
© 2001 Gilberto E. Urroz