mathStatica: Symbolic Computational Statistics

advertisement
mathStatica: Symbolic Computational Statistics
Colin ROSE1 and Murray D. SMITH2
1
Theoretical Research Institute, Sydney, AUSTRALIA
Email: colin@tri.org.au
2
Econometrics and Business Statistics, The University of Sydney,
AUSTRALIA
Email: murray.smith@econ.usyd.edu.au
Summary. mathStatica is a general toolset for doing exact (symbolic)
mathematical statistics with a computer algebra system. It provides automated statistical operators for taking expectations, finding probabilities,
deriving transformations of random variables, finding moments, cumulative distribution functions, characteristic functions, and other generating
functions – all for arbitrary user-defined distributions. mathStatica v1 accompanies the book: Rose and Smith (2002). This paper illustrates some of
the latest algorithms/functions that are forthcoming in mathStatica v2
(2007). In particular, the paper highlights new algorithms in automated
transformations of random variables, many-to-one transformations such as
deriving the pdf of min(X, Y, Z, …), piecewise functions, and order statistics with non-identical parent distributions.
Keywords: computational statistics, order statistics, transformations,
mathStatica
1. Order Statistics
Let random variable X have, say, a Logistic distribution with pdf f(x)
which we enter into mathStatica as:
1344
f =
In[1]:=
e−x
;
(1+ e−x ) 2
domain[f ] = {x, –∞, ∞};
Let (X1, X2, …, Xn) denote a random sample of size n drawn on X, and let
(X , X , …, X ) denote the ordered sample so that X < X < … <
(1)
(2)
(n)
(1)
(2)
X(n). The pdf of the rth order statistic, X(r), is given immediately by the
mathStatica function:
In[2]:=
OrderStat[r, f ]
Out[2]=
(1+ e−x )−r (1+ e x )−1−n +r n!
(n − r)!(r −1)!
The line above labelled In[2] is the mathStatica input we enter into
the computer, namely OrderStat[r, f ]. The line labelled Out[2] is
the output returned by mathStatica. Figure 1 below illustrates the solution.
0.6
0.5
0.4
0.3
r1
r 10
0.2
0.1
6
4
2
2
4
6
Fig. 1. PDF of the rth order statistic (Logistic), as r increases from 1 to 10, given a
sample size of n=10
x
1345
In similar fashion, we can derive the joint pdf of two or more order statistics. Here is the joint pdf of X(r) and X(s), for r < s:
OrderStat[{r, s}, f ]
In[3]:=
Out[3]=
s−r
⎛⎜ 1
1 ⎞
⎟
e (1+ e ) (1+ e )
−
Γ[1+ n]
⎝ 1+e xr 1+e xs ⎠
(e xs − e x r ) Γ[r] Γ[1+ n − s] Γ[s − r]
xs
−x r −r
x s −1−n +s
2. Non-identical distributions: pdf of min(X, Y, Z)
mathStatica v2 generalises the above order statistic functionality to non-
identical distributions. This provides enormous flexibility. To illustrate,
consider three completely different distributions defined over three different domains of support. In the following, f(x) is the pdf of an Exponential(λ) random variable, g(x) is the pdf of a standard Normal, and h(x) is
the pdf of a Uniform(-1,1) random variable:
In[1]:=
&&
f = Exp[–x/λ]/λ ;
domain[f ] = {x, 0, ∞}
{λ > 0} ;
Exp(−x 2 )
In[2]:=
g=
In[3]:=
h = ½;
2π
;
domain[g ] = {x, -∞,
∞} ;
domain[h ] = {x, -1, 1}
;
mathStatica v2 provides a general methodology for working with such
problems. For example, the input:
OrderStat[2, {f, g, h}, {5, 6, 4}]
will return the exact pdf of the second order statistic given a random sample in which:
1346
{5 values are drawn from the Exponential, 6 values are drawn from the
Normal, and 4 from the Uniform}.
The same algorithm provides a neat way to solve problems such as
finding the pdf of min(X, Y, Z), when X, Y and Z have completely different
distributions and different domains of support. Thus, if
X ~ Exponential(λ), Y ~ Normal(0,1) and Z ~ Uniform(-1,1), the pdf of
min(X, Y, Z) can be simply obtained as the pdf of the first order statistic:
In[4]:=
OrderStat[1, {f, g, h}]
Out[4]=
⎧
⎪ e−x 2 / 2
⎪
⎪ 2π
⎪ −x 2 / 2
⎪e
(1− x) 1
+ 4 Erfc[ x2 ]
⎨
⎪ 12 2π
⎪ − 2 x(x + λ2 ) 2
−x 2 / 2
(1−
x)
λ
+
e
(1− x + λ )Erfc[ x2 ]
e
π
⎪
⎪
4λ
⎪⎩
(
Figure 2 plots the solution (here with λ=1):
)
x ≤ −1
−1 < x ≤ 0
0 < x <1
1347
sol
0.7
0.6
0.5
0.4
0.3
0.2
0.1
4
3
2
1
1
2
x
Fig. 2. PDF of min(X, Y, Z) when X ~ Exponential(1), Y ~ Normal(0,1) and
Z ~ Uniform(-1,1)
3. Piecewise products of random variables
mathStatica v2 provides new functionality for deriving products and ra-
tios of two or more independent continuous random variables. This is part
of a more general support for piecewise continuous functions. It extends
the excellent work of Glen, Leemis and Drew (2004) to provide support
for distributions with arbitrary symbolic parameters and to domains of
support that depend on arbitrary symbolic parameters.
To illustrate, let random variable X ~ Pareto(c, b) with pdf f(x), and let
random variable Y have a piecewise continuous pdf g(y) consisting of a
rectangular component defined over (α, β) for α < β < 0, and a ‘triangular’
component over (½, 1). We enter pdf’s f(x) and g(y) into the computer as:
In[1]:= f = cb x
c
−(c +1)
;
domain[f ] = {x, b, ∞} && {c > 0, b > 0} ;
1348
⎧ 2( β1−α )
In[2]:= g = ⎨
⎩ 4(1− y)
∞} && {α < β < 0} ;
α < y<β
1
2
< y <1
;
domain[g]=
{y, -∞,
We seek the pdf of V = X Y, and mathStatica’s new TransformProduct function provides the exact solution:
In[3]:=
TransformProduct[v, {f, g}]
Out[3]=
⎧
β c⎞
bα c ⎛
⎜
⎟
α
−
β
c
(
)
v
α ⎠
⎪
⎝
⎪− 2(c + 1)v(α − β )
⎪
⎪
⎪ ⎛⎜
⎞
bβ c
c
b
β
− v⎟
v
⎪ ⎝
⎠
⎪ 2b(c + 1)v(α − β )
⎪
⎨
⎪
⎪ 2−c v −c−1 (−2 c +2 c (−b(c + 2) + cv + v )v c +1 − b c +2c(c + 3))
⎪
b 2 (c + 1)(c + 2)
⎪
⎪
⎪
⎪ 2−c b c (−c + 2 c +2 − 3)cv −c−1
⎪
c 2 + 3c + 2
⎩
()
( )
v < bα
bα < v < bβ
b
<v<b
2
v>b
Figure 3 plots the solution, here with c = 2, b = 3, α = –3/2, and β = –½ .
1349
h
0.2
0.15
0.1
0.05
20
15
10
5
5
10
15
v
Fig. 3. PDF of V = X Y
The implementation of this algorithm is completely general, exact and
symbolic, so that it can be easily applied to essentially any such problem.
References
Glen, A. G., Leemis, L. M. and Drew, J. H. (2004), Computing the distribution of
the product of two continuous random variables, Computational Statistics &
Data Analysis, 44, 451–464.
mathStatica (2002-2006), http://www.mathStatica.com
Rose, C. and Smith, M.D. (2002), Mathematical Statistics with Mathematica,
Springer-Verlag: New York.
Rose, C. and Smith, M.D. (2005), Computational Order Statistics, The Mathematica Journal, 9(4), 790–802.
Download