mathStatica: Symbolic Computational Statistics Colin ROSE1 and Murray D. SMITH2 1 Theoretical Research Institute, Sydney, AUSTRALIA Email: colin@tri.org.au 2 Econometrics and Business Statistics, The University of Sydney, AUSTRALIA Email: murray.smith@econ.usyd.edu.au Summary. mathStatica is a general toolset for doing exact (symbolic) mathematical statistics with a computer algebra system. It provides automated statistical operators for taking expectations, finding probabilities, deriving transformations of random variables, finding moments, cumulative distribution functions, characteristic functions, and other generating functions – all for arbitrary user-defined distributions. mathStatica v1 accompanies the book: Rose and Smith (2002). This paper illustrates some of the latest algorithms/functions that are forthcoming in mathStatica v2 (2007). In particular, the paper highlights new algorithms in automated transformations of random variables, many-to-one transformations such as deriving the pdf of min(X, Y, Z, …), piecewise functions, and order statistics with non-identical parent distributions. Keywords: computational statistics, order statistics, transformations, mathStatica 1. Order Statistics Let random variable X have, say, a Logistic distribution with pdf f(x) which we enter into mathStatica as: 1344 f = In[1]:= e−x ; (1+ e−x ) 2 domain[f ] = {x, –∞, ∞}; Let (X1, X2, …, Xn) denote a random sample of size n drawn on X, and let (X , X , …, X ) denote the ordered sample so that X < X < … < (1) (2) (n) (1) (2) X(n). The pdf of the rth order statistic, X(r), is given immediately by the mathStatica function: In[2]:= OrderStat[r, f ] Out[2]= (1+ e−x )−r (1+ e x )−1−n +r n! (n − r)!(r −1)! The line above labelled In[2] is the mathStatica input we enter into the computer, namely OrderStat[r, f ]. The line labelled Out[2] is the output returned by mathStatica. Figure 1 below illustrates the solution. 0.6 0.5 0.4 0.3 r1 r 10 0.2 0.1 6 4 2 2 4 6 Fig. 1. PDF of the rth order statistic (Logistic), as r increases from 1 to 10, given a sample size of n=10 x 1345 In similar fashion, we can derive the joint pdf of two or more order statistics. Here is the joint pdf of X(r) and X(s), for r < s: OrderStat[{r, s}, f ] In[3]:= Out[3]= s−r ⎛⎜ 1 1 ⎞ ⎟ e (1+ e ) (1+ e ) − Γ[1+ n] ⎝ 1+e xr 1+e xs ⎠ (e xs − e x r ) Γ[r] Γ[1+ n − s] Γ[s − r] xs −x r −r x s −1−n +s 2. Non-identical distributions: pdf of min(X, Y, Z) mathStatica v2 generalises the above order statistic functionality to non- identical distributions. This provides enormous flexibility. To illustrate, consider three completely different distributions defined over three different domains of support. In the following, f(x) is the pdf of an Exponential(λ) random variable, g(x) is the pdf of a standard Normal, and h(x) is the pdf of a Uniform(-1,1) random variable: In[1]:= && f = Exp[–x/λ]/λ ; domain[f ] = {x, 0, ∞} {λ > 0} ; Exp(−x 2 ) In[2]:= g= In[3]:= h = ½; 2π ; domain[g ] = {x, -∞, ∞} ; domain[h ] = {x, -1, 1} ; mathStatica v2 provides a general methodology for working with such problems. For example, the input: OrderStat[2, {f, g, h}, {5, 6, 4}] will return the exact pdf of the second order statistic given a random sample in which: 1346 {5 values are drawn from the Exponential, 6 values are drawn from the Normal, and 4 from the Uniform}. The same algorithm provides a neat way to solve problems such as finding the pdf of min(X, Y, Z), when X, Y and Z have completely different distributions and different domains of support. Thus, if X ~ Exponential(λ), Y ~ Normal(0,1) and Z ~ Uniform(-1,1), the pdf of min(X, Y, Z) can be simply obtained as the pdf of the first order statistic: In[4]:= OrderStat[1, {f, g, h}] Out[4]= ⎧ ⎪ e−x 2 / 2 ⎪ ⎪ 2π ⎪ −x 2 / 2 ⎪e (1− x) 1 + 4 Erfc[ x2 ] ⎨ ⎪ 12 2π ⎪ − 2 x(x + λ2 ) 2 −x 2 / 2 (1− x) λ + e (1− x + λ )Erfc[ x2 ] e π ⎪ ⎪ 4λ ⎪⎩ ( Figure 2 plots the solution (here with λ=1): ) x ≤ −1 −1 < x ≤ 0 0 < x <1 1347 sol 0.7 0.6 0.5 0.4 0.3 0.2 0.1 4 3 2 1 1 2 x Fig. 2. PDF of min(X, Y, Z) when X ~ Exponential(1), Y ~ Normal(0,1) and Z ~ Uniform(-1,1) 3. Piecewise products of random variables mathStatica v2 provides new functionality for deriving products and ra- tios of two or more independent continuous random variables. This is part of a more general support for piecewise continuous functions. It extends the excellent work of Glen, Leemis and Drew (2004) to provide support for distributions with arbitrary symbolic parameters and to domains of support that depend on arbitrary symbolic parameters. To illustrate, let random variable X ~ Pareto(c, b) with pdf f(x), and let random variable Y have a piecewise continuous pdf g(y) consisting of a rectangular component defined over (α, β) for α < β < 0, and a ‘triangular’ component over (½, 1). We enter pdf’s f(x) and g(y) into the computer as: In[1]:= f = cb x c −(c +1) ; domain[f ] = {x, b, ∞} && {c > 0, b > 0} ; 1348 ⎧ 2( β1−α ) In[2]:= g = ⎨ ⎩ 4(1− y) ∞} && {α < β < 0} ; α < y<β 1 2 < y <1 ; domain[g]= {y, -∞, We seek the pdf of V = X Y, and mathStatica’s new TransformProduct function provides the exact solution: In[3]:= TransformProduct[v, {f, g}] Out[3]= ⎧ β c⎞ bα c ⎛ ⎜ ⎟ α − β c ( ) v α ⎠ ⎪ ⎝ ⎪− 2(c + 1)v(α − β ) ⎪ ⎪ ⎪ ⎛⎜ ⎞ bβ c c b β − v⎟ v ⎪ ⎝ ⎠ ⎪ 2b(c + 1)v(α − β ) ⎪ ⎨ ⎪ ⎪ 2−c v −c−1 (−2 c +2 c (−b(c + 2) + cv + v )v c +1 − b c +2c(c + 3)) ⎪ b 2 (c + 1)(c + 2) ⎪ ⎪ ⎪ ⎪ 2−c b c (−c + 2 c +2 − 3)cv −c−1 ⎪ c 2 + 3c + 2 ⎩ () ( ) v < bα bα < v < bβ b <v<b 2 v>b Figure 3 plots the solution, here with c = 2, b = 3, α = –3/2, and β = –½ . 1349 h 0.2 0.15 0.1 0.05 20 15 10 5 5 10 15 v Fig. 3. PDF of V = X Y The implementation of this algorithm is completely general, exact and symbolic, so that it can be easily applied to essentially any such problem. References Glen, A. G., Leemis, L. M. and Drew, J. H. (2004), Computing the distribution of the product of two continuous random variables, Computational Statistics & Data Analysis, 44, 451–464. mathStatica (2002-2006), http://www.mathStatica.com Rose, C. and Smith, M.D. (2002), Mathematical Statistics with Mathematica, Springer-Verlag: New York. Rose, C. and Smith, M.D. (2005), Computational Order Statistics, The Mathematica Journal, 9(4), 790–802.