1 Likert Scales and Multinomial Random Variables Updated 4/23/14 (See p.7) In this lecture we consider a generic random variable, X, with sample space S X {1,2,, k} . Denote its pdf as: f X ( x) Pr[ X x] p X ( x) . (1) Then for iid associated data collection variables X ( X 1 , X 2 ,, X n ) , we have n n j 1 j 1 f X ( x) f X j ( x j ) p X ( x j ) . We now define the random variables Y (Y1 , Y2 ,, Yk ) , where Yx (2) n I j 1 x ( X j ) , and where, here, the generic random variable I x (X ) is the indicator function associated with the value x. In other words, it entails two possible events: (i) [ I x ( X ) 1] [ X x] , and (ii) [ I x ( X ) 0] [ X x] . Clearly then, I x ( X ) ~ Ber [ p X ( x)] . From this observation it follows that Yx ~ Binomial [n, p X ( x)] . We now address the joint pdf of Y [Y1 ,Y2 ,,Yk ]tr . To begin, notice that Yx ”The act of recording how many of the k Y elements of X ( X 1 , X 2 ,, X n ) take on the value x”. Clearly, we have the condition x 1 x n . It should be clear from this condition that the elements of Y (Y1 , Y2 ,, Yk ) are not mutually independent. With a nontrivial amount of work [c.f. http://en.wikipedia.org/wiki/Multinomial_theorem ], it can be shown that the pdf of Y is: f Y (y ) n! p X ( x1 ) y1 p X ( x2 ) y2 p X ( xk ) yk . y1! y 2 ! y k ! The sample space for Y (Y1 , Y2 ,, Yk ) tr is S Y {( y1 , y2 ,, yk ) | yk {0,n} & (3). n y k n} k 1 The distribution (3) is called the multinomial distribution. [c.f. http://en.wikipedia.org/wiki/Multinomial_distribution ] The mean for Y: Since each Yx ~ Binomial [n, p X ( x)] , we immediately have μY E (Y) n p1 p2 pk 1 pk tr (4a) The covariance for Y: Again, since each Yx ~ Binomial [n, p X ( x)] , we immediately have Var(Yx ) n px (1 px ) . (4b) 2 The Cov(Yx1 , Yx2 ) quantity is a bit ‘trickier’, since, as noted above, these random variables are correlated. With some work, it can be shown that Cov(Yx1 ,Yx 2 ) n p x1 p x 2 . (4c) To obtain the expression for Σ Y , note that the off-diagonal elements of this k k matrix are exactly the off-diagonal elements of the matrix (1 / n ) μ YμtrY . The xth diagonal of this latter matrix is ( 1/ n) px2 (1/ n) Y2x ; whereas the xth diagonal of the matrix Σ Y is (4b). We can write this element as: Var(Yx ) n px (1 px ) npx npx2 Yx (1/ n)Y2x . Hence, if we define the k k matrix Μ Y diag{μ Y } , we arrive at: Σ Y Μ Y (1 / n) μ YμtrY . (5) Equation (5) is a concise expression. Furthermore, it highlights the fact that if any of the probabilities { p x }kx 1 are equal to zero, then the inverse of (5) does not exist. This is important in the contest of the central limit theorem (CLT), which states that for sufficiently large sample size, n, μ Y ~ N ( μ Y , (1 / n) Σ Y ) . Since the pdf of this estimator involves the inverse of (5), one must then acknowledge the need to take a pseudo-inverse of (5). From (4b) and (4c), we can obtain the correlation coefficient 12 Corr(Yx ,Yx ) 1 2 p x1 p x 2 (1 p x1 )(1 p x 2 ) . (6) Connection to the Central Limit Theorem (CLT): Once, again, recall that, marginally, for each x {1,2,, k} , we have Yx ~ Binomial (n, px ) . Hence, 1 n p x Yx / n X x( m ) X x ~ (1 / n ) Binomial [n, p x ) . n m 1 (7) Hence, from the CLT, we know that for sufficiently large n: Zx px px p x ~ N (0,1) where p2 x p x (1 p x ) n (8) Example 1. Often times a survey question answer will include the three options: {disagree, no opinion, agree} . In a survey of n persons, we will let Y1 = “The act of recording the number of persons who disagree, Y2 = “The act of recording the number of persons who agree, and Y3 = “The act of recording the number of persons who have no opinion. 3 The special case where p1 = p2 : We chose this case to address a situation that often occurs; namely where it appears that the population is evenly split on a given issue. For notational convenience, let p1 p2 p . This will be our null hypothesis, H0. It follows that: μY n p p Σ Y Μ Y (1 / n ) μ YμtrY n 0 0 0 p2 0 p2 p3 p3 p 0 p 0 p p3 tr (9a) p p2 p3 p p3 p n p 2 p3 p p32 p2 p2 p3 p p2 p p2 p3 p p3 p p3 p , or p3 p32 p3 1 p p . Σ p3 Y np p 1 p p3 p3 ( p3 p32 ) / p (9b) Now recall that the 3-D random variable Y [Y1 ,Y2 , Y3 ]tr can be written as Y [Y1 ,Y2 , n Y1 Y2 ]tr . And so for convenience in the following, we will redefine Y [Y1 ,Y2 ]tr . We then have μ Y n p1 1 tr 1 p p Y np . p 1 p and (10) Letting p (1 / n) Y , it follows that, under H 0 : tr E (p) p 1 1 and p 1 p p Cov(p) . n p 1 p (11) Now, consider the test statistic p1 p2 . Clearly, E ( ) p1 p2 , which, under H 0 is zero. To compute the variance of , recall that Var ( ) Var ( p1 p2 ) Var ( p1 ) Var ( p2 ) 2Cov( p1 , p2 ) . Specifically, under H 0 we have: 2 p(1 p) p(1 p) 2 p 2 n n n 2p . n (12) Finally, for sufficiently large n, we obtain, under H 0 : ~ N [0 , (2 p / n)] . (13) 4 Hence, for this situation hypothesis testing becomes very straightforward. Numerical Values: Suppose now that our null hypothesis is H 0 : p1 p2 0.40 , and that our alternative hypothesis is H1 : p1 p2 . Let the false alarm probability be 0.05 , and the sample size be n 1000 . Then under H 0 : ~ N (0 , (0.0008) . Equivalently, Z .0283 ~ N (0,1) . For 0.05 our threshold is zth 1.96 . Then we will announce H1 if | | (1.96)(. 0283) .0555 . Remark Recall that when conducting the above test in the situation where p1 and p2 are independent, the variance of 2 is: p(1 p) p(1 p) n n 2 p(1 p) . In the present case, however, we have n 13 Corr (Yx , Yx ) Corr ( p x , p x ) 1 3 1 3 p X ( x1 ) p X ( x3 ) p 0.4 0.67 . [1 p X ( x1 )][1 p X ( x3 )] 1 p 0.6 (14) This relatively high level of correlation will increase the standard deviation of by a factor of 1/ 1 p 1/ .6 1.29 . Simulations: For related ways (e.g. using Excel) to simulate a multinomial random variable, see http://en.wikipedia.org/wiki/Multinomial_distribution . Here, we will use the Matlab code ‘Likert1D_3.m’ given in the Appendix. For 103 simulations of the code, we obtain: mu_phat = 0.3980 0.2013 0.4008 cov_phat = 0.0048 -0.0015 -0.0034 -0.0015 0.0032 -0.0017 -0.0034 -0.0017 0.0051 and corrcoef(phat)= 1.0000 -0.3721 -0.6810 -0.3721 1.0000 -0.4263 -0.6810 -0.4263 1.0000 Notice that the estimated correlation coefficient value -0.6810 is close to the theoretical value -0.67 in (13). □ 5 Two-Dimensional Multinomial Random Variables Example 2. When negotiating with potential clients, negotiations often focus on cost. It is relatively easy to measure a potential client’s view of the cost of a project. In the case of building architecture, a major factor in cost is the longevity of the design. Suppose that you work for an architecture firm, and that you are considering carrying out a survey that would allow you to predict a potential client’s perceived longevity of a design, based on the cost. Prior to conducting the survey, you decide to run some simulations under various scenarios, in order to determine the survey sample size that would be needed to obtain trustworthy predictions. Define the random variables: “X = The act of recording a client’s perceived cost of a design” with events [X=1]=”cheap”, [X=2]= “moderate”, and [X=3]=”expensive”. and “Y = The act of recording the longevity of a design” with events [Y=1]=”short”, [X=2]= “medium”, and [X=3]=”long”. The sample space for ( X , Y ) is S ( X ,Y ) { ( x, y ) | x, y {1,2,3} } . This sample space is discrete. It includes 9 elements, with joint probabilities { p xy Pr[ X x Y y] | x, y {1,2,3} } . Now, in order to construct a given simulation you could specify the numerical values of 8 of these 9 joint probabilities. However, it is probably more natural to specify 8 other probabilities that are related to these. In this example, we will specify 6 conditional probabilities and 2 marginal probabilities. Consider the 3 3 matrix of conditional probabilities: p1|1 Pc p1|2 p1|3 p2|1 p2|2 p2|3 p3|1 p3|2 where p y|x Pr[Y y | X x] . p3|3 (15) Even though (15) includes 9 parameters, we only need to specify 6 of them. This is because Pr[Y {1,2,3} | X x] p1|x p2|x p3|x 1 . (16) Recall that p y|x Pr[Y y | X x] Define the joint probability matrix Pr[ X x Y y ] p x y . Pr[ X x] p X ( x) (17) 6 p11 P j p21 p31 p12 p22 p32 p13 p23 . p33 (18) To see the relationship between (16) and (18), define the probability matrix p X (1) PX 0 0 0 p X (2) 0 0 0 . p X (3) (19) Note that even though (16) includes 3 parameters, there are only two that need to be specified. And so, there are a total of 8 parameters that need to be specified in relation to (15) and (18). Having specified the 8 parameters, one could then program the simulations to use them directly. Alternatively, one could find the corresponding joint probabilities via the relation Pc P j PX ; Pj Pc PX1 (20) and write a program that generates simulations of ( X , Y ) using these joint probabilities. In this example we will use this approach. The Matlab code written for this purpose is For 104 simulations this code gave the following estimate of Pc P j PX : mu_Pygxhat = 0.4993 0.4011 0.0996 0.2008 0.5004 0.2988 0.0994 0.3006 0.6000 It also gave the true value of Pc P j PX : Pygx = 0.5000 0.4000 0.1000 0.2000 0.5000 0.3000 0.1000 0.3000 0.6000 The closeness of the above matrices suggests that the code is correct . The scatter plot below shows the relation between the estimators p1|1 Pr[Y 1 | X 1] and p2 |1 Pr[Y 2 | X 1] . 7 Scatter Plot of p1|1hat (x) vs. p2|1hat (y) 1 0.9 0.8 0.7 h p2|1 at 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 p1|1hat 0.6 0.7 0.8 0.9 1 Converting a Bivariate Likert Random Variable to a Univariate Random Variable The manner in which the code in the Appendix computed the conditional probabilities entailed a mapping of the 9 elements in S ( X, Y ) to the numbers 1 through 9. This suggests defining the random variable, call it W (W1 ,W2 ,,W9 )tr W, with a Likert scale {1,2,3} having the array of singleton probabilities: { p1 p11 , p2 p12 , , p8 p32 , p9 p33 } , 9 and such that W k 1 k n . Marginally, Wk ( i , j ) ~ binomial ( n, pk ( i j ) ) . Clearly, W is a univariate Likert random variable. And so we can take advantage of the properties of the multinomial pdf for the 1-9 scale variable. In particular, we now have μW E ( W) n p1 p2 p8 p9 tr and Σ W Μ W (1 / n ) μ WμtrW W diag {μ W } . where Μ (21) Hence, we are now in a position to carry out a variety of analyses, such as those addressed in Example 1. It should be noted, however, that we no longer have the natural 2-D format needed to construct a linear model. But we still can proceed as in the bivariate case by using equivalent events. For example, consider the equivalent events : 3 (i) [ X 1 x1 ] W1 W2 W3 x1 {( w1 , w2 , w3 ) | w3 x1} k 1 (ii) [Y2 y2 ] W2 W5 W8 y2 {( w2 , w5 , w8 ) | w2 w5 w8 y2 } 8 1 1 1 0 0 0 0 0 0 a1 . Then 0 1 0 0 1 0 0 1 0 a2 Define A X1 X X 1 x1 a1W x1 x1 Y y a W y AW y . 2 2 2 2 2 X1 Hence, we have: AW with 1 Aμ W and Cov AΣ W A tr Y Y 2 2 Y2 Clearly, the A matrix in (22) is unique to X 1 Y2 . For this reason, it is better denoted as A12 . Because we now have tr closed form expressions, (22), for the mean and covariance of X 1 Y2 , simulations are no longer needed. tr Example 2 continued. Theoretical Quantities: Pxy = 0.1000 0.0800 0.0200 0.1000 0.2500 0.1500 0.0300 0.1200 0.1500 fxy' = 0.1000 muW' = 0.0800 0.0200 0.1000 0.2500 0.1500 0.0300 0.1200 0.1500 5.0000 4.0000 1.0000 5.0000 12.5000 7.5000 1.5000 4.5000 9.000 CovW = 4.5000 -0.4000 -0.1000 -0.5000 -1.2500 -0.7500 -0.1500 -0.4500 -0.9000 -0.4000 3.6800 -0.0800 -0.4000 -1.0000 -0.6000 -0.1200 -0.3600 -0.7200 -0.1000 -0.0800 (22) 0.9800 -0.1000 -0.2500 -0.1500 -0.0300 -0.0900 -0.1800 -0.5000 -0.4000 -0.1000 4.5000 -1.2500 -0.7500 -0.1500 -0.4500 -0.9000 -1.2500 -1.0000 -0.2500 -1.2500 9.3750 -1.8750 -0.3750 -1.1250 -2.2500 -0.7500 -0.6000 -0.1500 -0.7500 -1.8750 6.3750 -0.2250 -0.6750 -1.3500 -0.1500 -0.1200 -0.0300 -0.1500 -0.3750 -0.2250 1.4550 -0.1350 -0.2700 -0.4500 -0.3600 -0.0900 -0.4500 -1.1250 -0.6750 -0.1350 4.0950 -0.8100 -0.9000 -0.7200 -0.1800 -0.9000 -2.2500 -1.3500 -0.2700 -0.8100 7.3800 9 muX1Y2'= 10.0000 21.0000 CovX1Y2 = 8.0000 -0.2000 -0.2000 12.1800 Running 30,000 simulations for N=50 sample size gives (‘likertsim.m’): muX1Y2 = 10 21 muhatX1Y2 = 9.9960 21.0194 CovX1Y2 = 8.0000 -0.2000 -0.2000 12.1800 CovhatX1Y2 = 7.9915 -0.2241 -0.2241 12.1827 Remark. Even for this simulation size the covariance between X1 and Y2 is notably variable! 10 Appendix Matlab Codes % PROGRAM NAME: Likert1D_3.m % Run m simulations of a 1-D multinomial random variable % with S = {1,2,3} for sample size n. %-------------------------------------------k = 3; %This is so the code can be more easily modified for k>3 % SPECIFY PROBABILITIES: p(1) = .4; p(2) = .2; % Compute CDF: F = zeros(1,k-1); F(1) = p(1); for j = 2:k-1 F(j) = F(j-1) + p(j); end % SPECIFY SAMLE SIZE: N = 50; %-------------------------------------------% GENERATE m SIMULATIONS, each of size n: NSIM = 10^3; phat = zeros(NSIM,k); %Initialize estimated proportions Y = k*ones(NSIM,k); %Initialize all counts to the value k %NOTE: The array Y is only needed to illustrate the raw survey results % If only phat is of concern, then better to not include Y, as it % can take a significant amount of memory. for m = 1:NSIM bc = zeros(1,k-1); %Initialze the three bin counts for n = 1:N u = rand(1,1); if u < F(1) Y(n,m) = 1; bc(1) = bc(1) + 1; else if u < F(2) Y(n,m) = 2; bc(2) = bc(2) + 1; end end end phat(m,:) = [bc(1)/N , bc(2)/N , 1-(bc(1)+bc(2))/N ]; end %============================================ % COMPUTE DISTRIBUTIONAL INFORMATION FOR phat: mu_phat = mean(phat) cov_phat = cov(phat) 11 % PROGRAM NAME: xyLikert.m % This code simulates n measurements of (X,Y) % where Sx = Sy = {1,2,3} = {low, medium, high} % X = cost of a design % Y = longevity of the design %============================ clear all % Specify matrix of conditional probabilities of Y=y|X=x: Pygx = [ .5 .4 .1; .2 .5 .3; .1 .3 .6]; %Rows must sum to 1.0 % Specify marginal probabilities for X: px = [.2 .5 .3]; Px = [px(1)*ones(1,3);px(2)*ones(1,3);px(3)*ones(1,3)]; % Compute joint probabiility matrix for (X,Y): Pxy = Pygx.*Px; % This is the 3x3 matrix of joint probabilities %================================================ % SIMULATIONS: % In order to simulate, the Uniform random number generator % will be used.The followng segment of code generates % the breakpoints of the 9 intervals that the interval [0 , 1] % will use: fxy = reshape(Pxy',9,1); %concantonate the rows of Pxy Fxy = zeros(9,1); Fxy(1) = fxy(1); for k = 2:9 Fxy(k) = Fxy(k-1) + fxy(k); % F(k) is the RIGHT end of the kth interval end %---------------------%---------------------nsim = 10^4; %Number of simulations N = 10^2; %Sample size for each simulation pygxhat_array = zeros(nsim,9); %Array of pygx estimates for nn = 1:nsim Bc = zeros(1,9); %Initialise the counts of the 9 bins for n = 1:N [q,kbin] = max(ceil(Fxy - rand(1,1))); Bc(kbin) = Bc(kbin) +1 ; end fxyhat = Bc/N; pxhat = [sum(fxyhat(1:3)) , sum(fxyhat(4:6)) , sum(fxyhat(7:9)) ]; pxhatvec = [pxhat(1)*ones(1,3),pxhat(2)*ones(1,3),pxhat(3)*ones(1,3)]; pygxhat_array(nn,:) = fxyhat./pxhatvec; end mu_pygxhat = mean(pygxhat_array); mu_Pygxhat = reshape(mu_pygxhat,3,3)' Pygx pause %====================================== % Investigation of the joint properties of 2 estimators p1g1 = pygxhat_array(:,1); % estimates of p1|1 p2g1 = pygxhat_array(:,2); % estimates of p2|1 figure(1) plot(p1g1,p2g1,'*') grid xlabel('p1|1_hat') ylabel('p2|1_hat') title('Scatter Plot of p1|1_hat (x) vs. p2|1_hat (y)')