Likert Scale and Multinomial pdf

advertisement
1
Likert Scales and Multinomial Random Variables
Updated 4/23/14 (See p.7)
In this lecture we consider a generic random variable, X, with sample space S X  {1,2,, k} . Denote its pdf as:

f X ( x)  Pr[ X  x]  p X ( x) .
(1)
Then for iid associated data collection variables X  ( X 1 , X 2 ,, X n ) , we have
n
n
j 1
j 1
f X ( x)   f X j ( x j )   p X ( x j ) .
We now define the random variables Y  (Y1 , Y2 ,, Yk ) , where Yx 
(2)
n
I
j 1
x
( X j ) , and where, here, the generic random
variable I x (X ) is the indicator function associated with the value x. In other words, it entails two possible events: (i)
[ I x ( X )  1]  [ X  x] , and (ii) [ I x ( X )  0]  [ X  x] . Clearly then, I x ( X ) ~ Ber [ p X ( x)] . From this observation it
follows that Yx ~ Binomial [n, p X ( x)] .
We now address the joint pdf of Y  [Y1 ,Y2 ,,Yk ]tr . To begin, notice that Yx  ”The act of recording how many of the
k
Y
elements of X  ( X 1 , X 2 ,, X n ) take on the value x”. Clearly, we have the condition
x 1
x
 n . It should be clear
from this condition that the elements of Y  (Y1 , Y2 ,, Yk ) are not mutually independent. With a nontrivial amount of
work [c.f. http://en.wikipedia.org/wiki/Multinomial_theorem ], it can be shown that the pdf of Y is:
f Y (y ) 
n!
p X ( x1 ) y1 p X ( x2 ) y2  p X ( xk ) yk .
y1! y 2 ! y k !
The sample space for Y  (Y1 , Y2 ,, Yk ) tr is S Y  {( y1 , y2 ,, yk ) | yk  {0,n} &
(3).
n
y
k
 n}
k 1
The distribution (3) is called the multinomial distribution. [c.f. http://en.wikipedia.org/wiki/Multinomial_distribution ]
The mean for Y:
Since each Yx ~ Binomial [n, p X ( x)] , we immediately have
μY  E (Y)  n p1
p2  pk 1
pk 
tr
(4a)
The covariance for Y:
Again, since each Yx ~ Binomial [n, p X ( x)] , we immediately have
Var(Yx )  n px (1  px ) .
(4b)
2
The Cov(Yx1 , Yx2 ) quantity is a bit ‘trickier’, since, as noted above, these random variables are correlated. With some
work, it can be shown that
Cov(Yx1 ,Yx 2 )   n p x1 p x 2 .
(4c)
To obtain the expression for Σ
 Y , note that the off-diagonal elements of this k  k matrix are exactly the off-diagonal
elements of the matrix (1 / n ) μ YμtrY . The xth diagonal of this latter matrix is ( 1/ n) px2  (1/ n) Y2x ; whereas the xth
diagonal of the matrix Σ
 Y is (4b). We can write this element as: Var(Yx )  n px (1  px )  npx  npx2  Yx  (1/ n)Y2x .

Hence, if we define the k  k matrix Μ
 Y  diag{μ Y } , we arrive at:
Σ
Y Μ
 Y  (1 / n) μ YμtrY .
(5)
Equation (5) is a concise expression. Furthermore, it highlights the fact that if any of the probabilities { p x }kx 1 are equal to
zero, then the inverse of (5) does not exist. This is important in the contest of the central limit theorem (CLT), which states


that for sufficiently large sample size, n, μ Y ~ N ( μ Y , (1 / n) Σ
 Y ) . Since the pdf of this estimator involves the inverse of
(5), one must then acknowledge the need to take a pseudo-inverse of (5).
From (4b) and (4c), we can obtain the correlation coefficient

12  Corr(Yx ,Yx )  
1
2
p x1 p x 2
(1  p x1 )(1  p x 2 )
.
(6)
Connection to the Central Limit Theorem (CLT):
Once, again, recall that, marginally, for each x  {1,2,, k} , we have Yx ~ Binomial (n, px ) . Hence,
1 n

p x  Yx / n   X x( m )  X x ~ (1 / n ) Binomial [n, p x ) .
n m 1
(7)
Hence, from the CLT, we know that for sufficiently large n:
Zx 

px  px
 p
x

~ N (0,1) where  p2 x 
p x (1  p x )
n
(8)
Example 1. Often times a survey question answer will include the three options: {disagree, no opinion, agree} . In a
survey of n persons, we will let Y1 = “The act of recording the number of persons who disagree, Y2 = “The act of
recording the number of persons who agree, and Y3 = “The act of recording the number of persons who have no opinion.
3
The special case where p1 = p2 :
We chose this case to address a situation that often occurs; namely where it appears that the population is evenly split on a

given issue. For notational convenience, let p1  p2  p . This will be our null hypothesis, H0. It follows that:
μY  n  p
 p

Σ Y  Μ
 Y  (1 / n ) μ YμtrY  n   0

 0

0   p2

0    p2

p3   p3 p
0
p
0
p
p3 
tr
(9a)
 p  p2
p3 p  


p3 p    n   p 2
  p3 p
p32  

p2
p2
p3 p
 p2
p  p2
 p3 p
 p3 p 

 p3 p  , or
p3  p32 
 p3
1  p  p


.
Σ
 p3
 Y  np  p 1  p


  p3  p3 ( p3  p32 ) / p 
(9b)
Now recall that the 3-D random variable Y  [Y1 ,Y2 , Y3 ]tr can be written as Y  [Y1 ,Y2 , n  Y1  Y2 ]tr . And so for

convenience in the following, we will redefine Y [Y1 ,Y2 ]tr . We then have
μ Y  n p1 1
tr
1  p  p 
 Y  np 
.
  p 1  p
and
(10)

Letting p  (1 / n) Y , it follows that, under H 0 :

tr
E (p)  p 1 1

and

  p  1  p  p 
Cov(p)    
.
 n    p 1  p
(11)


Now, consider the test statistic   p1  p2 . Clearly,   E ( )  p1  p2 , which, under H 0 is zero. To compute the

variance of  , recall that





 
Var ( )  Var ( p1  p2 )  Var ( p1 )  Var ( p2 )  2Cov( p1 , p2 ) .
Specifically, under H 0 we have:
 2 
p(1  p) p(1  p) 2 p 2


n
n
n

2p
.
n
(12)
Finally, for sufficiently large n, we obtain, under H 0 :



~
N [0 , (2 p / n)] .
(13)
4
Hence, for this situation hypothesis testing becomes very straightforward. 
Numerical Values: Suppose now that our null hypothesis is H 0 :
p1  p2  0.40 , and that our alternative hypothesis is
H1 : p1  p2 . Let the false alarm probability be   0.05 , and the sample size be n  1000 . Then under H 0 :



~

N (0 , (0.0008) . Equivalently, Z 


.0283

~
N (0,1) . For   0.05 our threshold is zth  1.96 . Then we
will announce H1 if |  |  (1.96)(. 0283)  .0555 .



Remark Recall that when conducting the above test in the situation where p1 and p2 are independent, the variance of 
2
is:   
p(1  p) p(1  p)

n
n

2 p(1  p)
. In the present case, however, we have
n



13  Corr (Yx , Yx )  Corr ( p x , p x ) 
1
3
1
3
p X ( x1 ) p X ( x3 )
p
0.4


 0.67 .
[1  p X ( x1 )][1  p X ( x3 )] 1  p 0.6
(14)

This relatively high level of correlation will increase the standard deviation of  by a factor of 1/ 1  p  1/ .6  1.29
.
Simulations: For related ways (e.g. using Excel) to simulate a multinomial random variable, see
http://en.wikipedia.org/wiki/Multinomial_distribution . Here, we will use the Matlab code ‘Likert1D_3.m’ given in the
Appendix.
For 103 simulations of the code, we obtain:
mu_phat = 0.3980
0.2013 0.4008
cov_phat =
0.0048 -0.0015 -0.0034
-0.0015
0.0032 -0.0017
-0.0034 -0.0017
0.0051
and
corrcoef(phat)=
1.0000 -0.3721 -0.6810
-0.3721
1.0000 -0.4263
-0.6810 -0.4263
1.0000
Notice that the estimated correlation coefficient value -0.6810 is close to the theoretical value -0.67 in (13). □
5
Two-Dimensional Multinomial Random Variables
Example 2. When negotiating with potential clients, negotiations often focus on cost. It is relatively easy to measure a
potential client’s view of the cost of a project. In the case of building architecture, a major factor in cost is the longevity of
the design.
Suppose that you work for an architecture firm, and that you are considering carrying out a survey that would allow you to
predict a potential client’s perceived longevity of a design, based on the cost. Prior to conducting the survey, you decide to
run some simulations under various scenarios, in order to determine the survey sample size that would be needed to obtain
trustworthy predictions.
Define the random variables:
“X = The act of recording a client’s perceived cost of a design” with events [X=1]=”cheap”, [X=2]= “moderate”, and
[X=3]=”expensive”.
and
“Y = The act of recording the longevity of a design” with events [Y=1]=”short”, [X=2]= “medium”, and [X=3]=”long”.
The sample space for ( X , Y ) is S ( X ,Y )  { ( x, y ) | x, y {1,2,3} } . This sample space is discrete. It includes 9 elements,

with joint probabilities { p xy  Pr[ X  x  Y  y] | x, y {1,2,3} } .
Now, in order to construct a given simulation you could specify the numerical values of 8 of these 9 joint probabilities.
However, it is probably more natural to specify 8 other probabilities that are related to these. In this example, we will
specify 6 conditional probabilities and 2 marginal probabilities.
Consider the 3 3 matrix of conditional probabilities:
 p1|1

Pc   p1|2
 p1|3


p2|1
p2|2
p2|3
p3|1 


p3|2  where p y|x  Pr[Y  y | X  x] .
p3|3 
(15)
Even though (15) includes 9 parameters, we only need to specify 6 of them. This is because
Pr[Y {1,2,3} | X  x]  p1|x  p2|x  p3|x  1 .
(16)
Recall that

p y|x  Pr[Y  y | X  x] 
Define the joint probability matrix
Pr[ X  x  Y  y ]  p x y

.
Pr[ X  x]
p X ( x)
(17)
6
 p11
P j   p21
 p31
p12

p22
p32
p13 
p23  .
p33 
(18)
To see the relationship between (16) and (18), define the probability matrix
 p X (1)
PX   0
 0
0

p X (2)
0
0 
0  .
p X (3)
(19)
Note that even though (16) includes 3 parameters, there are only two that need to be specified. And so, there are a total of
8 parameters that need to be specified in relation to (15) and (18).
Having specified the 8 parameters, one could then program the simulations to use them directly. Alternatively, one could
find the corresponding joint probabilities via the relation
Pc
 P j PX
;
Pj
 Pc PX1
(20)
and write a program that generates simulations of ( X , Y ) using these joint probabilities. In this example we will use this
approach. The Matlab code written for this purpose is
For 104 simulations this code gave the following estimate of Pc
 P j PX :
mu_Pygxhat =
0.4993 0.4011 0.0996
0.2008 0.5004 0.2988
0.0994 0.3006 0.6000
It also gave the true value of Pc
 P j PX :
Pygx =
0.5000 0.4000 0.1000
0.2000 0.5000 0.3000
0.1000 0.3000 0.6000
The closeness of the above matrices suggests that the code is correct .


The scatter plot below shows the relation between the estimators p1|1  Pr[Y  1 | X  1] and p2 |1  Pr[Y  2 | X  1] .
7
Scatter Plot of p1|1hat (x) vs. p2|1hat (y)
1
0.9
0.8
0.7
h
p2|1 at
0.6
0.5
0.4
0.3
0.2
0.1
0
0
0.1
0.2
0.3
0.4
0.5
p1|1hat
0.6
0.7
0.8
0.9
1
Converting a Bivariate Likert Random Variable to a Univariate Random Variable
The manner in which the code in the Appendix computed the conditional probabilities entailed a mapping of the 9
elements in S ( X, Y ) to the numbers 1 through 9. This suggests defining the random variable, call it W  (W1 ,W2 ,,W9 )tr




W, with a Likert scale {1,2,3} having the array of singleton probabilities: { p1  p11 , p2  p12 , , p8  p32 , p9  p33 } ,
9
and such that
W
k 1
k
 n . Marginally, Wk ( i , j ) ~ binomial ( n, pk ( i j ) ) . Clearly, W is a univariate Likert random variable.
And so we can take advantage of the properties of the multinomial pdf for the 1-9 scale variable. In particular, we now
have
μW  E ( W)  n p1
p2  p8
p9 
tr
and
Σ
W Μ
 W  (1 / n ) μ WμtrW

 W  diag {μ W } .
where Μ
(21)
Hence, we are now in a position to carry out a variety of analyses, such as those addressed in Example 1. It should be
noted, however, that we no longer have the natural 2-D format needed to construct a linear model. But we still can
proceed as in the bivariate case by using equivalent events. 
For example, consider the equivalent events :
3
(i)
[ X 1  x1 ]  W1  W2  W3  x1   {( w1 , w2 , w3 ) |  w3  x1}
k 1
(ii)
[Y2  y2 ]  W2  W5  W8  y2   {( w2 , w5 , w8 ) | w2  w5  w8  y2 }
8
1 1 1 0 0 0 0 0 0  a1 
    . Then
0 1 0 0 1 0 0 1 0 a2 
Define A  
 X1 
 X 
 X 1  x1   a1W  x1  
 x1 
Y  y   a W  y   AW   y  .
2
2
 2
 2
 2 

 X1
Hence, we have:    AW with  1   Aμ W and Cov    AΣ
 W A tr

Y
Y
 2
 2
 Y2 
Clearly, the A matrix in (22) is unique to X 1
Y2  . For this reason, it is better denoted as A12 . Because we now have
tr
closed form expressions, (22), for the mean and covariance of X 1
Y2  , simulations are no longer needed.
tr
Example 2 continued.
Theoretical Quantities:
Pxy =
0.1000 0.0800 0.0200
0.1000 0.2500 0.1500
0.0300 0.1200 0.1500
fxy' =
0.1000
muW' =
0.0800 0.0200 0.1000 0.2500 0.1500 0.0300 0.1200 0.1500
5.0000
4.0000 1.0000 5.0000 12.5000 7.5000 1.5000 4.5000 9.000
CovW =
4.5000 -0.4000 -0.1000 -0.5000 -1.2500 -0.7500 -0.1500 -0.4500 -0.9000
-0.4000
3.6800 -0.0800 -0.4000 -1.0000 -0.6000 -0.1200 -0.3600 -0.7200
-0.1000 -0.0800
(22)
0.9800 -0.1000 -0.2500 -0.1500 -0.0300 -0.0900 -0.1800
-0.5000 -0.4000 -0.1000 4.5000 -1.2500 -0.7500 -0.1500 -0.4500 -0.9000
-1.2500 -1.0000 -0.2500 -1.2500 9.3750 -1.8750 -0.3750 -1.1250 -2.2500
-0.7500 -0.6000 -0.1500 -0.7500 -1.8750 6.3750 -0.2250 -0.6750 -1.3500
-0.1500 -0.1200 -0.0300 -0.1500 -0.3750 -0.2250 1.4550 -0.1350 -0.2700
-0.4500 -0.3600 -0.0900 -0.4500 -1.1250 -0.6750 -0.1350 4.0950 -0.8100
-0.9000 -0.7200 -0.1800 -0.9000 -2.2500 -1.3500 -0.2700 -0.8100 7.3800
9
muX1Y2'= 10.0000 21.0000
CovX1Y2 =
8.0000 -0.2000
-0.2000 12.1800
Running 30,000 simulations for N=50 sample size gives (‘likertsim.m’):
muX1Y2 =
10
21
muhatX1Y2 = 9.9960 21.0194
CovX1Y2 =
8.0000 -0.2000
-0.2000 12.1800
CovhatX1Y2 =
7.9915 -0.2241
-0.2241 12.1827
Remark. Even for this simulation size the covariance between X1 and Y2 is notably variable!
10
Appendix
Matlab Codes
% PROGRAM NAME: Likert1D_3.m
% Run m simulations of a 1-D multinomial random variable
% with S = {1,2,3} for sample size n.
%-------------------------------------------k = 3; %This is so the code can be more easily modified for k>3
% SPECIFY PROBABILITIES:
p(1) = .4;
p(2) = .2;
% Compute CDF:
F = zeros(1,k-1);
F(1) = p(1);
for j = 2:k-1
F(j) = F(j-1) + p(j);
end
% SPECIFY SAMLE SIZE:
N = 50;
%-------------------------------------------% GENERATE m SIMULATIONS, each of size n:
NSIM = 10^3;
phat = zeros(NSIM,k);
%Initialize estimated proportions
Y = k*ones(NSIM,k); %Initialize all counts to the value k
%NOTE: The array Y is only needed to illustrate the raw survey results
%
If only phat is of concern, then better to not include Y, as it
%
can take a significant amount of memory.
for m = 1:NSIM
bc = zeros(1,k-1); %Initialze the three bin counts
for n = 1:N
u = rand(1,1);
if u < F(1)
Y(n,m) = 1; bc(1) = bc(1) + 1;
else if u < F(2)
Y(n,m) = 2; bc(2) = bc(2) + 1;
end
end
end
phat(m,:) = [bc(1)/N , bc(2)/N , 1-(bc(1)+bc(2))/N ];
end
%============================================
% COMPUTE DISTRIBUTIONAL INFORMATION FOR phat:
mu_phat = mean(phat)
cov_phat = cov(phat)
11
% PROGRAM NAME: xyLikert.m
% This code simulates n measurements of (X,Y)
% where Sx = Sy = {1,2,3} = {low, medium, high}
% X = cost of a design
% Y = longevity of the design
%============================
clear all
% Specify matrix of conditional probabilities of Y=y|X=x:
Pygx = [ .5 .4 .1; .2 .5 .3; .1 .3 .6]; %Rows must sum to 1.0
% Specify marginal probabilities for X:
px = [.2 .5 .3];
Px = [px(1)*ones(1,3);px(2)*ones(1,3);px(3)*ones(1,3)];
% Compute joint probabiility matrix for (X,Y):
Pxy = Pygx.*Px; % This is the 3x3 matrix of joint probabilities
%================================================
% SIMULATIONS:
% In order to simulate, the Uniform random number generator
% will be used.The followng segment of code generates
% the breakpoints of the 9 intervals that the interval [0 , 1]
% will use:
fxy = reshape(Pxy',9,1); %concantonate the rows of Pxy
Fxy = zeros(9,1);
Fxy(1) = fxy(1);
for k = 2:9
Fxy(k) = Fxy(k-1) + fxy(k); % F(k) is the RIGHT end of the kth interval
end
%---------------------%---------------------nsim = 10^4; %Number of simulations
N = 10^2;
%Sample size for each simulation
pygxhat_array = zeros(nsim,9); %Array of pygx estimates
for nn = 1:nsim
Bc = zeros(1,9); %Initialise the counts of the 9 bins
for n = 1:N
[q,kbin] = max(ceil(Fxy - rand(1,1)));
Bc(kbin) = Bc(kbin) +1 ;
end
fxyhat = Bc/N;
pxhat = [sum(fxyhat(1:3)) , sum(fxyhat(4:6)) , sum(fxyhat(7:9)) ];
pxhatvec = [pxhat(1)*ones(1,3),pxhat(2)*ones(1,3),pxhat(3)*ones(1,3)];
pygxhat_array(nn,:) = fxyhat./pxhatvec;
end
mu_pygxhat = mean(pygxhat_array);
mu_Pygxhat = reshape(mu_pygxhat,3,3)'
Pygx
pause
%======================================
% Investigation of the joint properties of 2 estimators
p1g1 = pygxhat_array(:,1); % estimates of p1|1
p2g1 = pygxhat_array(:,2); % estimates of p2|1
figure(1)
plot(p1g1,p2g1,'*')
grid
xlabel('p1|1_hat')
ylabel('p2|1_hat')
title('Scatter Plot of p1|1_hat (x) vs. p2|1_hat (y)')
Download