Testing for Marginal Independence Between Two Categorical

Testing for Marginal Independence Between Two Categorical Variables with Multiple Responses Robert Jeutong Outline • Introduction – Kansas Farmer Data – Notation • Modified Pearson Based Statistic – Nonparametric Bootstrap – Bootstrap p-Value Methods • Simulation Study • Conclusion Introduction • “pick any” (or pick any/c) or multiple-response categorical variables • Survey data arising from multiple-response categorical variables questions present a unique challenge for analysis because of the dependence among responses provided by individual subjects. • Testing for independence between two categorical variables is often of interest • When at least one of the categorical variables can have multiple responses, traditional Pearson chisquare tests for independence should not be used because of the within-subject dependence among responses Intro cont’d • A special kind of independence, called marginal independence, becomes of interest in the presence of multiple response categorical variables • The purpose of this article is to develop new approaches to the testing of marginal independence between two multiple-response categorical variables • Agresti and Liu (1999) call this a test for simultaneous pair wise marginal independence (SPMI) • The proposed tests are extensions to the traditional Pearson chi-square tests for independence testing between single-response categorical variables Kansas Farmer Data • Comes from Loughin (1998) and Agresti and Liu (1999) • Conducted by the Department of Animal Sciences at Kansas State University • Two questions in the survey asked Kansas farmers about their sources of veterinary information and their swine waste storage methods • Farmers were permitted to select as many responses as applied from a list of items Data cont’d • Interest lies in determining whether sources of veterinary information are independent of waste storage methods in a similar manner as would be done in a traditional Pearson chi-square test applied to a contingency table with singleresponse categorical variables • A test for SPMI can be performed to determine whether each source of veterinary information is simultaneously independent of each swine waste storage method Data cont’d • 4 × 5 = 20 different 2 × 2 tables can be formed to marginally summarize all possible responses to item pairs Professional consultant Lagoon 1 0 1 34 109 0 10 126 • Independence is tested in each of the 20 2 × 2 tables simultaneously for a test of SPMI Data cont’d • The test is marginal because responses are summed over the other item choices for each of the multiple-response categorical variables • If SPMI is rejected, examination of the individual 2 × 2 tables can follow to determine why the rejection occurs Notation • Let W and Y = multiple-response categorical variables for an r × c table’s row and column variables, respectively • Sources of veterinary information are denoted by Y and waste storage methods are denoted by W • The categories for each multiple-response categorical variable are called items (Agresti and Liu, 1999); For example, lagoon is one of the items for waste storage method • Suppose W has r items and Y has c items. Also, suppose n subjects are sampled at random Notation cont’d • Let Wsi = 1 if a positive response is given for item i by subject s for i = 1,.. ,r and s = 1,.. ,n; Wsi = 0 for a negative response. • Let Ysj for j = 1,.., c and s = 1..,n be similarly defined. • The abbreviated notation, Wi and Yj , refers generally to the binary response random variable for item i and j, respectively • The set of correlated binary item responses for subject s are • Ys = (Ys1, Ys2,…,Ysc) and Ws = (Ws1, Ws2,…,Wsr ) Notation cont’d • Cell counts in the joint table are denoted by ngh for the gth possible (W1…,Wr ) and hth possible (Y1…,Yc ) • The corresponding probability is denoted by τgh. Multinomial sampling is assumed to occur within the entire joint table; thus, ∑g,h τgh = 1 • Let mij denote the number of observed positive responses to Wi and Yj • The marginal probability of a positive response to Wi and Yj is denoted by πij and its maximum likelihood estimate (MLE) is mij/n. Joint Table SPMI Defined in Hypothesis • Ho: πij = πi•π•j for i = 1,...,r and j = 1,...,c • Ha: At least one equality does not hold • where πij = P(Wi = 1, Yj = 1), πi• = P(Wi = 1), and π•j = P(Yj = 1). This specifies marginal independence between each Wi and Yj pair • P(Wi = 1, Yj = 1) = πij • P(Wi = 1, Yj = 0) =πi• − πij • P(Wi = 0, Yj = 1) = π•j − πij • P(Wi =0, Yj = 0) = 1 − πi• − π•j + πij Hypothesis • SPMI can be written as ORWY,ij =1 for i = 1,…,r and j = 1,…,c where OR is the abbreviation for odds ratio and – ORWY,ij = πij(1 − πi• − π•j + πij)/[(πi• − πij)(π•j − πij )] • Therefore, SPMI represents simultaneous independence in the rc 2 × 2 pairwise item response tables formed for each Wi and Yj pair • Join independence implies SPMI but the reverse is not true Modified Pearson Statistic • Under the Null Yj Wi 1 0 1 πij πi• − πij π•i 0 π•j − πij 1 − πi• − π•j + πij 1-π•i π•j 1-π•j • (1,1), (1,0), (0,1), (1,1) The Statistic Nonparametric Bootstrap • To resample under independence of W and Y, Ws and Ys are independently resampled with replacement from the data set. • The test statistic calculated for the bth resample of size n is denoted by X2∗S,b. • The p-value is calculated as – B-1∑bI(X2∗S,b ≥X2S) • where B is the number of resamples taken and I() is the indicator function Bootstrap p-Value Combination Methods • Each X2S,i,j gives a test for independence between each Wi and Yj pair for i = 1,…,r and j = 1,…,c. The p-values from each of these tests (using a χ21 approximation) can be combined to form a new statistic p tilde • the product of the r×c p-values or the minimum of the r×c p-values could be used as p tilde • The p-value is calculated as – B-1∑bI(p* tilde ≤ p tilde) Results from the Farmer Data Method My p-value Authors p-value Bootstrap X2s 0.0001 <0.0001 Bootstrap product of p-values 0.0001 0.0001 Bootstrap minimum p-values 0.0047 0.0034 Interpretation and Follow-Up • The p-values show strong evidence against SPMI • Since X2S is the sum of rc different Pearson chi-square test statistics, each X2S,i,j can be used to measure why SPMI is rejected • The individual tests can be done using an asymptotic χ21 approximation or the estimated sampling distribution of the individual statistics calculated in the proposed bootstrap procedures • When this is done, the significant combinations are (Lagoon, pro consultant), (Lagoon, Veterinarian), (Pit, Veterinarian), (Pit, Feed companies & representatives), (Natural drainage, pro consultant), (Natural drainage, Magazines) Simulation Study • which testing procedures hold the correct size under a range of different situations and have power to detect various alternative hypotheses • 500 data sets for each simulation setting investigated • The SPMI testing methods are applied (B = 1000), and for each method the proportion of data sets are recorded for which SPMI is rejected at the 0.05 nominal level My Results • n=100 • 2×2 marginal table • OR = 25 Method My p-value Authors p-value Bootstrap X2s 0.04 0.056 Bootstrap product of p-values 0.042 0.056 Bootstrap minimum p-values 0.036 0.044 Conclusion • The bootstrap methods generally hold the correct size

Testing for Marginal Independence Between Two Categorical

Related documents

Products

Support

Testing for Marginal Independence Between Two Categorical

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib