Genome-wide association studies BNFO 602 Roshan Application of SNPs: association with disease • Experimental design to detect cancer associated SNPs: – Pick random humans with and without cancer (say breast cancer) – Perform SNP genotyping – Look for associated SNPs – Also called genome-wide association study Case-control example • Study of 100 people: – Case: 50 subjects with cancer – Control: 50 subjects without cancer • Count number of alleles and form a contingency table #Allele1 #Allele2 Case 10 90 Control 2 98 Odds ratio • Odds of allele 1 in cancer = a/b = e • Odds of allele 1 in healthy = c/d = f • Odds ratio of recessive in cancer vs healthy = e/f #Allele1 #Allele2 Cancer a b Healthy c d Example • Odds of allele 1 in case = 15/35 • Odds of allele 1 in control = 2/48 • Odds ratio of allele 1 in case vs control = (15/35)/(2/48) = 10.3 #Allele1 #Allele2 Case 15 35 Control 2 48 Statistical test of association (P-values) • P-value = probability of the observed data (or worse) under the null hypothesis • Example: – Suppose we are given a series of co in-tosses – We feel that a biased coin produced the tosses – We can ask the following question: what is the probability that a fair coin produced the tosses? – If this probability is very small then we can say there is a small chance that a fair coin produced the observed tosses. – In this example the null hypothesis is the fair coin and the alternative hypothesis is the biased coin Binomial distribution • Bernoulli random variable: – Two outcomes: success of failure – Example: coin toss • Binomial random variable: – Number of successes in a series of independent Bernoulli trials • Example: – Probability of heads=0.5 – Given four coin tosses what is the probability of three heads? – Possible outcomes: HHHT, HHTH HTHH, HHHT – Each outcome has probability = 0.5^4 – Total probability = 4 * 0.5^4 Binomial distribution • Bernoulli trial probability of success=p, probability of failure = 1-p • Given n independent Bernoulli trials what is the probability of k successes? n k nk p (1 p) k • Binomial applet: http://www.stat.tamu.edu/~west/applets/binomialdemo.html Hypothesis testing under Binomial hypothesis • Null hypothesis: fair coin (probability of heads = probability of tails = 0.5) • Data: HHHHTHTHHHHHHHTHTHTH • P-value under null hypothesis = probability that #heads >= 15 • This probability is 0.021 • Since it is below 0.05 we can reject the null hypothesis Null hypothesis for case control contingency table • We have two random variables: #allele1 #allele2 case a b control c d – X: disease status – A: allele type. • • Null hypothesis: the two variables are independent of each other (unrelated) Under independence – P(X=case and A=1)= P(X=case)P(A=1) • Expected number of cases with allele 1 is – P(X=case)P(A=1)N – where N is total observations • • • • P(X=case)=(a+b)/N P(A=1)=(a+c)/N What is expected number of controls with allele 2? Do the probabilities sum to 1? Chi-square statistic 2 (O E ) i 2 i Ei i1 n Oi = observed frequency for ith outcome Ei = expected frequency for ith outcome n = totaloutcomes The probability distribution of this statistic is given by the chi-square distribution with n-1 degrees of freedom. Proof can be found at http://ocw.mit.edu/NR/rdonlyres/Mathematics/18-443Fall2003/4226DF27-A1D0-4BB8-939A-B2A4167B5480/0/lec23.pdf Chi-square • Using chi-square we can test how well do observed values fit expected values computed under the independence hypothesis • We can also test for the data under multinomial or multivariate normal distribution with probabilities given by the independence assumption. This would require cumulative distribution functions of multinomial and multivariate normal which are hard to compute. • Chi-square p-values are easier to compute Case control E1: expected cases with allele 1 E2: expected cases with allele 2 E3: expected controls with allele 1 E4: expected controls with allele 2 N=a+b+c+d E1 = ((a+b)/N)((a+c)/N) N = (a+b)(a+c)/N E2 = (a+b)(b+d)/N E3 = (c+d)(a+c)/N E4 = (c+d)(b+d)/N Now compute chi-square statistic #allele1 #allele2 case a b control c d Chi-square statistic • Compute expected values and chi-square statistic • Compute chi-square p-value by referring to chi-square distribution #Allele1 #Allele2 Case 15 35 Control 2 48