Practical for GWAS course UCSC Exercise: Soft ware requirements: Internet access. The learning objective is to provide knowledge of: i) Annotation categories. ii) Visual data presentation possibilities. iii) Data output possibilities. Guided tour of UCSC Browser: Step 1: Go to the browser. http://genome.ucsc.edu/cgi-bin/hgGateway. Step 2: Pick your favourite region (note the genome and assembly that you are using). Step 3: Zoom a little! Step 4: Pick some annotations, tailor your annotations (squishy and multiples). Competition: Pick your favourite gene/loci and make a pretty picture! Guided tour of UCSC Tables via Galaxy: Step 1: Go to Galaxy. https://main.g2.bx.psu.edu/. Step 2: Get data: UCSC main table (select region, group, track, table). Step 3: Call it something else? Step 4: Operate with genomic intervals, intersect. Competition: First person to tell me how many rows they get when they join chr6:31233138-31243301 Group: Genes and Gene Prediction track, Track: UCSC genes (Exons +/- 5), with Group: Variations and repeats, Track Conmon SNPS(135) in assembly hg19. NHGRI GWAS catalogue Quiz Soft ware requirements: Internet access. The learning objective is to provide knowledge of: i) Catalogue contents. ii) Possible searches. Step 1: Go to website: http://www.genome.gov/26525384 Step 2: Find your own GWAS or one you know about and check the data is correct. Competition: How many GWAS studies pick up HLA-C with a -value <10-100? Locus Zoom Soft ware requirements: Internet access. Files required: Bad_ rs1419074.txt, Good_ rs1419074.txt. The learning objective is to use Locus Zoom to consider the evidence for association. Step 1: Go to website: http://www.genome.gov/26525384 http://csg.sph.umich.edu/locuszoom/ Step 2: Upload the each file and plot data. i) Links: Plot using your data ii) Choose File, specify column names and delimiter, specify SNP rs1419074 and “plot data”. Upweighting of GWAS hits: Soft ware requirements: plink, R. Files required: Knight_data.ped, Knight_data.map, annotSmall.txt, PLINK2wakefieldBF.R. The learning objective is to demonstrate how to apply upweighting and visualize the difference that is makes. # Run GWAS to get OR. # Open terminal window # cd to folder with Knight_data.ped and Knight_data.map path_to_plink/plink --file Knight_data --logistic --out Knight_results # Use R script to get BF setwd("Your/Path") source("PLINK2wakefieldBF.R") PLINK2wakefieldBF("Knight_results.assoc.logistic", test="ADD", ORcrit=1.5, epsilon=0.05 ) # FYI #PLINK2wakefieldBF.R #Function to read in a PLINK logistic output file, filtered to leave a 1df SNP component only, calculate BFassoc and save new file with new BF column added. Output file = input filename with ".bf" added #Function written by Mike Weale, King's College London. Version = 20 Dec 2010. #Please refer to Knight J, Barnes MR, Breen G, Weale ME, "Using Functional Annotation for the Empirical Determination of Bayes Factors for Genome-wide Association Study Analysis", PLoS ONE, submitted. #Method is the Approximate Bayes Factor method of Wakefield J (2007) AJHG 81:681-690 and Wakefield J (2009) Gen Epi 33:79-8 #file = string containing name of PLINK logistic output file #test = string containing string used in "TEST" column to denote 1df SNP coefficient. If specified, dataframe will first be filtered to contain only rows of this type. #ORcrit = prior upper limit of 100(1-epsilon)% range of OR values when SNP is causal (default = 1.5) #epsilon = proportion of OR values lying above ORcrit or below 1/ORcrit when SNP is causal (default = 0.05) PLINK2wakefieldBF = function( file, test="", ORcrit=1.5, epsilon=0.05 ) { data = read.table( file=file, header=TRUE, as.is=TRUE, comment.char="" ) data=data[data$TEST==test,] W = (log(ORcrit)/qnorm(1-epsilon/2))^2 theta = log(data$OR) V = (theta/data$STAT)^2 BF = sqrt(V/(V+W))*exp(W*theta^2/2/V/(V+W)) write.table( cbind(data,BF), file=paste(file,".bf",sep=""), quote=FALSE, row.names=FALSE, col.names=TRUE ) } # Read p-values and BFs into R and merge. BF <-read.table("Knight_results.assoc.logistic.bf", header=TRUE) Annot<-read.table("annotSmall.txt", header=TRUE) All<-merge(Annot, BF, by.x="rs", by.y= "SNP") # Plot Annot BFs - Great they work…. plot(All$rs,All$BFAnnot,ylim=c(0,8)) # Upweight BFs according to annot All$BFcomb<-All$BFAnnot*All$BF # Put BFs on log10 scale All$logBFcomb<-log10(All$BFcomb) All$logBFassoc<-log10(All$BF) # Plot Upweighted BFs….not very radically though… plot(All$rs,All$logBFassoc,ylim=c(1,13), ylab="log10 BF") par(new=T) plot(All$rs,All$logBFcomb,ylim=c(1,13), ylab="log10 BF", col="red") # Plot change…but at least they are going in the right direction! All$BFchange<- All$logBFcomb - All$logBFassoc plot(All$rs, All$BFchange)