Practical for GWAS course UCSC Exercise: Soft ware requirements

advertisement
Practical for GWAS course
UCSC Exercise:
Soft ware requirements: Internet access.
The learning objective is to provide knowledge of: i) Annotation categories. ii)
Visual data presentation possibilities. iii) Data output possibilities.
Guided tour of UCSC Browser:
Step 1: Go to the browser. http://genome.ucsc.edu/cgi-bin/hgGateway.
Step 2: Pick your favourite region (note the genome and assembly that you are
using).
Step 3: Zoom a little!
Step 4: Pick some annotations, tailor your annotations (squishy and multiples).
Competition: Pick your favourite gene/loci and make a pretty picture!
Guided tour of UCSC Tables via Galaxy:
Step 1: Go to Galaxy. https://main.g2.bx.psu.edu/.
Step 2: Get data: UCSC main table (select region, group, track, table).
Step 3: Call it something else?
Step 4: Operate with genomic intervals, intersect.
Competition: First person to tell me how many rows they get when they join
chr6:31233138-31243301 Group: Genes and Gene Prediction track, Track: UCSC
genes (Exons +/- 5), with Group: Variations and repeats, Track Conmon SNPS(135)
in assembly hg19.
NHGRI GWAS catalogue Quiz
Soft ware requirements: Internet access.
The learning objective is to provide knowledge of: i) Catalogue contents. ii) Possible
searches.
Step 1: Go to website: http://www.genome.gov/26525384
Step 2: Find your own GWAS or one you know about and check the data is correct.
Competition: How many GWAS studies pick up HLA-C with a -value <10-100?
Locus Zoom
Soft ware requirements: Internet access.
Files required: Bad_ rs1419074.txt, Good_ rs1419074.txt.
The learning objective is to use Locus Zoom to consider the evidence for association.
Step 1: Go to website: http://www.genome.gov/26525384
http://csg.sph.umich.edu/locuszoom/
Step 2: Upload the each file and plot data.
i)
Links: Plot using your data
ii)
Choose File, specify column names and delimiter, specify SNP
rs1419074 and “plot data”.
Upweighting of GWAS hits:
Soft ware requirements: plink, R.
Files required: Knight_data.ped, Knight_data.map, annotSmall.txt,
PLINK2wakefieldBF.R.
The learning objective is to demonstrate how to apply upweighting and visualize the
difference that is makes.
# Run GWAS to get OR.
# Open terminal window
# cd to folder with Knight_data.ped and Knight_data.map
path_to_plink/plink --file Knight_data --logistic --out Knight_results
# Use R script to get BF
setwd("Your/Path")
source("PLINK2wakefieldBF.R")
PLINK2wakefieldBF("Knight_results.assoc.logistic", test="ADD",
ORcrit=1.5, epsilon=0.05 )
# FYI
#PLINK2wakefieldBF.R
#Function to read in a PLINK logistic output file, filtered to leave a 1df SNP
component only, calculate BFassoc and save new file with new BF column added.
Output file = input filename with ".bf" added
#Function written by Mike Weale, King's College London. Version = 20 Dec 2010.
#Please refer to Knight J, Barnes MR, Breen G, Weale ME, "Using Functional
Annotation for the Empirical Determination of Bayes Factors for Genome-wide
Association Study Analysis", PLoS ONE, submitted.
#Method is the Approximate Bayes Factor method of Wakefield J (2007) AJHG 81:681-690
and Wakefield J (2009) Gen Epi 33:79-8
#file = string containing name of PLINK logistic output file
#test = string containing string used in "TEST" column to denote 1df SNP
coefficient. If specified, dataframe will first be filtered to contain only rows of
this type.
#ORcrit = prior upper limit of 100(1-epsilon)% range of OR values when SNP is causal
(default = 1.5)
#epsilon = proportion of OR values lying above ORcrit or below 1/ORcrit when SNP is
causal (default = 0.05)
PLINK2wakefieldBF = function( file, test="", ORcrit=1.5, epsilon=0.05 ) {
data = read.table( file=file, header=TRUE, as.is=TRUE, comment.char="" )
data=data[data$TEST==test,]
W = (log(ORcrit)/qnorm(1-epsilon/2))^2
theta = log(data$OR)
V = (theta/data$STAT)^2
BF = sqrt(V/(V+W))*exp(W*theta^2/2/V/(V+W))
write.table( cbind(data,BF), file=paste(file,".bf",sep=""), quote=FALSE,
row.names=FALSE, col.names=TRUE )
}
# Read p-values and BFs into R and merge.
BF <-read.table("Knight_results.assoc.logistic.bf", header=TRUE)
Annot<-read.table("annotSmall.txt", header=TRUE)
All<-merge(Annot, BF, by.x="rs", by.y= "SNP")
# Plot Annot BFs - Great they work….
plot(All$rs,All$BFAnnot,ylim=c(0,8))
# Upweight BFs according to annot
All$BFcomb<-All$BFAnnot*All$BF
# Put BFs on log10 scale
All$logBFcomb<-log10(All$BFcomb)
All$logBFassoc<-log10(All$BF)
# Plot Upweighted BFs….not very radically though…
plot(All$rs,All$logBFassoc,ylim=c(1,13), ylab="log10 BF")
par(new=T)
plot(All$rs,All$logBFcomb,ylim=c(1,13), ylab="log10 BF", col="red")
# Plot change…but at least they are going in the right direction!
All$BFchange<- All$logBFcomb - All$logBFassoc
plot(All$rs, All$BFchange)
Download