docx

advertisement
PLINK tutorial
1/20/2011 Erin Smith with John Kelsoe
Goals:
1.
2.
3.
4.
Run a GWAS on cleaned data for multiple phenotypes in PLINK
Visualize results with Manhattan and Q-Q plots.
Look at LD structure of regions of interest with Haploview
Make a regional plot of the P-values using SNAP plot
Websites of interest:
PLINK: http://pngu.mgh.harvard.edu/~purcell/plink/
A Catalog of Published Genome-Wide Association Studies:
http://www.genome.gov/gwastudies/
UCSC Genome browser: http://genome.ucsc.edu/
dbSNP: http://www.ncbi.nlm.nih.gov/projects/SNP/
dbGaP: http://www.ncbi.nlm.nih.gov/sites/entrez?Db=gap
Haploview: http://www.broadinstitute.org/mpg/haploview
SNAP plot: http://www.broadinstitute.org/mpg/snap/doc.php
Setting the PATH environment variable
After placing PLINK in a convenient place, put the location in your environment
path to make it easier to call.
This process is temporary and will only work for the current window.
“PLINK_location” is the folder where PLINK is located.
Windows in a command prompt:
echo %PATH% (see what is now in the path)
path = C:\PLINK_location;%PATH%
Mac in a terminal window:
echo $PATH
export PATH=/PLINK_location:$PATH
Introduction to data formats in PLINK:
PED & MAP
BED, BIM, & FAM
Additional phenotype files
Exercise: Look at example.ped, example.map, example.bim,
example.fam, and phenotypes.txt and figure out what they are
Exercise: Reading in files: use --bfile to read in the example and bipolar
BED/BIM/FAM filesets – how many individuals are in the datasets? How
many SNPs? What is genotyping rate?
For Windows, use plink.exe, for Mac, use plink
plink --bfile example
plink --bfile bipolar
Manipulating the data files
Get only the genotypes for a single chromosome or a region around a
SNP
--chr 13
Exercise: Get data from chromosome 13 and write to a new BED file. If
you are having trouble running the full dataset, you can use this fileset
instead of bipolar.
plink --bfile bipolar --chr 13 --make-bed --out bipolar_chr13
Performing association tests
--assoc
allelic association (chi-squared test of allele frequencies)
Other examples of association-related commands
--linear
linear regression for a quantitative phenotype
--logistic
logistic regression for a qualitative phenotype
--pheno
pick a new phenotype file
--pheno-name
choose a column from the phenotype file
Run a GWAS on the irritable mania phenotype
Use the commands –pheno and --pheno-name to select an alternate
phenotype. For later analyses, also add –adjust and --qq-plot commands.
plink --bfile bipolar --assoc --out bipolar_irritable --pheno phenotypes.txt -pheno-name irritable_elated --adjust --qq-plot
Interpret genome-wide output: Manhattan & Q-Q plots
Exercise: generate a Manhattan plot in Haploview
Load Haploview
Choose PLINK format and read in .assoc file
Note: these assoc files have integrated map information.
Plot chromosomes on x-axis and –log(p) on y-axis.
Exercise: generate a Quantile-Quantile (Q-Q) plot in R
Use results from –adjust and –qq-plot, which generated the
expected null distribution of p-values:
bipolar_irritable.assoc.adjusted.
Start R
Change working directory to where the plink output is located
(setwd(dir) or Mac: Misc -> Change working directory or Windows:
File -> Change dir…)
Read in data
data <- read.table(file = " bipolar_irritable.assoc.adjusted",
header = T)
look at first 10 lines of table:
data[1:10,]
plot the expected –log P-values on the x-axis and observed –log P
values on the y-axis:
plot(-log(data$QQ, 10), -log(data$UNADJ,10), xlab =
“expected –logP values”, ylab = “observed –logP values”)
add a line corresponding to y = x
abline(a = 0, b = 1)
Strong deviation from the line indicates that there were more
significant associations than you would expect by chance.
Interpreting regional associations
Exercise: Look at LD relationships near potential hits
Get region +/- 250kb from peak SNP from PLINK – output as a ped file
using –snp and –window command
plink --bfile bipolar --snp rs17079247 --window 250 --recodeHV --out
rs17079247_250kb
Load into Haploview using the linkage format
Exercise: Look at zoomed-in P-values for the region with LD values
(SNAP plot)
plink --bfile bipolar --chr 13 --from-kb 84500 --to-kb 84800 --pheno
phenotypes.txt --pheno-name irritable_elated --assoc --out
bipolar_irritable_rs17079247_zoom
Edit output file in a text editor: change the header P to PValue
Go to SNAP plot website, choose “Plots” from upper right menu and plot a
“Regional Association Plot”
Get more info on genes in the region
UCSC Genome browser: http://genome.ucsc.edu/
Enter top SNP to find region and zoom out to find nearest genes
Download