Tools for data analysis and management in complex traits

advertisement
Tools for data analysis and management in complex traits genetic studies
Faculty:
-
Yurii Aulchenko
-
Maria Krestyaninova
-
Ilkka Lappalainen
-
Inga Prokopenko
-
Samuli Ripatti
-
Johan Rung
Tutors (specific aspects):
-
Juris Viksna
-
Maksim Struchalin
Technical setup (locally at FIMM):
Teemu Perheentuppa
Jani Heikkinen
Place: Barcelona
Dates: 8-10 February
Duration: 2.5 days, 5 sessions
Workshop hours: Morning session 9.00-13.00 (15 min break at 10.45), Lunch 13.00-14.00,
Afternoon session 14.00-18.00 (15 min break at 15.45)
Participants prerequisites:
Knowledge of analytical tools
Some programming skills
Knowledge about association analysis as a method for genetic studies. Knowledge of
basic statistics, study designs (case-control), quantitative traits analysis.
Ideally would be involved in GWA studies with necessity to perform QC of GWA data,
association analysis, imputation and meta-analyses of GWAs.
Location prerequisites:
Classroom with 20-30 workplaces and plugs
Ideally WIFI, otherwise wired internet access (here problem: need 25 internet
cables!)
Whiteboard, projector
Course materials:
Prepare dataflow (session specific) protocol from very beginning to the meta-analysis result
discussion
IT:
Set up a dedicated server, 30 user accounts, upload data examples, collect the scripts.
DAY 1: Computational methods for genome-wide association studies
Session 1: Introduction to the course
-
Workflow and sessions content (Maria, Samuli – 20 min)
-
GWAs in general (Yurii – 30 min)
-
Exercise analysis to be performed during workshop: Real life example: “Variants in MTNR1B
locus and fasting glucose levels” (Nat Genet. 2009 Jan;41(1):77-81. PMID: 19060907) (Inga –
20 min)
-
Quality Control of GWA data & PLINK (Inga – 30 min)
-
GWAs from wet lab to interpretation of the results : workflow of data analysis from GWAs
QC to representation of the meta-analysis results and follow-up studies (20 min)
Break 15 min
2h Practical session
1. Where is the data for today. Ilkka Lappalainen(40-60 min)
-
File formats and descriptions at EGA
-
Access right and accounts
2.: Example: Quantitative traits (Fasting Glucose and Fasting Insulin) Samuli Ripatti / Inga
Prokopenko. Exercise
Data structure: 8 datasets of 1000 individuals with 2Mb row data for chr11: 91-93Mb build35 (from
NFBC66 and Rotterdam samples, totalling up to 8k individuals); phenotype files with (Fasting
Glucose, gender, bmi, Fasting insulin) phenotype files formatted and split into 8 samples according to
genotypes
-
Downloas test dataset from EGA
-
Perform Standard QC steps
-
Save clean files, inclusion and exclusion lists
Lunch
Session 2: Samuli (1 hour): IMPUTATION overview
-
Statistical methods for imputation
-
Comparison of the reference datasets (HapMap 3, Finnish, data, 1000genomes)
-
Imputation software (MACH, IMPUTE)
-
Data format transformation from ped/map files to IMPUTE and MACH appropriate formats
(GTOOL, PLINK)
Break 15 min
2h Practical session. Yurii, Maksim, Inga, Ilkka
1. Imputation. Yurii, Maksim, Inga
Room is divided into MACH and IMPUTE users (either randomly or by personal decision of
participants, however half to half ratio to be kept for two approaches)
Yurii/Maksim – run half of the room using MACH, Inga – runs half of the room using IMPUTE (60-100
min)
-
transform files into appropriate format with GTOOL for IMPUTE/scripts for MACH
-
Run MACH / IMPUTE on training sets
-
Check output
2. Upload of imputed data to EGA. Ilkka (20 min)
DAY 2. Imputation and Meta-analysis
Session 3: Association analysis of imputed data
1. Overview of the session: programmes, genetic models, GWA data and genomic control,
phenotypes (binary and quantitative) and adjustments. Yurii/Maksim (40 min)
Same Example: association analysis of quantitative traits (Fasting Glucose and Fasting Insulin)
2. Summary on efficient data structuring and exchange within diverse research partnerships
(SIMBioMS) Maria (1h)
-
Solutions for cost-efficient data organisation
-
Standard operating procedures and formats
-
Public resources vvs project-specific application: areas of application
-
Keeping data management light-weight and effective: focus on analysis protocols
15 min break
2 h Practical session.
1. Association analysis of Imputed data. Inga/Samuli/ Ida/Maksim (100-120 min)
- files available after imputation
- phenotypes (input formats), trait transformations, adjustments
- commands and scripts to run association analysis programmes
- files needed ( for SNPTEST/ProbAbel)
- Run programmes, check output
2. Organising the data during analysis stage. Maria (40-50 min)
-
Storing your data at local or web-based, project-specific database
-
Handling raw, processes and sample data files
-
Describing the files within AIM/SIMS
-
Concept of efficient exchange and minimum information about the data
-
Upload to AIMS/SIMS
o
ftp upload (types, size of files, rights)
o
html upload (type of files, rights, structuring)
Session 4: IT and tools for meta-analysis
Meta-analysis. Inga (1h)
1. Meta-analysis methods for GWAs
2. AIMS structure with files/results pages for various analyses, a brief rehearsal from the
session before (requires that results from previous sessions were uploaded on AIMS/SIMS)
3. Information and file formats needed for meta-analysis
4. Programmes to use
15 min break
2h Practical session (Yurii/Maksim/Samuli/Ida)
1. What one needs to know about data organisation in order to set up a meta study across
several institutions. Juris (40-50 min)
-
-differentiated access rights
-
Capturing study context
2. Running metaanalysis (Yurii/Maksim/Samuli/Ida) 70-80 min
-
run meta-analyses with GWAMA, metaMapper (Inga) and METAL, Metabel (Yurii), format
output
-
Store the results in AIMS for meta-analysis results sharing
-
Summary of the results (reporting and plots)
-
Secondary signals: conditional analyses within associated regions (genotype dosage vs
best guess)
DAY 3
Session 5. Current trends in genome-wide analysis:
1. Using other datatypes (OMICS): from polymorphisms to genes and functions. Johan (1h30 with
short break):
- inferring SNP function
- expression data, eQTLs
- gene networks and pleiotropy
- disease examples, Mendelian vs polygenic
- GWAS findings vs “what would have been expected”
15 min break
Maria: More complex phenotypes, SAIL (Samuli/Maria)
Ilkka: Archiving data and results
Download