Databasing, neuroimaging and genetics Jean-Baptiste Poline Thanks: A. Barbot, B. Thyreau, Y. Schwartz , A. Moreno, B. Thirion, V. Frouin, E. Duchesnay, P Pinel and many others JB Poline 06/11/09 1 Outline • Motivation • Databasing and neuroimaging: a quick review and taxinomy • Genetic databases: a very brief word • Neuroimaging and genetics: new needs • Two imaging genetics examples: – Saguenay study – Imagen study • Conclusion and perspective JB Poline 06/11/09 2 Motivation • Imaging genetic studies can be divided into – Small groups selected for a specific polymorphism – Large studies involving hundreds / thousands of subjects for sensitivity / exploratory approaches, (cf GWAS) • Sharing: – several partners are involved, or the NIH requires it • Data protection • Updating / data versioning – Increasing the number of subjects, or information • Queries made simpler / quicker / possible through the web or on disck • Cost: Databasing reduces cost – Acquisition / maintenance JB Poline 06/11/09 3 You cannot handle large heterogeneous data without serious tools •Exel files will kill you JB Poline 06/11/09 4 Databasing and neuroimaging: a quick review and taxinomy • The bibliography database – Brainmap • Networks / databases – BIRN, ADNI, Brainscape, fMRIDC… • Knowledge based – Ontology projects / Xceed • Processings… – Loni pipelines, NAMIC, Brainvisa, Neurogrid, Fistwidget,, … • Each large project has its DB – Imagen, Saguenay, many others … See also… JB Poline 06/11/09 5 BrainMap http://brainmap.org/ Three BrainMap applications : 1. Database searches and Talairach coordinate plotting (Sleuth) 2. Meta-analyses via the activation likelihood estimation (ALE) method; (GingerALE) 3. Entry of published functional neuroimaging papers with coordinate results (Scribe) Not a resource for raw data, but may contain contrast maps JB Poline 06/11/09 6 The BIRN JB Poline 06/11/09 7 BIRN Data Repository • Sharing data through the BDR to capture, curate, store, query, view, and download imaging and related data. • Enable the sharing of existing, published data, • BDR as a mechanism to facilitate collaborations • Appropriate timeline for public release – versioning • A rich curatorial environment, built on the BIRN portal foundation, data submission process and subsequent sharing. • XNAT remains a possibility JB Poline 06/11/09 8 NAMIC •creating a medical image computing platform •research on novel image analysis algorithms •deploying these capabilities JB Poline 06/11/09 9 JB Poline 06/11/09 10 • http://www.adni-info.org • Access through request • ADNI methods available for nonADNI studies. … JB Poline – imaging protocol, – image corrections, – ADNI phantom and analysis software. 06/11/09 11 ADNI uses LONI Image Data Archive JB Poline 06/11/09 12 LONI • • • • The LONI Image Data Archive: an environment for safely archiving, querying, visualizing sharing. The archive facilitates • de-identification and pooling of data from multiple institutions • protection from unauthorized access • the ability to share data among collaborative investigator JB Poline 06/11/09 13 Extensible Neuroimaging Archive Toolkit JB Poline 06/11/09 14 Neurogrid http://www.neurogrid.ac.uk/ • A Grid-based network of neuroimaging centres and a neuroimaging tool-kit. Sharing data and expertise to facilitate the archiving, curation, retrieval and analysis of imaging data • Enable multiple sites large-scale clinical studies • Practicalities: – Set up a secured account – Upload your brain image (T1, DTI) – Dowload results JB Poline 06/11/09 15 Outstanding questions • Databases are still about large project, but local organisation is needed – How to reconcile the need for local need and real DB? • Most of the tools from large projects require IT support (system manager + knowledge on neuroimaging) Often even if they pretend otherwise… • Results are too rarely input in DB after analyses: ontology issues • Large projects publications: are those the most efficient with respect to the current success criteria? – BIRN: about 80 publications in 5 years – ADNI: about 15? (pubmed) JB Poline 06/11/09 16 Some thoughts on neuroimaging and databasing • Sharing data is not yet common but should be in the future – NIH trend, cost, specific population recrutment • Remote computing is getting more common (cloud computing) but tools are still too difficult for average lab • Reproducibility / provenance tracking of results may eventually impose databasing solution • Could be a cost effective solution… JB Poline 06/11/09 17 • Gene Database • A new database of genes and associated information is available for searching in Entrez. • RefSeq Reference sequences of chromosomes, genomic contigs, mRNAs, and proteins for human and major model organisms. • OMIM A guide to human genes and inherited disorders maintained by Johns Hopkins University and collaborators. • dbSNP A database of single nucleotide polymorphisms (SNPs) and other nucleotide variations. NCBI (National Center for Biotechnology Information) Genome Resource guides http://www.ncbi.nlm.nih.gov/genome/guide/ JB Poline 06/11/09 18 JB Poline 06/11/09 19 … resources See also the …. JB Poline 06/11/09 20 db SNP: • SNP rs2396753: Variations can be used for gene mapping, definition of population structure, and performance of functional studies. – DBSNP JB Poline 06/11/09 21 Mapview JB Poline 06/11/09 22 Hapmap / Haploview JB Poline 06/11/09 23 Summarizing the needs • Data protection / Backup / Archiving • Data (pseudo) anonymisation – deidentification – The story of the pseudocode 2 and how it can be broken • Data entering and download – User login/password based access – User specific view of the data • Data versioning • Quality check – Data curation • Querrying the data (Gene/Img/Behav): Interface + scripting – Different level: x,y,z ? Whole image / run? • Sharing the results; (results re-entered) • Visualization JB Poline 06/11/09 24 Example 1: Saguenay Youth Study Funded by CIHR (PIs: T. Paus and Z. Pausova) A genetic study of long-term effects of prenatal exposure to maternal cigarette smoking: On: * Brain Structure * Brain Function * Cardiovascular Function * Body Fat/Metabolism In: * Human Subjects (500 sibpairs) * Recombinant Inbred Strains of Rats JB Poline 06/11/09 25 Saguenay-Lac-Saint-Jean region JB Poline 06/11/09 26 Saguenay Youth Study •500 sib-pairs (+parental DNA) •Age: 12-18 years •French-Canadian origin 250 exposed 250 non-exposed Matched by: • Maternal education • School attended •Genome-wide scan with sib-pair linkage analysis •Fine mapping with family-based association analyses Pausova et al. Human Brain Mapping 06/11/09 28:502-518, 2007 27 JB Poline Saguenay Youth Study Data Collection III Telephone Interview 30 min •Life habits of mother during pregnancy and now •Medical history of children, mother and father IV Home Visit 2h •School performance, activities at school, feelings at school, life at home (ECOBES, students) •Your children and school, your education, your family life (ECOBES, parents) •Screen for psychiatric disorders (DISC Predictive Scale for adolescents) •Puberty development; risky behaviors (cigarettes, drugs, alcohol); hyperactivity, conduct disorder, aggression, anxiety, and depression; delinquency (GRIP, adolescents) •Cigarettes, drugs, and alcohol abuse; anxiety, depression, and anti-social behavior (GRIP, parents) •Drawing a blood sample (parents) V Laboratory 6h Neuro-psychological Assessment •IQ assessment (WISC-III) •Academic achievement (Woodcock-Johnson) •Memory (Children’s Memory Scale) •Motor skills (pegboard, tapping, bi-manual coordination) •Executive functions (interference, word fluency, working memory) •Emotion/Motivation (faces, voices, gambling, RFT) •Language (FM threshold, phonological awareness, DAF, phonetic learning) VI Hospital Session 4h Body composition •Anthropometry •Bioimpedance •MRI (fat) Blood pressure, cardiovascular reactivity, and salivary cortisol (Finometer: beat-to-beat, respiration) •Resting •In response to postural change •In response to mental stress MRI scan •Brain •Abdomen: fat and kidneys Diet and Physical Activity •Twenty-four-hour food recall, •Food frequency questionnaire •Physical activity questionnaire VII School Session 1h Fasting Blood Sample Glucose and lipid metabolism Low-grade inflammation, endothelial and fibrinolytic dysfunctions, HPA activity Sexual maturation Smoking habits Nutrition VIII Genotyping: Candidate Genes and Total Genome Scan Structural Magnetic Resonance Imaging: T1-weighted 15 min T2-weighted Proton Density 15 min Magnetization Transfer Ratio 15 min MR Pipeline: Quality Control One result • White Matter volume Magnetization Transfer ratio testosterone influenced WM volume to a greater extent in males with the more “efficient” AR (short AR gene), compared with those with a less efficient AR (long AR gene) JB Poline 06/11/09 32 Lessons from the Saguenay study • Home made database (PHP, 1py) • Contains all variables (phone interview, etc) but not the imaging data • No mecanism to share data • Home design for web pages for specific datasets (~versioning) • Semi automatic analysis pipeline, results re-entered in the DB • The use of a specific population • Very large amount of behavioural or biological data • No tool easy for re-use JB Poline 06/11/09 33 Example 2: Imagen project and database: a brief review • Genetically influenced individual differences in brain responses to reward, punishment and emotional cues in adolescents mediate risk for mental disorders • Neuroimaging : measurement of specific brain functions implicated in the etiology of mental disorders and link them to genetic and behavioural variations • The goal of the present study is to identify the neurobiological and genetic basis of these traits and to assess their relevance for mental disorder. Means: a multicentre functional and structural genetic-neuroimaging study of a cohort of 2000+ 14 year old adolescents. Intermediate phenotypes of risk for adolescent mental illness will be explored. JB Poline 06/11/09 34 European partners 1. Berlin: A. Heinz 2. Cambridge: replaced by Dresden, M. Smolka 3. Dublin: H. Garavan 4. Hamburg: C. Buechel 5. London: G. Schumann, L. Reed, 6. Mannheim: H. Flor 7. Nottingham: T. Paus 8. Orsay: JL Martinot Also: T. Robbins, Cam. P. Conrod, IOP, … JB Poline 06/11/09 35 WP10: Training and dissemination; Year 1-5 (5 years) WP 06: Neuroimaging Year 2-4 (3 years) WP 04: Recruitment and characterisation Year 2-4 (3 years) Preparation (6 months) WP 05: Neuroimaging standardisation Year 1 (1 year) WP 07: Bioinformatics and Biostatistics Year 1-5 (5 years) WP 03: Gene identification Month 19year 4 (2,5 years) WP 08: DNA bank, SNP detection and genotyping Year 2-5 (4 years) WP 02: Behavioural tasks in humans; (4 years) Implementation Year 2-4 WP 01: Behavioural analysis of animal models; (3 years) Implementation Year 2-3 Preparation (Year 1) Preparation (Year 1) WP 09: EthicsIMAGEN; Year 1-5 (5 years) WP 11: Project Management; Year 1-5 (5 years) JB Poline 06/11/09 36 Step 1: One site collection and transfert (Scito, NNL) JB Poline 06/11/09 37 Step 2: data anonymisation and package handling JB Poline 06/11/09 38 Step 3: including data JB Poline 06/11/09 39 Work-Package 07 – Central Database XNAT : a database tool[ Marcus & al 2007 ] (also use in BIRN ) • XML schemas define database structure ( easy database modification ) • • • Auto-generated tools : Web portal Command line JB Poline 40 Data included, use XML schema for DB Ontology JB Poline 06/11/09 41 JB Poline 06/11/09 42 JB Poline 06/11/09 43 JB Poline 06/11/09 44 Web based Quality check JB Poline 06/11/09 45 Web based Quality check (Pre-)Processings • T1 – SPM8 new segment – Brainvisa pipeline – Dartel / Free surfer have been tried out • T2* – SPM8 preprocessing of all available EPI data – Strategy: mvt correction; reslicing, fMRI -> MPRAGE long, MPRAGE long -> MNI template for each session – Homogenizing the log file to get fMRI protocols (dealing with various number of runs, …) – Fitting the model intra subject (SPM) – Inter subject: in house (mixed effect + permutation) • DTI – FSL – In house JB Poline 06/11/09 46 JB Poline 06/11/09 47 Queries • Give me T1 – normalized in MNI images for which subjects had score X above 5 • Give me behavioural scores of instrument X and Y for subjects with T2* image quality above Z • Give me the genotypes of subject with both behavioural score X and DTI images of good quality • Download results • API for scripts JB Poline 06/11/09 48 Automatic Quality check JB Poline 06/11/09 49 Neuroimaging scores for QC Intra volume variance variation T1 mask and template overlap fMRI Movement estimated JB Poline 06/11/09 50 A few words on data analysis JB Poline 06/11/09 51 SNPs G T G T G T T G G G Neuroimaging and WGA Clinical / behaviour Find statistical links Or Predict aMRI JB Poline dMRI fMRI 06/11/09 52 Finding out the good analysis strategies SNP – 1M. +CNV Data dimension reduction Multiple comparison pb Inhomogeneous data JB Poline Subjects Images 200k-50k Transcriptom 50k Behaviour: <200 06/11/09 53 Candidate SNPs vs. all image For each voxel f( GG CG CC CG GG )=voxel Stat. Map Methods: - VBM, group fMRI, etc... Complexity/multiple comparison issue: - ~106 tests or estimated parameters JB Poline 06/11/09 54 One image region vs. all SNPs Plink? For one voxel f( )=SNP Method known as WGAS Multiple comparison: ~106 tests JB Poline 06/11/09 55 Feature selection approach Selection Gene-Image on reduced data Selection Or multivariate approaches • Consider LD / spatial covariance / behaviour tests covariance JB Poline 06/11/09 56 Circuit Lecture de phrase – damiers Correlation lateralisation / vistesse de lecture pseudomots Circuit Ecoute de phrase Correlation lateralisation / vistesse de lecture pseudomots Score=vitesse de lecture des pseudo-mots / étude cerveau entier p=0.01, 40 voxels Sans les outliers à deux écarts types de distance au moins de la moyenne Score=vitesse de lecture des pseudo-mots / étude cerveau entier p=0.01, 40 voxels Sans les outliers à deux écarts types de distance au moins de la moyenne Circuit Lecture de phrase – damiers Diff. Diff. Lateralisation – gene KIAA / SNP: rs155089 6>8 Type C/C C/T nb. sbj 2 21 age men 44 36 educ (y) 3.1 Dysl.(%) 6 Substr.(%) 80 Pseudow(ms/w) - 855 T/T 34 25 24 3.4 10 73 924 JB Poline Circuit Lecture de phrase – damiers Diff. Diff. Lateralisation – gene KIAA / SNP: rs7761100 7>6 Type G/G G/T T/T nb. sbj 32 28 7 age 23 23 men 39 40 50 educ (y) 3.4 3.1 3.5 Dysl.(%) 10 7 0 Substr.(%) 70 79 81 Pseudow(ms/w) 930 895 850 06/11/09 % dyslexic 22 57 Pseudow. speed reading Substraction score peak JB Poline 06/11/09 58 Conclusion: • A lot to be done: combining two complex and powerful data for – Better understanding of brain mecanisms – Better understanding of the impact of genetic variations – Better risk factor prediction… • Visualisation and interaction: see Abstract • Strategy for analysis is multiple to face huge multiple comparison JB Poline 06/11/09 59 L Shen, S Kim, J D West, A J Saykin JB Poline 06/11/09 60