Databasing, neuroimaging and genetics Jean-Baptiste Poline

advertisement
Databasing, neuroimaging and
genetics
Jean-Baptiste Poline
Thanks: A. Barbot, B. Thyreau, Y. Schwartz , A. Moreno,
B. Thirion, V. Frouin, E. Duchesnay, P Pinel and many
others
JB Poline
06/11/09
1
Outline
• Motivation
• Databasing and neuroimaging: a quick
review and taxinomy
• Genetic databases: a very brief word
• Neuroimaging and genetics: new needs
• Two imaging genetics examples:
– Saguenay study
– Imagen study
• Conclusion and perspective
JB Poline
06/11/09
2
Motivation
• Imaging genetic studies can be divided into
– Small groups selected for a specific polymorphism
– Large studies involving hundreds / thousands of subjects for
sensitivity / exploratory approaches, (cf GWAS)
• Sharing:
– several partners are involved, or the NIH requires it
• Data protection
• Updating / data versioning
– Increasing the number of subjects, or information
• Queries made simpler / quicker / possible through
the web or on disck
• Cost: Databasing reduces cost
– Acquisition / maintenance
JB Poline
06/11/09
3
You cannot handle large heterogeneous
data without serious tools
•Exel files will kill you
JB Poline
06/11/09
4
Databasing and neuroimaging: a quick
review and taxinomy
• The bibliography database
– Brainmap
• Networks / databases
– BIRN, ADNI, Brainscape, fMRIDC…
• Knowledge based
– Ontology projects / Xceed
• Processings…
– Loni pipelines, NAMIC, Brainvisa, Neurogrid, Fistwidget,, …
• Each large project has its DB
– Imagen, Saguenay, many others …
See also…
JB Poline
06/11/09
5
BrainMap http://brainmap.org/
Three BrainMap applications :
1. Database searches and
Talairach coordinate plotting
(Sleuth)
2. Meta-analyses via the
activation likelihood estimation
(ALE) method; (GingerALE)
3. Entry of published functional
neuroimaging papers with
coordinate results (Scribe)
Not a resource for raw data, but may contain contrast maps
JB Poline
06/11/09
6
The BIRN
JB Poline
06/11/09
7
BIRN Data Repository
• Sharing data through the BDR to capture, curate,
store, query, view, and download imaging and
related data.
• Enable the sharing of existing, published data,
• BDR as a mechanism to facilitate collaborations
• Appropriate timeline for public release
– versioning
• A rich curatorial environment, built on the BIRN
portal foundation, data submission process and
subsequent sharing.
• XNAT remains a possibility
JB Poline
06/11/09
8
NAMIC
•creating a medical image computing platform
•research on novel image analysis algorithms
•deploying these capabilities
JB Poline
06/11/09
9
JB Poline
06/11/09
10
• http://www.adni-info.org
• Access through request
• ADNI methods available for nonADNI studies.
…
JB Poline
– imaging protocol,
– image corrections,
– ADNI phantom and analysis
software.
06/11/09
11
ADNI uses LONI Image Data Archive
JB Poline
06/11/09
12
LONI
•
•
•
•
The LONI Image Data Archive: an environment for
safely archiving,
querying,
visualizing
sharing.
The archive facilitates
• de-identification and pooling of
data from multiple institutions
• protection from unauthorized
access
• the ability to share data among
collaborative investigator
JB Poline
06/11/09
13
Extensible Neuroimaging Archive
Toolkit
JB Poline
06/11/09
14
Neurogrid http://www.neurogrid.ac.uk/
• A Grid-based network of neuroimaging centres and a
neuroimaging tool-kit. Sharing data and expertise to facilitate
the archiving, curation, retrieval and analysis of imaging data
• Enable multiple sites large-scale clinical studies
• Practicalities:
– Set up a secured account
– Upload your brain image (T1, DTI)
– Dowload results
JB Poline
06/11/09
15
Outstanding questions
• Databases are still about large project, but local
organisation is needed
– How to reconcile the need for local need and real DB?
• Most of the tools from large projects require IT
support (system manager + knowledge on
neuroimaging) Often even if they pretend otherwise…
• Results are too rarely input in DB after analyses:
ontology issues
• Large projects publications: are those the most
efficient with respect to the current success criteria?
– BIRN: about 80 publications in 5 years
– ADNI: about 15? (pubmed)
JB Poline
06/11/09
16
Some thoughts on neuroimaging and
databasing
• Sharing data is not yet common but should
be in the future
– NIH trend, cost, specific population recrutment
• Remote computing is getting more common
(cloud computing) but tools are still too
difficult for average lab
• Reproducibility / provenance tracking of
results may eventually impose databasing
solution
• Could be a cost effective solution…
JB Poline
06/11/09
17
• Gene Database
• A new database of genes and associated information is
available for searching in Entrez.
• RefSeq
Reference sequences of chromosomes, genomic contigs,
mRNAs, and proteins for human and major model organisms.
• OMIM
A guide to human genes and inherited disorders maintained by
Johns Hopkins University and collaborators.
• dbSNP
A database of single nucleotide polymorphisms (SNPs) and
other nucleotide variations.
NCBI (National Center for Biotechnology Information) Genome Resource
guides http://www.ncbi.nlm.nih.gov/genome/guide/
JB Poline
06/11/09
18
JB Poline
06/11/09
19
… resources
See also the ….
JB Poline
06/11/09
20
db SNP:
• SNP rs2396753: Variations can be used for gene
mapping, definition of population structure, and
performance of functional studies.
– DBSNP
JB Poline
06/11/09
21
Mapview
JB Poline
06/11/09
22
Hapmap / Haploview
JB Poline
06/11/09
23
Summarizing the needs
• Data protection / Backup / Archiving
• Data (pseudo) anonymisation – deidentification
– The story of the pseudocode 2 and how it can be broken
• Data entering and download
– User login/password based access
– User specific view of the data
• Data versioning
• Quality check – Data curation
• Querrying the data (Gene/Img/Behav): Interface +
scripting
– Different level: x,y,z ? Whole image / run?
• Sharing the results; (results re-entered)
• Visualization
JB Poline
06/11/09
24
Example 1: Saguenay Youth Study
Funded by CIHR (PIs: T. Paus and Z. Pausova)
A genetic study of long-term effects of prenatal
exposure to maternal cigarette smoking:
On: * Brain Structure
* Brain Function
* Cardiovascular Function
* Body Fat/Metabolism
In: * Human Subjects (500 sibpairs)
* Recombinant Inbred Strains of Rats
JB Poline
06/11/09
25
Saguenay-Lac-Saint-Jean region
JB Poline
06/11/09
26
Saguenay Youth Study
•500 sib-pairs (+parental DNA)
•Age: 12-18 years
•French-Canadian origin
250 exposed
250 non-exposed
Matched by:
• Maternal education
• School attended
•Genome-wide scan with sib-pair linkage analysis
•Fine mapping with family-based association analyses
Pausova et al. Human Brain Mapping 06/11/09
28:502-518, 2007
27
JB Poline
Saguenay Youth Study
Data Collection
III
Telephone Interview
30 min
•Life habits of mother during pregnancy and now
•Medical history of children, mother and father
IV Home Visit
2h
•School performance, activities at school, feelings at school, life at home (ECOBES, students)
•Your children and school, your education, your family life (ECOBES, parents)
•Screen for psychiatric disorders (DISC Predictive Scale for adolescents)
•Puberty development; risky behaviors (cigarettes, drugs, alcohol); hyperactivity, conduct disorder,
aggression, anxiety, and depression; delinquency (GRIP, adolescents)
•Cigarettes, drugs, and alcohol abuse; anxiety, depression, and anti-social behavior (GRIP, parents)
•Drawing a blood sample (parents)
V
Laboratory
6h
Neuro-psychological Assessment
•IQ assessment (WISC-III)
•Academic achievement (Woodcock-Johnson)
•Memory (Children’s Memory Scale)
•Motor skills (pegboard, tapping, bi-manual coordination)
•Executive functions (interference, word fluency, working memory)
•Emotion/Motivation (faces, voices, gambling, RFT)
•Language (FM threshold, phonological awareness, DAF, phonetic learning)
VI Hospital Session
4h
Body composition
•Anthropometry
•Bioimpedance
•MRI (fat)
Blood pressure, cardiovascular reactivity, and salivary cortisol
(Finometer: beat-to-beat, respiration)
•Resting
•In response to postural change
•In response to mental stress
MRI scan
•Brain
•Abdomen: fat and kidneys
Diet and Physical Activity
•Twenty-four-hour food recall,
•Food frequency questionnaire
•Physical activity questionnaire
VII
School Session
1h
Fasting Blood Sample
Glucose and lipid metabolism
Low-grade inflammation, endothelial and fibrinolytic dysfunctions, HPA activity
Sexual maturation
Smoking habits
Nutrition
VIII Genotyping: Candidate Genes and Total Genome Scan
Structural Magnetic Resonance Imaging:
T1-weighted
15 min
T2-weighted
Proton Density
15 min
Magnetization
Transfer Ratio
15 min
MR Pipeline: Quality Control
One result
• White Matter
volume
Magnetization Transfer ratio
testosterone influenced WM volume to a greater extent in males with the more “efficient”
AR (short AR gene), compared with those with a less efficient AR (long AR gene)
JB Poline
06/11/09
32
Lessons from the Saguenay study
• Home made database (PHP, 1py)
• Contains all variables (phone interview, etc) but not
the imaging data
• No mecanism to share data
• Home design for web pages for specific datasets
(~versioning)
• Semi automatic analysis pipeline, results re-entered
in the DB
• The use of a specific population
• Very large amount of behavioural or biological data
• No tool easy for re-use
JB Poline
06/11/09
33
Example 2: Imagen project and
database: a brief review
• Genetically influenced individual differences in brain responses
to reward, punishment and emotional cues in adolescents
mediate risk for mental disorders
• Neuroimaging : measurement of specific brain functions
implicated in the etiology of mental disorders and link them to
genetic and behavioural variations
• The goal of the present study is to identify the neurobiological
and genetic basis of these traits and to assess their relevance
for mental disorder. Means: a multicentre functional and
structural genetic-neuroimaging study of a cohort of 2000+ 14
year old adolescents. Intermediate phenotypes of risk for
adolescent mental illness will be explored.
JB Poline
06/11/09
34
European partners
1. Berlin: A. Heinz
2. Cambridge: replaced by
Dresden, M. Smolka
3. Dublin: H. Garavan
4. Hamburg: C. Buechel
5. London: G. Schumann, L.
Reed,
6. Mannheim: H. Flor
7. Nottingham: T. Paus
8. Orsay: JL Martinot
Also: T. Robbins, Cam. P.
Conrod, IOP, …
JB Poline
06/11/09
35
WP10: Training and dissemination; Year 1-5 (5 years)
WP 06:
Neuroimaging
Year 2-4
(3 years)
WP 04:
Recruitment
and
characterisation
Year 2-4
(3 years)
Preparation
(6 months)
WP 05: Neuroimaging
standardisation
Year 1 (1 year)
WP 07:
Bioinformatics and
Biostatistics
Year 1-5
(5 years)
WP 03:
Gene
identification
Month 19year 4
(2,5 years)
WP 08:
DNA bank,
SNP
detection
and
genotyping
Year 2-5
(4 years)
WP 02: Behavioural tasks in
humans; (4 years)
Implementation Year 2-4
WP 01: Behavioural analysis
of animal models; (3 years)
Implementation Year 2-3
Preparation (Year 1)
Preparation (Year 1)
WP 09: EthicsIMAGEN; Year 1-5 (5 years)
WP 11: Project Management; Year 1-5 (5 years)
JB Poline
06/11/09
36
Step 1: One site collection and
transfert (Scito, NNL)
JB Poline
06/11/09
37
Step 2: data anonymisation and
package handling
JB Poline
06/11/09
38
Step 3: including data
JB Poline
06/11/09
39
Work-Package 07 – Central Database
XNAT : a database tool[ Marcus & al 2007 ]
(also use in BIRN )
•
XML schemas define database structure
( easy database modification )
•
•
•
Auto-generated tools :
Web portal
Command line
JB Poline
40
Data included, use XML schema for
DB Ontology
JB Poline
06/11/09
41
JB Poline
06/11/09
42
JB Poline
06/11/09
43
JB Poline
06/11/09
44
Web based Quality check
JB Poline
06/11/09
45
Web based Quality check
(Pre-)Processings
• T1
– SPM8 new segment
– Brainvisa pipeline
– Dartel / Free surfer have been tried out
• T2*
– SPM8 preprocessing of all available EPI data
– Strategy: mvt correction; reslicing, fMRI -> MPRAGE long,
MPRAGE long -> MNI template for each session
– Homogenizing the log file to get fMRI protocols (dealing with
various number of runs, …)
– Fitting the model intra subject (SPM)
– Inter subject: in house (mixed effect + permutation)
• DTI
– FSL
– In house
JB Poline
06/11/09
46
JB Poline
06/11/09
47
Queries
• Give me T1 – normalized in MNI images for
which subjects had score X above 5
• Give me behavioural scores of instrument X
and Y for subjects with T2* image quality
above Z
• Give me the genotypes of subject with both
behavioural score X and DTI images of good
quality
• Download results
• API for scripts
JB Poline
06/11/09
48
Automatic Quality check
JB Poline
06/11/09
49
Neuroimaging scores for QC
Intra volume variance
variation
T1 mask and template overlap
fMRI Movement estimated
JB Poline
06/11/09
50
A few words on data analysis
JB Poline
06/11/09
51
SNPs
G
T G
T G
T T
G G
G
Neuroimaging and WGA
Clinical / behaviour
Find statistical links
Or
Predict
aMRI
JB Poline
dMRI
fMRI
06/11/09
52
Finding out the good analysis
strategies
SNP – 1M. +CNV
Data
dimension
reduction
Multiple
comparison
pb
Inhomogeneous
data
JB Poline
Subjects
Images 200k-50k
Transcriptom 50k
Behaviour: <200
06/11/09
53
Candidate SNPs vs. all image
For each voxel
f(
GG
CG
CC
CG
GG
)=voxel
Stat. Map
Methods:
- VBM, group fMRI, etc...
Complexity/multiple comparison issue:
- ~106 tests or estimated parameters
JB Poline
06/11/09
54
One image region vs. all SNPs
Plink?
For one voxel
f(
)=SNP
Method known as WGAS
Multiple comparison: ~106 tests
JB Poline
06/11/09
55
Feature selection approach
Selection
Gene-Image
on reduced
data
Selection
Or multivariate approaches
• Consider LD / spatial covariance / behaviour
tests covariance
JB Poline
06/11/09
56
Circuit Lecture de phrase – damiers
Correlation lateralisation / vistesse de lecture pseudomots
Circuit Ecoute de phrase
Correlation lateralisation / vistesse de lecture pseudomots
Score=vitesse de lecture des pseudo-mots / étude cerveau entier p=0.01, 40 voxels
Sans les outliers à deux écarts types de distance au moins de la moyenne
Score=vitesse de lecture des pseudo-mots / étude cerveau entier p=0.01, 40 voxels
Sans les outliers à deux écarts types de distance au moins de la moyenne
Circuit Lecture de phrase – damiers
Diff.
Diff. Lateralisation – gene KIAA / SNP: rs155089 6>8
Type
C/C C/T
nb. sbj
2
21
age
men
44
36
educ (y)
3.1
Dysl.(%)
6
Substr.(%)
80
Pseudow(ms/w) - 855
T/T
34
25
24
3.4
10
73
924
JB Poline
Circuit Lecture de phrase – damiers
Diff.
Diff. Lateralisation – gene KIAA / SNP: rs7761100 7>6
Type
G/G G/T T/T
nb. sbj
32
28
7
age
23
23
men
39
40
50
educ (y)
3.4 3.1 3.5
Dysl.(%)
10
7
0
Substr.(%)
70
79
81
Pseudow(ms/w) 930 895 850
06/11/09
% dyslexic
22
57
Pseudow. speed reading
Substraction score
peak
JB Poline
06/11/09
58
Conclusion:
• A lot to be done: combining two complex and
powerful data for
– Better understanding of brain mecanisms
– Better understanding of the impact of genetic
variations
– Better risk factor prediction…
• Visualisation and interaction: see Abstract
• Strategy for analysis is multiple to face huge
multiple comparison
JB Poline
06/11/09
59
L Shen, S Kim, J D West, A J Saykin
JB Poline
06/11/09
60
Download