My first 100 Tb of data STATISTICAL METHODS FOR NEW TECHNOLOGY WORKING GROUP Ciprian M. Crainiceanu Johns Hopkins University http://www.biostat.jhsph.edu/smnt Members of the group • Key personnel • C.M. Crainiceanu, B.S. Caffo, A.-M. Staicu, S. Greven, D. Ruppert, C.-Z. Di • Senior Students • V. Zipunnikov, J.-A. Goldsmith • Other statisticians (>20) • Scientific collaborators • Direct collaboration • Solving important scientific problems • Diverse scientific applications Scientific Collaborators • Susan Bassett – fMRI, Alzheimer’s • Danny Reich – DTI, DCE-MRI, MS • Brian Schwartz – lead exposure, VBM, DTI, white matter imaging • Stewart Mostofsky – fMRI, rsfcMRI, Autism, ADHD, Turrets • Naresh Punjabi – EEG, sleep, sleep diseases • Dzung Pham / Pilou Bazin – Cortical shape, thickness, lesion detection, MS • Dean Wong – PET, fMRI substance abuse • Susan Resnick – BLSA • Jerry Prince – BLSA, ADNI • Jim Pekar, Peter Van Zijl – 7T MRI, fMRI, rsfcMRI preprocessing, scanner physics • Christos Davatzikos- RAVENS • Susumu Mori – DTI, tractography • Dana Boatman – ECOG, EEG, epilepsy • Graham Redgrave – fMRI, DTI, Huntington’s, anorexia/bulimia • Tudor Badea, Bruno Jednyak – Neuron classification, morphometry, 3D structure and shape • Tom Glass – Gizmos • Merck – EEG, neuroimaging • Pfizer – imaging biomarkers? Observational Studies 2.0 Longitudinal Functional Principal Component Analysis (LFPCA) • I=1000, J=4, D=100: 15’ • I=1000, J=8, D=200: 70’ Greven, Crainiceanu, Caffo, Reich, 2010. LFPCA, EJS, to appear A simple regression formula • Data compression via longitudinal PCA • MoM estimators of covariance matrices, smoothing • Need: all covariance operators • Solution: regress Yij(d)Yik(d’) on 1, Tik, Tij, TikTij, djk Variance explained (FA, 3 yrs of long. data) Longitudinal Penalized Functional Regression LPFR: recipe and ingredients PASAT/MD (Corp. Call.), PD (Cortic. spinal) Functional regression • • • • No paper on longitudinal functional regression No paper published with this data structure Longitudinal extensions are not “simple” Technical details are hard without the correct “recipe” for known and published “ingredients” • No available method that scales up Goldsmith, Feder, Crainiceanu, Caffo, Reich, 2010. PFR, JCGS, to appear Goldsmith, Crainiceanu, Caffo, Reich, 2010. LPFR, to appear? Population Value Decomposition (PVD) PVD Yi = P ViD + Ei • • • • P is T*A D is B*F Vi is A*B A << T, B << F Singular Value Decomposition (SVD) summarizes variance Time One subject Frequency Frequency. Subject-specific Data Eigenvariates Diagonal Matrix Eigenfrequencies Default PVD (Start here) Eigenvariates SVD Subject-specific Data Eigenfrequencies Low rank approximation Stacked across subjects SVD Population decomposition Projecting original data onto population bases ... … Subject-specific Data Caffo BS, Crainiceanu CM, Verduzco G, Joel SE, Mostofsky SH, Bassett SS, Pekar JJ. Two-Stage decompositions for the analysis of functional connectivity for fMRI with application to Alzheimer’s disease risk. NeuroImage (In Press). Population eigenimages Currently: •Deploying PVD to the 1000 Functional Connectomes Project http://www.nitrc.org/projects/fcon_1000/ •Comparing rsfcMRI in stroke versus normal subjects HD-MFPCA/RAVENS Images Multilevel Functional Principal Component Analysis (MFPCA) MFPCA HD-MFPCA HD-MFPCA, Step 1 HD-MFPCA, Step 2 Main message, backed by 100Tb of data • Eventually, good tech makes into observational and clinical trials • Longitudinal/Multilevel FDA is the natural next step in FDA • Data is changing the way we do business: availability, size, complexity • Likely: funding will be based much more on relevance than on technical ability