New Tools for Tracking the Dynamics of Mental Representations Sam Gershman Princeton University June 14, 2012 / OHBM Resist the tyranny of the voxel! unit of measurement 6= unit of analysis Problems with voxel-based approaches I Voxels impose a discretization of the anatomical space; voxel-based models elide this measurement discretization into parametric discretization. Why should doubling the number of measurements necessitate doubling the number of parameters? I Explosion of parameters to estimate. I Hard to align results across subjects. I Parameters should reflect meaningful theoretical quantities. A new appproach I Assume, as in latent factor models, that the observed neural data arises from a superposition of basis images. This allows us to parsimoniously capture the underlying modes of the system. I Assume, as in supervised models (e.g., GLMs), that the superposition is covariate-dependent: which basis images are active depends on which covariates are active I The main departure from these models is that the basis images are topographic and voxel-free. Basis images I Each basis image is specified by a function over space that is discretely sampled to give rise to voxel activations. I The function is parameterized by a center µk (in image space coordinates) and a width λk : ( ) X −1 2 fkv = exp −λk (µkd − rvd ) , d where fk is the k th basis image and rvd is the coordinate of voxel v in the dth dimension. I We call each of these functions a latent source to emphasize their similarity to physically extended regions. Hence, the model is called Topographic Latent Source Analysis (TLSA). The generative process V A N Y Neural data C = X K V W F Design Weights matrix B + - Basis images K Modeling multi-subject data Image space Subject-level sources Group template How many sources? I How do we decide how many sources to use? I We can approach this probabilistically by defining a nonparametric prior on the weight matrix. I This allows an unbounded number of sources, but with a bias towards a small number. I Thus, the effective number of sources can be determined automatically. I Specifically, each weight is modeled as a “spike and slab” prior (the beta process). Construction of the spike and slab prior wck = uck zk , uck ∼ N (0, σu2 ) (loading of source k on covariate c) (continuous “slab”) zk ∼ Bernoulli(πk ) (binary “spike”) πk ∼ Beta(α, 1) (probability of spike) I As K → ∞, we obtain a nonparametric prior, the beta process. I Even though there are theoretically an infinite number of sources, the number of active sources follows a Poisson(α) distribution. Spike and slab prior 0.7 Slab only Spike & slab 0.6 0.5 P(r) 0.4 0.3 0.2 0.1 0 −3 −2 −1 0 r 1 2 3 Inference I Our goal is to compute the posterior over parameters θ given the neural data Y and covariates X: P(θ|X, Y) ∝ P(Y|θ, X)P(θ). However, computing this exactly is intractable. Therefore we approximate it with a distribution Q(θ) taken from a family of distributions Q for which inference is tractable. The optimization problem is to find the distribution Q ∗ that minimizes the KL-divergence between Q(θ) and P(θ|X, Y): Q ∗ = argmin KL[Q||P]. Q∈Q I This technique is known as variational Bayes. It is computationally efficient and involves closed-form updates that are guaranteed to converge to a local optimum. Illustration: reconstructed maps Class 3 Class 2 Class 1 TLSA OLS Extension to spatiotemporal data I The high temporal and low spatial resolution of EEG presents a special challenge: how can we capture spatiotemporal patterns? I We can capture these patterns by replacing the spatial basis functions with spatiotemporal basis functions, consisting of spatial and temporal receptive fields. I The inference algorithm works in exactly the same way. Illustration: Extracting the P300 Oddball contrast for one subject: −3 1.6 x 10 400 ms 1.4 AF7 F7 F5 Fp1 Fp2 AF3 AF4 F3 F1 Fz 1.2 AF8 F6 F4 F2 F8 FT9 1 FT10 T7 C5 C1 C3 FT8 FC2 FC4 FC6 Cz C2 C6 C4 Activation FT7 FC5 FC3 FC1 T8 CP3 CP1 CPz CP2 CP4 CP6 TP7 CP5 TP8 TP9 TP10 P7 P5 PO7 PO9 P3 P1 PO3 O1 Pz POz Oz P2 P4 PO4 O2 P6 P3 Pz C3 FC4 CP2 0.8 0.6 P8 PO8 0.4 PO10 0.2 0 0 100 200 300 Time (ms) 400 500 600 Nonparametric surface priors I So far we’ve been using radial basis functions to represent sources. I Can we construct a more flexible surface representation? I Bayesian nonparametric approach: place a prior over surfaces that allows a wide range of shapes, but prefers smooth ones. I Specifically, we use Gaussian processes (GPs) as a surface prior. Gaussian processes I Definition: collection of random variables, any finite subset of which are jointly Gaussian-distributed. I Draws from a GP are functions (e.g., over space or time). I Parameters of the GP specify the smoothness properties of random functions. from Rasmussen and Williams (2006) Gaussian process TLSA I We use the GP prior to model spatial smoothness in the residuals—i.e., after subtracting the parametric TLSA predictions from the image. This allows us to represent sources that are both anatomically localized and irregularly shaped. I We’ve applied this to retinotopic orientation maps in V1 (data courtesy of Jeremy Freeman at NYU). GP-TLSA: example source Slice 1 Slice 2 Slice 3 Top: Radial basis function source Bottom: Gaussian process source GP-TLSA: predicted retinotopy map Each circle represents a datapoint. Gaussian process predictions are plotted at a finer resolution. Quantitative assessment I Reconstruction: Given a set of covariates, how well can we predict the neural data? I Decoding: Given neural data, how well can we predict the covariates? I All analyses used separate training and test sets (i.e., cross-validation). GP-TLSA: reconstruction results on orientation maps 5 Reconstruction error 2 x 10 1.5 1 0.5 0 MAP GP GNB MAP: maximum a posteriori fitting of TLSA with RBF sources GP-TLSA: TLSA with GP sources GNB: Gaussian naive Bayes GP-TLSA: classification results on orientation maps 7000 Cross−entropy error 6000 5000 4000 3000 2000 1000 0 MAP GP GNB Conclusions I TLSA was designed to capture several ideas about the statistical structure of fMRI data. Along the way, we can perform many tasks: classification, regression, reconstruction, hypothesis testing. I Beta process (spike and slab) prior allows us to automatically infer how many sources to use. I Extension to EEG: spatiotemporal sources. I Gaussian process priors for flexible surface models. Thanks! I Ken Norman (Princeton) I David Blei (Princeton) I Minqi Jiang (Princeton) I Abu Saparov (Princeton) I Per Sederberg (Ohio State) I Jeremy Freeman (NYU) Funding: NSF CRCNS grant IIS-1009542