Spike Sorting I: Bijan Pesaran New York University Acknowledgements • Ken Harris and Samar Mehta at Neuroinformatics course Woods Hole. Aims We would like to … • Monitor the activity of large numbers of neurons simultaneously • Know which neuron fired when • Know which neuron is of which type • Estimate our errors Voltage (A/D Levels) THE PROBLEM: Multiple Neural Signals 200 0 -200 -400 0 1 2 3 4 5 6 7 8 9 10 Time (sec) Voltage (A/D Levels) 300 200 100 0 -100 -200 -300 -400 4.64 4.66 4.68 4.7 4.72 4.74 4.76 Time (sec) 200 0 -200 -400 3 msec Primate retinal ganglion cells, courtesy of the lab of Dr. E.J. Chichilnisky THE GOAL: Spike Times of Single Neurons Region from previous slide 200 0 Raw Data -200 -400 Spike Detector Neuron #1 Spikes Neuron #2 Spikes 4.5 4.55 4.6 4.65 4.7 4.75 Time (sec) 4.8 4.85 4.9 4.95 5 THE ‘GRADUATE STUDENT’ ALGORITHM Voltage (A/D Levels) Raw Data 300 200 100 0 -100 -200 -300 Threshold detector at 32 -400 4.5 4.55 4.6 4.65 4.7 4.75 4.8 4.85 4.9 4.95 5 Time (sec) Interspike Interval Histogram 250 Candidate Waveforms 200 0.6 100 0 -100 -200 -300 -400 -500 # of Intervals Spike Height vs. Width Plot 200 Width (msec) Voltage (A/D Levels) 300 200 150 100 150 50 0 100 1 2 3 0.4 50 0 0 0.2 10 20 30 Time (msec) -600 0.2 0.4 0.6 0.8 1.0 1.2 1.4 4 1.6 Time (msec) 0 -700 -600 -500 -400 -300 Height (A/D Levels) -200 -100 40 50 A GENERAL FRAMEWORK Locate Spikes Preprocess Waveforms Density Estimation Spike Classification Quality Measures Extracellular Recording Hardware • You can buy two types of hardware, allowing • Wide-band continuous recordings • Filtered, spike-triggered recordings The Tetrode • Four microwires twisted into a bundle • Different neurons will have different amplitudes on the four wires Raw Data Spikes High Pass Filtering • Local field potential is primarily at low frequencies. • Spikes are at higher frequencies. • So use a high pass filter. 800hz cutoff is good. Filtered Data Cell 1 Cell 2 Spike Detection • Locate spikes at times of maximum extracellular negativity • Exact alignment is important: is it on peak of largest channel or summed channels? Data Reduction • We now have a waveform for each spike, for each channel. • Still too much information! • Before assigning individual spikes to cells, we must reduce further. Principal Component Analysis • Create “feature vector” for each spike. • Typically takes first 3 PCs for each channel. • Do you use canonical principal components, or new ones for each file? “Feature Space” Cluster Cutting • Which spikes belong to which neuron? • Assume a single cluster of spikes in feature space corresponds to a single cell • Automatic or manual clustering? Cluster Cutting Methods • Purely manual – time consuming, leads to high error rates. • Purely automatic – untrustworthy. • Hybrid – less time consuming, lowest error rates. Semi-automatic Clustering How Do You Know It Works? • We can split waveforms into clusters, but are we sure they correspond to single cells? • Simultaneous intra- and extra-cellular recordings allow us to estimate errors. • Quality measures allow us to guess errors even without simultaneous intracellular recording. Intra-extra Recording • Simultaneous recording with a wire tetrode and glass micropipette. Intra-extra Recording Extracellular waveform is almost minus derivative of intracellular Model Bizarre Extracellular Waveshapes Experiment Two Types of Error • Type I error (false positive) – Incorrect inclusion of noise, or spikes of other cells • Type II error (false negative) – Omission of true spikes from cluster • Which is worse? Depends on application… Manual Clustering Contest Best Ellipsoid Error Rates Find ellipsoid that minimizes weighted sum of Type I and Type II errors. Must evaluate using cross-validation! Humans vs. B.E.E.R. Waveshape Helps Separation Why were human errors higher? • To understand this, try to understand why clusters have the shape they do • Simplest possibility: spike waveform is constant, cluster spread comes from background noise • Are clusters multivariate normal? Problem: Overlapping Spikes Problem: Cellular Synchrony Problem: Bursting Problem: Misalignment • When you have a spike whose peak occurs at different times on different channels, it can align on either. • This causes the cluster to be split in two. Problem: Dimensionality Manual clustering only uses 2 dimensions at a time BEER measure can use all of them “Semi-Automatic” Clustering •Uses all dimensions at once •Errors should be lower •Still requires human input Semi-automatic Performance klustakwik.sourceforge.net Software: KlustaKwik • Mixture of Gaussians, unconstrained covariance matrices • Speed is crucial • CEM Algorithm – faster than EM • Most probabilities not calculated • Local maxima result in over- and underclustering • Split and merge features to tunnel out of local maxima • Still requires supercomputer resources. klusters.sourceforge.net Software: Klusters Waveforms Timecourse Auto/Cross correlograms Grouping Assistant Recluster Feature Ergonomic Design Cluster Quality Measures • Would like to automatically detect which cells are well isolated. • BEER measure needs intracellular data, which we don’t have in general. • Will define two measures that only use extracellular data. Isolation Distance Size of ellipsoid within which as many spikes belong to our cluster as not L_ratio Lratio 1 cdf noise 2 Ncluster False Positives and Negatives Which Measure to Use? • Isolation distance correlates with false positive error rates – Measures distance to other clusters • L_ratio correlates with false negative error rates – Measures number of spikes near cluster boundary Conclusions • Automatic clustering will save time and reduce errors. • Errors can be as low as ~5%. • Quality measures give you a feeling of how bad your errors are. Room for Improvement • Make it faster Easy • Improved spike detection and alignment • Quality measures that estimate % error • Fully automatic sorting • Resolve overlapping spikes Hard