Trial to trial variability of monkey spiking data Project in Introduction to Computational Neuroscience Final Report Marek Oja and Andre Tättar Supervisor: Kristjan Korjus and Raul Vicente Introduction In this small project we are studying neurons trial-to-trial variability. When the same stimulus is used then the neuron response is different in every trial. According to recent studies trial-to-trial variability of spiking activity characteristic of cortical neurons could be a source of information about the state of neurons and their participation in behavioural tasks [1]. The variability drops rapidly with the onset of stimuli and after that it declined more slowly [1,2]. These experiments were carried out with monkeys performing motor preparation and motion discrimination tasks [1,2]. In our project we have data recorder from monkeys brain while performing memory task which requires short-term memory. The variability of data is described with different measures and we use coefficient of variation [3]. The aim of this project is to calculate the coefficient of variation, to cluster the neurons on some similarity measure, interpret the results and see if we can see similar change in variability before stimuli as in other experiments. Methods Coefficient of variation The coefficient of variation (CV), also known as “relative variability”, equals the standard deviation divided by the mean [5]. The CV for a single variable aims to describe the dispersion of the variable in a way that does not depend on the variable's measurement unit. The higher the CV, the greater the dispersion in the variable. The CV for a model describes the model fit in terms of the relative sizes of the squared residuals and outcome values. The lower the CV, the smaller the residuals relative to the predicted value. This is suggestive of a good model fit. We use CV because it is good for comparing variation between datasets, where the means are considerably different from each other. We can use and make sense of CV because amount of spikes is always positive, CV cannot be used with negative sets. A disadvantage of CV is that when mean value is close to zero, CV will approach infinity and is therefore sensitive to small changes in the mean. The monkey memory test explanation The experiment was performed on a female rhesus monkey (Macaca mulatta) and microelectrodes were placed in the monke’s prefrontal cortex [4]. The experiment is illustrated on figure 1. First monkey is shown sample picture for one second and after that there is three second delay before test picture is shown. Test picture can be the same or different that sample picture. Now monkey has to answer if the picture is the same as sample picture or not. The picture showing, monkey response and reward given times are also recorded. Figure 1. The experiment to study monkey’s short-term memory. [4 and references therein] The dataset description The used dataset was experimentally collected by (and belongs to) professor Matthias Munk at the Max Planck Institute for Brain Research in Germany [4 and references therein]. The signals were recorded from 58 neurons and experiment was performed 871 times during one day. One trial lasted for 6.5 seconds during which time electrodes recorded when neurons fired. The recordings were made in every 1 ms and we have 6500 data points. Also there are recorded the trial numbers when monkey answered correctly and incorrectly. In this project we use only data for which the monkey answered correctly. The monkey answered correctly 615 times. In the end we have 3D matrix with dimensions 58 times 871 times 6500. Sample data is displayed on figure 2. Neuron Rasterplot of spiking in trial 1 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1000 2000 3000 4000 5000 6000 Time (ms) Figure 1. Rasterplot of spiking for trial one. On x-axis we have time in milliseconds and on y-axis we have neurons. For every neurons corresponds one line and every blue line marks that the neuron fired at that time point. Between about 1000 ms to 2000 ms (2 first red lines) the sample picture was shown. At time moment about 5000 ms (third red line) the test picture was shown. Fourth red line shows when the monkey answered the question and last red line shows when monkey got the treat. Description of the code To run the code, the file TrialToTrialVariabilityProject.m has to be run in Matlab. In the same folder there has to be the data TrialToTrialVariabilityProject.mat. First we have to change the data, we have the time points, when spikes are occurring, but we want data where there is 1 if spike occurs and 0 if there is no spike. Then we divide the data into 100 ms or 500 ms segments (time windows) and sum the number of spikes in these intervals. After that we find the mean and standard deviation over all trials and calculate the coefficient of variation. In the end is the plotting and clustering part. We also normalize the coefficient of variation using data points between 0 – 1000 m. Depending on the time window we take 1 or more values from the region 0 -1000 ms and find the average of these and we divide other data points with these values. In this way we can see how many times coefficient of variation changes in different parts of the experiment. Clustering In order to see which neurons are acting similarly we tried to cluster the neurons according to their coefficient of variation. Clustering was done using Matlab’s hierarchical clustering functionality [6]. In hierarchical clustering first step is to find similarity or dissimilarity between every pair in the data set. The distance between pairs in the data set can be calculated using different similarity measures, e.g. Euclidean distance, cosine (one minus the cosine of the angle between points (the data points are treated like angles)), correlation (one minus the correlation between points (the data points are treated like sequence of values)) and others. Important parameter is linkage which describes how the distance between clusters is calculated. In this project we use “average” as this parameter. It means that the distance between clusters is taken as the average of distances between two clusters points. In matlab the distance is calculated using function pdist. In the next step objects are grouped into a binary, hierarchically clustered tree. This is done in Matlab using command linkage. In the last part in the algorithm is to determine where to cut the hierarchical tree into clusters. This is done using cluster function in Matlab. In Matlab all these step are put into one function called clusterdata. For clustering we used Euclidean distance and maximum number of clusters was 15. Results and Discussion On figure 3 and 4 we have plots with coefficient of variation for two different time windows of 100 and 500 ms. It is very difficult to understand what means blue (values around 1) and red (values around 10) colours on the graph. To understand better the coefficient of variation values, we also plotted histograms (figure 5) for neuron 56 in region 5000 – 6000 ms (it is with red colour) and for neuron 15 in region 2000 – 3000 (with blue colour) to see how many spikes we have in these regions in different trials. From these histograms (figure 5) we can see that the blue values correspond more to these neurons which have almost normal distribution but almost no zeroes in this time window (on figure 5 right). Going from smaller coefficient of variation values to larger values this histogram the histogram peaks shifts to the left towards zero (on figure 5 left). When the coefficient of variation is large then there are many trials where in this region there were no spikes present (on figure 5 left). Many zero values cause the mean to be small and standard deviation large and due that also the coefficient variation is large. We also clustered these data using hierarchical clustering method. Results are displayed on the figure 6 and 7. From these plots we can see that there is a big cluster and other smaller clusters but we can’t tell if there is any drop or rise before stimulus in the coefficient of variation. Figure 3. Using trials where monkey answered correctly. The coefficient of variation is plotted using time window of 100 ms. Figure 4. Using trials where monkey answered correctly. The coefficient of variation is plotted using time window of 500 ms. 900 250 800 200 700 Number of spikes Number of spikes 600 500 400 150 100 300 200 50 100 0 0 0 1 2 3 4 5 6 7 Number of spikes in one trial in certain time window 8 0 50 100 Number of spikes in one trial in certain time window 150 Figure 5. Histograms for neuron 56 in region 5000 – 6000 ms (on left) and for neuron 15 in region 2000 – 3000 (on right). Neuron 10 56 34 55 53 52 51 49 48 47 44 43 42 41 40 39 37 36 35 30 29 25 24 23 20 19 18 17 16 15 14 13 11 10 9 6 3 1 57 54 46 38 33 28 27 22 12 7 5 4 26 2 50 8 58 45 31 21 32 9 8 7 6 5 4 3 2 1 0 1000 2000 3000 4000 5000 0 6000 Time (ms) Neuron Figure 6. Using trials where monkey answered correctly. The clustered results are plotted using time window of 100 ms. 6 56 34 26 57 54 46 43 38 33 28 27 22 21 12 7 5 4 45 32 31 58 50 2 8 55 53 52 48 42 41 40 37 36 24 23 19 18 14 13 10 9 3 51 49 47 44 39 35 30 29 25 20 17 16 15 11 6 1 5 4 3 2 1 0 1000 2000 3000 4000 5000 6000 0 Time (ms) Figure 7. Using trials where monkey answered correctly. The clustered results are plotted using time window of 500 ms. To see the change in the neurons variability we normalized the data using data points in region 0 – 1000 ms. The results are displayed on figure 8 (for 100 ms time window) and figure 9 (for 500 ms time window). From these data we see that the variability of neurons rises just before the test picture and stays high until the reward has come (figures 8 and 9). In previous studies it was shown that the variability drops just before stimulus [1,2]. We also clustered these results (figures 10, 11 Neuron and 12). On figure 10 is dendrogram presented for 100 ms time window. From this dendogram we can estimate the number of clusters present in our data. From these clusters we can see that large number of neurons are acting similarly and there is a number of outliers. There are two large sets of neurons. In one set coefficient of variation rises before test picture and stays high until monkey receives reward. In the second set coefficient of variation does not change much when the test picture is shown. On similarity for almost all neurons is that after sample picture is shown the coefficient of variability drops. We did not identify similarly acting neurons as in previous publications [1,2]. 2 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 1000 2000 3000 4000 5000 0 6000 Time (ms) Neuron Figure 8. Using trials where monkey answered correctly (normalized data). The normalized results are plotted using time window of 100 ms. 2 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 1000 2000 3000 4000 5000 6000 0 Time (ms) Figure 9. Using trials where monkey answered correctly (normalized data). The normalized results are plotted using time window of 500 ms. 3.5 3 2.5 2 1.5 1 12 23 18 14 21 9 22 1 15 16 27 28 7 8 2 3 4 19 13 5 20 24 6 10 29 30 17 26 11 25 Neuron Figure 10. Dendrogram for normalized data (time window 100). The maximum number of clusters should be about 8. 2 56 34 26 50 8 58 57 7 5 4 3 2 38 19 33 32 55 52 49 48 43 42 41 40 37 36 24 23 22 21 18 17 15 14 13 12 10 9 51 47 46 45 44 39 35 30 29 28 27 25 20 16 11 6 1 54 53 31 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 1000 2000 3000 4000 5000 6000 0 Time (ms) Figure 11. Using trials where monkey answered correctly. The normalized and clustered results are plotted using time window of 100 ms. Neuron 2 26 34 57 42 5 2 50 55 52 51 47 46 44 39 35 30 29 28 27 25 20 16 11 8 6 1 7 56 38 33 31 58 49 48 43 41 40 37 24 23 22 21 17 15 14 13 12 10 9 4 54 53 45 36 32 19 18 3 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 1000 2000 3000 4000 5000 6000 0 Time (ms) Figure 12. Using trials where monkey answered correctly. The normalized and clustered results are plotted using time window of 500 ms. Conclusion We worked with data measured from a monkey’s brain while performing memory task which tested short term memory. We did data pre-processing, normalization and clustering. We found that neurons are acting differently. For some neurons the coefficient of variation rises when sample picture and test picture are shown. For other neurons the coefficient of variation does not change much during the experiment. We did not identify similarly acting neurons as in previous publications [1,2]. References [1] C. Hussar, T. Pasternak, PNAS, 107(50), 2010, 21842-21847. http://www.pnas.org/lookup/doi/10.1073/pnas.1009956107 [2] M. M. Churchland, B. M. Yu, S.I. Ryu, G. Santhanam, K. V. Shenoy, The Journal of Neuroschience 26(14), 2006 3697-3712. DOI:10.1523/JNEUROSCI.3762-05.2006 [3] http://en.wikipedia.org/wiki/Coefficient_of_variation [4] K. Martšenko, “Using Machine Learning to Analyze Brain Activity During a Short-Term Memory Task”, Bachelor’s Thesis, Tartu, 2014. http://comserv.cs.ut.ee/forms/ati_report/datasheet.php?id=41081&year=2014 [5] http://apacgemba7.wikidot.com/statistics:variance-standard-deviation-and-coefficient [6] http://www.mathworks.se/help/stats/hierarchical-clustering.html