Moving in time: Bayesian causal inference explains movement coordination to auditory beats Mark T. Elliott*, Alan M. Wing, Andrew E. Welchman School of Psychology, University of Birmingham, Edgbaston, UK, B15 2TT *Corresponding Author: Email: m.t.elliott@bham.ac.uk; Tel: 0121 414 7260; Fax: 0121 414 4897 Supplementary Information A: Bayesian Inference Model Derivation and Simulation Description In the experiment, participants were asked to synchronise to auditory metronomic cues. Cues were formed of two independent metronomes (A and B) with equal underlying tempo, but with B offset in phase () relative to A. The metronomes also varied in temporal reliability, by perturbing each onset by a random value sampled from a distribution with standard deviation jittA and jittB respectively. We model this task using an observer who must synchronise their movements to the rhythmic auditory cues presented to them. From the observer’s point of view, each set of cues consists of two discrete tones of different pitch (sA and sB). The observer must estimate the onset of the underlying beat that is formed by the auditory cues and in the context of the previous cue onsets to allow them to make movements in synchrony with those beats. They do this using a causal inference process based on (i) the likelihood of the onsets of the two auditory cues, whose true onset times are corrupted by sensory noise and (ii) the prior expectation of where the beat will occur, based on the previous beat onset estimate. The causal inference process allows the observer to determine if the two auditory cues should form a single common beat and hence combine the likelihood of the two beats with the prior to obtain the estimated onset time of that beat (ŝ). Alternatively, if the causal inference process indicates that the two auditory cues are in fact independent, then two beat onset times are estimated (ŝA, ŝB) based on the prior and likelihood of each independent cue onset. Page 1 of 12 Here, we formally derive the causal inference model (CI) we used to fit to the experimental data. Subsequently, we show the alterations made to this model to derive the alternative models we tested (Causal Inference with phase-offset adaptation (CIPA); Mandatory Integration (MI); Mandatory Separation (MS)). Generative model We assume there are two scenarios: one where the observer registers a single common beat (C=1) or alternatively, two independent beats (C=2). C is determined by drawing from a binomial distribution, with p(C=1) = psingle, where psingle is the prior probability of the auditory cues forming a single beat [1]. For a single common beat we sample the onset of the beat, s, from a prior distribution, N(µp, p), where µp is based on the estimated onset time of the previous beat. We then set sA=s and sB=s. For independent beats, we sample two beat onsets, sA, sB from the same prior distribution. We assume the observer’s estimated onset times of the two auditory signals are corrupted by Gaussian noise. The estimated onset times tA, tB by the observer are therefore sampled from Gaussian distributions: N(sA, A) and N(sB, B) respectively. Time Referencing With a stream of isochronous beats, the mth beat would occur at mT seconds, where T is the interval between beats. For the model, we are only concerned with the relative timing of each beat estimate, so we define a relative time of 0s where each beat of a single isochronous metronome would occur. Time onsets occurring before this, are deemed negative; later events positive. All estimates of sA, sB and their derivations are centred around this relative time frame. The final asynchrony of the simulated observer’s response is also measured relative to this reference. We further use this reference to define our initial condition of the prior probability of where the next beat will occur, by starting with µp = 0. Inference Calculations First we determine the probability of a single common beat based on the estimated onset times tA, tB of the signals: Page 2 of 12 p(t A , t B | C 1) p(C 1) p(t A , t B ) p(C 1 | t A,t B ) (S1) As there are only two possibilities: A single beat (C=1) or two independent beats (C=2), and the overall probability must sum to one, then the denominator can be expanded to: p(C 1 | t A,t B ) p(t A , t B | C 1) psin gle (S2) p(t A , t B | C 1) psin gle p(t A , t B | C 2)(1 psin gle ) Therefore, we need to calculate the likelihood function, p(tA,tB|C=1), which is given by: pt A , t b | C 1 p(t A , t B | s ) p( s ) ds (S3) p (t A | s ) p (t B | s ) p ( s ) ds (S4) All three terms in the integral are Gaussians and hence this can be solved analytically: p(t A , t B | C 1) 1 t B t A 2 p2 t B p 2 A2 t A p 2 B2 exp (S5) A2 B2 B2 p2 A2 p2 2 A2 B2 B2 p2 A2 p2 2 1 For the condition where the signals are treated independently (C=2), we get: pt A , tb | C 2 p(t A | s A ) p(t B | s B ) p( s A , s B ) ds A ds B (S6) p(t A | s A ) p( s A ) ds A p(t B | s B ) p( s B ) ds B (S7) Similarly to S5, this can be solved analytically to get: p(t A , t B | C 2) 1 t A p 2 t B p 2 exp 2 2 2 B p2 2 A2 p2 B2 p2 2 A p 1 (S8) From equation S2, we consider the cues to form a single beat when p(C 1 | t A,t B ) 0.5 . Page 3 of 12 Optimal estimate of signal onset Once the observer has inferred if the signals form a single beat or not, they must then estimate the onset of the beat or beats, based on the inference made. The optimal Bayesian estimate is defined generally as: ˆ(m) arg min C ˆ p( | m) d ˆ (S9) Where: C( ˆ -) is the cost function of the error between the estimate of the signal and the signal itself, m are the parameters describing the signal and p( |m) is the posterior probability of the signal given m. Here, we aim to minimise the cost function that is the squared error between the signal and the estimate. Hence, the estimate of the signals is: 2 ŝ j,C=1 = argmin éê ò ( ŝ j - s j ) p(s j | t A , tB ,C =1)ds j ùú, when C=1 and where j=A or B. (S10) ë û ŝ j Similarly, 2 ŝ j,C=2 = argmin éê ò ( ŝ j - s j ) p(s j | t A , tB ,C = 2)ds j ùú, when C=2. û ŝ j ë (S11) This results in the equivalent of calculating the mean of the posterior and, in this case where the posterior is Gaussian, is also equivalent to calculating the maximum a-posteriori (MAP) estimate. For the condition when C=1, the MAP estimate can be calculated analytically, linearly weighting the cues according to their variances: sˆ j ,C 1 sˆ A,C 1 sˆB ,C 1 t A A2 tb B2 p p2 A2 B2 p2 j A or B (S12) For independent beats (C=2), the estimates of tA and tB are deemed independent but are combined with the prior expectation of where the current beat should occur: Page 4 of 12 sˆ j ,C 2 t j j 2 p p2 j 2 p2 j A or B . , (S13) Hence, our overall estimate of the current beat(s) ŝj can be described by: sˆ j ,C 1 , sˆ j sˆ j ,C 2 , p(C 1 | t A , t B ) 0.5 p(C 1 | t A , t B ) 0.5 , j = A or B. (S14) Alternative model definitions Adapting to the phase offset (CIPA) In the experiment, we used a fixed phase offset, ϕ, to separate the onset times of the metronome cues in addition to any deviations created by adding jitter. We therefore questioned whether participants would adapt to this phase offset and hence recalibrate their judgement of the level of deviation required between cues before they were considered separate beats. To test this we modified the model, such that the observer has knowledge of the consistent phase offset between the cues and subsequently ignores that in their inference of whether the cues are deemed a single beat, or independent beats. This results in substracting from tB in equation S5 to get: 1 t B t A 2 p2 t B p 2 A2 t A p 2 B2 p(t A , t B | C 1) exp A2 B2 B2 p2 A2 p2 2 A2 B2 B2 p2 A2 p2 2 1 (S15) Similarly, we do the same to equation S8: 1 t A p 2 t B p 2 p(t A , t B | C 2) exp 2 2 B2 p2 2 A2 p2 B2 p2 2 A p 1 (S16) The causal inference is now based on the deviation of the signals after taking into account any constant phase offset between the signals. The estimated beat onset calculations (S9-S14) Page 5 of 12 do not change as they remain based on the actual sensory registrations tA and tB and the prior, µp. Mandatory Integration (MI) / Mandatory Separation (MS) We further tested the causal inference model against the simpler models of mandatory integration (MI), where all cues are integrated into a single beat estimate, regardless of statistics, and mandatory separation (MS), where all cues are treated as independent beats. To test these models, we fixed the previously free parameter, psingle, to be zero for MS and one for MI. This forces p (C 1 | t A,t B ) (see S2) to zero or one respectively, such that for MS all estimates are calculated using equation S13 only and for MI, all estimates are calculated using equation S12 only. Simulation of experimental conditions It was not possible to analytically calculate the distributions of ŝ due to the non-linearity created by the model selection process shown in S14 [1]. Instead, simulations of the experimental conditions we tested were generated based on an observer using each of the models described above to estimate the signal onsets. Additionally, we added motor noise and an anticipation effect (a negative motor delay) to calculate the observer’s asynchrony to the actual beat, which allowed a direct comparison of the simulated results to the empirical asynchrony distributions. We generated 2,000 simulated signal pairs (sA and sB) for each of the experimental conditions. The signals were separated by phase offsets of 0, 50, 100 or 150 ms, with sA always occurring at 0, and sB at the offset time. The observer estimated the onset times of the underlying signals sA and sB, as tA and tB, respectively. The uncertainty in the estimates A and B were defined by the temporal jitter applied (as in the experiment) and the observer’s sensory noise: 2 2 A sens 2jittA , B sens 2jittB (S17) We used a fixed sensory registration noise value sens = 17 ms, estimated from the data of a previous study where participants completed a similar task, synchronising movements to Page 6 of 12 multisensory metronomes [2]. The values of temporal jitter, were set to the same as the experimental conditions: {0,0 ms}, {10,50 ms} and {50,10 ms}, for jittA and jittB, respectively. Hence, there were 12 simulations per participant for each model (four phaseoffsets and three jitter conditions). Each simulation step produced an estimate of a single common beat onset (C=1) or two independent beat onsets (C=2). In the latter scenario, the observer must choose which beat they will target their movement to, either ŝA or ŝB. From the experimental data, we found that when the signals were equally reliable (jitter {0,0 ms}), participants did not target A or B equally (as would be expected statistically), but in fact tended to be biased towards A over B, or vice-versa. We therefore added a free parameter, , (fitted only to the condition jitter: {0,0 ms} and phase offset: 150 ms). determined the bias towards metronome A and was subsequently used to split the simulation sample, such that a proportion of the 2000 samples selected ŝA, the remainder ŝB, when the signals were deemed independent. In practice, participants plan their finger tap to coincide with the next beat onset, based on their knowledge of the current beat. This results in a commonly observed ‘anticipation effect’ of a negative asynchrony between the tap and the beat onset [3]. In the simulation we added motor noise, sampled from a Gaussian distribution with mean zero and standard deviation equal to a participant's estimated motor variance ( M2 ; see Supplementary Information-B) and fitted a negative asynchrony offset (d; free parameter) to the current beat estimate, ŝ. This simplification to the model did not affect the validity of the resulting model output relative to the empirical data, as we were interested in the overall distribution of asynchronies. Hence, it does not matter if we align the finger movement to the current or next beat, as long as the underlying cue estimates are valid. Finally, the estimated onset of the beat, ŝ, was carried over to the next simulation step and used to update the prior, such that µp(m) = ŝ(m-1), where m is the mth simulation step. This represents the observer updating their ‘timekeeper’ on each finger tap [4,5], to maintain an expectation of when the next beat will occur. We defined the initial conditions of the prior to be µp = sA for the portion of the simulation where the observer is set to target sA in the C=2 scenarios, and µp = sB for the remaining portion, which coincides with the observer targeting sB in the C=2 scenarios. Page 7 of 12 Note: a dataset of simulated asynchronies are available, along with corresponding experimentally measured asynchronies, for each condition. See doi:10.5061/dryad.m5k62. Parameters Below we provide a summary of the parameters used in the model and subsequent simulation. parameter description type C Denotes the two possible scenarios: C=1, the signals define a single common beat; C=2, the signals define Model outcome two independent beats. psingle Prior probability of signals defining a common single beat (p(C=1)). Free parameter s A, s B Underlying beat onset times of signal A and B. Experimentally defined ŝ The observer’s final estimate of the underlying beat. For (C=2), the observer estimates ŝA and ŝB, but targets only ŝA or ŝB to get ŝ. Model outcome µp Mean of the prior expectation of current beat onset time. µp(m) = ŝ(m-1), where m is the mth simulation step. Model outcome p Uncertainty in the prior expectation of the current beat onset time. Free parameter t A, t B Observer’s sensory registration of signals A and B. Model outcome A, B Uncertainty in the sensory estimates of the signals A and B. Made up of jittA/B and sens Experimentally defined jittA, jittB Standard deviations of distributions from which experimentally manipulated temporal jitter is sampled from. Experimentally defined Page 8 of 12 S The observer’s sensory noise. Experimentally defined The consistent phase offset between signals, which the observer adapts to in the CIPA model. Experimentally defined The proportion of simulation samples to which the observer chooses ŝA for their estimate of ŝ when the signals are deemed independent (C=2). Free parameter M2 Motor variance due to producing the finger tap movement Experimentally defined (participant specific) d Negative delay time. Represents the anticipation effect resulting in a negative mean asynchrony in finger tapping tasks. Free parameter Table S1: Summary table of all parameters used in the simulation of an observer using Bayesian causal inference to synchronise their movements to rhythmic timing cues. Free parameter values We fitted three parameters to each participant's data for each condition. In addition a fourth parameter, β, was fitted to each participant’s experimental data resulting from the condition in which phase-offset was 150 ms and the jitter was {0, 0 ms}. Based on the fitted values, we generated the distributions of timing errors (after adding motor noise) and thereafter were able to compare the simulated results to the experimental results. Below we show the mean fitted values of the CIPA model (which showed the best fit to participants’ data) across participants. We checked that each of the free parameters contributed independently to describing the model by testing for correlations between the values. We found no significant correlation between the parameters amongst participants and conditions (Table S3), confirming each parameter contributed to the model fit. Page 9 of 12 jitter (A, B; ms) free parameter d (ms) psingle p (ms) offset (ms) 0,0 10,50 50,10 0 40 49 56 50 55 40 64 100 59 31 54 150 35 40 40 0 0.43 0.59 0.53 50 0.46 0.51 0.59 100 0.31 0.37 0.25 150 0.11 0.18 0.20 0 188 233 85 50 232 217 101 100 147 138 123 150 109 153 68 150 0.60 Table S2. Free parameter values (means across participants, fitted to the CIPA Model) free parameter d psingle psingle .15 p -.04 .14 * .18 .56 p -.26 Table S3: Correlation coefficients between free parameters (from CIPA model), collapsed across participants and conditions. All values showed a significance of p>.05. *Bias to metronome A is only measured to parameters in condition: Offset = 150 ms, Jitter = {0,0 ms}. Page 10 of 12 Supplementary Information B: Procedure for estimating motor variance The causal inference model we have developed is primarily concerned with the processes of sensory estimation. However, the production of synchronised finger taps involves additional stages of processing that each introduce variability, in terms of 'timekeeper' and motor noise [4,5]. We conducted measurements on each participant in order to estimate the contributions from these additional sources of variability. To this end, participants completed a synchronisation-continuation tapping task at the beginning and end of the experiment. Participants tapped in synchrony to a metronome (isochronous, period 500 ms) presented for five beats and then continued to tap their finger at the same tempo for another 60 seconds without the metronome. To analyse these data, the inter-response intervals (IRIs) between each tap onset were calculated. Autocovariance of the tapping responses at lag 0 and lag 1 were measured along the sequence using a sliding window technique (window size = 40 intervals sliding in steps of 5 intervals). Based on the Wing-Kristofferson model [6], we thereby estimated the motor variance ( M2 ) and the timekeeper variance ( T2 ): g I (0) = s T2 + 2s M2 (S18) g I (1) = -s M2 (S19) Where g I (k) is the lag k autocovariance of the IRIs. We calculated the median of the timekeeper and motor variances calculated within each window, and then took the mean of these values across repetitions of these trials for each participant. Given that timekeeper and motor variance are independent of variability associated with sensory processing of the stimuli, we simplify our treatment by combining these two (statistically independent) terms into a single term that we refer to as the motor variance. Page 11 of 12 References 1. Körding, K. P., Beierholm, U., Ma, W. J., Quartz, S., Tenenbaum, J. B. & Shams, L. 2007 Causal inference in multisensory perception. PLoS ONE 2, e943. (doi:PMC1978520) 2. Elliott, M. T., Wing, A. M. & Welchman, A. E. 2010 Multisensory cues improve sensorimotor synchronisation. Eur. J. Neurosci. 31, 1828–1835. (doi:10.1111/j.14609568.2010.07205.x) 3. Aschersleben, G. & Prinz, W. 1995 Synchronizing actions with events: the role of sensory information. Percept. Psychophys. 57, 305–317. 4. Vorberg, D. & Wing, A. M. 1996 Modeling variability and dependence in timing. In Handbook of perception and action, pp. 181–262. London: Academic Press. 5. Vorberg, D. & Schulze, H. H. 2002 Linear Phase-Correction in Synchronization: Predictions, Parameter Estimation, and Simulations. J. Math. Psychol. 46, 56–87. 6. Wing, A. M. & Kristofferson, A. B. 1973 Response delays and the timing of discrete motor responses. Percept. Psychophys. 14, 5–12. Page 12 of 12