Bayesian binning beats approximate alternatives: estimating peristimulus time histograms ¨´ Dominik Endres, Mike Oram, Johannes Schindelin and Peter Foldiak School of Psychology University of St. Andrews KY16 9JP, UK {dme2,mwo,js108,pf2}@st-andrews.ac.uk Abstract The peristimulus time histogram (PSTH) and its more continuous cousin, the spike density function (SDF) are staples in the analytic toolkit of neurophysiologists. The former is usually obtained by binning spike trains, whereas the standard method for the latter is smoothing with a Gaussian kernel. Selection of a bin width or a kernel size is often done in an relatively arbitrary fashion, even though there have been recent attempts to remedy this situation [1, 2]. We develop an exact Bayesian, generative model approach to estimating PSTHs and demonstate its superiority to competing methods. Further advantages of our scheme include automatic complexity control and error bars on its predictions. 1 Introduction Plotting a peristimulus time histogram (PSTH), or a spike density function (SDF), from spiketrains evoked by and aligned to a stimulus onset is often one of the first steps in the analysis of neurophysiological data. It is an easy way of visualizing certain characteristics of the neural response, such as instantaneous firing rates (or firing probabilities), latencies and response offsets. These measures also implicitly represent a model of the neuron's response as a function of time and are important parts of their functional description. Yet PSTHs are frequently constructed in an unsystematic manner, e.g. the choice of time bin size is driven by result expectations as much as by the data. Recently, there have been more principled approaches to the problem of determining the appropriate temporal resolution [1, 2]. We develop an exact Bayesian solution, apply it to real neural data and demonstrate its superiority to competing methods. Note that we do in no way claim that a PSTH is a complete generative description of spiking neurons. We are merely concerned with inferring that part of the generative process which can be described by a PSTH in a Bayes-optimal way. 2 The model Suppose we wanted to model a PSTH on [tmin , tmax ], which we discretize into T contiguous intervals of duration t = (tmax - tmin )/T (see fig.1, left). We select a discretization fine enough so that we will not observe more than one spike in a t interval for any given spike train. This can be achieved easily by choosing a t shorter than the absolute refractory period of the neuron under investigation. Spike train i can then be represented by a binary vector z i of dimensionality T . We model the PSTH by M + 1 contiguous, non-overlapping bins having inclusive upper boundaries km , within which the firing probability P (spik e|t (tmin + t(km-1 + 1), tmin + t(km + 1)]) = fm is constant. M is the number of bin boundaries inside [tmin , tmax ]. The probability of a spike train 1 × getIEC(m,T1,m) × getIEC(T1,T1,m) tmin i t tmax t subEm1[m1] subEm1[m] subEm1[T2] subEm[T1] z =[1,0,0,1,1,1,0,1,0,1,0,0,1,0] × getIEC(m+1,T1,m) × getIEC(m,T2,m) P(spike|t) f1 f2 subEm1[m1] subEm1[m] subEm[T2] subEm[T1] × getIEC(m+1,T2,m) 0 k0 k1 k2 k3=T1 k m1 m T2 T1 k Figure 1: Left: Top: A spike train, recorded between times tmin and tmax is represented by a binary vector z i . Bottom: The time span between tmin and tmax is discretized into T intervals of duration t = (tmax - tmin )/T , such that interval k lasts from k × t + tmin to (k + 1) × t + tmin . t is chosen such that at most one spike is observed per t interval for any given spike train. Then, we model the firing probabilities P (spik e|t) by M + 1 = 4 contiguous, non-overlapping bins (M is the number of bin boundaries inside the time span [tmin , tmax ]), having inclusive upper boundaries km and P (spik e|t (tmin + t(km-1 + 1), tmin + t(km + 1)]) = fm . Right: The core iteration. To compute the evidence contribution subEm [T - 1] of a model with a bin boundary at T - 1 and m bin boundaries prior to T - 1, we sum over all evidence contributions of models with a bin boundary at k and m - 1 bin boundaries prior to k , where k m - 1, because m bin boundaries must occupy at least time intervals 0; . . . ; m - 1. This takes O(T ) operations. Repeat the procedure to obtain subEm [T - 2]; . . . ; subEm [m]. Since we expect T m, computing all subEm [k ] given subEm-1 [k ] requires O(T 2 ) operations. For details, see text. z i of independent spikes/gaps is then P (z i |{fm }, {km }, M ) = M m =0 s( fm z i ,m) (1 - fm )g(z i ,m) (1) where s(z i , m) is the number of spikes and g (z i , m) is the number of non-spikes, or gaps in spiketrain z i in bin m, i.e. between intervals km-1 + 1 and km (both inclusive). In other words, we model the spiketrains by an inhomogeneous Bernoulli process with piecewise constant probabilities. We also define k-1 = -1 and kM = T - 1. Note that there is no binomial factor associated with the contribution of each bin, because we do not want to ignore the spike timing information within the bins, but rather, we try to build a simplified generative model of the spike train. Therefore, the probability of a (multi)set of spiketrains {z i } = {z1 , . . . , zN }, assuming independent generation, is P ({z i }|{fm }, {km }, M ) = M iN m =1 =0 s( fm {z i s( fm z i ,m) (1 - fm )g(z i ,m) = where s({z i }, m) = 2.1 The priors N M m =0 },m) (1 - fm )g({z i },m) (2) i=1 s(z i , m) and g ({z i }, m) = N i=1 g (z i , m ) We will make a non-informative prior assumption for p({fm }, {km }), namely p({fm }, {km }|M ) = p({fm }|M )P ({km }|M ). 2 (3) i.e. we have no a priori preferences for the firing rates based on the bin boundary positions. Note that the prior of the fm , being continuous model parameters, is a density. Given the form of eqn.(1) and the constraint fm [0, 1], it is natural to choose a conjugate prior p({fm }|M ) = M m =0 B(fm ; m , m ). (4) The Beta density is defined in the usual way [3]: B(p; , ) = ( + ) p (1 - p) . ( )( ) (5) There are only finitely many configurations of the km . Assuming we have no preferences for any of them, the prior for the bin boundaries becomes 1 T . P ({km }|M ) = (6) -1 M where the denominator is just the number of possibilities in which M ordered bin boundaries can be distributed across T - 1 places (bin boundary M always occupies position T - 1, see fig.1,left , hence there are only T - 1 positions left). 3 Computing the evidence P ({z i }|M ) To calculate quantities of interest for a given M , e.g. predicted firing probabilities and their variances or expected bin boundary positions, we need to compute averages over the posterior p({fm }, {km }|M , {z i }) = p({z i }, {fm }, {km }|M ) P ({z i }|M ) k1 -1 k 0 =0 (7) which requires the evaluation of the evidence, or marginal likelihood of a model with M bins: T -2 kM -1 -1 P ({z }|M ) = i k M -1 =M -1 k M -2 =M -2 ... P ({z i }|{km }, M )P ({km }|M ) (8) where the summation boundaries are chosen such that the bins are non-overlapping and contiguous and 1 1 1 i P ({z }|{km }, M ) = df0 df1 . . . dfM P ({z i }|{fm }, {km }, M )p({fm }|M ). (9) 0 0 0 By virtue of eqn.(2) and eqn.(4), the integrals can be evaluated: P ({z i }|{km }, M ) = M M m (s({z i }, m) + m )(g ({z i }, m) + m ) m (m + m ) . (s({z i }, m) + m + g ({z i }, m) + m ) =0 (m )(m ) =0 (10) Computing the sums in eqn.(8) quickly is a little tricky. A na¨ve approach would suggest that a i computational effort of O(T M ) is required. However, because eqn.(10) is a product with one factor per bin, and because each factor depends only on spike/gap counts and prior parameters in that bin, the process can be expedited. We will use an approach very similar to that described in [4, 5] in the context of density estimation and in [6] for Bayesian function approximation: define the function getIEC(ks , ke , m) := (s({z i }, ks , ke ) + m )(g ({z i }, ks , ke ) + m ) (s({z i }, ks , ke ) + m + g ({z i }, ks , ke ) + m ) (11) where s({z i }, ks , ke ) is the number of spikes and g ({z i }, ks , ke ) is the number of gaps in {z i } between the start interval ks and the end interval ke (both included). Furthermore, collect all contributions to eqn.(8) that do not depend on the data (i.e. {z i }) and store them in the array prior[M ]: M (m +m ) prior[M ] := m=0 (m )(m ) T -1 M . (12) 3 Substituting eqn.(10) into eqn.(8) and using the definitions (11) and (12), we obtain T -2 P ({z }|M ) i = prior[M ] × kM -1 -1 k M -1 =M -1 getIEC(kM -1 + 1, T - 1, M ) × k1 -1 M -2 k m 0 =0 × k M -2 =M -2 ... getIEC(km + 1, km+1 , m)getIEC(0, k0 , 0) = :subEM -1 [kM -1 ] =0 T -2 = prior[M ] × k M -1 =M -1 getIEC(kM -1 + 1, T - 1, M ) subEM -1 [kM -1 ](13) i.e. subEm [k ] contains (up to a factor prior[m]) the evidence for a model with a bin boundary at k and m bin boundaries prior to k . Thus, if we had already computed the contents of the array subEM [k ], we could evaluate P ({z i }|M ) in O(T ) operations (assuming that T M , which is likely to be the case in real-world applications). But calculating one of the O(T ) entries in this array requires also O(T ) operations, if subEM -1 [k ] were available ­ i.e. going from M - 1 to M involves O(T 2 ) operations (see fig.1, right). After initializing subE0 [k ] := getIEC(0, k , 0), P ({z }|M ) can be computed in O(M T ) operations, rather than O(T i 2 M (14) ). A close look at eqn.(13) reveals that while we iterate from M - 1 to M , we need subEM [k ] for k = M - 1; . . . ; T - 2 to compute the evidence of a model with its latest boundary at T - 1. We can, however, compute subEM -1 [T - 1] with little extra effort, which is, up to a factor prior[M - 1], equal to P ({z i }|M - 1), i.e. the evidence for a model with M - 1 bin boundaries. Moreover, having computed subEm [k ], we do not need subEm-1 [k - 1] anymore. Hence, the array subEm-1 [k ] can be reused to store subEm [k ], if overwritten in reverse order. In pseudo-code (E[m] contains the evidence of a model with m bin boundaries inside [tmin , tmax ] after termination): Table 1: Computing the evidences of models with up to M bin boundaries 1. for k := 0 . . . T - 1 : subE[k ] := getIEC(0, k , 0) 2. E[0] := subE[T - 1] × prior[0] 3. for m := 1 . . . M : (a) if m = M then l := T - 1 else l := m (b) for k := T - 1 . . . l k-1 subE[k ] := r:=m-1 subE[r] × getIEC(r + 1, k , m) (c) E[m] = subE[T - 1] × prior[m] 4. return E[] 4 Predictive firing rates and variances ~ We will now calculate the predictive firing rate P (spik e|k , {z i }, M ). For a given configuration of {fm } and {km }, we can write ~ P (spik e|k , {fm }, {km }, M ) = M m =0 ~ fm 1(k {km-1 + 1, km }) (15) where the indicator function 1(x) = 1 iff x is true and 0 otherwise. Note that the probability of a spike given {km } and {fm } does not depend on any observed data. Since the bins are non~ ~ overlapping, k {km-1 + 1, km } is true for exactly one summand and P (spik e|k , {z i }, {km }) evaluates to the corresponding firing rate. 4 To finish we average eqn.(15) over the posterior eqn.(7). The denominator of eqn.(7) is independent of {fm }, {km } and is obtained by integrating/summing the numerator via the algorithm in table 1. Thus, we only need to multiply the integrand of eqn.(9) (i.e. the numerator of the posterior) with ~ P (spik e|k , {fm }, {km }, M ), thereby replacing eqn.(11) with getIEC(ks , ke , m) := ~ (s({z i }, ks , ke ) + 1(k {ks , ke }) + m )(g ({z i }, ks , ke ) + m ) (16) ~ (s({z i }, ks , ke ) + 1(k {ks , ke }) + m + g ({z i }, ks , ke ) + m ) ~ i.e. we are adding an additional spike to the data at k . Call the array returned by this modified Ek [ M ] ~ algorithm Ek []. By virtue of eqn.(7) we then find P (spik e|k , {z i }, M ) = E~[M ] . To evaluate the ~ 2 ~ variance, we need the posterior expectation of fm . This can be computed by adding two spikes at k . 5 Model selection vs. model averaging To choose the best M given {z i }, or better, a probable range of M s, we need to determine the model posterior P ({z i }|M )P (M ) P (M |{z i }) = m (17) P ({z i }|m)P (m) where P (M ) is the prior over M , which we assume to be uniform. The sum in the denominator runs over all values of m which we choose to include, at most 0 m T - 1. Once P (M |{z i }) is evaluated, we could use it to select the most probable M . However, making this decision means 'contriving' information, namely that all of the posterior probability is concentrated at M . Thus we should rather average any predictions over all possible M , even if evaluating such an average has a computational cost of O(T 3 ), since M T - 1. If the structure of the data allow, it is possible, and useful given a large enough T , to reduce this cost by finding a range of M , such that the risk of excluding a model even though it provides a good description of the data is low. In analogy to the significance levels of orthodox statistics, we shall call this risk . If the posterior of M is unimodal (which it has been in most observed cases, see fig.3, right, for an example), we can then choose the smallest interval of M s around the maximum of P (M |{z i }) such that P (Mmin M Mmax |{z i }) 1 - and carry out the averages over this range of M after renormalizing the model posterior. (18) 6 Examples and comparison to other methods 6.1 Data acquisition We obtained data through [7], where the experimental protocols have been described. Briefly, extracellular single-unit recordings were made using standard techniques from the upper and lower banks of the anterior part of the superior temporal sulcus (STSa) and the inferior temporal cortex (IT) of two monkeys (Macaca mulatta) performing a visual fixation task. Stimuli were presented for 333 ms followed by an 333 ms inter-stimulus interval in random order. The anterior-posterior extent of the recorded cells was from 7mm to 9mm anterior of the interaural plane consistent with previous studies showing visual responses to static images in this region [8, 9, 10, 11]. The recorded cells were located in the upper bank (TAa, TPO), lower bank (TEa, TEm) and fundus (PGa, IPa) of STS and in the anterior areas of TE (AIT of [12]). These areas are rostral to FST and we collectively call them the anterior STS (STSa), see [13] for further discussion. The recorded firing patters were turned into distinct samples, each of which contained the spikes from -300 ms before to 600 ms after the stimulus onset with a temporal resolution of 1 ms. 6.2 Inferring PSTHs To see the method in action, we used it to infer a PSTH from 32 spiketrains recorded from one of the available STSa neurons (see fig.2, A). Spikes times are relative to the stimulus onset. We discretized the interval from -100ms pre-stimulus to 500ms post-stimulus into t = 1ms time intervals and 5 spiketrain number 30 20 10 0 A | | | | | ||| || | | || || || | || | | | | | | || | | | || || | | | | || | ||| | || | | | | | | || | | | | || | || | | | || | | | | | | | || || | | | || | | | | | | || | | | | ||| | || | || || | | || | | | || || | | | | | | | ||| | | | | | | | || | || | ||| || | || | | | || | | | || | | | | | | | || || || | | || || || | | | | | | ||| | | || | || | | || || | | | | || | | | | || ||| | || || | | | | | || || || | | | | | | | || | ||| | || || | | | | | | | || | || || | |||| || | || | || | | | | | | | | | | | | | | | || | || | | | || | | | || | | | | || ||| | | | || | | | | ||| | | | || || | | | | | || | | | | | || ||| | | | || || | | || ||| | | | | | | | ||| || | | | | | | || || | || | | | || | | | | | | | | || | | | | || | || || || | | || | | || | | | | | | | | | | || || | || | | | | | | | | | | || ||| || | || | || | | | | | | | || | || | | || || | || || | | | | | || | | | | || | | || | | || | || || || | | || | | | | | || | | | | | || | | || | | | | | | || | | || | | || || | | | | | | || | | | | || ||| | | || | || |||| | | || | || | | | | | | || | || | | | | | | | | || | | | | | | || || || | | || || | | | || | || | | | | || | | || | || | | || | || | || | ||| | | || | | || | | | | | || | | | | | | | || | || | | | | | | | | | || || || | || | | | | | | | | | | | | | | | || | || | | | | | | | | | || | | || | | | | | | | | | | | | | || | | | | | || | | | | || | | || || | || | | | || | || | || | ||| | | | | | | || | | | | | || | | | | | | | | | | | || || || || | | | | ||| | | | ||| | | |||| | || || | || || || | || || | || | | | | | | | | | | || | || | | || || | || | | | || || | | | ||| | | || || || || ||| | || | | | | || | || || | | || || | | || | | || | || || | | | | | | | | | || | || || | || | | || | || | || | | | || || || || | | || | || | || || | || | || || || | || | | B P(spike) P(spike) P(spike) 0.1 0.05 0 0.1 C 0.05 0 0.1 0.05 0 -100 D 0 100 200 300 400 500 600 time, ms after stimulus onset Figure 2: Predicting a PSTH/SDF with 3 different methods. A: the dataset used in this comparison consisted of 32 spiketrains recorded from a STSa neuron. Each tick mark represents a spike. B: PSTH inferred with our Bayesian binning method. The thick line represents the predictive firing rate (section 4), the thin lines show the predictive firing rate ±1 standard deviation. Models with 4 M 13 were included on a risk level of = 0.1 (see eqn.(18)). C: bar PSTH (solid lines), optimal binsize 26ms, and line PSTH (dashed lines), optimal binsize 78ms, computed by the methods described in [1, 2]. D: SDF obtained by smoothing the spike trains with a 10ms Gaussian kernel. computed the model posterior (eqn.(17)) (see fig.3, right). The prior parameters were equal for all bins and set to m = 1 and m = 32. This choice corresponds to a firing probability of 0.03 in each 1 ms time interval (30 spikes/s), which is typical for the neurons in this study1 . Models with 4 M 13 (expected bin sizes between 23ms-148ms) were included on an = 0.1 risk level (eqn.(18)) in the subsequent calculation of the predictive firing rate (i.e. the expected firing rate, hence the continuous appearance) and standard deviation (fig.2, B). Fig.2, C, shows a bar PSTH and a line PSTH computed with the recently developed methods described in [1, 2]. Roughly speaking, = P ({z }|M )P (M |m , m ), where P ({z }|M ) is given by eqn.(8). Using a uniform P (M |m , m ), we found m 2.3 and m 37 for the data in fig.2, A 1 i MAlternatively, one could search for ithe m , m which maximize of P ({z }|m , m ) i 6 these methods try to optimize a compromise between minimal within-bin variance and maximal between-bin variance. In this example, the bar PSTH consists of 26 bins. Graph D in fig.2 depicts a SDF obtained by smoothing the spiketrains with a 10ms wide Gaussian kernel, which is a standard way of calculating SDFs in the neurophysiological literature. All tested methods produce results which are, upon cursory visual inspection, largely consistent with the spiketrains. However, Bayesian binning is better suited than Gaussian smoothing to model steep changes, such as the transient response starting at 100ms. While the methods from [1, 2] share this advantage, they suffer from two drawbacks: firstly, the bin boundaries are evenly spaced, hence the peak of the transient is later than the scatterplots would suggest. Secondly, because the bin duration is the only parameter of the model, these methods are forced to put many bins even in intervals that are relatively constant, such as the baselines before and after the stimulus-driven response. In contrast, Bayesian binning, being able to put bin boundaries anywhere in the time span of interest, can model the data with less bins ­ the model posterior has its maximum at M = 6 (7 bins), whereas the bar PSTH consists of 26 bins. 6.3 Performance comparison 0.4 0.2 10 ms Gaussian relative frequency 0.4 0.2 0 0.4 0.2 0 0 bar PSTH P(M|{z }) 0 0.1 i 0.05 line PSTH 0 0.005 0.01 0.015 CV error relative to Bayesian Binning 0 10 M 20 30 Figure 3: Left: Comparison of Bayesian Binning with competing methods by 5-fold crossvalidation. The CV error is the negative expected log-probability of the test data. The histograms show relative frequencies of CV error differences between 3 competing methods and our Bayesian binning approach. Gaussian: SDFs obtained by Gaussian smoothing of the spiketrains with a 10 ms kernel. Bar PSTH and line PSTH: PSTHs computed by the binning methods described in [1, 2]. Right: Model posterior P (M |{z i }) (see eqn.(17)) computed from the data shown in fig.2. The shape is fairly typical for model posteriors computed from the neural data used in this paper: a sharp rise at a moderately low M followed by a maximum (here at M = 6) and an approximately exponential decay. Even though a maximum M of 699 would have been possible, P (M > 23|{z i }) < 0.001. Thus, we can accelerate the averaging process for quantities of interest (e.g. the predictive firing rate, section 4) by choosing a moderately small maximum M . For a more rigorous method comparison, we split the data into distinct sets, each of which contained the responses of a cell to a different stimulus. This procedure yielded 336 sets from 20 cells with at least 20 spiketrains per set. We then performed 5-fold crossvalidation, the crossvalidation error is given by the negative logarithm of the data (spike or gap) in the test sets: CV error = - log(P (spik e|t)) . (19) Thus, we measure how well the PSTHs predict the test data. The Gaussian SDFs were discretized into 1 ms time intervals prior to the procedure. We average the CV error over the 5 estimates to obtain a single estimate for each of the 336 neuron/stimulus combinations. On average, the negative log likelihood of our Bayesian approach predicting the test data (0.04556 ± 0.00029, mean ± SEM) was significantly better than any of the other methods (10ms Gaussian kernel: 0.04654 ± 0.00028; Bar PSTH: 0.04739 ± 0.00029; Line PSTH: 0.04658 ± 0.00029). To directly compare the performance of different methods we calculate the difference in the CV error for each neuron/stimulus combination. Here a positive value indicates that Bayesian binning predicts the test data more accurately than the alternative method. Fig.3, left, shows the relative frequencies of CV error differences between the 3 other methods and our approach. Bayesian binning predicted the data better than the three other 7 methods in at least 295/336 cases, with a minimal difference of -0.0008, indicating the general utility of this approach. 7 Summary We have introduced an exact Bayesian binning method for the estimation of PSTHs. Besides treating uncertainty ­ a real problem with small neurophysiological datasets ­ in a principled fashion, it also outperforms competing methods on real neural data. It offers automatic complexity control because the model posterior can be evaluated. While its computational cost is significantly higher than that of the methods we compared it to, it is still fast enough to be useful: evaluating the predictive probability takes less than 1s on a modern PC2 , with a small memory footprint (<10MB for 512 spiketrains). Moreover, our approach can easily be adapted to extract other characteristics of neural responses in a Bayesian way, e.g. response latencies or expected bin boundary positions. Our method reveals a clear and sharp initial response onset, a distinct transition from the transient to the sustained part of the response and a well-defined offset. An extension towards joint PSTHs from simultaneous multi-cell recordings is currently being implemented. References ¨ [1] H. Shimazaki and S. Shinomoto. A recipe for optimizing a time-histogram. In B. Scholkopf, J. Platt, and T. Hoffman, editors, Advances in Neural Information Processing Systems 19, pages 1289­1296. MIT Press, Cambridge, MA, 2007. [2] H. Shimazaki and S. Shinomoto. A method for selecting the bin size of a time histogram. Neural Computation, 19(6):1503­1527, 2007. [3] J.O. Berger. Statistical Decision Theory and Bayesian Analysis. Springer, New York, 1985. ¨´ [4] D. Endres and P. Foldiak. Bayesian bin distribution inference and mutual information. IEEE Transactions on Information Theory, 51(11), 2005. [5] D. Endres. Bayesian and Information-Theoretic Tools for Neuroscience. PhD thesis, School of Psychology, University of St. Andrews, U.K., 2006. http://hdl.handle.net/10023/162. [6] M. Hutter. Bayesian regression of piecewise constant functions. Technical Report arXiv:math/0606315v1, IDSIA-14-05, 2006. [7] M. W. Oram, D. Xiao, B. Dritschel, and K.R. Payne. The temporal precision of neural signals: A unique role for response latency? Philosophical Transactions of the Royal Society, Series B, 357:987­1001, 2002. [8] CJ Bruce, R Desimone, and CG Gross. Visual properties of neurons in a polysensory area in superior temporal sulcus of the macaque. Journal of Neurophysiology, 46:369­384, 1981. [9] DI Perrett, ET Rolls, and W Caan. Visual neurons responsive to faces in the monkey temporal cortex. Expl. Brain. Res., 47:329­342, 1982. [10] G.C. Baylis, E.T. Rolls, and C.M. Leonard. Functional subdivisions of the temporal lobe neocortex. Journal of Neuroscience, 7(2):330­342, 1987. [11] M. W. Oram and D. I. Perrett. Time course of neural responses discriminating different views of the face and head. Journal of Neurophysiology, 68(1):70­84, 1992. [12] K Tanaka, H Saito, Y Fukada, and M Moriya. Coding visual images of objects in the inferotemporal cortex of the macaque monkey. Journal of Neurophysiology, pages 170­189, 1991. [13] N.E. Barraclough, D. Xiao, C.I. Baker, M.W. Oram, and D.I. Perrett. Integration of visual and auditory information by superior temporal sulcus neurons responsive to the sight of actions. Journal of Cognitive Neuroscience, 17, 2005. 2 3.2 GhZ Intel XeonTM , SuSE Linux 10.0 8