A neural network implementing optimal state estimation based on dynamic spike train decoding

Omer Bobrowski1 , Ron Meir1 , Shy Shoham2 and Yonina C. Eldar1 Department of Electrical Engineering1 and Biomedical Engineering2 Technion, Haifa 32000, Israel
{bober@tx},{rmeir@ee},{sshoham@bm},{yonina@ee}.technion.ac.il

Neural Information Processing Systems, NIPS 21, 2007

Abstract
It is becoming increasingly evident that organisms acting in uncertain dynamical environments often employ exact or approximate Bayesian statistical calculations in order to continuously estimate the environmental state, integrate information from multiple sensory modalities, form predictions and choose actions. What is less clear is how these putative computations are implemented by cortical neural networks. An additional level of complexity is introduced because these networks observe the world through spike trains received from primary sensory afferents, rather than directly. A recent line of research has described mechanisms by which such computations can be implemented using a network of neurons whose activity directly represents a probability distribution across the possible "world states". Much of this work, however, uses various approximations, which severely restrict the domain of applicability of these implementations. Here we make use of rigorous mathematical results from the theory of continuous time point process filtering, and show how optimal real-time state estimation and prediction may be implemented in a general setting using linear neural networks. We demonstrate the applicability of the approach with several examples, and relate the required network properties to the statistical nature of the environment, thereby quantifying the compatibility of a given network with its environment.

1 Introduction
A key requirement of biological or artificial agents acting in a random dynamical environment is estimating the state of the environment based on noisy observations. While it is becoming clear that organisms employ some form of Bayesian inference, it is not yet clear how the required computations may be implemented in networks of biological neurons. We consider the problem of a system, receiving multiple state-dependent observations (possibly arising from different sensory modalities) in the form of spike trains, and construct a neural network which, based on these noisy observations, is able to optimally estimate the probability distribution of the hidden world state. The present work continues a line of research attempting to provide a probabilistic Bayesian framework for optimal dynamical state estimation by biological neural networks. In this framework, first formulated by Rao (e.g., [8, 9]), the time-varying probability distributions are represented in the neurons' activity patterns, while the network's connectivity structure and intrinsic dynamics are responsible for performing the required computation. Rao's networks use linear dynamics and discrete time to approximately compute the log-posterior distributions from noisy continuous inputs (rather than actual spike trains). More recently, Beck and Pouget [1] introduced networks in which the neurons directly represent and compute the posterior probabilities (rather than their logarithms) 1


from discrete-time approximate firing rate inputs, using non-linear mechanisms such as multiplicative interactions and divisive normalization. Another relevant line of work, is that of Brown and colleagues as well as others (e.g., [4, 11, 13]) where approximations of optimal dynamical estimators from spike-train based inputs are calculated, however, without addressing the question of neural implementation. Our approach is formulated within a continuous time point process framework, circumventing many of the difficulties encountered in previous work based on discrete time approximations and input smoothing. Moreover, using tools from the theory of continuous time point process filtering (e.g., [3]), we are able to show that a linear system suffices to yield the exact posterior distribution for the state. The key element in the approach is switching from posterior distributions to a new set of functions which are simply non-normalized forms of the posterior distribution. While posterior distributions generally obey non-linear differential equations, these non-normalized functions obey a linear set of equations, known as the Zakai equations [15]. Intriguingly, these linear equations contain the full information required to reconstruct the optimal posterior distribution! The linearity of the exact solution provides many advantages of interpretation and analysis, not least of which is an exact solution, which illustrates the clear distinction between observation-dependent and independent contributions. Such a separation leads to a characterization of the system performance in terms of prior knowledge and real-time observations. Since the input observations appear directly as spike trains, no temporal information is lost. The present formulation allows us to consider inputs arising from several sensory modalities, and to determine the contribution of each modality to the posterior estimate, thereby extending to the temporal domain previous work on optimal multimodal integration, which was mostly restricted to the static case. Inherent differences between the modalities, related to temporal delays and different shapes of tuning curves can be incorporated and quantified within the formalism. In a historical context we note that a mathematically rigorous approach to point process based filtering was developed during the early 1970s following the seminal work of Wonham [14] for finite state Markov processes observed in Gaussian noise, and of Kushner [7] and Zakai [15] for diffusion processes. One of the first papers presenting a mathematically rigorous approach to nonlinear filtering in continuous time based on point process observations was [12], where the exact nonlinear differential equations for the posterior distributions are derived. The presentation in Section 4 summarizes the main mathematical results initiated by the latter line of research, adapted mainly from [3], and serves as a convenient starting point for many possible extensions.

2 A neural network as an optimal filter
Consider a dynamic environment characterized at time t by a state Xt , belonging to a set of N states, namely Xt  {s1 , s2 , . . . , sN }. We assume the state dynamics is Markovian with generator matrix Q. The matrix Q, [Q]ij = qij , is defined [5] by requiring that for small values of h, Pr[Xt+h = si |Xt = si ] = 1 + qii h + o(h) and Pr[Xt+h = sj |Xt = si ] = qij h + o(h) for j = i. j The normalization requirement is that qij = 0. This matrix controls the process' infinitesimal progress according to  (t) =  (t)Q, where i (t) = Pr[Xt = si ].  The state Xt is not directly observable, but is only sensed through a set of M random state-dependent (k ) (k ) observation point processes {Nt }M 1 . We take each point process Nt to represent the spiking k= activity of the k -th sensory cell, and assume these processes to be doubly stochastic Poisson counting processes1 with state-dependent rates k (Xt ). These processes are assumed to be independent, given the current state Xt . The objective of state estimation (a.k.a. nonlinear filtering) is to obtain a differential equation for the posterior probabilities N X ,  (1) (M ) (1 ) pi (t) = Pr t = si [0,t] , . . . , N[0,t] N ,  (k ) (k ) (1) (M ) where N[0,t] = {Ns }0st . In the sequel we denote Y0t = and refer the [0,t] , . . . , N[0,t] reader to Section 4 for precise mathematical definitions. We interpret the rate k as providing the tuning curve for the k -th sensory input. In other words, the k -th sensory cell responds with strength k (si ) when the input state is Xt = si . The required
1

Implying that the rate function itself is a random process.

2


differential equations for pi (t) are considerably simplified, with no loss of information [3], by conN sidering a set of non-normalized `probability functions' i (t), such that pi (t) = i (t)/ j =1 j (t). Based on the theory presented in Section 4 we obtain n  jN kM i (t) =  Qj i j (t) + (k (si ) - 1)  (t - tk ) - 1 i (t), (2 ) n
=1 =1

wh e r e denote the spiking times of the k -th sensory cell. This equation can be written in vector form by defining k = diag(k (s1 ) - 1, k (s2 ) - 1, . . . k (sN ) - 1) and  = (1 , . . . , N ), leading to  (t) = (Q - ) (t) + kM k n  (t - tk )(t). n ; = kM k , (3 )

{ tk } n

=1

(4 )

=1

Equations (2) and (4) can be interpreted as the activity of a linear neural network, where i (t) represents the firing rate of neuron i at time t, and the matrix (Q - ) represents the synaptic weights (including self-weights); see Figure 1 for a graphical display of the network. Assuming that the tuning functions k are unimodal, decreasing in all directions from some maximal value (e.g., Gaussian or truncated cosine functions), we observe from (2) that the impact of an input spike at time t is strongest on cell i for which k (si ) is maximal, and decreases significantly for cells j for which sj is `far' from si . This effect can be modelled using excitatory/inhibitory connections, where neurons representing similar states excite each other, while neurons corresponding to very different states inhibit each other (e.g., [2]). This issue will be elaborated on in future work. Several observations are in place regarding (4). (i) The solution of (4) provides the optimal posterior state estimator given the spike train observations, i.e., no approximation is involved. (ii) The equations are linear even though the equations obeyed by the posterior probabilities pi (t) are nonlinear. (iii) The temporal evolution breaks up neatly into an observationindependent term, which can be conceived of as implementing a Bayesian dynamic prior, and an observation-dependent term, which contributes each time a spike occurs. Note that a similar structure was observed recently in [1]. (iv) The observation process affects the posterior estimate through two terms. First, input processes with strong spiking activity, affect the activity more strongly. Second, the k -th input affects most strongly the Figure 1: A graphical depiction of components of (t) corresponding to states with large values the network implementing optimal of the tuning curve k (si ). (v) At this point we assume that the filtering of M spike train inputs. matrix Q is known. In a more general setting, one can expect Q to be learned on a slower time scale, through interaction with the environment. We leave this as a topic for future work. Multi-modal inputs A multi-modal scenario may be envisaged as one in which a subset of the sensory inputs arises from one modality (e.g., visual) while the remaining inputs arise from a different sensory modality (e.g., auditory). These modalities may differ in the shapes of their receptive fields, their response latencies, etc. The framework developed above is sufficiently general to deal with any number of modalities, but consider for simplicity just two modalities, denoted by V and A. It is straightforward to extend the derivation of (4), leading to  M M n kv n ka a,k v a v v ,k a   (t - tn ) (t). (5) (t) = (Q -  -  ) (t) + k  (t - tn ) + k
=1 =1

Prediction The framework can easily be extended to prediction, defined as the problem of calculating the future posterior distribution ph (t) = Pr[Xt+h = si |Y0t ]. It is easy to show that the i 3


non-normalized probabilities h (t) can be calculated using the vector differential equation ~  (t) = (Q - ) h (t) + 
h

   ~ with the initial condition h (0) = ehQ (0), and where k = ehQ k e-hQ . Interestingly, the equations obtained are identical to (4), except that the system parameters are modified.

kM

~ k

=1

n

 (t - tk )h (t), n

(6 )

Simplified equation When the tuning curves of the sensory cells are uniformly distributed Gaussians (e.g., spatial receptive fields), namely k (x) = max exp(-(x - k x)2 /2 2 ), it can be shown M [13] that for small ek ough x, and a large number of sensory cells, k=1 k (x)   for all x, imn k  ( - M )I. Therefore the matrix  has no effect on the solution of (4), plying that  = except for an exponential attenuation that is applied to all the cells simultaneously. Therefore, in cases where the number of sensory cells is large,  can be omitted from (4). This means that between spike arrivals, the system behaves solely according to the a-priori knowledge about the world, and when a spike arrives, this information is reshaped according to the firing cell's tuning curve.

3 Theoretical Implications and Applications
Viewing (4) we note that between spike arrivals, the input has no effect on the system. Therefore, the inter-arrival dynamics is simply (t) = (Q - ) (t). Defining tn as the n-th arrival time of a  spike from any one of the sensors, the solution in the interval (tn , tn+1 ) is (t) = e(t-tn )(Q-) (tn ). When a new spike arrives from the k -th sensory neuron at time tn the system is modified within an infinitesimal window of time as
+ - i (tn ) = i (t- ) + i (tn )(k (si ) - 1) = i (t- )k (si ). n n


(7 )

Thus, at the exact time of a spike arrival from the k -th sensory cell, the vector  is reshaped according to the tuning curve of the input cell that fired this spike. Assuming n spikes occurred before time t, we can derive an explicit solution to (4), given by n i  (t) = e(t-tn )(Q-) (I + k(ti ) )e(ti -ti-1 )(Q-) (0), (8 )
=1

where k (ti ) is the index of the cell that fired at ti , I is the identity matrix, and we assumed initial conditions (0) at t0 = 0.

3.1 Demonstrations We demonstrate the operation of the system on several synthetic examples. First consider a small object moving back and forth on a line, jumping between a set of discrete states, and being observed by a retina with M sensory cells. Each world state si describes the particle's position, and each sensory cell k generates a Poisson spike train with rate k (Xt ), taken to be a Gaussian max exp (-(x - xk )2 /2 2 ). Figure 2(a) displays the motion of the particle for a specific choice of matrix Q, and 2(b) presents the spiking activity of 10 position sensitive sensory cells. Finally, Figure 2(c) demonstrates the tracking ability of the system, where the width of the gray trace corresponds to the prediction confidence. Note that the system is able to distinguish between 25 different states rather well with only 10 sensory cells. In order to enrich the systems's estimation capabilities, we can include additional parameters within the state of the world. Considering the previous example, we create a larger set of states: sij = ~ (si , dj ), where dj denotes the current movement direction (in this case d1 =up, d2 =down). We add a population of sensory cells that respond differently to different movement directions. This lends further robustness to the state estimation. As can be seen in Figure 2(d)-(f), when for some reason the input of the sensory cells is blocked (and the sensory cells fire spontaneously) the system estimates a movement that continues in the same direction. When the blockade is removed, the system is resynchronized with the input. It can be seen that even during periods where sensory input is absent, the general trend is well predicted, even though the estimated uncertainty is increased. 4


By expanding the state space it is also possible for the system to track multiple objects simultaneously. In figure 2(g)-(i) we present tracking of two simultaneously moving objects. This is done simply by creating a new state space, sij = (s1 , s2 ), where sk denotes the state (location) of the i j i k -th object.
(a) Object trajectory
x[ cm ] x[ cm ]
20 10 0 0 20 10 0 0

(d) Object trajectory
10

(g) Object trajectory
x[ cm ]

5 Object #1 Object #2 0 0 1 2 3 4 5 6 7 8 9

1

2

3

4

5

6

7

8

9

1

2

3

4

5

6

7

8

9

t[sec]

t[sec]

t[sec]

(b) Input activity
10

(e) Input activity
10

(h) Input activity
cell #

cell #

5

cell #

10 5 0 0 1 2 3 4 5 6 7 8 9

5

0 0

1

2

3

4

5

6

7

8

9

0 0

1

2

3

4

5

6

7

8

9

t[sec]

t[sec]

t[sec]

(c) Posterior probability evolution
25 20 15 10 5 0 2 4 6 8 10 25 20 15 10 5 0

(f) Posterior probability evolution
10

(i) Posterior probability evolution
x[cm]
8 6 4 2 0 2 4 6 8 10

x[cm]

x[cm]

2

4

6

8

10

t[sec]

t[sec]

t[sec]

Figure 2: Tracking the motion of an object in 1D. (a) The object's trajectory. (b) Spiking activity of 10 sensory cells. (c) Decoded position estimation with confidence interval. Each of the 10 sensory cells has a Gaussian tuning curve of width  = 2 and maximal firing rate max = 25.(d)-(f) Tracking based on position and direction information. The red bar marks the time when the input was blocked, and the green bar marks the time when the blockade was removed. Here we used 10 place-cells and 4 direction-cells (marked in red). (g)-(i) Tracki.ng of two objets simultaneously. The X1 2 network activity in (i) represents Pr t = si  Xt = si |Y0t 3.2 Behavior Characterization The solution of the filtering equations (4) depends on two processes, namely the recurrent dynamics due to the first term, and the sensory input arising from the second term. Recall that the connectivity matrix Q is essentially the generator matrix of the state transition process, and as such, incorporates prior knowledge about the world dynamics. The second term, consisting of the sensory input, contributes to the state estimator update every time a spike occurs. Thus, a major question relates to the interplay between the a-priori knowledge embedded in the network through Q and the incoming sensory input. In particular, an important question relates to tailoring the system parameters (e.g., the tuning curves k ), to the properties of the external world. As a simple characterization of the generator matrix Q, we consider the diagonal and non-diagonal terms. The diagonal term qii is relatedqto the average tim/ spent in state i through E[Ti ] = -1/qii [5], and thus we define e - -  (Q) = - 111 + � � � + qN1 N , as a measure of the transition frequency of the process, where N small values of  correspond to a rapidly changing process. A second relevant measure relates to the regularity in the transition between the states. To quantify this consider a state i, and define a probability vector qi consisting of the N - 1 elements {Qij }, j = i, normalized so that the sum of the elements is 1. The entropy of qi is a measure for the state transition irregularity from state i, and we define H (Q) as the average of this entropy over all states. In summary, we lump the main properties of Q into  (Q), related to the rapidity of the process, and H (Q), measuring the transition regularity. Clearly, these variables are but one heuristic choice for characterizing the Markov process dynamics, but they will enable us to relate the `world dynamics' to the system behavior. The sensory input influence on the system is controlled by the tuning curves. To simplify the analysis we assume uniformly placed Gaussian tuning curves, k (x) = max exp (-(x - k x)2 /2 2 ), which can be characterized by two parameters - the maximum value max and the width  . Note, however that our model does not require any special constraints on the tuning curves. Figure 3 examines the system performance under different world setups. We measure the performance using the L1 error of the maximum aposteriori (MAP) estimator built from the posterior distribution generated by the system. The MAP estimator is obtained by selecting the cell with the highest firing activity i (t), is optimal under the present setting (leading to the minimal probability of error), and can be easily implemented in a neural network by a Winner-Take-All circuit. The choice of the L1 error is justified in this case since the states {si } represent locations on a line, 5


thereby endowing the state space with a distance measure. In figure 3(a) we can see that as  (Q) increases, the error diminishes, an expected result, since slower world dynamics are easier to analyze. The effect of H (Q) is opposite - lower entropy implies higher confidence in the next state. Therefore we can see that the error increases with H (Q) (fig. 3(b)). The last issue we examine relates to the behavior of the system when an incorrect Q matrix is used (i.e., the world model is incorrect). It is clear from figure 3(c) that for low values of M (the number of sensory cells), using the wrong Q matrix increases the error level significantly. However as the value of M increases, the differences are reduced. This phenomenon is expected, since the more observations are available about the world, the less weight need be assigned to the a-priori knowledge.
1 0.8 1.6 10 1.4 8 Correct model Wrong Q-matrix, same (Q) Wrong Q-matrix, different (Q)

L Error

L Error

0.6 0.4 0.2 0 0

1.2 1 0.8

L Error
1.5 2 2.5 3 3.5 4

6 4 2 0 0 100 200 300 400 500

1

1

2

4

(Q)

6

8

10

1

H(Q)

1

M

(a) Effect of state rapidity

(b) Effect of transition entropy

(c) Effect of misspecification

Figure 3: State estimation error for different world dynamics and model misspecification. For (a) and (b) M = 17, N = 17,  = 3, max = 50, and for (c) N = 25,  = 3, max = 50. In figure 4 we examine the effect of the tuning curve parameters on the system's performance. Given a fixed number of input cells, if the tuning curves are too narrow (fig. 4(a) top), they will not cover the entire state space. On the other hand, if the tuning curves are too wide (fig. 4(a) bottom) the cell's response is very similar for all states. Therefore we get an error function that has a local minimum (fig. 4(b)). It remains for future work to determine what is the optimal value of  for a given model. The effect of different values of max is obvious - higher values of max lead to more spikes per sensory cell which increases the system's accuracy. Clearly, under physiological conditions, where high firing rates are energetically costly, we would expect a tradeoff between accuracy and energy expenditure.
low 
 (x) 40 20 0 0 10 20 30 40 50 x[cm] 60 70 80 90 100

k

medium 

14
 (x) 40 20

12 10
10 20 30 40 50 x[cm] 60 70 80 90 100

  
max max max

= 50 = 25 = 10

k

L Error

0 0

8 6 4 2

high 
 (x) 40 20 0 0 10 20 30 40 50 x[cm] 60 70 80 90 100

k

1

0 -2

-1

0

log()

1

2

3

(a)

(b)

Figure 4: The effect of the tuning curves parameters on performance.

4 Mathematical Framework and Derivations
We summarize the main mathematical results related to point process filtering, adapted mainly from [3]. Consider a finite-state continuous-time Markov process Xt  {s1 , s2 , . . . , sN } with a generator matrix Q that is being observed via a set of (doubly stochastic) Poisson processes with statedependent rate functions k (Xt ), k = 1, . . . , M .
t Consider first a single point process observation N0 = {Ns }0st . We denote the joint probability law for the state and observation process by P1 . The objective is to derive a differential equation for the posterior probabilities (1). This is the classic nonlinear filtering problem from systems theory

6


t (e.g. [6]). More generally, the problem can be phrased as computing E1 [f (Xt )|N0 ], where, in the case of (1), f is a vector function, with components fi (x) = �[x = si ].

where E0 denotes the expectation with regard to P0 . The random variable L is called the RadonNykodim derivative of P1 with respect to P0 , and is denoted by L( ) = dP1 ( )/dP0 ( ). Consider two continuous-time random processes - Xt ,Nt , that have different interpretation under the different probability measures - P0 , P1 : X t is a finite-state Markov process, Xt  {s1 , s2 , . . . , sN }. P0 : , (1 0 ) Nt is a Poisson process with a constant rate of 1, independent of Xt X t is a finite-state Markov process, Xt  {s1 , s2 , . . . , sN }. P1 : . (1 1 ) Nt is a doubly-stochastic Poisson process with rate function: (Xt ) The following avatar of Bayes' formula (eq. 3.5 in chap. 6 of [3]), supplies a way to calculate the t conditional expectation E1 [f (Xt )|N0 ] based on P1 in terms of an expectation w.r.t. P0 ,
t E1 [f (Xt )|N0 ] = t E0 [Lt f (Xt )|N0 ] , t] E0 [Lt |N0

We outline the derivation required to obtain such an equation, using a method referred to as change of measure (e.g., [3]). The basic idea is to replace the computationally hard evaluation t of E1 [f (Xt )|N0 ], by a tractable computation based on a simple probability law. Consider two probability spaces (, F , P ) and (, F , P ) that differ only in their probability measures. P1 is said to be absolutely continuous with respect to P0 (denoted by P1  P0 ), if for all A  F , P0 (A) = 0  P1 (A) = 0. Assuming P1  P0 , it can be proved that there exists a random variable L( ),   , such that for all A  F , A P1 (A) = E0 [1A L] = L( )dP0 ( ), (9 )

(1 2 )

where Lt = dP1,t /dP0,t , and P0,t and P1,t are the restrictions of P0 and P1 , respectively, to the t  sigma-algebra generated by {N0 , X0 }. We refer the reader to [3] for precise definitions. Using (1) and (12) we have
t pi (t) = E1 [fi (Xt )|N0 ] = t E0 [Lt fi (Xt )|N0 ] . t] E0 [Lt |N0

(1 3 )

Since the denominator is independent of i, it can be regarded as a normalization factor. Thus, N  t defining i (t) = E0 [Lt fi (Xt )|N0 ], it follows that pi (t) = i (t)/ j =1 j (t). jN

Based on the above derivation, one can show ([3], chap. 6.4) that {i (t)} obey the stochastic differential equation (SDE) di (t) = Qj i j (t)dt + ((si ) - 1)i (t)(dNt - dt). (1 4 )

=1

A SDE of the form d(t) = a(t)dt + b(t)dNt should be interpreted as follows. If at time t, no jump occurred in the counting process Nt , then (t + dt) - (t)  a(t)dt, where dt denotes an infinitesimal time interval. If a jump occurred at time t then (t + dt) - (t)  a(t)dt + b(t). Since the jump locations are random, (t) is a stochastic process, hence the term SDE. Now, this derivation can be generalized to the case where there are M observation processes (1) (2) (M ) N t , N t , . . . , Nt with different rate functions 1 (Xt ), 2 (Xt ), . . . , M (Xt ). In this case the differential equations for the non-normalized posterior probabilities is di (t) = jN Qj i j (t)dt + kM (k (si ) - 1)i (t)(dNt
(k )

- dt)

(1 5 )

=1

=1

n (k ) (k ) Recalling that Nt is a counting process, namely dNt /dt =  (t - tk ), we obtain (2), where n tk is the arrival time of the n-th event in the k -th observation process. n 7


5 Discussion
In this work we have introduced a linear recurrent neural network model capable of exactly implementing Bayesian state estimation and prediction from input spike trains in real time. The framework is mathematically rigorous and requires few assumptions, is naturally formulated in continuous time, and is based directly on spike train inputs, thereby sacrificing no temporal resolution. The setup is ideally suited to the integration of several sensory modalities, and retains its optimality in this setting as well. The linearity of the system renders an analytic solution possible, and a full characterization in terms of a-priori knowledge and online sensory input. This framework sets the stage for many possible extensions and applications, of which we mention several examples. (i) It is important to find a natural mapping between the current abstract neural model and more standard biological neural network models. One possible approach was mentioned in Section 2, but other options are possible and should be pursued. Additionally, the implementation of the estimation network (namely, the variables i (t)) using realistic spiking neurons is still open. (ii) At this point the matrix Q in (4) is assumed to be known. Combining approaches to learning Q and adapting the tuning curves k in real time will lend further plausibility and robustness to the system. (iii) The present framework, based on doubly stochastic Poisson processes, can be extended to more general point processes, using the filtering framework described in [10]. (iv) Currently, each world state is represented by a single neuron (a grandmother cell). This is clearly a non-robust representation, and it would be worthwhile to develop more distributed and robust representations. Finally, the problem of experimental verification of the framework is a crucial step in future work. Acknowledgments The authors are grateful to Rami Atar his helpful advice on nonlinear filtering.

References
[1] J.M. Beck and A. Pouget. Exact inferences in a neural implementation of a hidden markov model. Neural Comput, 19(5):1344�1361, 2007. [2] R. Ben-Yishai, R.L. Bar-Or, and H. Sompolinsky. Theory of orientation tuning in visual cortex. Proc Natl Acad Sci U S A, 92(9):3844�8, Apr 1995. 542. [3] P. Bremaud. Point Processes and Queues: Martingale Dynamics. Springer, New York, 1981. � [4] U.T. Eden, L.M. Frank, V. Solo, and E.N. Brown. Dynamic analysis of neural encoding by point process adaptive filtering. Neural Computation, 16:971�998, 2004. [5] G.R. Grimmett and D.R. Stirzaker. Probability and Random Processes. Oxford University Press, third edition, 2001. [6] A.H. Jazwinsky. Stochastic Processes and Filtering Theory. Academic Press, 1970. [7] H.J. Kushner. Dynamical equations for optimal nonlinear filtering. J. Differential Equations, 3:179�190, 1967. [8] R.P.N. Rao. Bayesian computation in recurrent neural circuits. Neural Comput, 16(1):1�38, 2004. 825. [9] R.P.N. Rao. Neural models of Bayesain belief propagation. In K. Doya, S. Ishii, A. Pouget, and R. P. N. Rao, editors, Bayesian Brain, chapter 11. MIT Press, 2006. [10] A. Segall, M. Davis, and T. Kailath. Nonlinear filtering with counting observations. IEEE Tran. Information Theory,, 21(2):143�149, 1975. [11] S. Shoham, L.M. Paninski, M.R. Fellows, N.G. Hatsopoulos, J.P. Donoghue, and R.A. Norman. Statistical encoding model for a primary motor cortical brain-machine interface. IEEE Trans Biomed Eng., 52(7):1312�22, 2005. [12] D. L. Snyder. Filtering and detection for doubly stochastic Poisson processes. IEEE Transactions on Information Theory, IT-18:91�102, 1972. [13] N. Twum-Danso and R. Brockett. Trajectory estimation from place cell data. Neural Netw, 14(6-7):835�844, 2001. [14] W.M. Wonham. Some applications of stochastic differential equations to optimal nonlinear filtering. J. SIAM Control, 2(3):347�369, 1965. [15] M. Zakai. On the optimal filtering of diffusion processes. Z. Wahrscheinlichkeitheorie verw Gebiete, 11:230�243, 1969.

8