Tractable Nonparametric Bayesian Inference in Poisson Processes with Gaussian Process Intensities Ryan Prescott Adams Cavendish Laboratory, University of Cambridge, Cambridge CB3 0HE, UK rpa23@cam.ac.uk Iain Murray murray@cs.toronto.edu Department of Computer Science, University of Toronto, Toronto, Ontario M5S 3G4 David J.C. MacKay Cavendish Laboratory, University of Cambridge, Cambridge CB3 0HE, UK mackay@mrao.cam.ac.uk Abstract The inhomogeneous Poisson process is a point process that has varying intensity across its domain (usually time or space). For nonparametric Bayesian modeling, the Gaussian process is a useful way to place a prior distribution on this intensity. The combination of a Poisson process and GP is known as a Gaussian Cox process, or doubly-stochastic Poisson process. Likelihood-based inference in these models requires an intractable integral over an infinite-dimensional random function. In this paper we present the first approach to Gaussian Cox processes in which it is possible to perform inference without introducing approximations or finitedimensional proxy distributions. We call our method the Sigmoidal Gaussian Cox Process, which uses a generative model for Poisson data to enable tractable inference via Markov chain Monte Carlo. We compare our methods to competing methods on synthetic data and apply it to several real-world data sets. describe nonparametrically the variation in the Poisson intensity function. This construction is called a doubly-stochastic Poisson process, or a Cox process (Cox, 1955), and has been applied in a variety of settings, e.g. neuroscience (Cunningham et al., 2008b), astronomy (Gregory & Loredo, 1992), and forestry (Heikkinen & Arjas, 1999). One variant of the Cox process is the Gaussian Cox process, where the intensity function is a transformation of a random realization from a Gaussian process (GP). From a modeling perspective, this is a particularly convenient way to specify general prior beliefs about the intensity function via a kernel, without having to choose a particular functional form. Unfortunately, however, likelihood-based inference in this model is generally intractable, due to the need to integrate an infinite-dimensional random function. Various approximations have been introduced to deal with this intractability. The classic approach of Diggle (1985) uses Parzen-type kernel densities to construct a nonparametric estimator, with the bandwidth chosen via the empirical Ripley's function (Ripley, 1977). Nonparametric Bayesian approaches to the Gaussian Cox process have been studied in works by Rathbun and Cressie (1994) and Møller et al. (1998), which both introduced tractable finite-dimensional proxy distributions via discretization. There have also been nonparametric Bayesian approaches to inhomogeneous Poisson Process inference that do not use underlying Gaussian processes, e.g. Dirichlet process mixtures of Beta distributions (Kottas & Sans´, 2007). o In this paper we present the first approach to a Gaussian Cox process model that enables fullynonparametric Bayesian inference via Markov chain Monte Carlo (MCMC), without requiring either numeric approximation or a finite-dimensional proxy dis- 1. Introduction The Poisson process is a widely-used model for point data in temporal and spatial settings. The inhomogeneous variant of the Poisson process allows the rate of arrivals to vary in time (or space), but typically we do not have a preconceived idea of the appropriate functional form for this variation. In this setting, it is often desirable to use another stochastic process to Appearing in Proceedings of the 26 th International Conference on Machine Learning, Montreal, Canada, 2009. Copyright 2009 by the author(s)/owner(s). Nonparametric Bayesian Inference in Poisson Processes with GP Intensities (a) (b) (c) Poisson Intensity (t) 5 4 3 2 1 0 0 1 2 3 4 5 6 7 8 9 10 Time (t) sistently defined with a positive definite covariance function C(·, ·) : S × S R and a mean function m(·) : S R. The mean and covariance functions are parameterized by hyperparameters . For a more detailed review of Gaussian processes see, e.g., Rasmussen and Williams (2006). In the Sigmoidal Gaussian Cox Process, we transform the random g(s) into a random intensity function (s) via (s) = (g(s)) (1) Figure 1. Three realizations of events from an inhomogeneous Poisson process in time, along with the associated intensity function. tribution. We achieve this tractability by extending recently-developed MCMC methods for probability density functions (Adams et al., 2009). where is an upper bound on (s) and (·) is the logistic function, (z) = (1 + e-z )-1 . Equation 1 squashes g(s) through a sigmoid function so that each g(s) corresponds to a random function with outputs between zero and . In this paper we use a logistic function as the sigmoid. The choice of is part of the prior and can be included in inference, as described in Section 3.5. 2.3. Generating Poisson Data from Random Gaussian Process Intensities We use the transformation of Equation 1 because it allows us to simulate exact Poisson data from a random intensity function drawn from the prior provided by the Gaussian process. By exact we mean that the data are not biased by, for example, the starting state of a finite Markov chain. We generate these exact data via thinning, which is a point-process variant of rejection sampling introduced by Lewis and Shedler (1979). We extend the thinning procedure to simultaneously sample the function g(s) from the Gaussian process. We wish to generate a set of events {sk }K on some k=1 subregion T of S which are drawn according to a Poisson process whose intensity function (s) is the result of applying Equation 1 to a random function g(s) drawn from the GP. We do this by first simulating a set of events {^j }J from a homogeneous Poisson s j=1 process with intensity . If µ(·) is a measure on S, then we first sample J, the number of events in T , by drawing it from a Poisson distribution with parameter µ(T ). Next, the J events {^j }J are distributed s j=1 uniformly within T . The {^j }J are now events randomly drawn from a s j=1 homogeneous Poisson process with intensity on T . Next, we treat these {^j }J s j=1 as input points for a Gaussian process and sample the function g(s) at these locations, to generate a corresponding set of function values, denoted {g(^j )}J . We now use the thinning s j=1 procedure to choose which K J points of {^j }J we s j=1 will keep so that the kept points, denoted {sk }K , are k=1 drawn from an inhomogeneous Poisson process with an 2. The Model In this section we review the Poisson process and specify our model, the Sigmoidal Gaussian Cox Process (SGCP), which transforms a Gaussian process into a nonparametric prior distribution on intensity functions. We then show that the SGCP allows exact simulation of Poisson data from a random infinitedimensional intensity function, without performing intractable integrals. This approach is similar to that taken for general density modeling by Adams et al. (2009). 2.1. The Poisson Process We consider the inhomogeneous Poisson process on a domain S which we will take to be RD . The Poisson process is parameterized by an intensity (or rate) function (s) : S R+ where R+ indicates the nonnegative real numbers. The random number of events N (T ) within a subregion T S is Poissondistributed with parameter T = T (s)ds. The number of events in disjoint subsets are independent. Figure 1 shows an example of a one-dimensional Poisson intensity function along with three independentlydrawn event sequences. 2.2. A GP Prior on Poisson Intensities We introduce a random scalar function g(s) : S R. This function has a Gaussian process prior, which means that the prior distribution over any discrete set of function values {g(sn )}N is a multivariate Gausn=1 sian distribution. These distributions can be con- Nonparametric Bayesian Inference in Poisson Processes with GP Intensities Algorithm 1 Simulate data from a Poisson process on region T with random (s) drawn as in Equation 1 Inputs: Region T , Upper-bound , GP functions m(s) and C(s, s ) Outputs: Exact Poisson events E = (s1 , s2 , . . .) 1: V µ(T ) Compute the measure of T . 2: J Poisson(V ) Draw the number of events. 3: {^j }J Uniform(T ) s j=1 Distribute the events uniformly in T . 4: {g(^j )}J GP (C(·, ·), m(·), , {^j }J ) s j=1 s j=1 Sample the function at the events from the GP. 5: E Initialize the set of accepted events. 6: for j 1 . . . J do 7: rj Uniform(0, 1) Draw a uniform random variate on the unit interval. 8: if rj < (g(^j )) then s Apply acceptance rule. 9: E E sj ^ Add sj to accepted events. ^ 10: end if 11: end for 12: return E intensity function (s) consistent with the {g(^j )}J s j=1 we have just simulated from the Gaussian process. We do this by generating J uniform random variates on (0, 1), denoted {rj }J . We only accept the events j=1 for which rj < (g(^j )). These accepted events form s the set {sk }K . This procedure is shown in Algok=1 rithm 1 and graphically in Figure 2. chain Monte Carlo methods are unable to deal with intractability in the likelihood as in Equations 2 and 3. We also have the basic intractability that we cannot na¨ ively represent the posterior distribution over the infinite-dimensional g, even if we could perform the integral calculations. 3.1. Tractability Via Latent Variables 3. Inference We have so far defined a model for generating data from an inhomogeneous Poisson process using a GPbased prior for the intensity function. We now address the problem of inference: given a set of K events, denoted {sk }K , within a region T , and using the SGCP k=1 model of Section 2 as the prior, what is the posterior distribution over (s)? The Poisson process likelihood function is K p({sk }K | (s)) = exp - ds (s) k=1 T k=1 (sk ). (2) For random infinite-dimensional (s), such as the Log Gaussian Cox Process or the SGCP model presented in Section 2, the integral inside the exponential cannot be evaluated. We write Bayes' theorem for our model, using g to indicate the infinite-dimensional object corresponding to g(s): p(g | {sk }K ) = k=1 GP (g) exp - T (g(s)) ds dg GP (g) exp - T (g(s)) ds Rather than performing MCMC inference directly via the posterior in Equation 3, we augment the posterior distribution to make the Markov chain tractable. Specifically, we consider the Poisson data to have been generated as in Section 2, and the additional latent variables are 1) the total number of "thinned" events M ; 2) the locations of the thinned events, {~m }M ; 3) the values of the function g(s) at s m=1 the thinned events, denoted g M = {g(~m )}M ; 4) the s m=1 values of the function g(s) at the observed events, denoted g K = {g(sk )}K . The generative procedure did k=1 not require integrating an infinite-dimensional random function, nor did it require knowledge of g(s) or (s) at more than a finite number of locations. By considering the procedure as a latent variable model, we inherit these convenient properties for inference. The joint distribution over the fixed data {sk }K , the numk=1 ber of thinned events M , the location of the thinned events {~m }M , and the function value vectors g M s m=1 and g K , is p({sk }K , M, {~m }M , g M+K | , T , ) = s m=1 k=1 K M (3) k (g(sk )) . k (g(sk )) ( )K+M exp {- µ(T )} k=1 (g(sk )) (-g(~m )) s (4) This posterior distribution is doubly-intractable (Murray et al., 2006), due to the presence of an intractable integral over T in the numerator and an intractable integral over g in the denominator. Standard Markov × GP m=1 K (g M+K | {sk }k=1 , {~m }M , ) s m=1 where g M+K denotes a vector concatenating g M and g K . Note that (-z) = 1 - (z), and that we Nonparametric Bayesian Inference in Poisson Processes with GP Intensities 5 a) sj ^ cess, conditioned on the current state g M+K . This proposal is qins (M +1 M ) = b(K, M ) GP (g(~ ) | s , g M+K ). s ~ µ(T ) (5) 0 5 b) (g(^j )) s 0 5 c) A deletion move for a latent thinned event consists of selecting the event m to remove randomly and uniformly from the M events in the current state. This proposal has density qdel (M -1 M ) = 1 - b(K, M ) . M (6) rj 0 5 We incorporate the joint distribution of Equation 4 to find the Metropolis­Hastings acceptance ratios of each type of proposal, integrating out the vertical coordinate r: (s) sj ains = adel 0 1 2 3 4 5 6 Time (t) 7 8 9 10 d) 0 (1-b(K, M +1)) µ(T ) (M +1) b(K, M ) (1+exp{g(~ )}) s M b(K, M -1) (1+exp{g(~m )}) s . = (1-b(K, M )) µ(T ) (7) (8) Figure 2. This sequence of figures shows the generative procedure for the SGCP. a) The upper-bounding intensity is shown, along with a Poisson time series generated from it. b) At each point of the time series, a sample is drawn from the Gaussian process. This function is squashed through the logistic function so that it is everywhere positive and upper-bounded by . c) Variates are drawn uniformly on (0, ) in the vertical coordinate. d) If the uniform variates are greater than the random function, then the corresponding events are discarded. The kept events are drawn from the inhomogeneous Poisson process corresponding to the random intensity (s). The proposal probability b(K, M ) can safely be set to 1 . Tuning and domain knowledge can almost cer2 tainly yield better choices, however. We have not extensively explored this topic. In practice we have found it useful to make several ( 10) of these transitions for each of the transitions made in Sections 3.3 and 3.4. 3.3. Sampling the Locations of Thinned Events Given the number of thinned events, M , we also wish to sample from the posterior distribution on the locations of the events, {~m }M . We use Metropolis­ s m=1 Hastings to perform this sampling. We iterate over each of the M thinned events and propose a new ~ sm location s via the proposal density q(~ sm ). ~m We then draw a function value g(~ ) from the Gaussm sian process, conditioned on the current state g M+K . The Metropolis­Hastings acceptance ratio for this proposal, integrating out the vertical coordinate r, is aloc = s q(~m s ) (1 + exp{g(~m )}) s ~m . ~ sm q(~ sm ) (1 + exp{g(~ )}) sm (9) have integrated out the vertical coordinates r that determined acceptance and rejection. We use three kinds of Markov transitions to sample from this joint distribution: 1) changing M , the number of latent thinned events, 2) changing the locations {~m }M of s m=1 the thinned events, and 3) changing the latent function vector g M+K . We also address hyperparameter inference in Section 3.5. 3.2. Sampling the Number of Thinned Events We use Metropolis­Hastings to sample from the number of thinned events M . We define a function b(K, M ) : N × N (0, 1) that gives the Bernoulli probability of proposing an insertion or a deletion. An insertion move consists of proposing a new s drawn ~ uniformly from T , followed by a draw of the corresponding function value g(~ ) from the Gaussian pros Typically, perturbative proposals on the order of the Gaussian process length scale are appropriate for these Metropolis­Hastings steps. If the move is accepted, the old values sm and g(~m ) can safely be forgotten. ~ s 3.4. Sampling the Function We use Hamiltonian Monte Carlo (Duane et al., 1987) for inference of the function values g M+K , to take Nonparametric Bayesian Inference in Poisson Processes with GP Intensities Poisson Intensity (t) advantage of gradient information and make efficient proposals. We perform gradient calculations in the "whitened" space resulting from linearly transforming g M+K with the inverse Cholesky decomposition of the covariance matrix , as this results in a betterconditioned space for calculations. The log conditional posterior distribution is ln p(g M+K | M, {sk }K , {sm }M , ) = k=1 m=1 1 - g T -1 g M+K - 2 M+K M K 2 1.5 1 0.5 0 0 5 10 15 20 25 30 35 40 Truth KS LGCP100 SGCP 45 50 (a) Synthetic data and model fits from 1 (s) ln (1 + exp{-g(sk )}) k=1 Poisson Intensity (t) 14 12 10 8 6 4 2 - m=1 ln (1 + exp{g(~k )}) + const. (10) s 3.5. Hyperparameter Inference Given the data, the thinned events and the function values g M+K , we might also like to take advantage of hierarchical Bayesian inference to sample from the posterior distribution on any hyperparameters in the covariance and mean functions. This can be performed straightforwardly using Hamiltonian Monte Carlo. The upper-bound parameter can also be inferred as part of the MCMC procedure. Conditioned on M , K, and the thinned event locations, the union of {sk }K k=1 and {sm }M are drawn from a homogeneous Poisson m=1 process on T with rate . The Gamma distribution with shape parameter and inverse-scale parameter provides a conditionally-conjugate prior for . We can sample from the conditional posterior distribution, which is Gamma with parameters post = + K + M post = + µ(T ). {~m }M s m=1 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 (b) Synthetic data and model fits from 2 (s) 3.5 3 2.5 2 1.5 1 0.5 0 0 10 20 30 40 50 60 70 80 90 100 (c) Synthetic data and model fits from 3 (s) Figure 3. Synthetic time series and mean functions from analyzed estimation methods. Thin solid lines show the true intensities, the thick solid lines show the SGCP posterior mean, the dot-dashed lines show the kernel estimator (KS), and the dotted lines show the posterior mean of the LGCP with 100 bins (LGCP100). 3.6. Predictive Samples The predictive distribution is often of interest in Bayesian modeling. This distribution is the one arising on sets of events, integrating out the posterior over intensity functions and hyperparameters. For the SGCP model, this corresponds to generating an entirely new time series from the model, integrating out g(s). It is straightforward to generate data from this distribution as a part of the MCMC inference: after any given step in the Markov chain, run the generative procedure of Algorithm 1, but condition on the current g M+K when drawing from the Gaussian process. That is, in Line 4 of Algorithm 1, condition on {sk , g(sk )}K k=1 and {sm , g(sm )}M . This provides new time series m=1 that are drawn from the predictive distribution. 4. Empirical Results We performed two types of empirical analysis of our approach. We used synthetic data with known (s) to compare the SGCP to other comparable nonparametric approaches. We also applied our method to two real-world data sets, one temporal and one spatial. 4.1. Synthetic Data We created three one-dimensional data sets using the following intensity functions: 1. A sum of an exponential and a Gaussian bump: 1 (s) = 2 exp{-s/15} + exp{-((s - 25)/10)2 } Nonparametric Bayesian Inference in Poisson Processes with GP Intensities Table 1. The SGCP is compared with the main frequentist kernel smoothing (KS) method and with the Log Gaussian Cox Process. The LGCP requires binning and the table shows the results with ten (LGCP10), 25 (LGCP25) and 100 (LGCP100) bins. The comparisons were done against time series from known intensity functions and compared on s norm to the true function and the mean predictive log probability (lp) of ten unseen time series from the same intensity function. SGCP 1 (s) 2 (s) KS LGCP10 LGCP25 LGCP100 5.96 -46.00 70.34 23.36 90.76 -53.67 6.12 -46.80 53.27 22.89 22.14 -52.31 5.44 -45.24 43.51 25.29 10.79 -47.16 0.01 0.009 Poisson Intensity (t) 0.008 0.007 0.006 0.005 0.004 0.003 0.002 0.001 0 1850 1860 1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970 Year 2 4.20 6.65 lp -45.11 -46.41 2 lp 38.38 73.71 24.45 28.19 (a) 0.35 0.3 0.25 0.2 Coal mine data, with mean intensity and quartile error bars. 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0.006 0.008 0.01 0.012 * Upper Bounding Intensity 100 150 200 250 300 Number of Thinned Events 11.41 30.56 3 (s) 2 lp -43.39 -46.47 on the interval [0, 50]. 53 events. 2. A sinusoid with increasing frequency: 2 (s) = 5 sin(s2 )+6 on [0, 5]. 29 events. 3. 3 (s) is the piecewise linear function shown in Figure 3(c), on the interval [0, 100]. 235 events. We compared the SGCP to the classical kernel smoothing (KS) approach of Diggle (1985). We performed edge-corrected kernel smoothing using a quartic kernel and the recommended mean-square minimization technique for bandwidth selection. We also compared to the most closely-related nonparametric Bayesian technique, the Log Gaussian Cox Process of Rathbun and Cressie (1994) and Møller et al. (1998). To implement this method, we used discretization to make a finitedimensional approximation and applied Markov chain Monte Carlo. We ran the Log Gaussian Cox Process method with 10 bins (LGCP10), 25 bins (LGCP25) and 100 bins (LGCP100). We used the squared-exponential kernel for both the SGCP and the LGCP, and sampled from the hyperparameters for both models. We report the numerically-estimated 2 distance between the mean (s) provided by each method and the known true function. We also report the mean log predictive probability of ten additional held-out time series generated from the same (known) (s). These predictive probabilities were calculated by numerical integrations of the functions. These results are provided in Table 1, and the resulting estimates (excluding LGCP10 and LGCP25) are shown in Figure 3. 4.2. Coal Mining Disaster Data We ran the Markov chain Monte Carlo inference procedure on the classic coal mine disaster data of Jar- 0.15 0.1 0.05 0 (b) Normalized histogram of sampled values. (c) Normalized histogram of the sampled number of thinned events. Figure 5. These three figures show the result of MCMC inference applied to coal mine disaster data. The top figure shows the inferred function with quartile error bars. The bottom two figures are normalized histograms of the sampled and the number of thinned events M . rett (1979). These data are the dates of 191 coal mine explosions that killed ten or more men in Britain between 15 March 1875 and 22 March 1962. Figure 5(a) shows the events along the top, and the inferred mean intensity function. Also shown are approximate quartile error bars. In Figure 5(b) is a normalized histogram of the inferred upper bounding intensity, . Figure 5(c) is a normalized histogram of the number of latent thinned events, M . 4.3. Redwoods Data We used a standard data set from spatial statistics to demonstrate the Sigmoidal Gaussian Cox Process in two dimensions. These data are the locations of redwood trees studied by Ripley (1977) and others. There are 195 points and, as in previous studies, they have been scaled to the unit square. Figure 4(a) shows the data along with the inferred mean intensity function. These data are useful for examining the placement of latent thinned events. Figure 4(b) shows a normalized histogram of where the these events tended to be located during the MCMC run. As expected, it is approximately a "negative" of the mean intensity; the Nonparametric Bayesian Inference in Poisson Processes with GP Intensities 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 100 150 250 300 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 200 50 (a) Redwood locations and mean intensity. (b) Normalized histogram of thinned event locations. Figure 4. The left figure shows redwood locations that have been scaled to a unit square. The mean intensity estimate from the SGCP is also shown. The black line shows the contour at (s) = 150. On the right is a normalized histogram of the locations of thinned events. This illustrates the tendency of these events to migrate to areas where (s) is small. thinned events are moving to places where it is necessary to "peg down" the intensity function. 5. Discussion 5.1. Computational Concerns Gaussian processes have significant computational demands: they have O(n3 ) time complexity for n input points, and O(n2 ) space complexity. When performing inference in the SGCP model, this means that each MCMC step costs O((K + M )3 ) as the thinned events must be included in the GP. Thus the approach we present is infeasible for data sets that have more than several thousand events. Generally, mixing of the Markov chain is a potential computational concern, but by using Hamiltonian Monte Carlo we have had good results. Autocorrelation plots from the 1 (s) inference are shown in Figure 6. 5.2. Contrasting With Other Methods The motivation for the Gaussian Cox process is primarily the ease with which one can specify prior beliefs about the variation of the intensity function of a Poisson process, without specifying a particular functional form. There have been several methods proposed for approximation to this model, most notably the discretization method of Rathbun and Cressie (1994) and Møller et al. (1998), to which we have compared the SGCP. Cunningham et al. (2008a) also use a discretization-based approximation for the Cox process and report performance improvements on their domain of interest. This method, however, suffers from three deficiencies for general application: it is a finite- dimensional proxy method, it uses the renewal-process formalism that cannot be easily generalized beyond the time domain, and its model is inconsistent in that it assigns prior mass to negative intensity functions. The Dirichlet process mixture of Beta distributions (Kottas & Sans´, 2007) has the appealing charactero istic that it allows tractable nonparametric Bayesian inference via relatively standard MCMC methods. However, it is unclear how to choose the hyperparameters of an infinite mixture of fixed densities to represent beliefs about a Poisson process intensity. The underlying infinite-dimensional stochastic process is the Dirichlet, and it is fixed throughout space and/or time. Variations in the intensity only arise out of the parametric variations in the distributions being mixed. Also, it is straightforward to understand the marginalization properties of the Gaussian Cox Process if the region of interest changes, but a mixture of Betas appears to have discontinuities when expanding the studied region. The Sigmoidal Gaussian Cox Process is superior to the frequentist kernel density approach (Diggle, 1985) in several ways. First, we obtain samples from the posterior rather than a point estimate of the unknown intensity function. Second, we are able to perform bandwidth selection in a principled way by sampling from the hyperparameters of the Gaussian process. Third, we are able to incorporate arbitrary nonstationary Gaussian processes into the framework without modification. Finally, our method does not suffer from detrimental edge effects. These improvements do, however, come at some computational cost. Nonparametric Bayesian Inference in Poisson Processes with GP Intensities Autocorrelation 5.3. Variations on the Model There are several ways in which the Sigmoidal Gaussian Cox Process we have presented could be modified for different modeling situations. For example, to arrive at bounded random intensities we used a constant dominating function , but other tractable parametric forms would be suitable and potentially more efficient. Also, we use the logistic function in Equation 1 to map from arbitrary functions to the bounded domain, but other functions could be used to achieve different properties. For example, if (·) was the normal cumulative distribution function and the Gaussian process prior was zero-mean and stationary with unit amplitude, then the random intensities (s) from the model would be marginally uniform beneath . 5.4. Summary of Contributions We have introduced a novel method of inference for the Gaussian Cox process that avoids the intractability typical of such models. Our model, the Sigmoidal Gaussian Cox Process, uses a generative prior that allows exact Poisson data to be generated from a random intensity function drawn from a transformed Gaussian process. With the ability to generate exact data, we can simulate a Markov chain on the posterior distribution of infinite-dimensional intensity functions without approximation and with finite computational resources. Our approach stands in contrast to other nonparametric Bayesian approaches to the inhomogeneous Poisson process in that it requires neither a crippling of the model, nor a finite-dimensional approximation. 1 0.8 0.6 0.4 0.2 0 -0.2 0 1 2 3 4 5 6 Lag / 1000 7 8 9 2 10 Figure 6. Two autocorrelation functions from the MCMC run from function 1 (s). The dashed line shows the autocorrelation of and the solid line shows the autocorrelation of the numerically-estimated 2 distance between the estimated function and the true function. Acknowledgements The authors wish to thank Maneesh Sahani and Zoubin Ghahramani for valuable comments. Ryan Adams thanks the Gates Cambridge Trust and the Canadian Institute for Advanced Research. References Adams, R. P., Murray, I., & MacKay, D. J. C. (2009). The Gaussian process density sampler. Advances in Neural Information Processing Systems 21 (pp. 9­ 16). Cox, D. R. (1955). Some statistical methods connected with series of events. Journal of the Royal Statistical Society, Series B, 17, 129­164. Cunningham, J., Shenoy, K., & Sahani, M. (2008a). Fast Gaussian process methods for point process intensity estimation. International Conference on Machine Learning 25 (pp. 192­199). Cunningham, J., Yu, B., Shenoy, K., & Sahani, M. (2008b). Inferring neural firing rates from spike trains using Gaussian processes. Advances in Neural Information Processing Systems 20 (pp. 329­336). Diggle, P. (1985). A kernel method for smoothing point process data. Applied Statistics, 34, 138­147. Duane, S., Kennedy, A. D., Pendleton, B. J., & Roweth, D. (1987). Hybrid Monte Carlo. Physics Letters B, 195, 216­222. Gregory, P. C., & Loredo, T. J. (1992). A new method for the detection of a periodic signal of unknown shape and period. The Astrophysical Journal, 398, 146­168. Heikkinen, J., & Arjas, E. (1999). Modeling a Poisson forest in variable elevations: a nonparametric Bayesian approach. Biometrics, 55, 738­745. Jarrett, R. G. (1979). A note on the intervals between coal-mining disasters. Biometrika, 66, 191­193. Kottas, A., & Sans´, B. (2007). Bayesian mixture o modeling for spatial Poisson process intensities, with applications to extreme value analysis. Journal of Statistical Planning and Inference, 137, 3151­3163. Lewis, P. A. W., & Shedler, G. S. (1979). Simulation of a nonhomogeneous Poisson process by thinning. Naval Research Logistics Quarterly, 26, 403­413. Møller, J., Syversveen, A. R., & Waagepetersen, R. P. (1998). Log Gaussian Cox processes. Scandinavian Journal of Statistics, 25, 451­482. Murray, I., Ghahramani, Z., & MacKay, D. (2006). MCMC for doubly-intractable distributions. Uncertainty in Artificial Intelligence 22 (pp. 359­366). Rasmussen, C. E., & Williams, C. K. I. (2006). Gaussian processes for machine learning. Cambridge, MA: MIT Press. Rathbun, S. L., & Cressie, N. (1994). Asymptotic properties of estimators for the parameters of spatial inhomogeneous Poisson point processes. Advances in Applied Probability, 26, 122­154. Ripley, B. D. (1977). Modelling spatial patterns. Journal of the Royal Statistical Society, Series B, 39, 172­212.