Self-Exciting Point Processes

Hawkes Processes, Self-Exciting Point Processes, Poisson Processes with Stochastic Intensity

The names are different but the idea is the same. In our everyday life we can observe many dynamic systems with cascading events. For example, in a conversation between two entities we may consider that each sentence one pronounces makes it more likely for the other one to talk. We consider the pronunciation of a message as an event. If many events happen for entity1, the model should have the happening of events for entity2 become more likely. This is the kind of dynamics presented here. The C/Python code is available at the bottom of this page.

Introducing Homogeneous Poisson Processes

Poisson Processes are frequently used when modeling events that happen at a random time. A good example is nuclear fission. This class of models is very restrictive because there cannot be any memory in the process that is modeled. If we model the happening of events for entities 1 and 2 with Poisson Processes of intensities 0.12 and 0.25 respectively, here is what we get:

(click on the picture to zoom in, jump values are rescaled)

What happens before an arbitrary time t, for a given process, does not affect what is going to happen next. Moreover , if we consider what has happened for process 1, it does not affect at all what is going to happen for process 2.

This model is very poor because it is entirely static. What we observe practically, in seismic activity, neuro-dynamics or stock markets, is quite different. What happens at a given moment is strongly dependent on what has occurred previously. If there is an earthquake at place1, then it is more likely that a replica will affect place2 if they are near one from the other. 

Let us present a time series analogy. For now we have two independent white noises. However we want to have an Auto-Regressive process of dimension 2. Unfortunately, it is difficult to introduce a notion of Auto-Regressive dynamics when it comes to Jump Processes. The Auto-regressive model will be hidden. It will not directly affect the jump but it will impact the Poisson Intensity. 

This class of models was introduced in A. G. Hawkes, Point spectra of some mutually exciting point processes, 1971 and A. G. Hawkes, Sprectra of some self-exciting and mutually exciting point processes, 1971. They have been extensively studied in D. J. Daley, D. Vere-Jones, An Introduction of the Theory of Point Processes. They are generally called Self-Exciting Point Processes or Hawkes Processes.

Non Homogeneous Poisson Processes

A Poisson Process (


) with intensity 
λ is defined by the following set of rules
  • The probability that there is a single jump between t and t+h is P(Xt+h − Xt =1) = λh o(h
  • The probability that there are no jumps between t and t+h is P(Xt+h − Xt =0) = 1 − λh o(h
  • The probability that there are more than one jump between t and t+h is P(Xt+h Xt  > 1) = o (h
(o(h) is a quantity that converges towards 0 as h goes to 0).

Hence these equations mean that the instantaneous likelihood of a jump per second is asymptotically λ. They also imply that with probability 1 there are no simultaneous jumps. Such a definition implies that the time between two consecutive jumps follows an exponential law of parameter λ. As the average time between two jumps is 1/λ, the higher λ, the more jumps per unit of time.

(click on the pictures to zoom in, jump values are rescaled)

Defining a Jump Process which intensity changes dynamically is not so different. We denote λt the intensity as it varies with t. We also consider the filtration associated with the Process (Xt)t>0 that we note (Ft). This (Ftobject represents all the information that is known of at a given time t from the previous observation of the process.

  • The probability, conditional to Ft, that there is a single jump between t and t+h is P(Xt+h − Xt =1 | Ft) = λh+o(h
  • The probability, conditional to Ft, that there are no jumps between t and t+h is P(Xt+h − Xt =0 | Ft) = 1 − λh+o(h
  • The probability, conditional to Ft, that there are more than one jump between t and t+h is P (Xt+h Xt > | Ft)  = o (h
Again, the probability that two jumps occur at the same time is 0.

Now, if we start with a deterministic intensity λt that varies with t, here is what we get:

(click on the picture to zoom in, jump values are rescaled)

In that case the dynamics are more interesting. As the intensity decreases, fewer jumps are simulated. Here it is important to note that the simulation process is non trivial. Simulating a Poisson Process with constant intensity is very easy. One simulates independent variables that follow an exponential law of parameter λ (they represent the times between consecutive jumps) and cumulatively sums them up.

When the intensity varies however, a technique such as the thinning procedure of Lewis and Shedler should be used. This acceptance/rejection method is described in Y. Ogata, On Lewis' Simulation Method for Point Processes. It is then very important to note that we cannot simulate such a process with the following algorithm
  1. For a given jump at time t, simulate time to next jump with t that follows exponential law of parameter λ
  2. Get the time of the next jump as t + t
  3. Compute the new current intensity as λt+∆t
  4. Go back to step 1 with t = t + t
This algorithm is flawed because it does not take into account the fact that λ varies between t + t. The acceptance/rejection method emulates a process that takes into account all the values of the intensity as it varies.

This method of Lewis and Shedler is valid for a deterministic intensity and for a stochastic one as well.

Poisson Process with a stochastic intensity, Self-exciting Point Process

The definition of a Non Homogeneous Poisson Process specifies the probability, conditional to Ft,  of a jump between t and t+h: P(Xt+X=1 | Ft) = λh+o(h). However it does not describe the dynamics of λt.

Above, an example is presented that shows the case of a deterministic λt. Things get interesting when λt becomes stochastic, in particular when it depends on the jumps that occurred previously.

In a Self-exciting Point Process, λt follows the dynamics
  • λt  μsumsJ(X)tφ(ts· 1 , where J(X)t is the set of jumps that occurred before t (1 is the standard size of the jumps)
  • μt 0 and deterministic (most of the time we will choose a constant value)
  • The convolution kernel φ is ≥ 0
  • The convolution kernel is causal, φ (t) = 0 if t < 0
  • The convolution kernel is stable, its integral on the real axis is < 1 (it dissipates energy)
We will use an exponentially decaying kernel: φ (t) = α · eβt if t > 0, else 0. It is a short memory kernel that offers fast simulation methods. In that case the stability condition becomes α<β.

Here is an example of such a process, in dimension 1, with parameters μ = 0.13, α = 0.023 and β = 0.11:

(click on the picture to zoom in, jump values are rescaled)

Multi-dimensional Self-exciting Point Process

Now that we have stochastic intensity Point Processes we are able to model more complicated phenomena. A Jump Process which intensity depends on the previous jumps is not enough though. It cannot model interacting entities. We want a model where jumps on entity1 excite entity2, that is to say increase the Poisson intensity of entity2.

Such a model is easy to set up now, we will simply use a matrix-based extension of the definitions above.

The multi-dimensional point-process is 

the intensities are

and the equation

defines the dynamics ( is the convolution product). As the dX are Dirac measures, the convolutions become discrete sums such like in the unidimensional definition.

The stability condition now becomes that the matrix of the integrals of the convolution kernels has all its Eigen-values strictly inside the unit sphere. The matrix is

for exponential φ kernels, this matrix becomes

Here is an example of simulation for a 4-dimensional process:

(click on the pictures to zoom in, jumps are rescaled)

The parameters of the simulation are 

Such a multidimensional process can model a vast range of phenomena. We could use it to fit changes of states in a network composed of 4 devices that interact with each other.

Simulating stock prices with Self-exciting Point Processes

In E. Bacry, S. Delattre, M. Hoffmann, J-F. Muzy, Modeling microstructure noise with mutually exciting point processes, 2010 and E. Bacry, J-F. Muzy, Hawkes model for price and trades high-frequency dynamics, 2013, several models of Self-exciting Point Processes are given that model high frequency stock market dynamics.

Three phenomena are studied:
  • Mean reversion of a single price (microstructure noise that biaises volatility estimators)
  • Correlation between two prices (and the Epps effect that biaises correlation estimators)
  • Market impact, trade order flow autocorrelation and mean reversion (joint modeling)
We will focus here on the simplest model: the mean reverting price. At very high frequency scale, when we observe trade prices, an obvious negative autocorrelation appears in the mid-price (average of best-bid and best-ask). That is to say that whenever the price goes up of one increment (0.05$ for example), it is very likely that it will then go down of one increment afterward. That is to say that the price reverts towards its mean.

Many models try to offer good explanations for such a phenomenon. The most famous one the high frequency white noise where the price is considered to be a hidden process blurred by a white noise. This model is poor however because it needs a time grid. That is to say that the price should be observed at t, t+∆t, t+nt, t+(n+1)∆t , with a constant duration between observation times. The other problem with this model is that the price can generally take any real value.

Practically, the price can only take discrete values separated by a minimum resolution called the tick size. For stocks, the tick size can be of 0.01$ or 0.0001€, which is generally small enough. For future contracts however, the tick is generally bigger which means that the price will vary very discretely on a sparse grid. The time on the contrary is continuous. A trade can occur at any point in time and two trades can be separated by less than a millisecond.

In such a context, Jump Processes seem very realistic. They model the occurrence of price variations. The variations in price themselves always have the same value of 1 tick. This last assumption is coarse, it does not hold at all for stock prices, but makes sense for future contracts (it is also possible to use a Compound Jump Process in order to have different variation sizes). The trajectory that we will simulate will therefore be as follows:

(click on the picture to zoom in)

Mean-reverting price modeling

We will use a 2-dimensional process with exponential kernels. The jump process will be

where the first component symbolizes upward price jumps and the second symbolizes downward price jumps. The simulated resulting price will then be the sum of the jumps of the first component minus the jumps of the second component. The intensities are

and the core equation

We model the mean reversion mechanism in a very straightforward way. We consider that the jumps of the first component do not increase its own intensity only increase the intensity of the second component and vice versa. This means that an increase in price will make a decrease more likely.

The matrix of alphas is therefore 

and the beta matrix is

The symmetry in the matrices means that the upward and downward price jumps impact each other the very same way. It will be useful when calibrating the model. The fewer parameters to estimate the better. Here is the kind of trajectories we simulate:

(click on the pictures to zoom in, the tick size here is scaled up to 1)

What is very nice here is the speed of the simulation (the core is written in C). It only takes 4 seconds to simulate 10000 trajectories of 1 hour each with more than 10 million jumps overall.

(click on the picture to zoom in)

The influence of the three parameters, μ, α, β, is very easy to grasp:
  • μ is a background exogenous intensity. The total number of jumps will increase proportionally to μ if α and β do not vary
  • α is the quantity of energy a single jump will bring in the intensity
  • β is the rate at which this energy will decay

Estimation by Maximum Likelihood Ratio

This technique of Maximum Likelihood Estimation (MLE in the code), is very usual. For a given component m of the multidimensional process, the log-likelihood of a trajectory is

where Jumps(m) is the set of jumps of the mth component. Λm (0, Tm) is the normalized terminal time of the component, it is defined by

This quantity is very interesting, we will come back to it in the final section. Now we focus on the computation of the Likelihood.

We use a simulated mean reverting price trajectory. It represents 5 days of 7 hours of trading each. The values of the parameters are
  • μ = 0.20
  • α = 0.025
  • β = 0.17
We plot a heat map of the Log-Likelihood for μ = 0.20 and varying α and β. The white marker shows the actual α and β parameters

(click on the picture to zoom in)

We can see that if the actual point is very close to the estimated maximum, the hyperboles α/β show very little gradient in the value of the likelihood. This is problematic when it comes to computing the maximum. Generally the precision is not so good. We estimate a couple (α,β) with a ratio close to the actual ratio but the respective estimated values of α and β can be far from the truth. The result also depends on the maximum search algorithm that is used. A Nelder-Mead algorithm generally gives good results if the polygon has enough points.

With a good maximization technique the MLE can give good results as long as the number of parameters that need to be estimated is low. As soon as the number of components in the Point Process increases or if there are too many distinct parameters, the precision of the estimation is not satisfying.

Checking the simulation algorithm with the normalized time Λ

If we consider a component m of a multidimensional process and not(ti ) its jumps, the normalized durations between consecutive jumps, Λm (ti, ti+1), follow an exponential law of parameter 1. If we want to check that the acceptance/rejection simulation is theoretically acceptable, we compare the empirical percentiles of the sampled Λ(ti, ti+1) with the theoretical percentiles of the exponential law of parameter 1.

Here is an example of QQ plot that we get for a given batch of simulations:

(click on the picture to zoom in)

The alignment of the QQ plot with the line y = x means that the simulation is valid.


The code given bellow in C code wrapped in Python with Cython. It is very straightforward.

In C, we directly access numpy arrays declared in the Python part of the code. This makes things quite fast, the only drawback is that the C arrays we use are all unidimensional.

In the code, in order to make the computation of the simulations, the normalized times and the likelihood much faster we widely use the multiplicative properties of the exponential decay. This implies that simulation with a long memory decay such as 1 / (1 + t) would be much longer because all past values of the process should be taken into account for each intensity computation. Here, we do not have to do that. We simply update the intensity decay by multiplying it by exp(-β(t-s)) where t is the instant of the new computation and s the last instant for which the intensity has been computed.

There is a maximum number of jumps that can be kept in memory in the C part of the code. This value can be changed in the .pyx file if overflows occur because too many jumps need to be simulated. However, if this happens systematically, even for a short simulated period, this means that the process is unstable and has theoretically an infinite number of jumps. In that case, the value of α/β needs to be decreased.

  • A. G. Hawkes, Point spectra of some mutually exciting point processes, 1971
  • D. J. Daley, D. Vere-Jones, An Introduction to the Theory of Point Processes: Volume 1: Elementary Thoery and Mehods, second edition, 2002
  • Y. Ogata, On Lewis' Simulation Method for Point Processes, 1986
  • T. Ozaki, Maximum Likelihood Estimation of Hawkes Self-Exciting Point Processes, 1979
  • E. Bacry, S. Delattre, M. Hoffmann, J. F. Muzy, Modeling microstructure noise with mutually exciting point processes, 2010
  • E. Bacry, S. Delattre, M. Hoffmann, J. F. Muzy, Scaling limits for Hawkes processes and application to financial statistics, 2012
  • E. Bacry, J. F. Muzy, Hawkes model for price and trades high-frequency dynamics, 2013
  • P. Hewlet, Clustering of order arrivals, price impact and trade path optimization, 2006
Quant Lullaby,
Nov 3, 2013, 2:34 PM