Hawkes Processes, SelfExciting Point Processes, Poisson Processes with Stochastic Intensity The names are different but the idea is the same. In our everyday life we can observe many dynamic systems with cascading events. For example, in a conversation between two entities we may consider that each sentence one pronounces makes it more likely for the other one to talk. We consider the pronunciation of a message as an event. If many events happen for entity1, the model should have the happening of events for entity2 become more likely. This is the kind of dynamics presented here. The C/Python code is available at the bottom of this page. Introducing Homogeneous Poisson Processes Poisson Processes are frequently used when modeling events that happen at a random time. A good example is nuclear fission. This class of models is very restrictive because there cannot be any memory in the process that is modeled. If we model the happening of events for entities 1 and 2 with Poisson Processes of intensities 0.12 and 0.25 respectively, here is what we get: (click on the picture to zoom in, jump values are rescaled) What happens before an arbitrary time t, for a given process, does not affect what is going to happen next. Moreover , if we consider what has happened for process 1, it does not affect at all what is going to happen for process 2. This model is very poor because it is entirely static. What we observe practically, in seismic activity, neurodynamics or stock markets, is quite different. What happens at a given moment is strongly dependent on what has occurred previously. If there is an earthquake at place1, then it is more likely that a replica will affect place2 if they are near one from the other. Let us present a time series analogy. For now we have two independent white noises. However we want to have an AutoRegressive process of dimension 2. Unfortunately, it is difficult to introduce a notion of AutoRegressive dynamics when it comes to Jump Processes. The Autoregressive model will be hidden. It will not directly affect the jump but it will impact the Poisson Intensity. This class of models was introduced in A. G. Hawkes, Point spectra of some mutually exciting point processes, 1971 and A. G. Hawkes, Sprectra of some selfexciting and mutually exciting point processes, 1971. They have been extensively studied in D. J. Daley, D. VereJones, An Introduction of the Theory of Point Processes. They are generally called SelfExciting Point Processes or Hawkes Processes. Non Homogeneous Poisson Processes A Poisson Process ( Xt
Hence these equations mean that the instantaneous likelihood of a jump per second is asymptotically λ. They also imply that with probability 1 there are no simultaneous jumps. Such a definition implies that the time between two consecutive jumps follows an exponential law of parameter λ. As the average time between two jumps is 1/λ, the higher λ, the more jumps per unit of time. Defining a Jump Process which intensity changes dynamically is not so different. We denote λt the intensity as it varies with t. We also consider the filtration associated with the Process (Xt)_{t>0 }that we note (Ft). This (Ft) object represents all the information that is known of at a given time t from the previous observation of the process.
Again, the probability that two jumps occur at the same time is 0. Now, if we start with a deterministic intensity λt that varies with t, here is what we get: (click on the picture to zoom in, jump values are rescaled) In that case the dynamics are more interesting. As the intensity decreases, fewer jumps are simulated. Here it is important to note that the simulation process is non trivial. Simulating a Poisson Process with constant intensity is very easy. One simulates independent variables that follow an exponential law of parameter λ (they represent the times between consecutive jumps) and cumulatively sums them up. When the intensity varies however, a technique such as the thinning procedure of Lewis and Shedler should be used. This acceptance/rejection method is described in Y. Ogata, On Lewis' Simulation Method for Point Processes. It is then very important to note that we cannot simulate such a process with the following algorithm
This algorithm is flawed because it does not take into account the fact that λ varies between t + ∆t. The acceptance/rejection method emulates a process that takes into account all the values of the intensity as it varies. This method of Lewis and Shedler is valid for a deterministic intensity and for a stochastic one as well. The definition of a Non Homogeneous Poisson Process specifies the probability, conditional to Ft, of a jump between t and t+h: P(Xt+h −Xt =1  Ft) = λt h+o(h). However it does not describe the dynamics of λt. Above, an example is presented that shows the case of a deterministic λt. Things get interesting when λt becomes stochastic, in particular when it depends on the jumps that occurred previously. In a Selfexciting Point Process, λt follows the dynamics
We will use an exponentially decaying kernel: φ (t) = α · e−βt if t > 0, else 0. It is a short memory kernel that offers fast simulation methods. In that case the stability condition becomes α<β. Here is an example of such a process, in dimension 1, with parameters μ = 0.13, α = 0.023 and β = 0.11: (click on the picture to zoom in, jump values are rescaled) Multidimensional Selfexciting Point Process Now that we have stochastic intensity Point Processes we are able to model more complicated phenomena. A Jump Process which intensity depends on the previous jumps is not enough though. It cannot model interacting entities. We want a model where jumps on entity1 excite entity2, that is to say increase the Poisson intensity of entity2. Such a model is easy to set up now, we will simply use a matrixbased extension of the definitions above. The multidimensional pointprocess is the intensities are and the equation defines the dynamics (⋆ is the convolution product). As the dX are Dirac measures, the convolutions become discrete sums such like in the unidimensional definition. The stability condition now becomes that the matrix of the integrals of the convolution kernels has all its Eigenvalues strictly inside the unit sphere. The matrix is (click on the pictures to zoom in, jumps are rescaled) The parameters of the simulation are Such a multidimensional process can model a vast range of phenomena. We could use it to fit changes of states in a network composed of 4 devices that interact with each other. Simulating stock prices with Selfexciting Point Processes In E. Bacry, S. Delattre, M. Hoffmann, JF. Muzy, Modeling microstructure noise with mutually exciting point processes,
2010 and E. Bacry, JF. Muzy, Hawkes model for price and trades highfrequency dynamics, 2013, several models of Selfexciting Point Processes are given that model high frequency stock market dynamics. Three phenomena are studied:
We will focus here on the simplest model: the mean reverting price. At very high frequency scale, when we observe trade prices, an obvious negative autocorrelation appears in the midprice (average of bestbid and bestask). That is to say that whenever the price goes up of one increment (0.05$ for example), it is very likely that it will then go down of one increment afterward. That is to say that the price reverts towards its mean. Many models try to offer good explanations for such a phenomenon. The most famous one the high frequency white noise where the price is considered to be a hidden process blurred by a white noise. This model is poor however because it needs a time grid. That is to say that the price should be observed at t, t+∆t, t+n∆t, t+(n+1)∆t , with a constant duration between observation times. The other problem with this model is that the price can generally take any real value. Practically, the price can only take discrete values separated by a minimum resolution called the tick size. For stocks, the tick size can be of 0.01$ or 0.0001€, which is generally small enough. For future contracts however, the tick is generally bigger which means that the price will vary very discretely on a sparse grid. The time on the contrary is continuous. A trade can occur at any point in time and two trades can be separated by less than a millisecond. In such a context, Jump Processes seem very realistic. They model the occurrence of price variations. The variations in price themselves always have the same value of 1 tick. This last assumption is coarse, it does not hold at all for stock prices, but makes sense for future contracts (it is also possible to use a Compound Jump Process in order to have different variation sizes). The trajectory that we will simulate will therefore be as follows: (click on the picture to zoom in) Meanreverting price modeling We will use a 2dimensional process with exponential kernels. The jump process will be where the first component symbolizes upward price jumps and the second symbolizes downward price jumps. The simulated resulting price will then be the sum of the jumps of the first component minus the jumps of the second component. The intensities are and the core equation We model the mean reversion mechanism in a very straightforward way. We consider that the jumps of the first component do not increase its own intensity only increase the intensity of the second component and vice versa. This means that an increase in price will make a decrease more likely. The matrix of alphas is therefore and the beta matrix is The symmetry in the matrices means that the upward and downward price jumps impact each other the very same way. It will be useful when calibrating the model. The fewer parameters to estimate the better. Here is the kind of trajectories we simulate: (click on the pictures to zoom in, the tick size here is scaled up to 1) What is very nice here is the speed of the simulation (the core is written in C). It only takes 4 seconds to simulate 10000 trajectories of 1 hour each with more than 10 million jumps overall. The influence of the three parameters, μ, α, β, is very easy to grasp:
Estimation by Maximum Likelihood Ratio This technique of Maximum Likelihood Estimation (MLE in the code), is very usual. For a given component m of the multidimensional process, the loglikelihood of a trajectory is where Jumps(m) is the set of jumps of the m^{th} component. Λm (0, Tm) is the normalized terminal time of the component, it is defined by This quantity is very interesting, we will come back to it in the final section. Now we focus on the computation of the Likelihood. We use a simulated mean reverting price trajectory. It represents 5 days of 7 hours of trading each. The values of the parameters are
We plot a heat map of the LogLikelihood for μ = 0.20 and varying α and β. The white marker shows the actual α and β parameters (click on the picture to zoom in) We can see that if the actual point is very close to the estimated maximum, the hyperboles α/β show very little gradient in the value of the likelihood. This is problematic when it comes to computing the maximum. Generally the precision is not so good. We estimate a couple (α,β) with a ratio close to the actual ratio but the respective estimated values of α and β can be far from the truth. The result also depends on the maximum search algorithm that is used. A NelderMead algorithm generally gives good results if the polygon has enough points. With a good maximization technique the MLE can give good results as long as the number of parameters that need to be estimated is low. As soon as the number of components in the Point Process increases or if there are too many distinct parameters, the precision of the estimation is not satisfying. Checking the simulation algorithm with the normalized time Λ Here is an example of QQ plot that we get for a given batch of simulations: (click on the picture to zoom in) The alignment of the QQ plot with the line y = x means that the simulation is valid. Implementation The code given bellow in C code wrapped in Python with Cython. It is very straightforward. In C, we directly access numpy arrays declared in the Python part of the code. This makes things quite fast, the only drawback is that the C arrays we use are all unidimensional. In the code, in order to make the computation of the simulations, the normalized times and the likelihood much faster we widely use the multiplicative properties of the exponential decay. This implies that simulation with a long memory decay such as 1 / (1 + t) would be much longer because all past values of the process should be taken into account for each intensity computation. Here, we do not have to do that. We simply update the intensity decay by multiplying it by exp(β(ts)) where t is the instant of the new computation and s the last instant for which the intensity has been computed. There is a maximum number of jumps that can be kept in memory in the C part of the code. This value can be changed in the .pyx file if overflows occur because too many jumps need to be simulated. However, if this happens systematically, even for a short simulated period, this means that the process is unstable and has theoretically an infinite number of jumps. In that case, the value of α/β needs to be decreased. References
