Synthetic Market Data

Current Theme: Market Generation and Applications

Workshop on Market Generator Models

Organised by:

Sam Cohen (Oxford), Blanka Horvath (King's College), Kathrin Glau (Queen Mary University), Lukasz Szpruch (Edinburgh)

and The Alan Turing Institute

Second Workshop on Market Generation:

Applications and Regulatory Aspects

The workshop will take place on May 19th 2020 online via Zoom

Registration

https://turing-uk.zoom.us/webinar/register/WN_fWk3f16qQRuFfJzuXo5_lw

Programme

16:10 Edgar Alonso Lopez-Rojas (EalaX)

Financial Synthetic Data is the New Oil for FinCrime Analytics.

16:30 Alexei Kondratyev and Christian Schwarz (Standard Chartered Bank)

Data Anonymisation, Outlier Detection and Fighting Overfitting with Restricted Boltzmann Machines. (See: Anonymisation)

17:00 Panel Discussion

Isabelle Flückinger, Robert Graumans, Cody Hyndman, Alexei Kondratyev, Gordon Lee, Christian Schwarz

17:40 Patrick Kidger (Oxford)

Neural Controlled Differential Equations for Irregular Time Series.

(See also: Signatory library)

Abstracts of the Second Workshop

Edgar Alonso Lopez-Rojas (Ealax): Financial Synthetic Data is the New Oil for FinCrime Analytics

Abstract:

Financial Data is very constrained by customer privacy regulations such as GDPR. This hampers the possibility of collaboration between different stakeholders in financial problems such as optimising Anti-Money Laundering (AML) tools and reducing financial crime. Solutions based on Machine Learning (ML) are on the way, but unfortunately the quality data required to train the models is just not available. The three biggest drawbacks of using ML for AML are: lack of ’labelled data’, the ’imbalance class’ of misbehaviour financial activity and finally the evolving threat of finCrime that makes ’training datasets obsolete’. All of these drawbacks are derived from the unknown ‘hidden crime’ problem. We address these problems using advanced financial simulation. By creating digital synthetic twins of financial data especially enriched, we are now able to develop and benchmark advanced solutions based on machine learning. We add to our models the known normal and crime customers dynamics that are specially tailored to match realistic crime scenarios in our financial institutions. These synthetic datasets are the new oil for ML to tackle complex prob- lems and improve our AML controls. Our simulators output augmented non-confidential synthetic data, resulting in trustable enriched synthetic financial data ready for solution providers of advanced analytics.

Alexei Kondratyev and Christian Schwarz (Standard Chartered Bank) : Data Anonymisation, Outlier Detection and Fighting Overfitting with Restricted Boltzmann Machines

Abstract:

We propose a novel approach to the anonymisation of datasets through non-parametric learning of the underlying multivariate distribution of dataset features and generation of the new synthetic samples from the learned distribution. The main objective is to ensure equal (or better) performance of the classifiers and regressors trained on synthetic datasets in comparison with the same classifiers and regressors trained on the original data. The ability to generate unlimited number of synthetic data samples from the learned distribution can be a remedy in fighting overtting when dealing with small original datasets. When the synthetic data generator is trained as an autoencoder with the bottleneck information compression structure we can also expect to see a reduced number of outliers in the generated datasets, thus further improving the generalization capabilities of the classifiers trained on synthetic data. We achieve these objectives with the help of the Restricted Boltzmann Machine, a special type of generative neural network that possesses all the required properties of a powerful data anonymiser. The talk is based on joint work with B. Horvath.

The article can be accessed at https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3526436 .

Patrick Kidger (Oxford): Neural Controlled Differential Equations for Irregular Time Series

Abstract:

Neural ordinary differential equations are an attractive option for modelling temporal dynamics. However, a fundamental issue is that given some (learnt) vector field, the solution to an ordinary differential equation is determined by its initial condition; there is no mechanism for adjusting the trajectory based on subsequent observations. In this work, we demonstrate how this may be resolved through the well-understood mathematics of controlled differential equations. The resulting neural controlled differential equation model is directly applicable to the general setting of partially-observed irregularly-sampled multivariate time series, and (unlike previous work on this problem) may utilise memory-efficient adjoint-based backpropagation even across observations. We demonstrate that it outperforms similar (ODE or RNN based), state of the art models in empirical studies on several datasets. We provide additional theoretical results showing that our model is a universal approximator, and that it subsumes the apparently-similar continuous-RNN models. The talk is based on joint work with Cristopher Salvi, James Morrill, James Foster and Terry Lyons.

Patrick is also Coauthor of the Signatory project which can be found here: https://arxiv.org/abs/2001.00706.

First Workshop on Market Generation

The workshop took place on May 6th 2020 online via Zoom

Programme

15:00 Beatrice Acciaio and Tianlin Xu (London School of Economics)

15:30 Charles-Albert Lehalle (Capital Fund Management)

16:00 Panel Discussion

Rama Cont, Alexei Kondratyev, Terry Lyons, Charles-Albert Lehalle, Josef Teichmann,

16:40 Christa Cuchiero (Unviersity of Vienna) and Josef Teichmann (ETH Zurich)

17:20 Hans Buehler and Ben Wood (JP Morgan) tbc and Blanka Horvath (King's)

17:40 Lukasz Szpruch (Edinburgh) and Imanol Perez Arribas (Oxford)

Abstracts and further information can be found below.

Registration

THIS WORKSHOP IS BOOKED OUT. Further registrations will be put on the waiting list.

Abstracts and background material of the first workshop

Abstracts

Beatrice Acciaio and Tianlin Xu (LSE): Learning dynamic GANs via Causal Optimal Transport

Abstract:

We propose a version of Generative Adversarial Networks suitable to generate sequential data. Inspired by Genevay, Peyré, and Cuturi [Learning Generative Models with Sinkhorn Divergences, arXiv:1706.00292v3], we use transport-based costs and an entropic penalization that allows the use of Sinkhorn divergences. In order to take into account sequentiality, we impose the causality constraint on the transports plans. Remarkably, this naturally provides a way to parametrize the cost function that will be learned by the discriminator.

Charles-Albert Lehalle (CFM) : Mean Field Game Driven Market Simulations

Abstract:

The use of Reinforcement Learning (RL) opens the door to model free control, and hence can leverage on any collection of trajectories to tune the control, provided they are numerous enough. This collection can be made of simulated data generated according to Markovian and highly structured models, but also on historical data. The latter are usually not numerous enough, and have to be augmented by trajectories generated a non parametric way, reproducing main statistical properties of past observations. I will expose a way to generate trajectories that we introduced in 2011 with Olivier Guéant and Julien Razafinimanana (see "High-frequency simulations of an order book: a two-scale approach." Econophysics of Order-driven Markets, pp73-92). It is particulary adapted when the control influences the trajectories, like in optimal trading, where actions have an influence on price moves in a feedback, price-impact driven, loop. The idea is to use historical data to infer a mean field that will be used as the "master seed" of the simulation in a second step. It is then needed to write the transitions in the state space as conditional expectations with respect to the distance between the observed state and the mean field. A Monte-Carlo simulation then generates dynamics according to these conditional expectations. With this methodology, optimal trading algorithms can be trained or backtested. To conclude, I will list some of the drawbacks of using non parametric simulations versus model-driven ones.

Christa Cuchiero (Unviersity of Vienna) and Josef Teichmann (ETH Zurich): Deep calibration of LSV models

The talk is based on joint work with Wahid Khosrawi.

Hans Buehler and Ben Wood (JP Morgan) tbc Blanka Horvath (King's): A Data-driven Market Simulator for Small Data Environments

Abstract:

In this talk we present here a parsimonious generative model that works reliably even in environments where the amount of available training data is notoriously small. Furthermore, we discuss how a rough paths perspective combined with a parsimonious Variational Autoencoder framework provides a powerful way for encoding and evaluating financial time series data in such environments. Lastly, we also discuss some pricing and hedging considerations in a DNN framework and their connection to Market Generation. The talk is based on joint work of H. Buehler, B. Horvath, T. Lyons, I. Perez Arribaz, and B. Wood.

Lukasz Szpruch (Edinburgh) and Imanol Perez Arribas (Oxford): Neural SDEs - two perspectives.

Abstract:

Classical financial risk models provide only an approximate description of the reality, and the risk of using an inadequate model, often called Knightian uncertainty, is often hard to detect. Modern data science techniques are opening the door to more robust data-driven risk models. However, these models suffer from a lack of interpretability. Further, for many machine learning models, it is not clear how to consistently calibrate under both Q and P measures. Our work aims to achieve the best of both worlds. In the first part of the talk, we combine (deep) neural networks with classical stochastic risk models. We demonstrate that the approach can be used to produce robust bounds for pricing complex financial products or can be used to simulate future possible market scenarios needed for computing risk profiles or hedging strategies.

In the second part of the talk, we will provide a rough path perspective of neural SDEs that allows for swift calibration to vanilla and exotic products. Importantly our approach is underpinned by rigorous mathematical analysis, that provides theoretical guarantees for convergence of presented algorithms.

Google Sites

Report abuse