Textual-based approach to measure Shadow banking activities: Summary of methods

The shadow banking index is parsed from bank financial reports or 10-Ks, which are then processed using computational linguistic tools and machine learning techniques. The index is based solely on risk disclosures made by publicly traded and nonbank Canadian financial institutions. The data is extracted from SEDAR using a web scrapping algorithm for each bank from 1997 to 2017. We then run a topic modeling module to obtain a 20-factor Latent Dirichlet Allocation (LDA) model for all years. Next, we extract from the 20-factor 40 bigrams, and trigrams related to shadow banking activities. Because LDA puts high probability weights on bigrams and trigrams present in the risks disclosed by many banks, these bigrams and trigrams are systemically important and not idiosyncratic. Finally, we score each bank designed as a systemically important institution to determine its final risk exposures using cosine similarities.

Overview of some topics detected using the Latent Dirichlet Allocation