Vitaly Meursault

Machine Learning Economist

Federal Reserve Bank of Philadelphia

10 N Independence Mall W Philadelphia, PA 19106

vitaly.meursault@phil.frb.org

Adjunct Professor of Finance

AI in Business (2024)

Carnegie Mellon University

Papers

Operationalizing the Search for Less Discriminatory Alternatives in Fair Lending (with Talia Gillis and Berk Ustun) Link

FAccT '24: Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency

The Less Discriminatory Alternative is a key provision of the disparate impact doctrine in the United States. In fair lending, this provision mandates that lenders must adopt models that reduce discrimination when they do not compromise their business interests. In this paper, we develop practical methods to audit for less discriminatory alternatives. Our approach is designed to verify the existence of less discriminatory machine learning models – by returning an alternative model that can reduce discrimination without compromising performance (discovery) or by certifying that an alternative model does not exist (refutation). We develop a method to fit the least discriminatory linear classification model in a specific lending task – by minimizing an exact measure of disparity (e.g., the maximum gap in group FNR) and enforcing hard performance constraints for business necessity (e.g., on FNR and FPR). We apply our method to study the prevalence of less discriminatory alternatives on real-world datasets from consumer finance applications. Our results highlight how models may inadvertently lead to unnecessary discrimination across common deployment regimes, and demonstrate how our approach can support lenders, regulators, and plaintiffs by reliably detecting less discriminatory alternatives in such instances.

One Threshold Doesn't Fit All: Advancing Fairness in Lending Through Machine Learning (with Daniel Moulton, Larry Santucci, and Nathan Schor) Link

Conditionally accepted at Journal of Policy Analysis and Management (JPAM)

Modeling advances create credit scores that predict default better overall, but raise concerns about their effect on protected groups. Focusing on low- and moderate-income (LMI) areas, we use an approach from the Fairness in Machine Learning literature — fairness constraints via group-specific prediction thresholds — and show that gaps in true positive rates (% of non-defaulters identified by the model as such) can be significantly reduced if separate thresholds can be chosen for non-LMI and LMI tracts. However, the reduction isn’t free as more defaulters are classified as good risks, potentially affecting both consumers’ welfare and lenders’ profits. The trade-offs become more favorable if the introduction of fairness constraints is paired with the introduction of more sophisticated models, suggesting a way forward. Overall, our results highlight the potential benefits of explicitly considering sensitive attributes in the design of loan approval policies and the potential benefits of output-based approaches to fairness in lending.

PEAD.txt: Post-Earnings Announcement Drift Using Text (with Pierre Liang, Bryan Routledge and Madeline Scanlon) Link

Journal of Financial and Quantitative Analysis (JFQA), 2023 

We construct a new numerical measure of earnings announcement surprises, standardized unexpected earnings call text (SUE.txt), that does not explicitly incorporate the reported earnings value. SUE.txt generates a text-based post-earnings announcement drift (PEAD.txt) larger than the classic PEAD and can be used to create a profitable trading strategy. Leveraging the prediction model underlying SUE.txt, we propose new tools to study the news content of text: paragraph impact and paragraph classification scheme based on the business curriculum. With these tools, we document many asymmetries in the distribution of news across content types, demonstrating that earnings calls contain a wide range of news about firms and their environment.

Working papers

Mapping Inventions in the Space of Ideas, 1836-2022: Representation, Measurement, and Validation (with Ina Ganguli, Jeffrey Lin, and Nicholas Reynolds) Link

How well can different methods meaningfully represent inventions in the "space of ideas"? We evaluate four leading natural language processing (NLP) models, each of which produces a different numerical representation of patent text. We design three novel, domain-specific validation tasks to select between these representations. Sentence-BERT (S-BERT) significantly outperforms other widely-used NLP models, creating metrics better aligned with both expert and non-expert human judgment about patent similarity. The choice of representation matters significantly for economic measurement. According to S-BERT, contemporaneous patents have declined in similarity over more than a century, as inventions have "spread out" on an expanding knowledge frontier. Other representations report ambiguous or diverging patterns. We reproduce the S-BERT result using newly-digitized records of historical interferences, which show secular declines in the rate of multiple invention. Our results highlight the importance of validation and model selection as an essential step in constructing and using measures derived from patent text. 

We are extending our analysis to include the latest generation of "ChatGPT-era" embedding models. OpenAI's latest embeddings significantly outperform S-BERT's already impressive performance in our main validation task. We are in the process of fully integrating results based on these new embeddings into our paper.

Corporate Disclosure: Facts or Opinions? (with Shimon Kogan)

A large body of literature documents the link between textual communication (e.g., news articles, earning calls) and firm fundamentals, either through pre-defined “sentiment” dictionaries or through machine learning approaches. Surprisingly, little is known about why textual communication matters. In this paper, we take a step in that direction by developing a new methodology to automatically classify statements into objective (“facts”) and subjective (“opinions”) and apply it to transcripts of earning calls. The large scale estimation suggests several novel results: (1) Facts and opinions are both prominent parts of corporate disclosure, taking up roughly equal parts, (2) higher prevalence of opinions is associated with investor disagreement, (3) anomaly returns are realized around the disclosure of opinions rather than facts, and (4) facts have a much stronger correlation with contemporaneous financial performance but facts and opinions have an equally strong association with financial results for the next quarter.

The Language of Earnings Announcements Link

This study quantifies and characterizes the information content of earnings an- nouncement language via a statistical model of language that extracts the latent fac- tors most associated with absolute returns around the time earnings announcements are released. The language of earnings announcements explains 11% of the variation in absolute announcement returns out-of-sample. That is comparable to the explana- tory power of standard numerical variables. Using the latent factors to recover the features that are important, we show that the information content depends on what is mentioned, how it is mentioned, and where in a document it is mentioned. Find- ings show that earnings components are more important than bottom line net income. Sentiment and forward-lookingness amplify the information content of all themes, and information content is more concentrated at the beginnings of texts.

Bank Credit Supply and Shadow Mortgage Lending SSRN Page

With a novel application of a simple supply-demand decomposition methodology to residential mortgage markets, I analyze the role of bank credit supply, shadow lender's own supply, and local demand in lending growth by shadow mortgage companies. I show that shadow lending grew faster in counties exposed to increases in bank credit supply. At the same time, shadow firms' own supply shocks explain more variation in shadow lending growth than bank supply shocks. These results suggest that shadow lenders have operational advantages over banks, but are also connected to them, perhaps via warehouse lines of credit.