Coding Projects

"Integrating Domain Knowledge for Financial QA: A Multi-Retriever RAG Approach with LLMs" (with Yukun Zhang and Samyak Jain, report, poster)

We aim to address the errors of financial numerical reasoning QA tasks due to the lack of domain knowledge in finance. We build a RAG-like multi-retriever system to retrieve both external domain knowledge and internal question contexts as the inputs for our generator. Despite recent advances in LLMs, financial numerical questions are challenging because they require specific domain knowledge in finance as well as complex multi-step numeric reasoning. Our model outperforms of the non-expert human crowd, yet it is still not at the expert human crowd level. Our best neural-symbolic generator model outperforms the FinQA baseline on both execution and program accuracy.

Keywords:  Python, Pytorch, NLP, RAG, BERT, DPR FAISS, FinRAD

"Solar Panel Detection on Satellite Images" (with Camila Nicollier, poster)

We train and implement a model for automating solar panel detection from satellite images. We deploy the new state-of-the-art YOLOv10 architecture only a few days after its release with satisfactory results. Automating solar panel identification is a relevant task in the context of renewable energies, where the need to keep track of these installations has increased exponentially and solar developers have little to no tools to quickly identify existing projects in a specific area.

Keywords:  Python, Pytorch, Computer Vision, Object Detection, YOLO, YOLOv10, YOLOv9, CNN, Fast R-CNN

"Sentiment Analysis of Stockwits and Stock Returns Prediction" (with Sang Ahn, report, poster)

Does the sentiment of public opinion influence the return of micro-cap stocks? Prior literature documents a strong correlation between stock price changes and public sentiments on large cap stocks. However, the question of whether the same results are expected for micro-cap companies is unclear. We proceed to investigate that question by first predicting the sentiment of over 80,000 tweets associated with micro-cap stocks from the Stocktwits website. Afterwards, we study whether those sentiments aid in predicting the trends of monthly returns. We find that, in some cases, the investors' sentiments move in opposition to the actual returns.

Keywords:  Python, Pytorch, BERT, RoBERTA, Sentiment Analysis, VADER, Huggingface, Matplotlib, Pandas

"Textual Analysis of Whatsapp Conversations" (Github, Deployed App)

As an avid Whatsapp user, I have been intrigued by common questions such as: who talks the most out of my friends in the group chat? what time are people more active? and who is the most positive person in the group? To answer those questions and help you answer them in your own friend group, I created and deployed an app that receives a text file with your group chat and provides a detailed breakdown of the conversation trends in your group including: (i) a sender-level analysis of the evolution of messaging habits within the group; (ii) wordcloud of most frequently used words in the chat as a whole and per-sender; (iii) sentiment analysis of the chat; and (iv) the activation times of the texts.

Keywords:  Python, NLTK, Sentiment Analysis, VADER, Huggingface, Matplotlib, Pandas

"Rock Song Lyric Generator" (Github, Deployed App)

Generative AI project that generates song lyrics starting from a user-provided first line. The model is based on a GPT2 model finetuned over a novel dataset of over 50,000 rock-song lyrics. The model generates songs for different genres and different song structures. Please refer to the demos page in the deployed app to see some examples. 

Keywords:  Python, GenAI, Pytorch, GPT2, Huggingface, Transformers, Pandas, rock, lyrics, music, text generation, finetuning, scraping

"Exploratory Analysis of Supermarket Sales" (Github)

Exploratory data analysis of the supermarket sales database from Kaggle using R and Jupyter Notebooks. I use econometric and statistical analysis along with datavisualization to answer three questions: what drives the purchased quantity? what drives the total purchased amount? and what drives the rating that customers give based on their experience?

Keywords:  R Studio, Analysis, Visualization, GGPlot, Econometrics, Statistics

Research

"Disclosure and Insider Trading Under 10b5-1 Plans" (solo authored, in progress)

I analyze the strategic disclosure of material non-public information by insiders under the newly amended 10b5-1 plans. By developing an analytical model, I study the dynamics between an insider ('he'), who possesses private information and intending to sell shares, and an outsider ('she'), who values the firm's shares based on her accessible information. Additionally, there is an exogenous probability of the insider obtaining additional private information, which he can choose to disclose or withhold. I delineate the conditions under which insiders initiate and terminate their plans. The findings reveal that, in comparison to the pre-amendment scenario, when information asymmetry between insiders and outsiders is sufficiently high, the 10b5-1 amendments result in only insiders with more unfavourable private information setting up plans. In cases where there is no significant information asymmetry, the decision to trade or not relies solely on the information that the insider can disclose. To enhance the representation of insider trading under 10b5-1 plans, I also propose a more sophisticated model. Additionally, I outline the future steps for calibrating this model with available data.

How much discretion should a biased manager give to an employee when the latter has to incur a cost to acquire information relevant to the optimal action path? We show that the answer depends on the nature of the task. If the task is one where the cost of effort is exogenous and the employee has to decide whether to exert effort to obtain precise information, then discretion increases with the cost. If the task is such that the employee must select a level of effort, the cost of exerting it increases with the effort and the more effort the higher probability of obtaining precise information, then discretion depends on the level of the bias: it is higher (compared to the cost-less benchmark) for sufficiently low biases and lower for sufficiently high biases. The manager faces the trade-off that increased discretion encourages information acquisition, but reduces the alignment of the employee’s action with the manager’s preferred action. The reduced discretion in the high bias case is a result of the manager insuring herself if the employee remains uninformed. Our results provide guidance to managers on how much discretion to provide to an employee depending on the task.