Selected Projects

"Linear Regression in C" (in progress, Github repository )

Implemented a module to perform linear regression from scratch in C. Created a companion linear algebra module to perform all the necessary matricial operations. The code uses linear regression and receives a dataset from the user, who can specify the learning rate and number of iterations. Work still in progress.

Keywords: C, Linear Regression, Linear Algebra, Machine Learning

"Integrating Domain Knowledge for Financial QA: A Multi-Retriever RAG Approach with LLMs" (with Yukun Zhang and Samyak Jain, report, poster)

We aim to address the errors of financial numerical reasoning QA tasks due to the lack of domain knowledge in finance. We build a RAG-like multi-retriever system to retrieve both external domain knowledge and internal question contexts as the inputs for our generator. Despite recent advances in LLMs, financial numerical questions are challenging because they require specific domain knowledge in finance as well as complex multi-step numeric reasoning. Our model outperforms of the non-expert human crowd, yet it is still not at the expert human crowd level. Our best neural-symbolic generator model outperforms the FinQA baseline on both execution and program accuracy.

Keywords: Python, Pytorch, NLP, RAG, BERT, DPR FAISS, FinRAD

"Solar Panel Detection on Satellite Images" (with Camila Nicollier, poster)

We train and implement a model for automating solar panel detection from satellite images. We deploy the new state-of-the-art YOLOv10 architecture only a few days after its release with satisfactory results. Automating solar panel identification is a relevant task in the context of renewable energies, where the need to keep track of these installations has increased exponentially and solar developers have little to no tools to quickly identify existing projects in a specific area.

Keywords: Python, Pytorch, Computer Vision, Object Detection, YOLO, YOLOv10, YOLOv9, CNN, Fast R-CNN

"Rock Song Lyric Generator" (Github, Deployed App)

Generative AI project that generates song lyrics starting from a user-provided first line. The model is based on a GPT2 model finetuned over a novel dataset of over 50,000 rock-song lyrics. The model generates songs for different genres and different song structures. Please refer to the demos page in the deployed app to see some examples.

Keywords: Python, GenAI, Pytorch, GPT2, Huggingface, Transformers, Pandas, rock, lyrics, music, text generation, finetuning, scraping

Coding Projects

"Sentiment Analysis of Stockwits and Stock Returns Prediction" (with Sang Ahn, report, poster)

Does the sentiment of public opinion influence the return of micro-cap stocks? Prior literature documents a strong correlation between stock price changes and public sentiments on large cap stocks. However, the question of whether the same results are expected for micro-cap companies is unclear. We proceed to investigate that question by first predicting the sentiment of over 80,000 tweets associated with micro-cap stocks from the Stocktwits website. Afterwards, we study whether those sentiments aid in predicting the trends of monthly returns. We find that, in some cases, the investors' sentiments move in opposition to the actual returns.

Keywords: Python, Pytorch, BERT, RoBERTA, Sentiment Analysis, VADER, Huggingface, Matplotlib, Pandas

"Textual Analysis of Whatsapp Conversations" (Github, Deployed App)

As an avid Whatsapp user, I have been intrigued by common questions such as: who talks the most out of my friends in the group chat? what time are people more active? and who is the most positive person in the group? To answer those questions and help you answer them in your own friend group, I created and deployed an app that receives a text file with your group chat and provides a detailed breakdown of the conversation trends in your group including: (i) a sender-level analysis of the evolution of messaging habits within the group; (ii) wordcloud of most frequently used words in the chat as a whole and per-sender; (iii) sentiment analysis of the chat; and (iv) the activation times of the texts.

Keywords: Python, NLTK, Sentiment Analysis, VADER, Huggingface, Matplotlib, Pandas

"Exploratory Analysis of Supermarket Sales" (Github)

Exploratory data analysis of the supermarket sales database from Kaggle using R and Jupyter Notebooks. I use econometric and statistical analysis along with datavisualization to answer three questions: what drives the purchased quantity? what drives the total purchased amount? and what drives the rating that customers give based on their experience?

Keywords: R Studio, Analysis, Visualization, GGPlot, Econometrics, Statistics

Research

"Disclosure and Insider Trading Under 10b5-1 Plans" (solo authored, in progress)

I analyze the strategic timing of disclosure of material non-public information by insiders who trade under the newly amended 10b5-1 plans. By developing an analytical model, I study the dynamics between an insider (`he'), who possesses private information and intends to sell shares of his own firm, and the market (`she'), who values the firm's shares based on the publicly available information. In contrast to prior models studying 10b5-1 plans, I consider the possibility that insiders can strategically time their disclosures in order to trade based on material non-public information. I capture this idea by introducing to my model an exogenous probability that the insider can observe additional information that he can disclose. I delineate the conditions under which an insider can initiate and terminate plans and disclose their additional private information. I find that the amendments increased price efficiency and the probability of disclosure, although they still allow the insider to profit from material non-public information. I also show preliminary descriptive statistics and comment on future work to test my model with data.

"Discretion to Encourage Information Acquisition" (with Nicolas Riquelme, submitted)

How much discretion should a biased manager give to an employee when the latter has to incur a cost to acquire information relevant to the optimal action path? We show that the answer depends on the nature of the task. If the task is one where the cost of effort is exogenous and the employee has to decide whether to exert effort to obtain precise information, then discretion increases with the cost. If the task is such that the employee must select a level of effort, the cost of exerting it increases with the effort and the more effort the higher probability of obtaining precise information, then discretion depends on the level of the bias: it is higher (compared to the cost-less benchmark) for sufficiently low biases and lower for sufficiently high biases. The manager faces the trade-off that increased discretion encourages information acquisition, but reduces the alignment of the employee’s action with the manager’s preferred action. The reduced discretion in the high bias case is a result of the manager insuring herself if the employee remains uninformed. Our results provide guidance to managers on how much discretion to provide to an employee depending on the task.