National University of Singapore

Department of Industrial Systems Engineering & Management
Department of Economics

B.Eng (ISE) and B.Soc.Sci (Honours) Final Year Integrated Project (2023/2024)

 Integrating Textual Analysis into Machine Learning for Asset Return Prediction

Ying Fangfei Anna

Summary

(a) Purpose and Problem

This report investigates how sentiment analysis from platforms like Sina Weibo can improve predictive models for stock index returns in the Chinese and U.S. technology sectors. It contributes by integrating sentiment features with traditional stock-level characteristics and macroeconomic predictors, filling a gap in the existing literature. Additionally, it explores the impact of local and foreign investor sentiments on asset returns, providing insights for global market investors.

(b) Key Findings

Model Specification Sensitivity. The impact of SinaWeibo sentiment on asset return prediction varies depending on the model specification, including regression vs. classification and linear/logit vs. CatBoost models. While Sina Weibo sentiment contributes to model performance, its effectiveness differs across model types. 

Sentiment Distribution Features. Sentiment distribution features, such as standard deviation and skewness of sentiments, emerged as crucial predictors in asset return prediction models. These features capture the variability and asymmetry in sentiment data, providing valuable insights into market sentiment dynamics. 

Market Dependency. We observed that Sina Weibo sentiment has a stronger influence on the Chinese market compared to the U.S. This discrepancy can be attributed to the global nature of the U.S. market, attracting investors worldwide, whereas Sina Weibo users represent only a fraction of the global investor base.

Cross-Market Dynamics. The Chinese and U.S. technology sectors exhibit different dynamics between asset returns and predictors due to inherent structural and cultural differences. Despite these disparities, our findings suggest some level of transferability in the predictive power of models between the two markets. Delayed Investor Reactions. Our analysis revealed delayed investor reactions in both the Chinese and U.S. markets, with improved model performance at h = 5 for China and h = 21 for the U.S. in response to Sina Weibo sentiment signals.  Understanding these delays is crucial for accurate asset return predictions.

(c) Conclusion

In conclusion, this study emphasizes the crucial role of sentiment analysis in asset return prediction and underscores the necessity for adaptive models capable of accommodating temporal variations in sentiment signals. While sentiment features demonstrate their importance in enhancing model performance, their effectiveness is contingent upon model specifications and market environments. These findings offer valuable insights into the intricate nature of market sentiment and its influence on asset returns, setting the stage for future research to delve into innovative methodologies and tackle prevailing challenges within financial markets.

(d) Recommendations

Based on the key findings, several recommendations can be proposed to enhance the effectiveness of asset return prediction models. First, given the sensitivity of sentiment-based features to model specifications, it is imperative to conduct thorough sensitivity analyses across different model types, including regression vs. classification and linear/logistic vs. tree-based models. This approach will enable researchers and practitioners to identify the most suitable model specifications for capturing the nuances of sentiment data in various market contexts.

Furthermore, incorporating sentiment distribution features, such as standard deviation and skewness of sentiments, into asset return prediction models should be prioritized. These features offer valuable insights into the variability and asymmetry of sentiment data, thereby enhancing the predictive power of the models.  Additionally, considering the market dependency of sentiment signals, particularly the stronger influence of Sina Weibo sentiment on the Chinese market compared to the U.S., future research should explore tailored approaches to sentiment analysis for different market environments. Finally, given the observed delayed investor reactions in both markets, efforts should be made to develop models that account for these delays, thereby improving the accuracy of asset return predictions.