6 MMAC thesis projects in Statistical Theory and Applied Statistics
(26/Sept/2025)
6 MMAC thesis projects in Statistical Theory and Applied Statistics
(26/Sept/2025)
Statistical theory
1. Establishing Convergence Rates for Maximum Likelihood Estimators in Non-Parametric Observation-Driven Filters in High-Entropy Spaces
Description: This project investigates the asymptotic properties of maximum likelihood estimators (MLEs) in nonparametric score-driven and observation-driven frameworks. You will explore convergence rates in high-dimensional space with entropy diverging to infinity, settings where traditional parametric assumptions break down. The focus will be on linking modern sieve estimation theory with statistical efficiency in observation-driven models.
References:
Chen, X. (2007). "Large Sample Sieve Estimation of Semi-Nonparametric Models." *Handbook of Econometrics, Vol. 6B*, 5549–5632.
van der Vaart, A. W. (1998). *Asymptotic Statistics*. Cambridge University Press.
Creal, D., Koopman, S. J., & Lucas, A. (2013). "Generalized Autoregressive Score Models with Applications." *Journal of Applied Econometrics*, 28(5), 777–795.
Harvey, A. C. (2013). *Dynamic Models for Volatility and Heavy Tails: With Applications to Financial and Economic Time Series*. Cambridge University Press.
Supervisor: Francisco Blasques
Co-supervisor: Cláudia Nunes
2. Consistency and Asymptotic Normality for Autoencoder Neural Networks Featuring Time-Varying Decoders
Description: This thesis explores the theoretical properties of autoencoders when the decoder parameters evolve dynamically over time. The aim is to establish conditions for consistency and asymptotic distributions of estimators in this dynamic statistical-learning framework. By bridging autoencoder architectures with observation-driven time-series models, this project opens a path for rigorous inference in data-rich, dynamic environments.
References:
Hinton, G. E., & Salakhutdinov, R. R. (2006). "Reducing the Dimensionality of Data with Neural Networks." *Science*, 313(5786), 504–507.
Baldi, P. (2012). "Autoencoders, Unsupervised Learning, and Deep Architectures." *ICML Workshop on Unsupervised and Transfer Learning*, 37–50.
Creal, D., Koopman, S. J., & Lucas, A. (2011). "A Dynamic Multivariate Heavy-Tailed Model for Time-Varying Volatilities and Correlations." *Journal of Business & Economic Statistics*, 29(4), 552–563.
Blasques, F., Koopman, S. J., & Lucas, A. (2015). "Information-Theoretic Optimality of Observation-Driven Time Series Models for Continuous Responses." *Biometrika*, 102(2), 325–343.
Supervisor: Francisco Blasques
Co-supervisor: Cláudia Nunes
3. Stochastic Properties of Higher-Order Max-INAR Processes
Description: Integer-valued autoregressive processes with maximum operations (Max-INAR) are valuable for modeling count data with extreme behavior. This thesis extends the theory to higher-order Max-INAR models, with a focus on stochastic properties such as stationarity, ergodicity, dependence structure, and higher-order moments. The project also connects to observation-driven frameworks, providing a fertile ground for both probabilistic analysis and practical applications.
References:
Scotto, M.G., Weiß, C.H., Möller, T.A. et al. (2018) The max-INAR(1) model for count processes. TEST 27, 850–870
Scotto, M. G., & Gouveia, S. (2021). On the extremes of the max-INAR(1) process for time series of counts. Communications in Statistics - Theory and Methods, 52(4), 1136–1154.
Gorgi, P. (2018). "Time-Varying INAR Models with Applications." *Journal of Time Series Analysis*, 39(4), 499–514.
Blasques, F., Koopman, S. J., & Lucas, A. (2014). "Stationarity and Ergodicity of Univariate Nonlinear Filters." *Electronic Journal of Statistics*, 8(1), 1088–1112.
Supervisor: Francisco Blasques
Co-supervisor: Cláudia Nunes
_____________________________
Applied Statistics
4. Design and Deploy Reinforcement Learning Algorithms for a Mental-Health Chatbot
Description: You will design and implement reinforcement learning algorithms for a chatbot created in collaboration with psychology experts. The chatbot aims to help users struggling with mental health and psychological challenges. You will work closely with data scientists and software engineers from the new startup company Miggo, as well as researchers from CEMAT IST, Faculdade de Psicologia das Universidades de Lisboa e Porto, and Departamento de Ciências da Comunicação da Lusófona. The project combines statistical learning, reinforcement learning, and generative AI to create a socially impactful digital health solution.
References:
Sutton, R. S., & Barto, A. G. (2018). *Reinforcement Learning: An Introduction*. MIT Press.
Kaelbling, L. P., Littman, M. L., & Moore, A. W. (1996). "Reinforcement Learning: A Survey." *Journal of Artificial Intelligence Research*, 4, 237–285.
Brown, T. et al. (2020). "Language Models are Few-Shot Learners." *NeurIPS*.
Lewis, P., Perez, E., Piktus, A., et al. (2020). "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks." *NeurIPS*.
Supervisor: Francisco Blasques
Co-supervisor: Cláudia Nunes
5. Dynamic Models for Commodity Forecasts with Textual Sentiment Integration
Description: This thesis develops dynamic econometric models to forecast commodity prices while integrating textual sentiment scores extracted from news. You will implement Monte Carlo–based scenario analysis to assess uncertainty and stress-test forecasts. You will collaborate with software engineers and data scientists from the startup company **Forecast Factor**, and researchers from the **Vrije Universiteit Amsterdam** and **Aarhus University**. The project combines classical econometrics with modern NLP techniques for forecasting and decision support.
References:
Box, G. E. P., Jenkins, G. M., & Reinsel, G. C. (2015). *Time Series Analysis: Forecasting and Control*. Wiley.
Pindyck, R. S., & Rotemberg, J. J. (1990). "The Excess Co-Movement of Commodity Prices." *The Economic Journal*, 100(403), 1173–1189.
Loughran, T., & McDonald, B. (2011). "When Is a Liability Not a Liability? Textual Analysis, Dictionaries, and 10‐Ks." *Journal of Finance*, 66(1), 35–65.
Bollen, J., Mao, H., & Zeng, X. (2011). "Twitter Mood Predicts the Stock Market." *Journal of Computational Science*, 2(1), 1–8.
Supervisor: Francisco Blasques
Co-supervisor: Cláudia Nunes
6. AI-Driven Predictive and Prescriptive Models for Agricultural Irrigation and Feeding
Description: This thesis focuses on developing predictive and prescriptive AI models to optimize plant irrigation and feeding. The aim is to transform traditional agricultural practices into a rational, sustainable, and healthy AI-driven optimization of soil use, water, fertilizers, pesticides, and other inputs. You will work with software engineers and data scientists from **Orchid Potential** and researchers at **CEMAT**, applying state-of-the-art machine learning and spatial modeling techniques to agriculture.
References:
LeCun, Y., Bengio, Y., & Hinton, G. (2015). "Deep Learning." *Nature*, 521(7553), 436–444.
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). "ImageNet Classification with Deep Convolutional Neural Networks." *NeurIPS*.
Lobell, D. B., Cassman, K. G., & Field, C. B. (2009). "Crop Yield Gaps: Their Importance, Magnitudes, and Causes." *Annual Review of Environment and Resources*, 34, 179–204.
Pantazi, X. E., Moshou, D., & Bravo, C. (2016). "Active Learning System for Weed Species Recognition Based on Hyperspectral Sensing." *Biosystems Engineering*, 146, 193–202.
Supervisor: Francisco Blasques
Co-supervisor: Cláudia Nunes