My Research

My papers are here.

Approximate Bayesian Inference for the Interaction Types 1, 2, 3 & 4 with Application in Disease Mapping

Abstract: We address in this paper a new approach for fitting spatiotemporal models with application in disease mapping using the interaction types 1,2,3, and 4. When we account for the spatiotemporal interactions in disease-mapping models, inference becomes more useful in revealing unknown patterns in the data. However, when the number of locations and/or the number of time points is large, the inference gets computationally challenging due to the high number of required constraints necessary for inference, and this holds for various inference architectures including Markov chain Monte Carlo (MCMC) and Integrated Nested Laplace Approximations (INLA). We re-formulate INLA approach based on dense matrices to fit the intrinsic spatiotemporal models with the four interaction types and account for the sum-to-zero constraints, and discuss how the new approach can be implemented in a high-performance computing framework. The computing time using the new approach does not depend on the number of constraints and can reach a 40-fold faster speed compared to INLA in realistic scenarios. This approach is verified by a simulation study and a real data application, and it is implemented in the R package INLAPLUS and the Python header function: inla1234(). (Paper)

Smart Gradient - An adaptive technique for improving gradient estimation

Abstract: Computing the gradient of a function provides fundamental information about its behavior. This information is essential for several applications and algorithms across various fields. One common application that requires gradients are optimization techniques such as stochastic gradient descent, Newton's method and trust region methods. However, these methods usually require a numerical computation of the gradient at every iteration of the method which is prone to numerical errors. We propose a simple limited-memory technique for improving the accuracy of a numerically computed gradient in this gradient-based optimization framework by exploiting (1) a coordinate transformation of the gradient and (2) the history of previously taken descent directions. The method is verified empirically by extensive experimentation on both test functions and on real data applications. The proposed method is implemented in the Rpackage smartGrad and in C++. (Paper)

Complex interactions: space x time x age

Description: As an extension for the interaction types 1, 2, 3 & 4, proposed by Held 2000 and using Integrated Nested Laplace Approximations for dense matrices, we are working on fitting a fertility model in Malawi considering a more complex interaction: space x time x age. This project is a collaboration work with Prof. Jon Wakefield and Yanhau Wu, University of Washington. Start Date: Nov 22, 2022. and it is in progress.

A smoothing adaptive Bayesian spatial model for disease mapping with mixture of neighbors

Abstract: The focus of this paper is to extend the Besag model into a more smoothing adaptive model by assigning different precision parameters to different predetermined number of partitions of the region. The extra flexibility in the precision parameters of a smoothing Besag may provide a better fit to the data than a stationary Besag. Therefore, we consider to extend the prior for the stationary model to a prior for a non-stationary model, with the goal of smoothing the spatial effect based on the adjacency structure of the model. (This project is in progress)

Leave group out cross validation for INLA+

As a joint project between INLAPLUS and LGOCV, we are adding another criteria selection for fitting models with complex interaction types. Here is the abstract from LGOCV paper,

Abstract: Evaluating predictive performance is essential after fitting a model and leave-one-out cross-validation is a standard method. However, it is often not informative for a structured model with many possible prediction tasks. As a solution, leave-group-out cross-validation is an extension where the left-out-groups adapt to different prediction tasks. In this paper, we propose an automatic group construction procedure for leave-group-out cross-validation to estimate the predictive performance when the prediction task is not specified. We also propose an efficient approximation of leave-group-out cross-validation for latent Gaussian models. We implement both procedures in the R-INLA software.

The Importance of Accounting for Parameter Uncertainty in SF-6D Value Sets and Its Impact on Studies that Use the SF-6D to MHU

Abstarct: The parameter uncertainty in the six-dimensional health state short form (SF-6D) value sets is commonly ignored. There are two sources of parameter uncertainty: uncertainty around the estimated regression coefficients and uncertainty around the model’s specification. This study explores these two sources of parameter uncertainty in the value sets using probabilistic sensitivity analysis (PSA) and a Bayesian approach. Methods: We used data from the original UK/SF-6D valuation study to evaluate the extent of parameter uncertainty in the value set. First, we re-estimated the Brazier model to replicate the published estimated coefficients. Second, we estimated standard errors around the predicted utility of each SF-6D state to assess the impact of parameter uncertainty on these estimated utilities. Third, we used Monte Carlo simulation technique to account for the uncertainty on these estimates. Finally, we used a Bayesian approach to quantifying parameter uncertainty in the value sets. The extent of parameter uncertainty in SF-6D value sets was assessed using data from the Hong Kong valuation study. Results: Including parameter uncertainty results in wider confidence/credible intervals and improved coverage probability using both approaches. Using PSA, the mean 95% confidence intervals widths for the mean utilities were 0.1394 (range: 0.0565–0.2239) and 0.0989 (0.0048–0.1252) with and without parameter uncertainty whilst, using the Bayesian approach, this was 0.1478 (0.053–0.1665). Upon evaluating the impact of parameter uncertainty on estimates of a population’s mean utility, the true standard error was underestimated by 79.1% (PSA) and 86.15% (Bayesian) when parameter uncertainty was ignored. Conclusions: Parameter uncertainty around the SF-6D value set has a large impact on the predicted utilities and estimated confidence intervals. This uncertainty should be accounted for when using SF-6D utilities in economic evaluations. Ignoring this additional information could impact misleadingly on policy decisions. (Paper)

This project was part of Data Mining course at AUB. Python was used for implementing classification with five supervised machine learning (ML) algorithms: k-nearest neighbor (KNN), Adaboost, Random Forest (RF), Decision Tree (DT), and support vector machine (SVM). It is not published project.

Towards Adaptive Display: Prediction of Website Key Objects Recognition

Abstract: an appropriate design of a website greatly affects its success in terms of content and commercial use. Content wise, a website’s accessibility and visibility play an important role in the dissemination of the information. Commercial wise, having a prediction or estimation of the users’ preferences contributes to the personalization of the purchasing experience and therefore maximizing profit. Accordingly, managers and web developers are in continuous work for enhancing their websites to attract users and increase their conversion rate (convert a website visit to an actual purchase). The purpose of this work is to examine the effects of website clutter on users’ content comprehension. Clutter can be defined as the presence of a large amount of task-irrelevant data that leads to slower and less accurate task performance.

By using the eye tracking technology, we explore users’ behaviors towards identifying the content of websites. Our model merges features extracted from web tracking tools, website key objects’ characteristics and degree of complexity of the website. Consequently, a classification model is built to differentiate if a user can recognize a certain website key object in the web page or not.