Research & Projects

Project 1:Machine learning 

Error analysis for Generative adversarial networks (GANs 

A generative adversarial network (GAN) is a machine learning (ML) model developed by Ian Goodfellow. GAN can create a fake image of a person who does not exist in the real world is possible. GANs, there are two main parts: the Generator and the Discriminator, both built as Neural Networks. The Generator creates fake data, like images or audio, and tries to fool the Discriminator. The Discriminator's job is to tell real from fake. During training, they compete and get better after each round of steps. In our research on GAN applications, we noticed that many researchers face a common problem: getting the model to perform better and converge effectively. To tackle this, we conducted an in-depth analysis of GAN performance by studying errors. 

Objectives

We started by measuring GAN's performance by comparing it to the best possible outcome of its objective function. Then, our goal became to create methods for measuring GAN errors in a general way and to analyze how quickly they converge. We wanted to find very accurate convergence rates. 

                                                                              Results

 Our projects were difficult because we didn't have much research material and faced problems with advanced math and stats tools. However, with strong support from my advisor and my hard work, we succeeded. Our results indicate significant improvements in convergence rates, which can greatly benefit GAN applications for model training and performance. Additionally, our approach can be extended to analyze other machine learning algorithms theoretically. 


Link: https://arxiv.org/pdf/2310.15387.pdf

Project 2: Machine learning 

Convergence rates of f-divergence metrics for GAN estimator

This project builds upon the foundation of our previous project, which focused on f-divergence metrics for the GAN estimator. In this extension, we utilized the minimization of the GAN objective function and applied it to explore various convergence rates related to different f-divergences. 

Objectives

In our work, we established convergence rates for various f-divergence metrics, including total variation, Kullback-Leibler divergence, Pearson Chi-square divergence, Hellinger square divergence, and Jensen-Shannon divergence, specifically for the GAN estimator. Subsequently, we determined the convergence rates by incorporating these metrics into the same neural network structure. 

Results

We achieved robust results through the application of rigorous mathematical and statistical methods. Specifically, we derived an oracle inequality, which proved useful in bounding f-divergence metrics for the GAN estimator. Our approach yielded improved outcomes, and this methodology can be extended to enhance existing results found in the literature. 

Project 3: Machine Learning 

                                                                                                                         Objective

Information Maximizing Generative Adversarial Network (infoGAN) can be understood as a minimax problem involving two networks: discriminators and generators with mutual information functions. The infoGAN incorporates various components, including latent variables, mutual information, and objective function. This research demonstrates that the two objective functions in infoGAN become equivalent as the discriminator and generator sample size approaches infinity. This equivalence is established by considering the disparity between the empirical and population versions of the objective function. The bound on this difference is determined by the Rademacher complexity of the discriminator and generator function class. Furthermore, the utilization of a two-layer network for both the discriminator and generator, featuring Lipschitz and non-decreasing activation functions, validates this equality. 


                                                                                                                          Results

This paper demonstrates that infoGAN can be formulated as an objective function with a regularized generator without employing a latent code. This objective function and its empirical implementation are equivalent when utilizing both Lipschitz and non-decreasing activation functions in a two-layer network. However, this equivalence holds true only in scenarios involving infinitely large discriminator and generator sample sizes. The Rademacher complexity bound plays a crucial role in establishing this equality. Investigating this property in the context of the lower bound of the regularized objective function is a potential direction for future research. 


Link: https://arxiv.org/pdf/2310.00443.pdf



Biostatistics:  Projects 1

Data scaling analysis for Poisson and gamma generalized linear model

This project was my summer research internship project at the Pharmacy Administration dept. Center for Pharmaceutical Marketing and Management, University of Mississippi, under the supervision of Dr. Bentley.  In the field of bio-statistics, researchers often use traditional scaling techniques, such as dividing or multiplying data by specific numbers, to simplify analysis. 

Objectives

When comparing gamma generalized linear models (GLM) to Poisson GLM, we made an interesting discovery. In gamma GLM, parameter estimation stays stable, but standard errors show variations. On the other hand, in Poisson GLM, both parameter estimation and standard errors remain unchanged. We conducted our research using health data and employed SAS programming to investigate the underlying reasons behind these differences between Poisson and gamma GLM. 

                                                                                            Results

 Our research findings reveal that the weight parameter and the formation of the information matrix differ between Poisson and gamma GLM (Generalized Linear Models). These differences have a notable impact on parameter estimation and the calculation of standard errors. 

Biostatistics:  Projects 2

    Geographically weighted regression model for spatial point data


This is my summer internship second project under the supervision of Dr., K. Bhattacharya. Geographically Weighted Regression (GWR) is a spatial statistical technique used to explore spatially varying relationships between a dependent variable and a set of independent variables. When applied to spatial point data, the goal of using GWR is to better understand how the relationships between variables differ across space. 

                                                                                                                    OBJECTIVE

 The objective is to develop an effective method for synthesizing and presenting the extensive mappable results derived from local Geographically Weighted Regression (GWR) models, specifically focusing on local parameter estimates and local t-values. The aim is to provide a comprehensive and easily interpretable representation of the spatially varying relationships and significance levels, addressing the challenge of information overload for users of GWR methods. 

                                                                                                            RESULTS

By implementing this refined map design, researchers can facilitate a more efficient and meaningful exploration and interpretation of spatial nonstationary, enabling stakeholders to make informed decisions and develop targeted interventions that account for the nuanced spatial dynamics identified through the GWR analysis. 

                                             Future independent research Projects 


     A Bayesian-based Generative Adversarial Network (BGAN) for risk prediction can be a novel and powerful approach to incorporating uncertainty quantification into risk prediction models. The combination of Bayesian methods and GANs can offer improved model robustness, better calibration, and enhanced decision-making capabilities, particularly in complex and uncertain business environments. Here's an outline of a potential research idea:

Title: Bayesian-Based Generative Adversarial Network for Uncertainty-Aware Risk Prediction in Business and health science Environments

Introduction:

Methodology:

Model Implementation:

Evaluation and Results:

Discussion and Implications: