Ph.D. Research

Ph.D. Work:

 Quantitative assessment of the growth of biological populations has produced many mathematical equations. It is a challenging problem to select the best estimating model from a set of models as one model may serve as a close approximation of the other by appropriate choice of the parameter(s). The objective of our work is:

Details of my Ph.D. Work


In my Ph.D. journey, I have done my research in the domain of Population Biology and Computational Biology, where my focus is on establishing novel methodologies to detect parameter variation (continuous and stochastic) from real data sets (from various domains) by using single continuous growth modeling. Apart from that, I also work collaboratively on fish food chain modeling (inter-guild predation modeling) and stability analysis of conditional moments. My research work details are given in the following diagram. However, I am also keenly interested in doing research in mathematical epidemiology.


Ph.D. Thesis-related research:

 Quantitative assessment of the growth of biological populations has produced many mathematical equations and over time, population growth modeling has become a separate area of research in population biology. Every growth equation is unique in terms of mathematical structures; however, one model may serve as a close approximation of the other through an appropriate choice of the parameter(s). It is still a challenging problem to select the most accurate estimation model from a set of model equations whose shapes are similar in nature. Also, when fitting growth equations to real data sets, the model parameters are assumed to be fixed but unknown quantity. However, due to environmental and demographic changes, one or more parameters may vary over time. So, the main objective of my thesis is to critically investigate the properties of existing growth models and develop efficient mathematical and statistical techniques to analyze real data sets. Also provides some methodology for the statistical detection of parameter variations (continuous) in nonlinear models. The title of my thesis is “Mathematical Analysis of Biological Growth Models with Continuously and Stochastically Varying Parameters with Applications to Real Data”.

Research problem 1:

Growth curve models serve as the mathematical framework for the qualitative studies of growth in many areas of applied science and due to their extensive use in recent studies, several distinct models were developed over a long period of time (Bhowmick et al., 2014). There are many practical applications for such models. Also, most of the research on population models have treated the parameters as fixed but unknown quantity which are estimated by using the non-linear least squares method that provides a confidence interval. However, due to natural randomness, the parameters may vary over time (Banks, 1994). Now the problem is, if the experimenter has observed data over a time period, then there is uncertainty about the parameter being fixed or changing with time. Even if it is perceived (from biological theory) that a particular model parameter changes over time, estimating that parameter empirically can be difficult. In our first paper, we address these issues by proposing a new methodology to detect parameter variation from real data by using the interval-specific rate parameter (ISRP) proposed by Bhowmick et al (2014). 


We initially showed that one model can be obtained from the other by choosing a suitable continuous transformation of the parameters. This idea builds an interconnection between existing models in the literature. To build this interconnection, we have chosen four key models namely, logistic, theta-logistic, exponential, and confined exponential. Then for a given set of training data points and these four key models, we select an optimal mathematical model for the data by using non-linear least square fitting. Then we plot the ISRP profiles of the parameters of the optimal mathematical model and the ISRP profile will indicate whether any variation is present in the parameter. If parameter variation is present then this ISRP profile will indicate the nature of variation in parameters with time by using the interconnecting flowcharts. This enables the experimenter to extrapolate the inference to more complex models. Our proposed methodology will significantly reduce the effort involved in model fitting exercises. The proposed idea has been verified by using simulated and real data sets from three different domains: marketing (LCD-TV sales data from Trappey and Wu, 2008), biology (cattle growth data from Kenward, 1987 and the number of horses and mules on US farms, 1865-1960 from Banks, 1994), and epidemiology (COVID-19 data of Germany). We believe that this work will be helpful for practitioners in the field of growth studies.

Research problem 2:

 In our first paper we develop a methodology to detect parameter variation in real data sets using the ISRP distribution of parameters. So, interval-specific estimates are an integral part of our proposed methodology, and Bhowmick et al. (2014) provide a conceptual overview and derivation of ISRP. However, for highly non-linear models and non-monotonic data ISRP of the parameters is not derivable using Bhowmick et al. (2014)'s method. Consequently, our methodology becomes vulnerable due to these limitations in the derivation of ISRP. Hence, in our second paper, we propose a novel methodology for estimating ISRP based on a localized maximum likelihood estimator (localized MLE) to overcome these issues. For theoretical validation of our proposed methodology, we check the distribution of the null hypothesis by taking the Von Bertalanffy model as the test bed model and also draw power curves to cross-check the validation. Following that, we verified it with real data sets (cattle growth data from Kenward, 1987). Then, we draw comparisons between these two key methodologies to determine which methodology is more appropriate for selecting the best-fitted model for real data sets. For the comparison study, we check stability, efficiency, and parameter sensitivity and find that our method is better than the existing one. Also, our proposed methodology is time and effort-preserving as we no longer need to derive ISRP analytically, and it is applicable to complex models and non-monotonic datasets where the existing methodology failed to derive the ISRP.

Research problem 3:

 In the previous two research problems, we mainly focused on detecting the time-dependent variation of the parameter from real data sets. But in literature density-dependent parameter variation is also present and detection of density-dependent parameter variation is needed for better understanding of the growth phenomena. Therefore, in our next problem, we shift our focus to developing a methodology to detect density-dependent parameter variation from real data sets by using our proposed idea of computational-based ISRP. Here we again used the localized MLE method to estimate ISRP and plot its distribution over size to detect density-dependent parameter variation. Validation of the method has been carried out by using simulation studies. We have also applied it to two different data sets from two different domains.


Research problem 4:

 Our previous three research problems focused on establishing methods to detect parameter variation (continuous) in real data sets. It is also possible to carry out these studies for models where the parameter changes randomly over time (stochastic variation). As a first step, we gathered all the research so far on stochastic population modeling. In the literature on stochastic growth models for single species, a few key growth equations dominate, such as logistic, Gompertz, exponential, Richards, Bertalanffy-Richards, and theta-logistic. However, the logistic growth model with stochastic treatment has attracted researchers' attention in many different disciplines. So, in our review, we will therefore concentrate on the use of stochastic logistic models in population biology. Our survey reports a bifurcation of studies in logistic growth equations, into harvesting and non-harvesting equations. This study also identifies the importance of data-driven research in stochastic growth equations and the selection of the appropriate models using multi-model inferential techniques. Also based on this survey, we have identified five key research problems in which special attention may be required.

Collaboration research:

 In addition to my thesis work on population and computational biology, I also work on Food Chain Dynamics and Statistical stability analysis.

Research problem 1:

 In this research work, we focused on food chain dynamics by using intra-guild predation modeling. For this, we choose a Chitata-Mugil-Shrimp fish dynamics and these species have a wide distribution in African and Asian countries and have been classified as endangered (EN) by the Conservation Assessment and Management Plan. In this paper, we are exploring the causes of Notopterus Chitala's decline in its natural habitat. Our investigation on the decline of Chitala is based on fish data collected from the Bhagirathi River, located in Diamond Harbour, Malancha, and Raidighi, West Bengal, India. Based on the literature, we have considered two variants of IGP models consisting of Chitala as the top predator, mugil as the intermediate predator, and shrimp as the basal prey. Then calibrating these models under the Bayesian modeling framework, we estimate the posterior of the parameters. We use the Reversible-jump Markov chain Monte Carlo method to obtain the posterior model probabilities to select the most suitable model. Our most accurate model allows us to investigate the cause of the decline in Chitala population rates, and the primary reason for the lack of availability is the high extinction risk for mugil populations. Sensitivity analysis has confirmed that the biomass conversion rate from Mugil to Chitala is the most significant parameter. We believe that this study may be useful to develop management strategies for Chitala conservation.

Research problem 2:

 In next research problem belongs to population biology where we focused on analyzing the statistical stability of conditional moments. In population biology, parametric growth models are essential and used to explain growth patterns. Historical data points are also an essential tool in the population biology domain to predict the future of population growth. In the studies of future prediction based on historical data points, the stability analysis of the equilibrium distribution at large time points got considerable attention from researchers. Our work also includes an analysis of the stability of different order moments of relative changes in population sizes using the logistic as a test bed model for assessing the stability of population sizes. We also analyzed the stability behavior for two dimension models also and for that, we chose the predator-prey dynamics with holling type I, and type II function responses, where the prey follows a logistic growth profile. 

 

In order to determine if population sizes are stable, we examine the behavior of moments of population size over time, using a stochastic logistic growth model. We have taken two different RGR estimates to investigate the moments’ convergence. The simulation study indicates that both the estimators behave almost similarly as the expectation values of the estimators are zero and the variance profile stabilizes around zero after some points. We also define conditional statistics over the first RGR estimator and verify the stability of the conditional statistic by simulation study. Hence, the conditional moments of the logistic model for these two estimators are convergent and stable. Then, we also investigate the stability of the interacting population for the first RGR estimator. The simulation study is conducted by drawing data from a multivariate normal setup. For the predator-prey model with Holling type I, both the conditional mean and variance cluster around zero, and the skewness and kurtosis profile further verify their stability. But for the predator-prey model with Holling type II, the conditional mean does not cluster around zero because of the limit cycle stability of the model. So in claiming the stability of the model the experimenter has to look at the conditional variance profile instead of the conditional mean profile as the conditional mean does not cluster around a fixed point but the conditional variance profile clustered around zero very nicely.

Research problem 3:

 After proposing methodologies to detect parameter variation with applications in real data sets from different domains in our two previous papers, we are now focused on applying our proposed methodologies in very large data sets (25 countries' COVID-19 data) to provide a synthesis of our findings. Based on our proposed methodologies, we try to determine how early we can make predictions about the COVID pandemic by looking at single population dynamics. For that, in this problem, we take COVID-19 data from different countries and divide every country’s data into train data and test data (three different ratios: 1:1, 7:3, 9:1). After that, using previous existing methodologies, we find the best-fitted model from the train data, and use the best-fitted model to develop a prediction interval for the future, and then check that prediction interval based on test data. Then, using our proposed methodologies, we first find the best-fitting model, draw a future prediction interval, and test its accuracy. In conclusion, we make a comparison between these proposed and existing methods to determine how effective our proposed methods are at making early predictions about pandemics by relying only on a single population's dynamics.

References: