The Speakers

Professor Jong-Min Kim

Jong-Min Kim, Professor of Statistics, University of Minnesota at Morris, USA

Personal website:http://cda.morris.umn.edu/~jongmink/CV.html

Title: Copula Directional Dependence and Its Applications

Abstract

By a theorem due to Sklar in 1959, a multivariate distribution can be represented in terms of its underlying margins by binding them together a copula function. Copulas are useful devices to explain the dependence structure between variables by eliminating the influence of marginals. A copula method for understanding multivariate distributions has a relatively short history in statistics literature; most of the statistical applications have arisen in the last twenty years. In this talk, copula history will be briefly introduced. The main of this talk is as follows: first, copula directional dependence capturing the direct interactions among data will be introduced. Second, for the applications of the copula direction dependence, diverse examples such as earthquakes in East Asia countries (China, Japan, Korea), China air pollution, Gene interaction, Finance will be introduced.

Professor Mekki Terbeche

Professor at USTO, Oran

Faculty of Mathematics and Informatics, Department of Mathematics

Biography https://drive.google.com/file/d/1wfep5vPC-_0w5txDh9zqF2GgUPahG0Xn/view?usp=sharing

TITLE: An Overview of Gaussian Function in Mathematics and Statistics

Why this topic?

The Gaussian distribution or normal distribution can be derived from Gaussian function.

The Gaussian distribution is the best model in data science and most important probability distribution in Statistics because it fits many natural phenomena like age, height, test-scores, IQ scores,sum of the rolls of two dices and so on.

In Mathematics

1. Fourier transform of a Gaussian function is also Gaussian function.

2. Gaussian function belongs to Scwartz's space.

3. Gaussian functions are used to solve heat and diffusion equations and define the Weierstrass transform.

In Statistics

Gaussian function is widely used to describe the Normal distribution and its importance comes from the Central Limit Theorem.

In Signal Processing

This function is used to define Gaussian filters.

In Image Processing

Two dimensional Gaussians are used for Gaussian blurs.

Yichuan Zhao and Xue Yu, Georgia State University

Title: Jackknife empirical likelihood inference for the accelerated failure time model

Abstract

Accelerated failure time (AFT) model is a useful semi-parametric model under right censoring, which is an alternative to the commonly used proportional hazards model. Making statistical inference for the AFT model has attracted considerable attention. However, it is difficult to compute the estimators of regression parameters due to the lack of smoothness for rank-based estimating equations. Brown and Wang (Stat Med 26(4):828–836, 2007) used an induced smoothing approach, which smooths the estimating functions to obtain point and variance estimators. In this paper, a more computationally efficient method called jackknife empirical likelihood (JEL) is proposed to make inference for the accelerated failure time model without computing the limiting variance. Results from extensive simulation suggest that the JEL method outperforms the traditional normal approximation method in most cases. Subsequently, two real data sets are analyzed for illustration of the proposed method.

Kesaobaka Molebatsi, Lesego Gabaitiri, Lucky Mokgatlhe and Eric Tchetgen Tchetgen

Title: Doubly robust estimation of Human Immunodeficiency Virus incidence (HIV) rate in a cross-sectional cohort study design

Abstract

Previously, we have established that efficient estimates of Human immunodeficiency virus (HIV) incidence rate can be obtained by combining error-prone self-reports with error-free formal documentation of individuals’ testing history in a pooled cross-sectional cohort study design, while simultaneously accounting for selection bias and misclassification error. To account for selection bias into the validated sub-sample where both self-reports and formal documentation are available, we have previously used inverse probability weighting (IPW) methods, which depend on correct specification of unknown propensity score (PS) into the validated sub-sample. It is known that if model PS is misspecified, such estimator will generally be biased. We propose two new augmented inverse probability weighting (AIPW) estimators for a cross-sectional cohort study design that additionally require fitting a regression model (RM) for the underlying population full data. The estimators have an advantage that they are guaranteed to remain consistent and asymptotically normal if either model PS or RM is correctly specified, a property called double robustness. Our approach therefore extends the theory of double robustness to the cross-sectional cohort study design. To investigate large sample properties of the new AIPW estimators, we perform extensive Monte Carlo simulations and compare them with other available techniques. We then use our methods to estimate HIV incidence rate among individuals who tested HIV-negative 1.5 and 5 years prior to Botswana Combination Prevention Project enrolment. Our methods clearly demonstrate that highly accurate estimates of HIV incidence rate can be obtained by carefully simultaneously accounting for multiple potential sources of bias in the less resource-intensive cross-sectional cohort study design.

Nicholas Siame Adam, Halima S Twabi, Samuel O Manda

Title: A Simulation Study for Evaluating the Performance of Clustering Measures for Multilevel Logistic Regression

Abstract

Background: Multilevel binary regression models are widely used in the health sciences to analyze binary outcome data that have a hierarchical structure. The models account for nested sources of variability which come from different levels of the hierarchy. As outcomes could be correlated between individuals as a result of these nested structures, traditional logistic regression models are inappropriate. Thus, multilevel logistic regression models are the appropriate approaches to estimate the strength of dependence in the outcomes for subjects within the same higher level. For a two-level binary regression model, the intra-class correlation coefficient (ICC) has often been used to measure the strength of clustering at the level-2 units. In this study, we compare the performance of ICC to other less well-known measures of level-2 clustering, namely; the median odds ratio (MOR), the 80% interval odds ratios (IOR-80), and the sorting out index (SOI) in an analysis of a level-2 logistic regression model. This is achieved by assessing their performance by a simulation study and an application to a real-life example on under-five anemia in Malawi. Methods: 2-level binary outcome data sets were simulated by varying the size and number of clusters, as well as the extent of variation between the clusters. Multilevel logistic regression models were then fitted to the data set and the accuracy levels on estimates of regression parameters and the four clustering measures were assessed using the average root mean square error (RMSE). We also applied the four clustering measures to investigate the level of heterogeneity in an analysis of child anemia among children aged 6 to 59 months clustered across 850 communities in Malawi. The variable of interest was child anemia recorded as 1 if a child had anemia and 0 otherwise. Results: The simulation results show that the estimates of the regression parameters were accurate for level-2 variances smaller than 0.5 even at 10 clusters with a cluster size of 5. For variances larger than 1, the estimates of the regression parameters were less accurately estimated even with cluster sizes of 250. The ICC and SOI performed beer at all sizes of the level-2 variance while MOR and IOR-80 performed poorly for a small number of clusters regardless of cluster size. For a moderate number of clusters (say 10) and size (say 5 subjects) of the clusters, SOI and ICC were estimated with very small bias. The performance of the four clustering measures improved with increased clusters and cluster size for all level-2 variances. Findings from the application on child anemia data gave estimates of ICC, MOR, SOI, and IOR-80 as 0.122, 1.898, 0.567, and (0.345, 3.978), respectively. Conclusion: Cluster sizes of 50 and at least 100 clusters would be adequate to provide reliable estimates of regression coefficients for multi-level logistic regression models. For clustering measures at least 300 clusters and at least 50, level-1 units would be adequate to measure the strength of clustering with negligible bias. The sorting out index (SOI) provided the best accuracy in estimating both regression parameters and clustering strength in the analysis of the multilevel logistic regression model. We recommend using SOI especially when there are fewer clusters and are mostly of small size and the level-2 variance is expected to be large. Keywords— Heterogeneity measures, Clusters, Multilevel modelling, Child anemia

Marcus Nunes

Title: Teaching Statistical Learning in Developing Countries

Abstract: In this talk we define statistical learning as the area of knowledge that uses Statistics and Computer Science tools for understanding and modeling data. Its main goal is to find a predictive model based on collected data. There are many free resources, such as computer software and data sources, to help anyone who wants to teach this subject in developing countries. We will show through examples how statistical educators can take advantage of some of these resources to prepare their students for the changes that are happening in the statistical world.

Thatayaone Moakofi, Broderick Oluyede and Boikanyo Makubate

Title: The Half Logistic Log-logistic Weibull Distribution: Model, Properties and Applications

Abstract

A new three parameter distribution named Half Logistic Log-Logistic Weibull (HLLLoGW) distribution is developed. This model includes Half Logistic Log-Logistic (HLLL) distribution, Half Logistic Log-Logistic Exponential (HLLLE) distribution and Half Logistic Log-Logistic Rayleigh (HLLLR) distribution as sub-models. Structural properties including shapes, hazard function, reverse hazard function, quantile function, moments, conditional moments, mean deviations, Bonferroni and Lorenz Curves, R´enyi entropy and distribution of order statistics are presented. We adopt the maximum likelihood method to estimate model parameters. To check the accuracy of the maximum likelihood estimates, various simulations were performed for different parameter settings and sample sizes. Finally, numerical examples are provided to test the goodness of fit of the proposed model compared to other models.

Kedumetse Vati and L´aszl´o Sz´ekelyhidi

Title: Moment functions on hypergroup joins

Abstract

Moment functions play a basic role in probability theory. A natural generalization can be defined on hypergroups which leads to the concept of generalized moment function sequences. In a former paper we studied some function classes on hypergroup joins which play a basic role in spectral synthesis. Moment functions are also important basic blocks of spectral synthesis. All these functions can be characterized by well-known functional equations. In this talk we describe generalized moment function sequences on hypergroup joins

Roland A. Matsouaka, Duke University School of Medicine

Title: Robust statistical inference for the matched net benefit and win ratio

Abstract:

As alternatives to the time-to-first-event analysis of composite endpoints, the net benefit (NB) and the win ratio (WR) -- which assess treatment effects using prioritized component outcomes based on clinical importance -- have been proposed. However, statistical inference of NB and WR relies on large-sample assumptions, which can lead to an invalid test statistic and inadequate, unsatisfactory confidence intervals, especially when the sample size is small or the proportion of wins is near 0 or 1.

For this talk, we will show how to address these limitations in a paired-sample design. We first introduce a new test statistic under the null hypothesis of no treatment difference. Then, we present new ways to estimate the confidence intervals of NB and WR. The confidence interval estimations use the method of variance estimates recovery (MOVER). The MOVER combines two separate individual-proportion confidence intervals into a hybrid interval for each estimand of interest. We assess the performance of the proposed test statistic and MOVER confidence interval estimations through simulation studies.

The results show that the MOVER confidence intervals are as good as the large-sample confidence intervals when the sample is large and the proportions of wins is bounded away from 0 and 1. Moreover, the MOVER intervals outperform their competitors when the sample is small or the proportion of wins is near the boundaries 0 and 1. We illustrate the method (and its competitors) using three examples from randomized clinical studies.

Diawara Norou, Old Dominion University

Title: Modelling Spatio-temporal Point Processes and Inferences Using Moran Statistics

Abstract

Moran statistics capture measures of spatial autocorrelations. They quantify the degree of dispersion (or spread) of events or objects in space. When investigating counts of some event data in an area, a single Moran statistic may not give a sufficient summary of the autocorrelation spread, and time factor cannot be ignored. However, by partitioning the area and taking the Moran statistics of the subareas over time, patterns of the local neighbors not otherwise apparent are uncovered. The main consequence of our results is that our time dependent Moran statistics are calculated from an explicit algorithm in a Monte Carlo simulation setting.

We show simulation results under a multilevel Poisson process where the dependence among the levels is captured by the rate of increase of the disease spread over time, steered by a common factor in the scale. Analysis of the data on the COVID-19 new cases on four selected countries over time, data published by the World Health Organization, is presented.

Nonhle Channon Mdziniso

Title: Parametric Analysis of Renal Failure Data using the Exponentiated Odd Weibull Distribution

Abstract:

In this work, we analyze renal failure data from patients with mesangioproliferative glomerulonephritis (MPGN) which was modeled by Vikse et al. (2002) non-parametrically using the Kaplan-Meier curve. In their work, they showed that the clinical variables, large increase serum creatinine (LISC) and systolic blood pressure >160 mmHg (SBP>160), and morphological variables, benign nephrosclerosis Present (BNP) and interstitial score group 5-6 (IS5-6) were part of the variables which indicated progression to end-stage renal failure (ESRF). Though survival curves associated with these variables may be difficult to model by existing parametric distributions in literature. Therefore, we introduce a four-parameter Odd Weibull extension, the exponentiated Odd Weibull (EOW) distribution which is very versatile in modeling lifetime data that its hazard function exhibits ten different hazard shapes as well as various density shapes. Basic properties of the EOW distribution are presented. In the presence of random censoring, a small simulation study is conducted to assess the coverage probabilities of the estimated parameters of the EOW distribution using the maximum likelihood method. Our results show that the EOW distribution is very convenient and reliable to analyze the MPGN data since it provides an excellent fit for the variables LISC, SBP>160, BNP, and IS5-6. Furthermore, advantages of using the EOW distribution over the Kaplan-Meier curve are discussed. Comparisons of the EOW distribution with other Weibull-related distributions are also presented.

Hamza EROL

Title: High Performance Computing Systems in Big Data Analytics: Solution Environments and Coding

Abstract

Stream or multilayered big data concept is explained and complexity in stream or multilayered big data is defined in this study. High performance computing systems used in big data analytics: Single processor - single core structure (standard computing architecture); Single processor - multi core structure (parallel computing architecture) and Multi processor - multi core structure (distributed computing architecture) are explained. Computing environments used in these systems have been examined. Hardware computing or hardware acceralated computing and software computing or software optimized computing are emphasized. Learning methods applied in big data analytics: statistical learning, machine learning, deep learning were expressed. Artificial intelligence has been explained as the result or product-oriented applications of these learning methods. Google Colabratory as a web-based solution environment in stream or multilayered big data analytics and Python applications for code development in this environment have been given.

Keywords: Global world economies, Intelligent management of the economy, Stream big data, Multilayer big data, Size of big data, Complexity of big data, Anaconda, NetworkX, PySpark, PyCuda, Google Colabratory.

Simbarashe Chamunorwa, Broderick Oluyede, Boikanyo Makubate and Chipepa Fastel

Botswana International University of Science and Technology

Title: The Exponentiated Odd Weibull-Topp-Leone-G Family of Distributions with Applications

Abstract: In this paper, a new generalized family of distributions called the exponentiated odd Weibull-Topp-Leone-G (EOW-TL-G) family of distributions is presented. Various mathematical properties of the new family of distributions including expansion of density, distribution of order statistics, R\'enyi entropy, moments, generating function, probability weighted moments, quantile function and maximum likelihood estimates were derived. A simulation study to examine efficiency of the maximum likelihood estimates is also conducted. We also present three real data examples to demonstrate the flexibility of the EOW-TL-LLoG distribution compared to several non-nested models.

Olusegun Michael Otunuga, Ph.D.

Assistant Professor & Advisor for Pi Mu Epsilon, WV Chapter

Department of Mathematics |Marshall University

Title:

Time-dependent probability distribution for number of infection in a stochastic SIS epidemic model: case study COVID-19

Abstract:

We derive the time-dependent probability distribution of the number of infected individuals at a given time in a stochastic Susceptible-Infected-Susceptible (SIS) epidemic model. The mean, variance, skewness and kurtosis of the distribution are obtained as a function of time. We study the effect of noise intensity on the distribution and later derive and analyze the effect of changes in the transmission and recovery rates of the disease. Our analysis reveals that the time-dependent probability density function exists if the basic reproduction number is greater than one. It converges to the Dirac delta function on the long run (entirely concentrated on zero) as the basic reproduction number tends to one from above.

The result is applied to analyze the probability distribution of the aggregate number of COVID-19 cases in the United States for the period: January 22, 2020-March 23, 2021. Findings show that the distribution shifts concentration to the right until it concentrates entirely on the carrying infection capacity as the infection growth rate increases or the recovery rate reduces. The disease eradication and disease persistence thresholds are also calculated.

Eric Numfor

Department of Mathematics

Augusta University, Augusta, GA

Email:<enumfor@augusta.edu>

Title: A Malaria-HIV/AIDS Co-infection Model with Optimal Treatment and Insecticide-treated Bednets

Abstract:

The concurrent use of multiple strategies has been recommended as an effective strategy to reduce malaria and its burden. In this talk, we present a mathematical model for malaria-HIV/AIDS co-infection and control in which malaria treatment, insecticide-treated bednets, and HIV/AIDS treatment are incorporated. The existence of a backward bifurcation is established. The optimal impact of malaria treatment,insecticide-treated bednets and HIV/AIDS treatment are assessed, by formulating and analyzing an optimal control problem to gain qualitative understanding on how different combinations of these controls should be used to reduce disease prevalence in a malaria-HIV/AIDS endemic setting.

Ephraim Agyingi, Tamas Wiandt and Sophia Maggelakis

School of Mathematical Sciences, Rochester Institute of Technology

Rochester, New York, United States

Email: Ephraim: <eoasma@rit.edu>

Title: A mathematical model of thermography applied tungiasis inflammation of the skin

Abstract

The application of thermography is emerging in the diagnosis of several diseases including tungiasis (aka jigger infestation). Jigger infestation is a tropical disease that disproportionately affect the poor and is caused by sand fleas burrowing into the skin of the host. Tunga penetrans manifests as a small swollen lesion, with a black dot at the center and can grow to the size of a pea. Jigger infestation can also lead to bacterial infection of the skin region and subsequently to other serious conditions such as the formation of abscesses, tissue death and gangrene. In this paper we develop a model of heat transfer in tungiasis-associated inflammationof the skin by considering tungiasis as a growing lesion. The model which is governed by the Pennes equation uses the steady state temperature at the skin surface to study the underlying lesion. Numerical simulations that investigate the presence of tungiasis, as well as bacterial coinfection at the skin site are presented.

Nourridine Siewe, Assistant Professor at School of Mathematical Sciences, Rochester Institute of technology, Rochester, NY, USA

Email: nxssma@rit.edu

Title: TGF-beta inhibition can overcome cancer primary resistance to PD-1 blockade: a mathematical model

Abstract:

Immune checkpoint inhibitors have demonstrated, over the recent years, impressive clinical response in cancer patients, but some patients do not respond at all to checkpoint blockade, exhibiting primary resistance. Primary resistance to PD-1 blockade is reported to occur under conditions of immunosuppressive tumor environment, a condition caused by myeloid derived suppressor cells (MDSCs), and by T~cells exclusion, due to increased level of T~regulatory cells (Tregs). Since TGF-$\beta$ activates Tregs, TGF-$\beta$ inhibitor may overcome primary resistance to anti-PD-1. Indeed, recent mice experiments show that combining anti-PD-1 with anti-TGF-$\beta$ yields significant therapeutic improvements compared to anti-TGF-$\beta$ alone. The present paper introduces two cancer-specific parameters and, correspondingly, develops a mathematical model which explains how primary resistance to PD-1 blockade occurs, in terms of the two cancer-specific parameters, and how, in combination with anti-TGF-$\beta$, anti-PD-1 provides significant benefits. The model is represented by a system of partial differential equations and the simulations are in agreement with the recent mice experiments. In some cancer patients, treatment with anti-PD-1 results in rapid progression of the disease, known as hyperprogression disease (HPD). The mathematical model can also explain how this situation arises, and it predicts that HPD may be reversed by combining anti-TGF-$\beta$ to anti-PD-1. The model is used to demonstrate how the two cancer-specific parameters may serve as biomarkers in predicting the efficacy of combination therapy with PD-1 and TGF-$\beta$ inhibitors.

Mymuna Monem and Divine Wanduku

Department of Mathematics and Statistics,

Florida International University,

11101 Southwest 13^th ST,

Miami, FL 33199, USA

Department of Mathematical Sciences,

Georgia Southern University,

65 Georgia Ave. Room 3309

P.O. Box 8093, Statesboro, GA 30460, USA

Title: A mathematical model for the spread of rumors on complex social networks

Abstract

Rumors affect our everyday emotional and physical lives. The effects on our lives are detrimental when the rumors are toxic such as terrorist ideas, and cyberbullying bullying etc. The recent advent of the internet, online social media (such as facebook, twitter and microblog etc.)and other online social network communication forums, with huge benefits to human life, have also facilitated the spread of malicious rumors, and cyberbullying etc. In this study, we present a chain-binomial mathematical model for the stochastic spread of a malicious rumor. The model consists of spreaders (I) who post malicious messages on websites. The ignorant (S) are infected and become exposed (E) to the malicious rumors after reading the posts. Some exposed who are eager to spread the messages on other susceptible websites are labelled “weakly exposed”. Other exposed people who have change of mind, and are reluctant to spread the messages are labelled “strongly exposed”. The “weakly exposed” become spreaders, and the “strongly exposed”become stiflers (R). We show how to derive the model on a social network, and present transition probabilities. Our model is a Markov Chain with trinomial transition probabilities. We find maximum likelihood estimates for the probability of getting infected by a terrorist, and the probability of becoming reluctant to spread terrorist ideas. We present numerical simulation results and provide numerical graphs and other figures to show how the malicious rumor spreads on the social network over time.

Contact

Email address: mmone014@fiu.edu and dwanduku@georgiasouthern.edu

Bismark Oduro

Department of Mathematics and Physical Sciences

California University of Pennsylvania. California, PA, USA.

Email: "Oduro, Bismark" <oduro@calu.edu>

Tittle : Optimal treatment strategies for controlling vector-borne disease like Chagas.

Abstract

Chagas disease is a major health problem in rural South and Central America where an estimated 8 to 11 million people are infected. It is a vector-borne disease caused by the parasite Trypanosoma cruzi, which is transmitted to humans mainly through the bite of insect vectors from several species of so-called kissing bugs. One of the control measures to reduce the spread of the disease is insecticide spraying of housing units to prevent infestation by the vectors. However, re-infestation of units by vectors has been shown to occur as early as four to six months after insecticide-based control interventions. I will present ordinary differential

equation models of type SIRS that shed light on long-term cost effectiveness of certain strategies for controlling re-infestation by vectors. The results show that an initially very high spraying rate may push the system into a region of the state space with low endemic levels of infestation that can be maintained in the long run at relatively moderate cost.

Omotomilola Jegede and Divine Wanduku

Department of Mathematics and Statistics,

Old Dominion University, VA, USA

Email: <ojege001@odu.edu>,

Department of Mathematical Sciences,

Georgia Southern University,

65 Georgia Ave. Room 3309

P.O. Box 8093, Statesboro, GA 30460, USA

Email: dwanduku@georgiasouthern.edu

Title: Stochastic Modelling of a SVEIRS Markov chain epidemic model with multiple discrete delay times and sensitivity analysis

Abstract:

This study presents a discrete time general SEIRS epidemic Markov chain model where vaccination is derived. The model incorporates finite delay times for disease incubation, infectiousness of infected individuals, natural and artificial immunity periods. The model represents the different states of the disease in the populationusing two discrete time decomposition measurements for the current time of a person’s state, and how long a person has been in the current state. Two sub-models are derived based on whether the drive to get vaccinated is inspired by close contacts with infectious individuals or otherwise. Some special epidemic models were studied, and sensitivity analysis is conducted on these models to determine how vaccination and infection affects disease eradication in a population.

Whatmore Sengweni and Broderick Oluyede

Department of Mathematics and Statistical Sciences,

Botswana International University of Science and Technology, Palapye, BW.

Title: The Type II Half Logistic-Kumaraswamy-G Family of Distributions with Applications

Abstract

In this talk, a new generalized distribution called the Type II Half Logistic-Kumaraswamy-G (TIIHL-Kum-G) family of distributions is proposed and studied. Some structural properties of the new distri- bution including moments, conditional moments, probability weighted moments, distribution of the order statistics and R ́enyi entropy are de- rived. Maximum likelihood estimation technique is used to estimate the model parameters. A simulation study to examine the bias and mean square error of the maximum likelihood estimators is presented and ap- plications to real dataset to illustrate the usefulness of the model are given.

Keywords: Type II Half Logistic distribution, Kumaraswamy distribution

Lilian Giibwa¹, Nazarius M. Tumwesigye¹, Simon P. S. Kibira¹, John M. Ssenkusu¹

¹Makerere University School of Public Health, P.O Box 7072, Kampala, Uganda

Title: Risk prediction of schistosomiasis infection among preschool children in Uganda using random forests

Abstract

Introduction: Schistosomiasis is a major public health concern in many tropical and sub-tropical regions in the world including Uganda. In Uganda, the national prevalence estimates of schistosomiasis among Pre-School Children (PSC) were 31% and 41.9% in 2016 and 2017 respectively, making them the most at-risk population. Development of risk prediction models for schistosomiasis in Uganda and similar settings has not been explored yet the existing models may not be applicable due to variations in exposures across settings.

Study objective: We aimed at developing a risk prediction model for schistosomiasis infection among PSC in Uganda using random forests (RFs).

Methods: National schistosomiasis prevalence survey data, on PSC, that were collected in 2016 and 2017 by the Performance Monitoring and Accountability 2020 project of Makerere University School of Public Health (MakSPH) and Johns Hopkins School of Public Health was used for this study. Using R software to analyse the data, the RF machine learning technique wasemployed to develop the risk prediction model. Out-of-bag error (OOB) rate, accuracy, sensitivity, specificity, precision, F measure and area under the receiver operating characteristic curve (AUROC) were used to assess model appropriateness using an evaluation set.

Results: The developed RF model had a 37.3% OOB error rate, 63% accuracy, 30% sensitivity, 81% specificity, 46% precision, 36% F measure and an AUROC of 0.547.

Conclusion: The performance of the developed risk prediction RF model was not good enough to predict the risk of schistosomiasis infection among PSC. However, the specificity of the model was much higher than its sensitivity implying that it would better work as a diagnostic than a screening tool. Further studies should be conducted to explore better performing schistosomiasis risk predictive models as this may act as a basis in Uganda and similar settings.

Tsirizani M. Kaombe1 Samuel O.M. Manda1,2,3

[1]Department of Mathematical Sciences, Chancellor College, University of Malawi, Zomba, Malawi,

[2]Biostatistics Research Unit, South African Medical Research Council, Pretoria 0001, South Africa,

[3]Department of Statistics, University of Pretoria, Pretoria 0002, South Africa.

Title: Detecting influential data in multivariate survival models

Abstract

Statistical techniques for detecting influential data are well developed and commonly used in linear regression, and to some extent in linear mixed-effects models. However, even though the application of multivariate survival models is widely done, the development of diagnostic tools for the models has been scarce. In this paper, we extend the martingale-based residuals and leverage commonly used in univariate survival regression to derive influence statistics for the multivariate survival model. The performance of the proposed influence statistics is illustrated with simulations, and the tools are applied to an analysis of child clustered survival data to identify influential clusters of observations and their impact on the estimates of fixed-effect coefficients. Keywords: Clustered data; Survival model; Regression coefficients; Group influence.

Thabiso Malomo *, Kago Kebotsamang 􏰀

Title: Modelling Mortality Rates in Botswana

In Botswana, life tables have not been updated for a long time. The latest life tables from Statistics Botswana, the official organisation for disseminating national statistics, are for 2011. These life tables may not reflect the current mortality and it affects end users of life tables such as insurance and pension fund companies. This presents a need to update life tables and this study therefore aims to estimate mortality rates and construct life tables for Botswana. We compare two models, the Heligman-Pollard (HP) and Lee-Carter (LC), to find the most suitable the mortality rates in Botswana. The HP model parameters are estimated through a Bayesian melding approach with incremental mixture importance sampling whereas for the LC we use a specialised iterative regression approach based on Poisson likelihood. The models from this study provided good fits and we found that the life expectancy for males and females in Botswana ranges between 65 to 68 and 70 to 74 respectively. Upon comparing the two models, we found out that the models were very similar and yielded almost the same results. However, the HP model produced a smoother fit than the LC model. We also compared the estimated life tables from this study to those of the World Health organisation and concluded that they are significantly different. The WHO life tables estimate lower life expectancies than the newly generated life tables.

Keywords: Botswana Life tables; Heligman-Pollard; Lee-Carter; Bayesian Medling; Iterative estimation approach.

Abstract

*Correspondence to: tmalomo@outlook.com

􏰀Department of Statistics, University of Botswana, Private Bag 00705, Gaborone, Botswana. Email: kebotsamangk@ub.ac.bw

¹Getnet Melak Assegie ²Stefano Bonnini

¹University of Parma ²University of Ferrara

Title: Nonparametric approach for Missing Data in hypotheses testing: a case study

BACKGROUND: In several statistical data analyses, missing observations are common phenomena. The missingness problem might arise due to detection limit, a dropout from the study, unwillingness to answer, treatment’s side effects, and so on. Consequently, carrying out a test of hypothesis and obtain reliable inferential output may be problematic. The typical remedy is based on the handling of missing data. For instance, deletion and imputation are the most common solutions under the strong assumption of missing completely at random (MCAR). However, in practice, the missingness mechanism might be missing not at random (MNAR). The parametric tests usually provide biased results in the case of MNAR. We propose a nonparametric method for the test of hypotheses with missing data, based on the permutation approach, which tackles missingness without deleting and imputing the missing values.

CASE STUDY: We considered Ethiopia's demographic health survey,2016(EDHS,2016) dataset about the children's nutrition in Ethiopia with missing values. The case study aims to compare the health status of children under 5 years old fed with breast milk only and children of the same age fed with breast milk and other food. Health status is represented by three variables: height-for-age percentile, height-for-age standard deviation, and weight-for-age standard deviation. We want to test the hypothesis that the health status of children fed with breast milk and food is better. Since some of the observations are not available, we are in the presence of a multivariate test with missing data. A suitable solution for such a problem can be based on the application of a combined permutation test. This nonparametric methodology is not only robust, flexible, and powerful for multivariate tests in general but also a valid solution to tackle the problem of missing data.

METHOD: To face the missingness, we define a binary variable which takes 1 in case of missing observation and 0 otherwise, and we consider it as a fourth response variable. According to the combined permutation test approach (Bonnini et al, 2014), the multivariate problem is broken down into four partial permutation tests and the global p-value of the multivariate test can be obtained through a suitable combination of the partial p-values. The permutation test follows the principle of conditioning on the pooled sample dataset under the null hypothesis (Pesarin and Salmaso, 2010). Inference based on the permutation test is easier to understand, and distribution-free.

RESULTS: The global p-value is 0.0433, hence with the null hypothesis of equal health status is rejected. To attribute the global significance to one or more partial tests, a closed testing method was applied to adjust the partial p-values and to control the Familywise Error Rate. The only adjusted p-value less than 0.05 is that of height-for-age standard deviation (0.0432). The children who took food other than breast milk improves their height-for-age standard deviation. The proposed method is preferable to parametric approaches based on the deletion of statistical units and imputation of missing values.

CONCLUSIONS: The proposed permutation test is exact, unbiased, and consistent (Pesarin, 2001). It is also appropriate for missing data analysis because its main advantage is that it takes into account the dependence of the components of a multivariate response variable. Therefore, it is suitable for missing data problems when the missingness mechanism is MNAR.

KEYWORDS: hypothesis testing, missing data, nonparametric statistics, permutation test

REFERENCES

Bonnini, S., Corain, L., Marozzi, M., & Salmaso, L. (2014). Nonparametric hypothesis testing: rank and permutation methods with applications in R. John Wiley & Sons.

Pesarin, F. (2001). Multivariate permutation tests: with applications in biostatistics (Vol. 240). Chichester: Wiley.

Pesarin, F., & Salmaso, L. (2010). Permutation tests for complex data: theory, applications and software. John Wiley & Sons.

Halima Twabi

Department of Mathematical Sciences, University of Malawi, Chancellor College, Zomba, Malawi

Samuel Manda

Biostatistics Research Unit, South Africa Medical Research Council, Pretoria, South Africa

Department of Statistics, University of Pretoria, Pretoria, South Africa, – Samuel.Manda@mrc.ac.za

Dylan Small

Department of Statistics, University of Pennsylvania, Penn State, United States of America – dsmall@wharton.upenn.edu

Hans-Peter Kohler

Department of Sociology, University of Pennsylvania, Penn State, United States of America - hpkohler@pop.upenn.edu

Title: A Comparison of Statistical Methods for Modelling Multiple Outcomes in Child Growth Studies

In both clinical trials and observational studies, multiple outcomes are often collected in order to measure treatment effectiveness or to investigate the association of the outcomes with other exposure of interest. For example, in child health studies, it would be of interest to investigate association between appropriate complementary feeding and multiple child growth measures such as mid-upper-arm-circumference, head-circumference, height-for-age, weight-for-age and height-for-weight that are measured. A simple option is to consider each growth measure separately and analyze it independently of the others. However, the child growth measures could be correlated because they are measuring related quantities in the child.

In this paper, we employ and contrast three different multivariate methods, namely, multivariate analysis of variance, shared random effects, pairwise joint models to analyze the association between child appropriate complementary feeding and three child growth outcome measures, namely height-for-age, weight-for-age, and weight-for-height. The fit and performance between the statistical approaches are compared using the Akaike Information Criteria (AIC) and Bayesian Information Criteria (BIC).

Key words: Multivariate outcomes; Multivariate statistics, Child nutrition and growth.

Page updated

Report abuse