Biological Growth Curve Models

Introduction

This material has been developed to cater to the requirement for participants in the Workshop on Growth curve Models in Population Dynamics Using R for Biologists organized by the Agricultural and Ecological Research Unit, Indian Statistical Institute, Kolkata, from 12- 13th February 2019. The module covers some fitting of growth equations to the real data and also solves the equations using packages from R related to differential equations.


Growth Curve Models

Scientific formalization of the notion of growth in living organisms is an age-old problem. For a long time, it has been a challenging issue for scientists to develop appropriate measures of growth or the rate of growth of living organisms. Such measures are receiving renewed importance in applied sciences, e.g., in zoology, botany, ecology, population dynamics, demography, cell dynamics, bacterial growth, finance, etc. The sigmoid functions, Gompertz, General Logistic, and General von Bertalanffy and their associated differential equations have applications to model self-limited population growth in diverse fields, e.g., sociology, fish growth, plant growth, and tumor growth. Particularly in the fisheries literature, much has been discussed on models using von Bertalanffy growth law, including criticisms, testing for parameter differences, bioenergetic applications, and re-parameterizations. There have also been theoretical approaches to defining a general framework to study growth models and a new family of sigmoid growth functions has been introduced, namely, Trans-General Logistic, Trans-General von Bertalanffy, and Trans-Gompertz. Growth curve models are increasingly used in several areas of interdisciplinary research. For example, growth models play an important role in modeling the density regulation in the abundance of natural populations. The growth models such as logistic, theta-logistic, Gompertz, etc. have potential applications in population dynamics that may be used for predictions and forecasting extinctions. The following references are very helpful:

1. Tsoularis, A. and Wallace, J. 2002. Analysis of Logistic growth Models. Mathematical Biosciences, 179, 21-55.

2. Seber, G. A. F. and Wild, C. J. 2003. Nonlinear Regression. John Wiley and Sons, Inc.

In this workshop, we will study some common growth curve models which have immense applications across disciplines. I will include R code in between the concepts of growth models. In this section, you will also learn how to solve differential equations using R.

  • Growth models are usually represented as a differential equation that describes the evolution of population over time (that is temporal evolution). Such differential equations describe the rate of change in population size. These growth models are usually developed for modeling the growth of the population under certain assumptions (usually specified by the domain experts to incorporate biological hypotheses).

  • Once the differential equation has been formed, it is of interest to obtain the size of the population at some time point t from the rate equations. Often, it is not possible to solve the differential equations analytically, in this case, numerical methods play a critical role. Let it be taken care of by the mathematicians.

  • Due to the increasing availability of open-source efficient software, we can solve them numerically using them. In this workshop, we shall use R as a tool to solve the differential equations describing the growth of the populations.

  • During the lectures of Dr. Sabyasachi Bhattacharya, we have seen various growth equations ranging from fundamental exponential growth to Allee growth equations. In this session, we will learn how we can use R to solve those differential equations in R.

I will use appropriate notes and comments whenever required. Here are the following growth models:

  • Exponential Model: Let us assume that a simple homogeneous population is growing under the assumption that, all the changes in the populations' size or density are regulated by births and deaths only. Let the per capita birth rate is b and the per capita death rate is d. Then, the relative growth rate is given by 1/X dX/dt = b - d. This gives an ordinary differential equation with an initial condition X(0) = X_0, say. Let r = b - d. r is known as intrinsic growth rate and plays a very important role in population dynamics. One can easily solve the equation to get the solution for X_t as a function of t, which is X_t = X_0 exp(rt). It easy to see that, for r>0, the population explodes as time increases, r<0, population dies out and for r=0, the population size remains constant. I take this occasion to introduce to solve this differential equation numerically. I keep the syntax standard as the same format can be used again and again for other models. Here is the code:

state = c(X=10) # Initial condition

parameters = c(r=0.3) # The parameter values

library(deSolve) # package for numerical solution of differential eqn.

times = seq(0, 10, by = 0.01)

expFun = function(t, state, param){

with(as.list(c(state,param)),{

dX = r*X;

return(list(c(dX)));

})

}

out = ode(y = state, times = times, func = expFun, parms = parameters)

head(out) # Check the output

curve(0.3*x^0, 0, 20, lwd=2, col=2, main= "PGR profile", xlab = "population size", ylab= "per capita growth rate")

curve(0.3*x, 0, 20, lwd=2, col=2, main= "AGR proifle", xlab="population size", ylab = "absolute growth rate")

plot(times, out[,"X"],main="Size profile", xlab = "time", type="l", ylab = "population size", col = 2, lwd=2)


  • Logistic Growth Model: Note that, in the exponential model, the growth is governed by constant per capita birth and death rates that lead to unlimited growth. In real life situations, this is unrealistic. In the natural environment, the populations are always limited by resources that lead to intra-specific competition when the population size becomes large. This concept leads to the density dependent growth regulation. Let us assume that, the per capita growth rate linearly decreases with population size and reaches zero at the carrying capacity, commonly denoted by K. This leads to the logistic equation given by 1/X dX/dt = r(1-X/K), as usual this ordinary autonomous differential equation is supported by the initial condition X(0) = X_0. The exact solution is given by X_t = K/[1+(K/X_0-1)exp(-rt)]. It can be easily seen that as t tends to infinity, the population size stabilizes to carrying capacity K.

logisticFun = function(t, state, param){

with(as.list(c(state,param)),{

dX = r*X*(1-X/K);

return(list(c(dX)));

})

}

state = c(X = 20)

parameters = c(r=0.5, K = 50)

library(deSolve)

times = seq(0, 30, by = 0.01)

out = ode(y = state, times = times, func = logisticFun, parms = parameters)

head(out)

plot(out, ylim=c(0, 100), main = "Solution of Logistic Model", xlab = "time", ylab = "population size", col = 2, lwd = 2)

In the following code, we generate multiple solutions of the logistic differential equation for different initial conditions. Run the code and note that all the solutions are converging to carrying capacity K (=50, in the program). Let us first plot the Relative growth rate (RGR) prfile and see these shapes are matching with slides shown by Dr. Bhattacharya.

r = 0.8; K = 50

curve(r*(1-x/K), 0, 60, lwd=2, col=2, main= "PGR profile", xlab = "population size", ylab= "per capita growth rate")

curve(r*x*(1-x/K), 0, 60, lwd=2, col=2, main= "AGR proifle", xlab="population size", ylab = "absolute growth rate")

One of the very important concepts in growth curve analysis is the stability of the equilibrium point. By an equilibrium point, we understand the size where there is no growth (Mathematically. dX_t/dt = 0). For example, in the logistic growth equation, there are two equilibrium points, one is zero and the other is K. There is a technique, called linear stability analysis, which can be utilized to check for the stability of the equilibrium points.

Recall in the discussion where we have seen that in a logistic model the population always converges to the carrying capacity of the environment. Let us try to mimic this in terms of a computer simulation. The idea is to start with various different initial conditions and plot the solution curves and see where all the solutions converge. In the following figure, we will see that all the solutions converge to the same carrying capacity.

init.pop = c(1,10,20,30,40,50,60,70,80,90)

logisticFun = function(t, state, param){

with(as.list(c(state,param)),{

dX = r*X*(1-X/K);

return(list(c(dX)));

})

}

library(deSolve)

parameters = c(r=0.5, K = 50)

times = seq(0, 30, by = 0.01)

for(i in 1:length(init.pop)){

state = c(X = init.pop[i])

out = ode(y = state, times = times, func = logisticFun, parms = parameters)

head(out)

if(i==1)

plot(out, ylim=c(0, 100), main = "Solution of Logistic Model", xlab = "time", ylab = "population size", col = i, lwd = 2)

else

lines(out, col = i, lwd = 2)

}

abline(h = 50, lwd = 2, col = 115)


  • Gompertz Model: The gompertz describes the dynamics of a population that grows with an intrinsic rate of growth that decays exponentially. The equation is commonly written in the non-autonomous differential equation given by dX/dt = r*exp(-\alpha*t)X. The equation can also be written as autonomous differential equation as dX/dt = alpha*X*ln(K/X), where K is the carrying capacity. Here is the code:

state = c(X = 1)

parameters = c(alpha=0.5, K = 50)

library(deSolve)

times = seq(0, 30, by = 0.01)

GompertzFun = function(t, state, param){

with(as.list(c(state,param)),{

dX = alpha*X*log(K/X);

return(list(c(dX)));

})

}

out = ode(y = state, times = times, func = GompertzFun, parms = parameters)

plot(out, xlab = "time", ylab = "population size", col = 1, lwd = 2, main="Solution of Gompertz model")

# Code for multiple solutions for different initial popualtion size

init.pop = c(1,10,20,30,40,50,60,70,80,90)

parameters = c(alpha=0.5, K = 50)

library(deSolve)

times = seq(0, 30, by = 0.01)

for(i in 1:length(init.pop)){

state = c(X = init.pop[i])

out = ode(y = state, times = times, func = gompertzFun, parms = parameters)

head(out)

if(i==1)

plot(out, ylim=c(0, 100), main = "Solution of Gompertz Model", xlab = "time", ylab = "population size", col = i, lwd = 2)

else

lines(out, col = i, lwd = 2)

}

abline(h = 50, lwd = 2, col = 115)

# Plot of per capita growth profile

alpha = 0.5; K = 50;

curve(alpha * log(K/x), 0, 55, lwd=2, col=2, main= "PGR profile", xlab = "population size", ylab= "per capita growth rate")

curve(alpha * x * log(K/x), 0, 55, lwd=2, col=2, main= "AGR profile", xlab = "population size", ylab= "absolute growth rate")

  • The evolution of Gompertz equation came from a very important biological assumption. Under the exponential growth law, the intrinsic growth rate of the population (r) is assumed to be constant over time. However, it might happen that the intrinsic growth rate is decaying over time at an exponential rate (r(t) = r_0 exp(-alpha*t)). r_0 is a constant and alpha is the decaying rate.

  • A critical role of growth curve models: For example, the gompertz growth curve model sand the logistic growth curve models both look similar, however, they have a very different intrinsic properties. For example, given any initial population size, logistic growth equation will give the population to converge at carrying capacity only. However, if we change the initial population size, the population may converge to a different capacity.

  • So, starting with multiple models and statistical selection of the best model may not be always give rise to a biologically correct model. It is a tendency for many statisticians to compare models and give the best model to the biologists. In such a scenario, the statistical model selection through ISRP (Interval specific rate parameter) would give a higher chance of correct model identification (Bhowmick et al. 2014, Journal of Biological Physics). In fact, the probability of correct identification of model is better than the Fisher growth model (Pal et al., 2018, Journal of Theoretical Biology).


  • Theta-Logistic Equation: The θ-logistic model (Gilpin and Ayala, 1973; Gilpin et al., 1976) defines a general class of models of density regulation as the function of only a single parameter θ that regulates the density dependent mechanism. In this model framework, the population is assumed to have maximum fitness when they are small in numbers. As population size increases per capita growth rate decreases monotonically, due to intra-specific competition being main regulatory mechanism (unlike Allee model, see below). The θ-logistic model is given by: dX_t/dt = rX_t(1-(X_t/K)^theta)

theta = c(0.1, 0.5, 1, 1.2, 1.5)

thetalogisticFun = function(t, state, param){

with(as.list(c(state,param)),{

dX = r*X*(1-(X/K)^theta);

return(list(c(dX)));

})

}

library(deSolve)

state = c(X = 1)

times = seq(0, 50, by = 0.1)

for(i in 1:length(theta)){

parameters = c(r=0.5, K = 80, theta = theta[i])

out = ode(y = state, times = times, func = thetalogisticFun, parms = parameters)

if(i==1)

plot(out, ylim=c(0, 100), xlab = "time", ylab = "population size", col = i, lwd = 2, main="Solution of logistic equation")

else

lines(out, col = i, lwd = 2)

}

# Per-capita growth profile for different values of theta

r = 0.8; K = 50

theta = c(0.1, 0.5, 1, 1.2, 1.5, 2, 5)

X=seq(0, 50, by = 1)

PGR=matrix(NA, nrow =length(theta), ncol = length(X))

AGR=matrix(NA, nrow =length(theta), ncol = length(X))

for(i in 1:length(theta)){

PGR[i,] = r*(1-(X/K)^(theta[i]))

}

for(i in 1:length(theta)){

AGR[i,] = r*X*(1-(X/K)^(theta[i]))

}

matplot(X, t(PGR), xlim=c(0,55), ylim=c(0,1), type="l", lwd = 2, ylab="per capita growth rate", xlab="population size", main = "PGR profile")

matplot(X, t(AGR), xlim=c(0,55), type="l", lwd = 2, ylab="Absolute capita growth rate", xlab="population size", main = "AGR profile")

Few important notes from the Clark et al. (2010)

  • Note 1: Concave r–N curves are typical of so-called ‘r-selected’ organisms where density dependence acts strongly at lower densities, whereas convex curves arise from ‘K-selected’ species where density dependence acts to reduce growth only at higher densities (although these terms have fallen out of favour).

  • Note 2: The growth response is parameterized jointly by r_m and theta. For a population growing from low abundance, the parameter h reflects those aspects of a species’ evolved life history (demographic rates) that determine how abruptly growth slows as abundance interacts with resource availability (Freckleton et al. 2003; Sibly et al. 2005) and type of competition (Johst, Berryman & Lima 2008).

  • Note 3: When concave (theta < 1), the growth response (i.e. the ‘r–N curve’) ideally characterizes a population unable to recover quickly from extrinsic perturbations, whereas a convex growth response (theta > 1) implies that density feedback occurs mainly above some (relatively large) threshold abundance (Fowler 1981; Owen-Smith 2006), typical, for instance, of the population dynamics in large mammals (McCullough 1999; Owen-Smith 2006).


  • Allee Growth Equations: The Allee effect, named after ecologist W. C. Allee, corresponds to density-mediated drop in population fitness when they are small in numbers (Allee, 1931; Dennis, 1989; Fowler and Baker, 1991; Stephens and Sutherland, 1999). The harmful effects of inbreeding depression, mate limitation, predator satiation etc. reduce fitness as the population size decreases. For such dynamics, the maximum fitness is achieved by the species at an intermediate population size, unlike logistic or theta-logistic growth models. Such observations usually correspond to the mechanisms giving rise to an Allee effect. In recent decades, due to an increasing number of threatened and endangered species, the Allee effect has received much attention from conservation biologists. The related theoretical consequences and the empirical evidence have made the Allee effect an important component in both theoretical and applied ecology. In general, there are two types of Allee effects are considered in the natural populations across a variety of taxonomic groups, viz. component and demographic Allee effect. The component Allee effect modifies some component of individual fitness with the changes in population sizes or density. If the per capita growth rate is low at small density but remains positive is called the weak demographic Allee effect. The strong Allee effect is characterized by a threshold density below which the per capita growth rate is negative that leads to extinction deterministically. The critical density is called 120 the Allee threshold and has significant importance in conservation biology (Hackney and McGraw, 2001), population management (Myers et al., 1995; Liermann and Hilborn, 1997) and invasion control (Johnson et al., 2006). There are a large number of studies available in the literature on the mathematical modeling of the demographic Allee effect (see Table 3.1 Courchamp et al. (2008)).

A = 30; # Threshold population size

K= 80; # Carrying capacity of the environment

init.pop = c(10, 20, 40, 50, 60, 100) # Differential initial population size

AlleeFun = function(t, state, param){

with(as.list(c(state,param)),{

dX = r*X*(X/A - 1)*(1 - X/K);

return(list(c(dX)));

})

}

parameters = c(r=0.5 , K = 80, A = 30)

times = seq(0, 10, by = 0.01)

for(i in 1:length(init.pop)){

state = c(X = init.pop[i])

out = ode(y = state, times = times, func = AlleeFun, parms = parameters)

if(i==1)

plot(out, main = "Solution of Allee growth equation", ylim=c(0, 100), xlab = "time", ylab = "population size", col = i, lwd = 2)

else

lines(out, col = i, lwd = 2)

}

abline(h = A, col = 6, lwd=2, lty=2)

abline(h = K, col = 8, lwd=2, lty=3)

text(0, 83, "K")

text(0, 33, "A")

#Plotting of per capita growth profile

r=0.8; A=30; K=80

curve(r*(x/A - 1)*(1 - x/K), main ="PGR profile", 15, 90, col = 2, lwd = 2, xlab ="Population size", ylab = "per capita growth rate")

abline(h=0, lwd =2)

text(30, -.1, "A")

text(80, -.1, "K")

curve(r*x*(x/A - 1)*(1 - x/K), main ="AGR profile", 0, 90, col = 2, lwd = 2, xlab ="Population size", ylab = "absolute growth rate")

abline(h=0, lwd =2)

text(30, -4, "A")

text(80, -4, "K")

Exercise-1. Read about the packages alr3 and alr4. This will be very helpful while discussing nonlinear regression.

Exercise-2. Solve the exponential, logistic, theta logistic, gompertz and Allee model analytically and sketch the solution curve in R without using the function ode from deSolve package.

Nonlinear Regression

We have already discussed some aspects of linear regression using R particularly the use of the function lm in R. This function is particularly useful for the linear models or nonlinear models which can be converted to a linear model. However, there are many models that can not be written as a linear function of the parameters. In such a case, nonlinear regression modeling is required. In R the nls function takes care of the estimation of parameters for nonlinear function. Since we have already learned about the different growth curve models, I shall use these models as nonlinear function whose parameters are to be estimated from given data. To do this, we shall write functions for all the models.

The idea is that, for each of the growth model X_t = f(t), we shall simulate the data at different time points. Then we fit the model using the simulated data and obtain the parameter estimates using nls function. We shall compare the true curve and the estimated curve by means of graphs. Note that, we have already solved the growth equations in Exercise-16 to obtain X_t as a function of t. Run the following codes, the idea will be clear.

Note: Before going into the real data we shall study a little bit about the statistical simulation. The idea is very simple. The idea is to simulate data artificially. Every time, we may not have the data to fit a model. However, R will allow us to incorporate the randomness in the observations by artificially simulating random numbers.

Exponential Model

x0 = 5; r=0.6;

t = seq(0, 5, by = 0.1)

x = x0*exp(r*t) + rnorm(length(t), 0, 9)

data = data.frame(t, x)

plot(t, x, type = "p", xlab = "time", ylab = "Size")

legend('topleft', legend = "Exponential Model", bty="n")

curve(x0*exp(r*x), col = 4, lwd= 2, add = TRUE)

ExpModel = function(t, x0, r){

x0*exp(r*t)

}

fit = nls(x ~ ExpModel(t, x0, r), data = data, start = list(x0 = 6, r = 0.7))

summary(fit)

lines(t, fitted.values(fit), col = 2, lwd = 2)

legend("bottomright", legend = c("Estimated", "True"), col = c(2, 4), lwd = c(2,2), bty = "n")

Logistic Model

LogisticModel = function(t, x0, r, K){

K/(1+(K/x0 - 1)*exp(-r*t))

}

x0 = 5; r = 0.8; K = 20;

t = seq(0, 15, by = 0.3)

x = LogisticModel(t, x0 = 5, r = 0.8, K = 20) + rnorm(length(t), 0, 2)

data = data.frame(t, x)

plot(t, x, type = "p", xlab = "time", ylab = "Size")

legend('topleft', legend="Logistic Model", bty="n")

curve(K/(1+(K/x0 - 1)*exp(-r*x)), lwd = 2, col = 2, add = T)

fit = nls(x ~ LogisticModel(t, x0, r, K), data = data, start = list(x0 = 6, r = 1, K = 20) )

summary(fit)

lines(t, fitted.values(fit), col = 3, lwd = 3)

legend("bottomright", legend = c("Estimated", "True"), col = c(3, 2), lwd = c(2,2), bty = "n")

library(car)

deltaMethod(fit,"1/r*log(K/x0-1)") #Estimate and variance of function of parameters using #delta method

theta-Logistic Model

ThetaLogisticModel = function(t, r, x0, K, theta){

K/(1+exp(-theta*r*t)*((K/x0)^(theta)-1))^(1/theta)

}

x0 = 5; r = 0.4; K = 20; theta = 0.7;

t = seq(0, 15, by = 0.3)

x = ThetaLogisticModel(t, r = 0.4, x0 = 5, K = 20, theta = 0.7) + rnorm(length(t), 0, 1)

data = data.frame(t, x)

plot(t, x, type = "p", xlab = "time", ylab = "Size")

legend('topleft', legend="theta-Logistic Model", bty="n")

curve(ThetaLogisticModel(x, r, x0, K, theta ), lwd = 2, col = 3, add = TRUE)

fit = nls(x ~ ThetaLogisticModel(t, r, x0, K, theta), data = data,

start = list(x0 = 5, r = 0.4, K = 20, theta = 0.7))

lines(t, fitted.values(fit), col = 2, lwd = 2)

legend("bottomright", legend = c("Estimated", "True"), col = c(2, 3), lwd = c(2,2), bty = "n")

Gompertz Model

GompertzModel = function(t, x0, r, alpha){

x0*exp(r*(1-exp(-alpha*t))/alpha)

}

x0 = 5; alpha = 0.4; r = 0.5

t = seq(0, 15, by = 0.3)

x = GompertzModel(t, r = 0.5, x0 = 5, alpha = 0.4) + rnorm(length(t), 0, 1)

data = data.frame(t, x)

plot(t, x, type = "p", xlab = "time", ylab = "Size")

legend('topleft', legend = "Gompertz Model", bty = "n")

curve((x0*exp(r*(1-exp(-alpha*x))/alpha)), lwd = 2, col = 3, add = T)

fit = nls(x ~ GompertzModel(t, x0, r, alpha), data = data, start = list(x0 = 5, r = 0.5, alpha = 0.4))

lines(t, fitted.values(fit), col = 2, lwd = 2)

legend("bottomright", legend = c("Estimated", "True"), col = c(2, 3), lwd = c(2,2), bty = "n")

Bootstrapping nonlinear regression model

Note that we have already computed the bootstrap estimates of the parameters of the linear regression mdoel. The same thing can be done for nonlinear regression model also. First, we will show by explicitly writing codes and then we shall see the use of special packages in R to obtain the bootstrap distribution of the parameters of nonlinear models. We take the logistic model as a case study. It can be easily modified for other models.

LogisticModel = function(t, x0, r, K){

K/(1+(K/x0 - 1)*exp(-r*t))

}

control = nls.control(maxiter = 500)

x0 = 5; K = 20; r = 0.8

t = seq(0, 15, by = 0.3)

x = LogisticModel(t, x0 = 5, r = 0.8, K = 20) + rnorm(length(t), 0, 2)

data = data.frame(t, x)

B = 500

boot.x0 = numeric(B); boot.r = numeric(B); boot.K = numeric(B)

for(i in 1:B){

brow = sample(1:nrow(data), replace = T)

newdata = data[brow,]

fit = nls(x~LogisticModel(t, x0, r, K), control=control,

data = newdata, start = list(x0 = 5, K = 20, r = 0.8))

boot.x0[i] = coef(fit)[1]

boot.K[i] = coef(fit)[2]

boot.r[i] = coef(fit)[3]

}

par(mfrow=c(3,3))

hist(boot.x0, probability = T, col = "grey", main = "histogram of x0")

plot(boot.x0, boot.K, type = "p", col = "red")

plot(boot.x0, boot.r, type = "p", col = "red")

plot(boot.K, boot.x0, type = "p", col = "red")

hist(boot.K, probability = T, col="grey", main = "histogram of K")

plot(boot.K, boot.r, type = "p", col="red")

plot(boot.r, boot.x0, type = "p", col="red")

plot(boot.r, boot.K, type = "p", col="red")

hist(boot.r, probability = T, col="grey", main = "histogram of r")

Note: nls.control() can be used to control the estimation algorithm used by nls function. Type nls.control() in the console and check.

Note: Available methods for extracting various quantities from and nls() model fit.

  • anova Comparison of several nested model fits

  • coef Parameter estimates from model fit

  • confint Profile likelihood based confidence intervals from MASS package

  • fitted The prediction based on the fitted model on the used data set

  • residuals Residuals from the model fit

  • summary Summary of the output

  • vcov Variance-Covariance matrix of the parameter estimates.


  • Real Data Analysis

Data were collected on length of fish, Cirrhinus mrigala, at 12 consecutive time points for each of the four equi-spaced directions to be referred to as A, B, C and D, emanating at 45◦ from the center of the lake to its four corners. At each time point, 12 measurements were available. Fishes were combined in the hoop nets placed at an equal radial distance from the center in each of these directions. The details of the data set can be found in Bhowmick and Bhattacharya, 2014 (Mathematical Biosciences, Elsevier Publications). Click the link for the data:



setwd("C:/STUDY/Supervisor/Growth Project/R Programs") # setting of working directory


#LOCATION A

data.A = read.table("LocationA.txt", header = FALSE, sep = "\t", na.strings = "NA")

time = 1:12

class(data.A)

dim(data.A)

matplot(time, t(data.A), type = "p", col = 1, ylab = "Observed length (cm)", xlab = "Time (week)", main = "Location A")

x = colMeans(data.A)

t = 1:12

data = data.frame(t, x)

  • Fitting of exponential model to the real data

fit = nls(x ~ ExpModel(t, x0, r), data = data, start = list(x0 = x[1], r = 0.7))

summary(fit)

lines(t, fitted.values(fit), col = "orange", lwd = 2)

resid(fit)

mse = mean(resid(fit)^2)

  • Fitting of logistic model to the real data

fit1 = nls(x ~ LogisticModel(t, x0, r, K), data = data, start = list(x0 = x[1], r = 0.7, K = x[12]) )

summary(fit1)

lines(t, fitted.values(fit1), col = "red", lwd = 2)

resid(fit1)

mse1 = mean(resid(fit1)^2)

mse1

  • Fitting of theta logistic model to the real data

fit2 = nls(x ~ ThetaLogisticModel(t, r, x0, K, theta), data = data, start = list(x0 = x[1], r = 0.4, K = x[12], theta = 1))

summary(fit2)

lines(t, fitted.values(fit2), col = "blue", lwd = 2)

resid(fit2)

mse2 = mean(resid(fit2)^2)

  • Fitting of the gompertz model to the real data

fit3 = nls(x~GompertzModel(t, x0, r, alpha), data = data, start = list(x0 = x[1], r = 0.5, alpha = 0.4))

lines(t, fitted.values(fit3), col = "green", lwd = 2)

resid(fit3)

mse3 = mean(resid(fit3)^2)

legend("topleft", legend = c("Exponential", "Logistic", "Theta-logistic", "Gompertz"), col = c("orange","red", "blue", "green"), lwd = c(2,2,2,2), bty = "n")

  • Checking the accuracy measures for all the model fitting results

mse = mean(resid(fit)^2)

mse1 = mean(resid(fit1)^2)

mse2 = mean(resid(fit2)^2)

mse3 = mean(resid(fit3)^2)

c(mse, mse1, mse2,mse3)


mse # for exponential

mse1 # for logistic

mse2 # for theta-logistic

mse3 # for gompertz model

  • If MSE is considered as the measure of goodness of fit then theta-logistic model is the best model describing the data. We can also use AIC for selecting the best model.

aic = AIC(fit)

aic1 = AIC(fit1)

aic2 = AIC(fit2)

aic3 = AIC(fit3)

c(aic, aic1, aic2, aic3)


AIC(fit) # for exponential

AIC(fit1) # for logistic

AIC(fit2) # for theta-logistic

AIC(fit3) # for gompertz model

  • A model with a lower AIC is preferred over the model with a higher AIC. If AIC is considered as the measure of goodness of fit then the theta-logistic model is the best model describing the data.

  • You may have to take caution and check the definition of AIC. The maximum value of AIC may also give an indication of better model or the minimum value of AIC. It depends on the way how the AIC computation has been implemented in the software.

  • Another cautionary tale for AIC is that if all the candidates in the set of models are poor, still AIC returns the best model. (A critical discussion can be found in the Burnham and Anderson, 2002) (We may discuss in the class as well)

summary(fit2) #Summary of the best model.


  • So, in the above, we have fitted the size data with respect to time. In the next section, we will fit the RGR data over time. As discussed by Dr. Bhattacharya, RGR plays a critical role in biological growth curve analysis. One obvious statistical advantage is that, growth model expressed in terms of RGR contains one lesser parameter than the size based model.

  • Now in the following we shall fit the growth models in which relative growth rate has been expressed as a function of time. A natural question is that why to study the behavior of RGR as a function of time. In the context of the fish data analysis, scientists are interested to know during which week, the fish reach the growth rate of the fish was maximum.

  • The program fits the model R(t) = f(t), where R(t) is the relative growth rate at time point t. Most of the results appeared in the Bhowmick and Bhattacharya, 2014 (Mathematical Biosciences)

  • The section will directly connect the first session taken by Dr. Bhattachary where he stated that "Identification of growth profile is easier using the RGR profile than the size profile (11:21 AM, 12th February, 2019)


library(car) # This package may no longer be available.

options(digits=3)


#This function generates the summary of the output of nls run

get.summary <- function(fit){

params <- coef(fit); fitted <- fitted(fit); resid <- resid(fit)

std.err <- summary(fit)[["coefficients"]][,"Std. Error"]

p.val <- summary(fit)[["coefficients"]][,"Pr(>|t|)"]

mss <- sum((fitted - mean(fitted))^2); rss <- sum(resid^2)

R2 <- mss/(mss + rss); rmse <- sqrt(rss); AIC <- AIC(fit)

summary <- c(as.double(params[1]), as.double(p.val[1]), as.double(std.err[1]), as.double(params[2]), as.double(p.val[2]), as.double(std.err[2]), as.double(R2), as.double(AIC), as.double(rmse))

summary <- signif(summary, digits=4)

}

  • Loading of the RGR data over time

setwd("C:/STUDY/Supervisor/Growth Project/Latex_Growth_2/Figure")

Dataset = read.table("LocationA-RGR.txt", head=T)

tmp <- t(Dataset)

par(mfrow=c(1,1))

plot(1:11,tmp[,1], xlab = expression(bold(Time(weeks))), ylab = expression(bold(RGR)), xlim=c(1,12), ylim=c(0,.3))

for(i in 2:11) {

points(tmp[,i])

}

x0= mean(tmp[,1]); mean <-rep(NA,11); t <- 1:11; for(i in 1:11){mean[i] = mean(tmp[i,])}

  • The following section will contain the fitting of the models using the nonlinear regression techniques

nls.out.c.2 <- nls(mean~ b*exp(-a*t)*t^2, start = list(b=.0898, a=0.55))

lines(predict(nls.out.c.2), lwd=2, col= 2, lty=1)

nls.out.c.1 <- nls(mean~ b*exp(-a*t)*t^1, start = list(b=.14, a=0.35))

lines(predict(nls.out.c.1), lwd=2, col= 3, lty=1)


nls.out.exp <- nls(mean~ b, start = list(b=.14))

lines(predict(nls.out.exp), lwd=2, col= 4, lty=1)


nls.out.gom <- nls(mean~ b*exp(-a*t), start = list(b=.14, a =.55))

lines(predict(nls.out.gom), lwd=2, col= 5, lty=1)


nls.out.pow <- nls(mean~ b* log(1 + a/(1+a*t)), start = list(b=1.55, a =.0904))

lines(predict(nls.out.pow), lwd=2, col= 6, lty=1)


nls.out.log <- nls(mean~ b* exp(-b*t)*(a-x0)/(x0+exp(-b*t)*(a-x0)), start = list(b=.48, a =.4))

lines(predict(nls.out.log), lwd=2, col= 7, lty=1)


legend("topright", c("c=2", "c=1", "gompertz", "power", "logistic"), col=c(2,3,5,6,7),lty=c(1,1,1,1,1), lwd=c(2,2,2,2,2), bty="n")

  • Summarizing the output we have just fitted.

sum.nls.out.c.2 <- get.summary(nls.out.c.2 )

sum.nls.out.c.1 <- get.summary(nls.out.c.1 )

sum.nls.out.exp <- get.summary(nls.out.exp )

sum.nls.out.gom <- get.summary(nls.out.gom )

sum.nls.out.pow <- get.summary(nls.out.pow )

sum.nls.out.log <- get.summary(nls.out.log )

  • Combined summary of the results. A good way of generating output for mathematicians.

final.sum.header <- c("b", "Pr(>|t|)", "Std.Err(b)", "a", "Pr(>|t|)", "Std.Err(a)", "R-square", "AIC", "RMSE")

final.sum.location <- rbind(final.sum.header, sum.nls.out.c.2, sum.nls.out.c.1, sum.nls.out.exp, sum.nls.out.gom, sum.nls.out.pow, sum.nls.out.log)

View(final.sum.location)

library(xtable)

xtable(final.sum.location)

  • In the above we have fitted RGR model for the location A. In this case as well, we can select the best model using RGR profiles.


Model Selection using ISRP

    • we will fit here the data on the mean of the size measurements at each time point with respect to time.

setwd("C:/STUDY/Supervisor/Growth Project/R Programs")

Dataset <- read.table("LocationA.txt", header=FALSE, sep="\t", na.strings="NA", dec=".", strip.white=TRUE)

library(relimp, pos=4)

matplot(t(Dataset), type = "p", pch = "o", ylab = "size (in cm)", xlab = "time")


t <- seq(1, 12, 1)

t2 <- seq(1, 11, 1)

size_vals = as.vector(colMeans(Dataset)) #Location A


###### GOMPERTZ MODEL FIT ########

windows.options(width=10, height=10)

gomest <- nls(size_vals ~ x0*exp((b/c)*(1-exp(-c*t))), start = list(x0=1, b=0.32, c=0.2454298))

lines(t, fitted(gomest), lty = 1, col = "red", lwd=2)


###### LOGISTIC MODEL FIT ########

logest <- nls(size_vals ~ x0/(1+exp(b*(c-t))), start = list(x0=10, b=.34, c=1))

lines(t, fitted(logest), lty = 2, col = "green", lwd=2)


###### THETA-LOGISTIC MODEL FIT ####

log_theta_est <- nls(size_vals ~ a*(1 + 3.64 * exp((-b*t)/d))^(-d), start = list(a=10, b =.34, d=1))

lines(t, fitted(log_theta_est), lty = 2, col = "green", lwd=2)


###### EXPONENTIAL MODEL FIT ######

expest <- nls(size_vals~x0*exp(b*t), start = list(x0=1, b =1))

lines(t, fitted(expest), lty = 6, col = "blue", lwd=2)

legend("topleft", legend = c("Gompertz", "Logistic", "Exponential"), lty = c(1,2,6), col = c("red","green","blue"), lwd = c(3.5,3.5,3.5), bty = "n")


####### Fit of the extended growth models #######

matplot(t(Dataset), type = "p", pch = "o", ylab = "size (in cm)", xlab = "time")

#Fit of Proposed Models 1

Prop.Model1 <- nls(size_vals ~ d1*exp((b1/a1)*(1/a1-(exp(-a1*t))*(t+1/a1))), start = list(d1=5, b1=0.18, a1=0.48))

lines(t, fitted(Prop.Model1), lty = 1, col = "red", lwd=2)


#Fit of Proposed Models 2

Prop.Model2 <- nls(size_vals ~ d1*exp((b1/a1)*(1/a1-(exp(-a1*t))*(t^2+2*t/a1+2/a1^2))), start = list(d1=10, b1=0.15, a1=0.66))

lines(t, fitted(Prop.Model2), lty = 6, col = "blue", lwd=2)

legend("topleft", legend = c("Proposed Model 1", "Proposed Model 2"), lty = c(1,2), col = c("red","blue"), lwd = c(3.5,3.5), bty = "n")

In the above output (figures) we can see that almost all the models are going very close to training data and according to the existing model selection methods is very difficult to assess. For example, if the fitting of two models gives the AIC as 63.5 and 64.1. Both the model seem to fit very well. This fitting is always based on a sample data, hence this decision is subject to variation across samples. ISRP is a useful method to give the correct identification of the model. For the above fitted models let us compute ISRP over different time points for each of the candidates.

gomest <- nls(size_vals ~ x0*exp((b/c)*(1-exp(-c*t))), start = list(x0=1, b=0.32, c=0.2454298))

gom.isrp <- rep(NA, length(t)-1);

rate.param = as.numeric(coef(gomest)[2]);

C = as.numeric(coef(gomest)[3]);

gom.isrp <- (C*exp(C*t[2:12])/(exp(C)-1))*log(size_vals[2:12]/size_vals[1:11])

plot(1:11, gom.isrp, lty = 1, type = "o", col = 1, lwd = 2, ylim = c(0,0.55), xlab="Time Interval", ylab="ISRP")

abline(h = rate.param, lty = 1, col = 1, lwd = 2)


logcest <- nls(size_vals ~ x0/(1+exp(b*(c-t))), start = list(x0=10, b=.34, c=1))

log.isrp <- rep(NA, length(t)-1); A = as.numeric(coef(logcest)[1]);

rate.param = as.numeric(coef(logcest)[2]); C = as.numeric(coef(logcest)[3]);

log.isrp <- log(((A/size_vals[1:11]) - 1)/((A/size_vals[2:12]) - 1))

points(1:11, log.isrp, lty = 2, type = "o", col = 2, lwd=2, ylim=c(0,0.5))

abline(h = rate.param, lty = 2, col = 2, lwd = 2)


Prop.Model1 <- nls(size_vals ~ x0*exp((b/c)*(1/c-(exp(-c*t))*(t+1/c))), start = list(x0=5, b=0.2, c=.2))

Prop.Model1.isrp <- rep(NA, length(t)-1); A = as.numeric(coef(Prop.Model1)[1]);

rate.param = as.numeric(coef(Prop.Model1)[2]); C = as.numeric(coef(Prop.Model1)[3]);

Prop.Model1.isrp <- ((C*exp(C*t[1:11]) / (t[1:11]+(1/C)-exp(-C)*(t[1:11]+1+(1/C))))* log(size_vals[2:12]/size_vals[1:11]))

points(1:11, Prop.Model1.isrp, lty = 3, type = "o", col = 3, lwd=2, ylim=c(0,0.5))

abline(h = rate.param, lty = 3, col = 3, lwd = 2)


Prop.Model2 <- nls(size_vals ~ x0*exp((b/c)*(1/c-(exp(-c*t))*(t^2+2*t/c+2/c^2))), start = list(x0=7, b=0.08, c=0.5))

Prop.Model2.isrp <- rep(NA, length(t)-1); A = as.numeric(coef(Prop.Model2)[1]);

rate.param = as.numeric(coef(Prop.Model2)[2]); C = as.numeric(coef(Prop.Model2)[3]);

Prop.Model2.isrp <- ((C*exp(C*t[1:11]) / (t[1:11]*t[1:11]+(2/C^2) + 2*t[1:11]/C-exp(-C)*((t[2:12])^2+(2/C^2)+2*(t[2:12]/C))))* log(size_vals[2:12]/size_vals[1:11]))

points(1:11, Prop.Model2.isrp, lty = 4, type = "o", col = 4, lwd=2, ylim=c(0,0.5))

abline(h = rate.param, lty = 4, col = 4, lwd = 2)


legend("topright", legend = c("Gompertz", "Logistic", "Proposed Model 1", "Proposed Model 2"), lty = 1:4, col = 1:4, lwd = c(2,2,2,2), bty = "n")

  • Deviation of ISRP from the rate parameter line gives indication of the incorrect model specification. Thus, we can rank the preferences of the models and accordingly chose a model for the statistical inference about the population under investigation. The extent of departure between the line of the constant rate parameter of the corresponding growth law and the estimated ISRP reflects the deviation of the assumed model from the true law.

  • However, to develop it as a general statistical procedure it has to tested over multiple data sets. This can be carried out by bootstrapping which allows to generate sample from the same sample data values which has been collected from the population.

  • The following code will perform bootstrap on ISRP

setwd("C:/STUDY/Supervisor/Growth Project/R Programs")

Dataset <- read.table("LocationA.txt", header=FALSE, sep="\t", na.strings="NA", dec=".", strip.white=TRUE)

matplot(1:12, t(Dataset), type = "l", lty = 1:12, xlab = "Time (Weeks)", ylab = "Length of fish (in cm)", main = "Location A")


data <- data.matrix(Dataset) #Convert data to a matrix

ind <- 1:12 #Row indices that to be sampled

B = 1500 #Number of bootstrap replicates

t <- seq(1, 12, 1)

gom.sigma <- rep(NA, length.out=B)

log.sigma <- rep(NA, length.out=B)

mod1.sigma <- rep(NA, length.out=B)

mod2.sigma <- rep(NA, length.out=B)


for (i in 1:B){

row.ind <- sample(ind, replace = TRUE)

bootmat <- rbind(data[row.ind,])

boot.mean <- colSums(bootmat)/12;

#Gompertz fit

gomest <- nls( boot.mean ~ x0*exp((b/c)*(1-exp(-c*t))), start = list(x0=1, b=0.32, c=0.2454298))

gom.isrp <- rep(NA, length(t)-1); rate.param = as.numeric(coef(gomest)[2]); C = as.numeric(coef(gomest)[3]);

gom.isrp <- (C*exp(C*t[2:12])/(exp(C)-1))*log(boot.mean[2:12]/boot.mean[1:11])

gom.sigma[i] <- sd(gom.isrp)

logcest <- nls(boot.mean ~ x0/(1+exp(b*(c-t))), start = list(x0=10, b=.34, c=1))

log.isrp <- rep(NA, length(t)-1); A = as.numeric(coef(logcest)[1]);

rate.param = as.numeric(coef(logcest)[2]); C = as.numeric(coef(logcest)[3]);

log.isrp <- log(((A/boot.mean[1:11]) - 1)/((A/boot.mean[2:12]) - 1))

log.sigma[i] <- sd(log.isrp)

Prop.Model1 <- nls(boot.mean ~ x0*exp((b/c)*(1/c-(exp(-c*t))*(t+1/c))), start = list(x0=5, b=0.2, c=.2))

Prop.Model1.isrp <- rep(NA, length(t)-1); A = as.numeric(coef(Prop.Model1)[1]);

rate.param = as.numeric(coef(Prop.Model1)[2]); C = as.numeric(coef(Prop.Model1)[3]);

Prop.Model1.isrp <- ((C*exp(C*t[1:11]) / (t[1:11]+(1/C)-exp(-C)*(t[1:11]+1+(1/C))))* log(boot.mean[2:12]/boot.mean[1:11]))

mod1.sigma[i] <- sd(Prop.Model1.isrp)

Prop.Model2 <- nls(boot.mean ~ x0*exp((b/c)*(1/c-(exp(-c*t))*(t^2+2*t/c+2/c^2))), start = list(x0=7, b=0.08, c=0.5))

Prop.Model2.isrp <- rep(NA, length(t)-1); A = as.numeric(coef(Prop.Model2)[1]);

rate.param = as.numeric(coef(Prop.Model2)[2]); C = as.numeric(coef(Prop.Model2)[3]);

Prop.Model2.isrp <- ((C*exp(C*t[1:11]) / (t[1:11]*t[1:11]+(2/C^2) + 2*t[1:11]/C-exp(-C)*((t[2:12])^2+(2/C^2)+2*(t[2:12]/C))))* log(boot.mean[2:12]/boot.mean[1:11]))

mod2.sigma[i] <- sd(Prop.Model2.isrp)

}


library(Rlab)

par(mfrow=c(2,2))

hplot(gom.sigma, prob=TRUE, col="lightgrey", nclass = 10, main="Gompertz", xlab = expression(sigma(bold(d))))

hplot(log.sigma, prob=TRUE, col="lightgrey", nclass = 10, main="Logistic", xlab = expression(sigma(bold(d))))

hplot(mod1.sigma, prob=TRUE, col="lightgrey", nclass = 10, main="Proposed Model 1", xlab= expression(sigma(bold(d))))

hplot(mod2.sigma, prob=TRUE, col="lightgrey", nclass = 10, main="Proposed Model 2", xlab= expression(sigma(bold(d))))


alpha <- 0.05

ci.gom <- quantile(gom.sigma, c(alpha/2, 0.5, 1 - alpha/2))

ci.log <- quantile(log.sigma, c(alpha/2, 0.5, 1 - alpha/2))

ci.mod1 <- quantile(mod1.sigma, c(alpha/2, 0.5, 1 - alpha/2))

ci.mod2 <- quantile(mod2.sigma, c(alpha/2, 0.5, 1 - alpha/2))


#Plot of confidence interval

par(mfrow=c(1,1))

names <- c("Gompertz", "Logistic", "Model 1", "Model 2")

ci.low <- c(as.numeric(ci.gom[1]), as.numeric(ci.log[1]), as.numeric(ci.mod1[1]), as.numeric(ci.mod2[1]))

ci.up <- c(as.numeric(ci.gom[3]), as.numeric(ci.log[3]), as.numeric(ci.mod1[3]), as.numeric(ci.mod2[3]))

ci.mid <- c(as.numeric(ci.gom[2]), as.numeric(ci.log[2]), as.numeric(ci.mod1[2]), as.numeric(ci.mod2[2]))

x <- 1:4*2-1

plot(ci.mid~x, cex=2, xlim=c(0,8), ylim=c(0.01,.2), xaxt="n", xlab="", ylab= expression(paste("CI of ",sigma(bold(d)))), lwd=3, col="red")

axis(1, at=x, labels=names)

arrows(x, ci.low,x, ci.up, code=3, length=0.1, angle=90, lwd=3)

Introduction to R Programming for Biologists

x = 3

y = 4

print(x)


z = x + y

print(z)

print(z)


z = x-y

print(z)


z = x*y

print(z)


z = x^2

print(z)


# Idea of storing numbers in R

time = 1:12

print(time)

length(time)

size = c(3.4, 4.2, 4.6, 5.1, 5.8, 6.1, 6.3, 6.8,7.1,9.3, 9.5, 9.9)

length(size)

data = data.frame(time, size)

print(data)

View(data)


plot(time, size)

plot(time, size, col = "red")

plot(time, size, col = "red", type = "l")

plot(time, size, col = "red", type = "b")

plot(time, size, col = "red", type = "b", pch = "*")

plot(time, size, col = "red", type = "b", pch = "*", cex = 3)

plot(time, size, col = "red", type = "b", pch = "*", cex = 3, lwd = 2)

plot(time, size, col = "red", type = "b", pch = "*", cex = 3, lwd = 2, xlab ="Time (weeks)")

plot(time, size, col = "red", type = "b", pch = "*", cex = 3, lwd = 2, xlab ="Time (weeks)", ylab = "Size (in cm)")


size2 = c(4, 4.2, 3.9, 4.8, 5.2, 6.1, 6.6, 7.1, 6.9, 7.4, 8.1, 8.9)

length(size2)


data = cbind(data, size2)

View(data)


lines(time, size2, col = "blue", type = "b", pch = "*", cex = 3, lwd = 2, xlab ="Time (weeks)", ylab = "Size (in cm)")

legend("topleft", legend =c("Fish 1", "Fish 2"), col = c("red","blue"), lwd=c(2,2), pch = c("*","*"))


# A little bit of fitting

plot(time, size2, col = "blue", type = "b", pch = "*", cex = 3, lwd = 2, xlab ="Time (weeks)", ylab = "Size (in cm)", cex.lab = 2)

data = data.frame(time, size2)

fit = lm(size2 ~ time, data = data)

summary(fit)

coef(fit)

abline(fit, col = "red", lwd=2)


Important Research Papers

  1. Pal, Arijit, Bhowmick, A.R., Yeasmin, F. and Bhattacharya, S. Evolution of Model Specific Relative Growth Rate: Its Genesis and Performance Over Fisher's Growth Rates, Journal of Theoretical Biology (doi: 10.1016/j.jtbi.2018.02.012) Vol-444, pp 11-27, 2018.

  2. Chakraborty, B., Bhowmick, A. R., Chattopadhyay, J. and Bhattacharya, S. Physiological Responses of Fish under Environmental Stress and extension of Growth (Curve) Models, Ecological Modelling (2017) Vol 363, pp 172-186, November 10, 2017.

  3. Mukhopadhyay, S. Hazra, A, Bhowmick, A. R., Bhattacharya, S. On Comparison of Relative Growth Rates under Different Environmental Conditions with Application to Biological Data. Metron. (2016) DOI: 10.1007/s40300-016-0102-y. December 2016, Vol 74, Issue 3, pp 311-337.

  4. Bhowmick, A. R., Bhattacharya, S. A New Growth Curve Model for Biological Growth: Some Inferential Studies on the Growth of Cirrhinus mrigala. Mathematical Biosciences 254, 28-41 (2014) DOI: 10.1016/j.mbs.2014.06.004

  5. Bhowmick, A. R., Chattopadhyay, G.and Bhattacharya, S. Simultaneous Identification of Growth Law and Estimation of Rate Parameter: A New Approach. Journal of Biological Physics 40(1) 71-95 (2014). doi: 10.1007/s10867-013-9336-6

  6. Bhattacharya, S.,Basu, A. N. and Bandyopadhyay, S. "Goodness-of-Fit Testing for Exponential Polynomial Growth Curves". (Communications in Statistics - Theory and Methods, 38: 1-24, 2009).