Blag

Violinplotter version 2 is out

[2021-04-01]

Violinplotter version 2 is out with the option to:

  • plot sample size of 1 as a point,

  • append sample sizes, and

  • include or exclude specific error bars.

Plus a quick fix (parsing explanatory variables) is on-track to be released as version 2.0.1.

Fitting logistic growth rate with optim()

[2021-03-26]

Given a logistic growth function of the form:

logistic_func = function(y_max, k, t0, t){

y_max / (1 + exp(-k*(t-t0)))

}

where y_max is the maximum level reached (e.g. maximum biomass or maximum number of individuals in a population), k is the maximal growth rate, t0 is the time at which the growth rate is at maximum, and t is the time. We can fit our growth data as a logistic function of time using the cost function:

cost_func = function(par, data){

y_max = par[1]

k = par[2]

t0 = par[3]

t = data[,1]

y = data[,2]

y_pred = logistic_func(y_max=y_max, k=k, t0=t0, t=t)

return(sum((y-y_pred)^2))

}

We can use R's optimisation function `optim()` while being mindful of the minimum and maximum possible parameter values, i.e. try not too use infinities in the lower and upper bounds of the parameter search space.

To illustrate, given the following growth data across 13 days on 2 treatments:

TRT = c("A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B")

DAY = c(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)

GROWTH = c(0, 0, 0, 3, 16, 22, 69, 82, 104, 122, 132, 137, 138, 0, 0, 0, 17, 46, 64, 107, 123, 135, 139, 139, 139, 139)

We fit these data on a logistic function by finding the optimum parameter values:

vec_colours = list(A="red", B="blue")

vec_ymax = c(); vec_k=c(); vec_t0=c()

plot(x=DAY, y=GROWTH, xlab="DAY", ylab="GROWTH", type="n"); grid()

for (trt in c("A", "B")){

optim_out = optim(par=c(1, 1e-4, 1),

fn=cost_func,

data=cbind(DAY[TRT==trt], GROWTH[TRT==trt]),

method="L-BFGS-B",

lower=c(0, 1e-8, 0),

upper=c(200, 3, 12))

y_max = optim_out$par[1]

k = optim_out$par[2]

t0 = optim_out$par[3]

### plot

t = seq(min(DAY[TRT==trt]), max(DAY[TRT==trt]), length=100)

y_pred = y_max / (1 + exp(-k*(t-t0)))

points(x=DAY[TRT==trt], y=GROWTH[TRT==trt], type="b", pch=20, col=eval(parse(text=paste0("vec_colours$", trt))))

lines(x=t, y=y_pred, col=eval(parse(text=paste0("vec_colours$", trt))))

vec_ymax=c(vec_ymax, y_max); vec_k=c(vec_k, k); vec_t0=c(vec_t0, t0)

}

legend("bottomright", legend=c(paste0("A: ymax=", round(vec_ymax[1]), "; k=", round(vec_k[1],2), "; t0=", round(vec_t0[1])),

paste0("B: ymax=", round(vec_ymax[2]), "; k=", round(vec_k[2],2), "; t0=", round(vec_t0[2]))),

col=unlist(vec_colours), pch=19, lty=1)



Arimedhargeomean

[2021-03-12]

Inspired by Randall Munroe's post, I made a recursive R function which calculates the arithmetic mean, median, harmonic mean, and geometric mean until they converge. The geometric mean is dropped if the product of the input is negative. :-P

F = function(x, epsilon=0.0001) {

if (sd(x) < epsilon) {

out = x[1]

} else {

mu_1 = mean(x)

mu_2 = median(x)

mu_3 = length(x) / (sum(1/x))

prod = 1

for (i in x) {

prod = prod * i

}

mu_4 = (prod)^(1/length(x))

if (is.na(mu_4)==TRUE){

out = F(x=c(mu_1, mu_2, mu_3))

} else {

out = F(x=c(mu_1, mu_2, mu_3, mu_4))

}

}

return(out)

}

Test:

x1 = 1:10

x2 = rnorm(100)

F(x=x1)

F(x=x2)

[2021-03-11]

The Bulmer effect is the reduction in the additive genetic variance of a highly polygenic trait after every generation under directional selection. This effect is entirely due to the correlation between pairs of loci as a result of linkage (physical proximity) or random associations.

Why combustion releases energy?

[2021-01-19]

From Schmidt-Rohr (2015):

"The double bond in O2 is much weaker than other double bonds or pairs of single bonds, and therefore the formation of the stronger bonds in CO2 and H2O results in the release of energy, which is given off as heat or increases thermal motion. This explains why fire is hot regardless of fuel composition. The bond energies in the fuel play only a minor role; for example, the total bond energy of CH4 is nearly the same as that of CO2."

I have been used to thinking that breaking the strong bonds in fuel releases energy; however it's the formation of the stronger bonds in the combustion products, i.e. CO2 and H2O that releases energy. Energy is required to break the strong bonds between CO2, it follows that generating the same bonds in CO2 releases energy.

Fst

[2021-01-08]

In simple terms, Fst measures the ratio between the observed heterozygosity (Ho) and the expected heterozygosity (He).

The expected heterozygosity can be computed following the Hardy-Weinberg equilibrium principle, e.g. for a biallelic locus this is 2p(1-p) where p is the frequency of one of the alleles.

On one extreme, if the ratio between observed and expected heterozygosities in a population is zero, i.e. Ho/He = 0, then this means that the individuals or subpopulations are not intercrossing (i.e. assortative mating and/or selfing and/or asexually reproducing). Hence the population is said to be highly structured. In terms of Fst:

Fst = 1 - (Ho/He).

Therefore, if Ho/He = 0 then Fst = 1, which means highly structure populations.

On the other extreme, a population with Ho/He = 1 or Fst = 0, means that the individuals or subpopulations are intercrossing freely as if they were a single panmictic population, hence the population is said to have no structure.

Confidence interval

[2021-01-08]

Confidence interval = ± z * standard deviation(x), where z is the number of standard deviations from the mean of the standard normal distribution we need to be to get a specific two-tailed significance level (e.g. for 95% significance this means ±1.96 standard deviations away from the mean, and ±2.58 standard deviations for 99% significance).

A 95% confidence interval is NOT interpreted as, there is 95% chance that the true mean lies within this interval.

A 95% confidence interval is an estimate specific to the experiment performed. If the experiment were to be repeated, a different confidence interval is likely to be estimated.

What 95% confidence interval mean is that, if we were to repeat the experiment multiple times, we expect that 95% of the confidence intervals we estimated independently from each experiment will contain the true mean.

It's not the specific confidence interval which has the 95% chance of containing the true mean; but the collection of multiple confidence intervals estimated from multiple independent experiments.

Be careful with the precision of numbers in R

[2020-01-07]

In reality, 10,000,000 / 3 = 3,333,333.333...

However, computing this in R will round this to 3,333,333. No decimal places.

Likewise, instead of showing that 10,000,001 / 5 = 2,000,000.20, R rounds the answer to the integer 2,000,000.

In Python, this rounding off is suppressed by implying that either or both the divisor and dividend are floating point numbers, e.g. 10000001.0/5

In Julia, the quotient is expressed in scientific notation, and so this is avoided.

However, it is important to note here that limits in precision exist regardless of the language used. But my point here is that, the limit in R is 1x10^7.