ratingCriteria

You score as much as possible!

For each of the criteria below, we will rank all participating tools.

Similar to ski racing, n equally performing tools will share the same rank, leading to n-1 vacant ranks thereafter.

1. Number of states:

n= 2, 3, ...100?
The closer to the ground truth model, the better.

2. Goodness of the kinetic rate constants: the closer to the ground truth model, the better.

How to assess models that may differ from the original connectivity between states?

The rate constants are rated as follows:

For each state, dwell time histograms are simulated for the ground truth model and the inferred rate model (with the same parameters as the challenge dataset).
Both are compared to the dwell time histogram of the challenge dataset using the Cramér–von Mises criterion (i.e. the integrated squared difference between the cumulative dwell time distributions).
The absolute difference of the two integrals (one for the ground truth & one for the inferred model) is computed for each state, and summed over all states.
The smaller the sum, the better the goodness of fit.
If degenerate states are present in the challenge dataset, combined dwell time histograms are compiled.

3. Error bounds:

The uncertainty measures are most difficult to rate in a meaningful way, in part because various definitions co-exist (standard deviations, confidence intervals, credibility intervals, ...).

However, a strict lower limit for the uncertainty of each rate is given by the variability of the finite dataset itself: multiple simulations of 150 time traces show a certain spread in their dwell time distributions. We can use this spread – i.e. the standard deviation observed within 200 simulated datasets for each ground truth model – as a lower limit for the uncertainty of the rate constants. No analysis method can be more precise than this intrinsic uncertainty.

Further, we suggest the following expression as a measure to compare different results – the smaller C, the better:

This is a pragmatic solution. Please, let us know if you know a better one.

(Criterion 3 was slightly modified compared to the first version, to make it less dependent on the rate deviation Δk, which is already rated by criterion 2.)

Sketch of original & inferred rates k, and reported uncertainty interval Δe.

Bonus: Ease of use

We encourage developers to keep the end user in mind. Thus, user friendly aspects, such as comprehensible documentation, video tutorials, etc. will be considered & reported as a bonus.

Hypothetical overall performance plot comparing several tools, as well as their scores regarding criteria 1, 2, 3 in red, yellow, and green, respectively. Score = n - rank.