ratingCriteria
You score as much as possible!
For each of the criteria below, we will rank all participating tools.
Similar to ski racing, n equally performing tools will share the same rank, leading to n-1 vacant ranks thereafter.
1. Number of states:
- n= 2, 3, ...100?
- The closer to the ground truth model, the better.
2. Goodness of the kinetic rate constants: the closer to the ground truth model, the better.
How to assess models that may differ from the original connectivity between states?
The rate constants are rated as follows:
- For each state, dwell time histograms are simulated for the ground truth model and the inferred rate model (with the same parameters as the challenge dataset).
- Both are compared to the dwell time histogram of the challenge dataset using the Cramér–von Mises criterion (i.e. the integrated squared difference between the cumulative dwell time distributions).
- The absolute difference of the two integrals (one for the ground truth & one for the inferred model) is computed for each state, and summed over all states.
- The smaller the sum, the better the goodness of fit.
- If degenerate states are present in the challenge dataset, combined dwell time histograms are compiled.
3. Error bounds:
The uncertainty measures are most difficult to rate in a meaningful way, in part because various definitions co-exist (standard deviations, confidence intervals, credibility intervals, ...).
However, a strict lower limit for the uncertainty of each rate is given by the variability of the finite dataset itself: multiple simulations of 150 time traces show a certain spread in their dwell time distributions. We can use this spread – i.e. the standard deviation observed within 200 simulated datasets for each ground truth model – as a lower limit for the uncertainty of the rate constants. No analysis method can be more precise than this intrinsic uncertainty.
Further, we suggest the following expression as a measure to compare different results – the smaller C, the better:
This is a pragmatic solution. Please, let us know if you know a better one.
(Criterion 3 was slightly modified compared to the first version, to make it less dependent on the rate deviation Δk, which is already rated by criterion 2.)
Sketch of original & inferred rates k, and reported uncertainty interval Δe.
Bonus: Ease of use
We encourage developers to keep the end user in mind. Thus, user friendly aspects, such as comprehensible documentation, video tutorials, etc. will be considered & reported as a bonus.
Hypothetical overall performance plot comparing several tools, as well as their scores regarding criteria 1, 2, 3 in red, yellow, and green, respectively. Score = n - rank.