# ratingCriteria

### You score as much as possible!

For each of the criteria below, we will rank all participating tools.

Similar to ski racing, n equally performing tools will share the same rank, leading to n-1 vacant ranks thereafter.

1. Number of states:

• n= 2, 3, ...100?
• The closer to the ground truth model, the better.

2. Goodness of the kinetic rate constants: the closer to the ground truth model, the better.

How to assess models that may differ from the original connectivity between states?

The rate constants are rated as follows:

• For each state, dwell time histograms are simulated for the ground truth model and the inferred rate model (with the same parameters as the challenge dataset).
• Both are compared to the dwell time histogram of the challenge dataset using the Cramér–von Mises criterion (i.e. the integrated squared difference between the cumulative dwell time distributions).
• The absolute difference of the two integrals (one for the ground truth & one for the inferred model) is computed for each state, and summed over all states.
• The smaller the sum, the better the goodness of fit.
• If degenerate states are present in the challenge dataset, combined dwell time histograms are compiled.

3. Error bounds:

The uncertainty measures are most difficult to rate in a meaningful way, in part because various definitions co-exist (standard deviations, confidence intervals, credibility intervals, ...).

However, a strict lower limit for the uncertainty of each rate is given by the variability of the finite dataset itself: multiple simulations of 150 time traces show a certain spread in their dwell time distributions. We can use this spread – i.e. the standard deviation observed within 200 simulated datasets for each ground truth model – as a lower limit for the uncertainty of the rate constants. No analysis method can be more precise than this intrinsic uncertainty.

Further, we suggest the following expression as a measure to compare different results – the smaller C, the better:

This is a pragmatic solution. Please, let us know if you know a better one.

(Criterion 3 was slightly modified compared to the first version, to make it less dependent on the rate deviation Δk, which is already rated by criterion 2.)

Sketch of original & inferred rates k, and reported uncertainty interval Δe.

Bonus: Ease of use

We encourage developers to keep the end user in mind. Thus, user friendly aspects, such as comprehensible documentation, video tutorials, etc. will be considered & reported as a bonus.

Hypothetical overall performance plot comparing several tools, as well as their scores regarding criteria 1, 2, 3 in red, yellow, and green, respectively. Score = n - rank.