lre07

LRE07

Subsystem score format spec

THE BIG PICTURE:

For a given (P,L), where P is a partition generated by Albert and L is a list of subsystems, we can create a (fused) submittable system, which outputs (C,E), where C is a set of development-test criteria and E is a set of submit scores for the new eval data. That is, our whole process is

  (P,L)  -->  (C,E).

We can then use C to decide whether we submit E or not and which E will be our primary system.

Here it is the joint job of Albert and David to do (P,L) --> Z = [Z_1, Z_2, ...], where the Z_k are (zipped) score-bundles for each subsystem k. It is Niko's job to do Z --> (C,E). The purpose of this email is to clarify the format for the score bundle Z_k:

SCORE BUNDLE SPEC:

Every score bundle must include:

(As discussed, I can allow system score-vectors to be wider than 14, but please let the first 14 be those of the target languages in alhabetical order.)

1. 100 x back.i.j.wide score files.

2. 10 x test.i.wide score files. Each test.i.wide is tested only on the trials of test jack i. As for the score-producing SVMs, there are two options here and if you generate both, I can test which works better.

a. SVM or GMM scores are averaged over 10 inner jacks. (Linear SVMs can be averaged before scoring).

b. SVM or GMM is re-trained over all available (or desired) data, but excluding trials in test jack i.

(I think a is easier and quicker and probably quite OK, but I can test if b helps).

3. 10 x sanity.i.wide score files. These scores are produced by the same SVMs as test.i.wide, but run over all trials in all 10 test jacks. So sanity.i.wide is 10 times larger than test.i.wide.

4. 10 x eval.i.wide score files. These scores are produced by the same SVMs as test.i.wide, but run over all 7530 eval trials.

Here 3 is mostly a mechanism to ensure the correctness of all of our software. I'll process the sanity scores in exactly the same was as the eval scores, all the way to a submit file and we have the answer key so we can check it. If sanity submit file is OK, then we can assume we went through all the right steps top produce the eval submit file.

Finally: SUMMARY OF BE FUNCTION:

- My code will produce 10 separate backends B_i for every outer jack i with 10 separate dev-test results C_i=(Cllr_i,CDET_i) which can be averaged C = (Cllr,CDET). We'll use C for final submit decisions.

- For sanity and eval average over jacks: a_t = sum_i B_i * (s_it), whete s_it are the sanity/eval scores calculated by David and Albert, for every jack i and every sanity/eval trial t.

- Process sanity/eval trial scores a_t to nist submit-file format.

- Evaluate CDET on sanity submit file for sanity check.