Submission Format
Participants in the SqCLIR shared task must follow the standard TREC ad hoc submission format. Each submission file should contain a set of ranked results for a corresponding set of topics, with each result appearing on a new line. Each line must consist of six whitespace-separated entries in the following format:
Query ID – Derived from the Spoken Query file name.
The constant string “Q0”.
Document ID – Retrieved from the document collection.
Rank – The rank assigned to the document.
Score – Either an integer or floating-point value assigned by the retrieval system.
Run ID – Described in more detail below.
Please note the following specific requirements:
Field 1 (Query ID) should match the spoken query file name.
All results for each query must appear consecutively within the file.
Field 3 (Document ID) corresponds to the ID used in the document collection.
The scores in Field 5 must be listed in non-increasing order for each query. If multiple documents have the same score, the TREC evaluation software will order them alphabetically by document ID, rather than by the order they appear in the file.
Fields 2 and 4 are ignored by the evaluation software but must be included in the file for proper formatting.
An example of a correctly formatted submission is shown below:
1 Q0 DOC1 1 2.73 en-team1-ADBT-run1
1 Q0 DOC2 2 2.71 en-team1-ADBT-run1
1 Q0 DOC3 3 2.61 en-team1-ADBT-run1
1 Q0 DOC4 4 2.05 en-team1-ADBT-run1
1 Q0 DOC5 5 1.89 en-team1-ADBT-run1
Each query can return up to a maximum of 1,000 ranked results. Submissions exceeding this limit will be truncated accordingly.
Run ID
Run IDs must start with a string identifying the document collection, chosen from the following options:
en- for English FIRE collection documents
hi- for Hindi FIRE collection documents
bn- for Bengali FIRE collection documents
gu- for Gujarati FIRE collection documents
The second field in the Run ID must be your registered team name, followed by a dash.
Additionally, the Run ID may include a third field, composed of characters representing the retrieval characteristics of the run. This third field is optional but recommended for better categorization. The characters represent the following aspects:
A: Automatic run (no human intervention)
M: Manual run (with human intervention)
T: Translated documents (using machine translation)
N: Native language documents (without translation)
K: Transcribed queries (using speech-to-text models)
E: English queries
H: Hindi queries
G: Gujarati queries
B: Bengali queries
S: Sparse retrieval
D: Dense retrieval
L: Learned sparse retrieval
H: Hybrid retrieval (combination of dense, sparse, or learned sparse methods)
For example:
en-TEAM3-ADEKT-MyTestRun: Represents an automatic run using dense retrieval on machine-translated Bengali documents, with queries transcribed from spoken English.
gu-TEAM3-AGD-MyTestRun: Represents an automatic run with Gujarati documents and Gujarati queries.
gu-TEAM3-MyTestRun: Represents a run with no further specific retrieval characteristics indicated.
The informational prefix in the Run ID is optional but will be used to categorize runs in the final track results if included. Participants are encouraged to include this prefix for proper categorization.
Evaluation
For both tasks, retrieval results will be evaluated using the standard MAP metric for ranking quality, as well as MRR, Recall@100, and Recall@1000.