Conversational Multi-Doc QA

Workshop & International Challenge @ WSDM'24 - Xiaohongshu.Inc

Introduction

Despite progress in large language model-based chatbots, conversational question-answering (QA) is still challenging, especially with current or trending topics. A typical solution involves providing relevant documents for the models to reference. However, it's often noted that these documents can overwhelm or mislead the language models.

In this challenge, we invite you to participate in a conversational QA challenge, featuring a mix of relevant and irrelevant documents from Xiaohongshu. Your systems will be trained on real-world data and assessed based on criteria evaluating both lexical and semantic relatedness. The top-3 teams will be awarded prizes of $1500, $1000, and $500 USD, respectively.

Quick start guide

Download the train dataset, we will send you the download link after filling in Dataset Download Agreement.
Join the community on Codabench for feedback and discussion about the competition https://www.codabench.org/forums/1691/.
Submit your entry via the competition server https://www.codabench.org/competitions/1772.
Share the prize based on your rank on the Phase 2 (Test set) leaderboard! 🤑😃🥳

Dates

Dec 25, 2023: 🚀 We launch Phase 1 (Eval set) in our Conversational Multi-Doc QA competition.
Feb 1, 2024: Move to Phase 2 (Test set). Rankings achieved in this phase will be used to determine the final award list. & Competition registration close.
Feb 15, 2024 (11:59PM Pacific Time): Phase 2 evaluation server close.
March 4-8, 2024: Conversational Multi-Doc QA workshop in Mérida, Yucatán, Mexico. Presentations by the top-3 winner team.

Organizers & Sponsors

Xiaohongshu.Inc

Dataset

Download: Please sign the Dataset Download Agreement, we will send you the download link within 2 days.
Format: both training/eval/test data will be given in `json` format, each sample includes the following fields:
- uuid: string, a unique identifier for each example
- history: list of tuples of strings, sequential QA pairs
- documents: list of strings, at most 5 reference documents
- question: string, user question
- answer: string, reference answer (not given in eval/test data)
- keywords: list of strings, reference keywords that should better be mentioned in the reference answer (not given in both training/eval/test set)
Example:

# Training example.

{

"uuid": "xxxxx",

"history": [

{"question": xxx, "history": xxx},

...

"documents":

[

"Jun 17th through Fri the 21st, 2024 at the Seattle Convention Center, Vancouver Convention Center.", "Workshops within a “track” will take place in the same room (or be co-located), and workshop organizers will be asked to work closely with others in their track ...",

...

"question": "Where will CVPR 2024 happen?",

"answer": "CVPR 2024 will happen at the Seattle Convention Center, Vancouver.",

"keywords": # Will not be given.

[

"Vancouver", "CVPR 2024", "Seattle Convention Center"

]

}

# submission example for eval/test phase.

[

{

"uuid": "xxxxx",

"prediction": "CVPR 2024 will happen at the Seattle Convention Center, Vancouver."

...

]

Rules

Your models are required to answer user questions based on the conversational history and the provided reference documents
- Input: History, Reference Documents, Question
- Output: Answer.
Model Scale Requirement: Ensure your model size is fewer than 14 billion (14B) parameters. The overall solution will be reviewed after the submission deadline.
- Q1: What exactly does "model size" refer to? In scenarios involving the use of Mixture of Experts (MoE) or ensemble models, are we considering the total combined size, the size of the largest individual model, or the size of the model actually employed during inference?
- A1: The term "model size" refers to the total combined size of all models used in an MoE or ensemble setup, i.e., 8x7B MoE model is considered as 56B though it only uses 2x7B parameters when inference.
- Q2: Is there a method to verify the model size?
- A2: Yes, you can check the size of your model using the following code snippet:

```

model = all_inference_models()

model_size = sum(p.numel() for p in model.parameters()) / 10**9

assert model_size <= 14

```

Evaluation

Submission:
- We use codabench to hold the competition, please refer to https://www.codabench.org/competitions/1772/ for details.
- Format: participants should submit their results in `json` format in which a line is actually an example that includes the following fields,
  1. - uuid: int, a unique identifier for each test example
    - prediction: string, your answer
Criterion
- Metrics:
  1. Keywords Recall: whether the answers contains the truths and the specific keywords by exact matching (see keywords filed in example data).
  2. Character-level ROUGE-L: whether the answers are similar to reference answers by fuzzy character-level matching (see answer field in example data).
  3. Word-level ROUGE-L: whether the answers are similar to reference answers by fuzzy word-level matching (see answer field in example data). (Feb 1, 2024 updated)

Ranking Procedure:
1. The overall performance will be determined by examining the mean rank of the above metrics on the Phase 2 (Test set) leaderboard.
2. In cases where teams have same mean ranks, preference will be given to the team with the higher Word-level ROUGE-L score.

Report Submission Portal

For Top-3 team in final Phase 2 leaderboard, please send an email to docqa.wsdm24@gmail.com.

Format of Email subject: “YourName-Submission-WSDM24-MultiDocQA”.
Please include metadata like your team members, institution, etc.
Attach your technical report and other relevant materials to the email.
Attach your code, checkpoint, and a detailed guide to reproduce the results. Ensure that the model size does not exceed 14B parameters.

Contact

If you have a question about the submission format or if you are still having problems with your submission, please create a topic in the competition forum (rather than contact the organizers directly by e-mail) and we will answer it as soon as possible.
The Conversational Multi-Doc QA competition @ WSDM'24 was created by:

Yan Gao, Lingzhi Li, Chen Zhang, Shiwei Wu (Xiaohongshu.Inc)

Please contact docqa.wsdm24@gmail.com or zhangchen3@xiaohongshu.com or wushiwei@xiaohongshu.com if you have further questions.

WeChat Group link:

Leaderboard

See the competition page for the leaderboard: https://www.codabench.org/competitions/1772/.
Join the community on Codabench for feedback and discussion about the competition https://www.codabench.org/forums/1691/.

Page updated

Google Sites

Report abuse