ALQAC 2026 introduces a single shared task on Vietnamese legal case understanding. Given a short case query derived from a Vietnamese court judgment, participating systems must predict whether the plaintiff or the defendant wins the case. In addition to the final prediction, systems are expected to retrieve supporting evidence from the case-content corpus and relevant legal provisions from the law corpus.
The task is designed to evaluate agentic legal AI systems that can combine case-level factual understanding, legal provision retrieval, evidence grounding, and outcome prediction. Instead of providing the full judgment directly to participants, the organizers expose segmented case content through official APIs. This setting encourages systems to actively search for relevant information, reason over retrieved evidence, and produce a verifiable prediction.
For each test instance, participants are given a short natural-language query describing the dispute. The query includes the main parties, the disputed legal relationship or asset, a brief summary of the plaintiff's claim and the defendant's position, and a question asking whether the plaintiff or the defendant is more likely to win.
Participants must build a system that:
1. Reads the provided case query.
2. Calls the official Case Content API to retrieve relevant case segments.
3. Retrieves relevant legal provisions from the provided law corpus.
4. Predicts the final outcome of the case.
5. Submits the predicted outcome together with supporting case evidence and legal provisions.
The competition contains only one task:
The expected prediction is one of four labels:
- A_WIN: The court fully accepts all of the plaintiff's claims.
- PARTIAL_A_WIN: The court partially accepts the plaintiff's claims, and the accepted portion is greater than 50%.
- PARTIAL_B_WIN: The court partially accepts the plaintiff's claims, but the accepted portion is 50% or less.
- B_WIN: The court fully rejects all of the plaintiff's claims.
Legal case outcome prediction requires more than simple text classification. A strong system must understand the legal dispute, identify the claims of the parties, retrieve relevant factual evidence from the case record, retrieve applicable legal provisions, and reason about how the court is likely to resolve the dispute.
This task aims to encourage research on:
- Vietnamese legal judgment understanding.
- Retrieval-augmented legal reasoning.
- Agentic interaction with legal APIs.
- Evidence-grounded legal prediction.
- Vietnamese legal corpus retrieval.
- Transparent and verifiable legal AI systems.
The task setting reflects a realistic legal AI scenario: a system receives an initial case description, then must actively retrieve additional information before making a prediction.
Participants will work with two main resources.
*** Case Query Input***
Each test case includes a short query generated from a Vietnamese court judgment. The query is intended to simulate the initial information given to a legal AI agent.
Example:
{
"case_id": "0001",
"case_query": "Ông Nguyễn Khắc Vũ H1 (nguyên đơn) và Chu Quang Nguyễn H2 (bị đơn) tranh chấp hợp đồng chuyển nhượng quyền sử dụng đất đối với một phần thửa 366. Nguyên đơn yêu cầu được công nhận hợp đồng chuyển nhượng cho diện tích nêu trên. Agent cần dự đoán nguyên đơn thắng kiện hay bị đơn thắng kiện?"
}
The query does not reveal the court's reasoning, the final decision, or the winner of the case.
1. Case Content Corpus
The case content is segmented into smaller chunks and hosted by the organizers. Participants do not receive the full raw judgments directly for the test set. Instead, they must retrieve case segments through the official Case Content API.
Case segments may contain information such as:
Participants are expected to call the API to identify the most relevant case segments for each query.
2. Law Corpus
The law corpus is provided to all participating teams. Teams may build their own retrieval system over this corpus to identify relevant legal provisions.
Each legal provision may include fields such as:
{
"law_id": "001abc",
"content": [
{
"aid": "aaa",
"content_Article": "Luật này quy định về việc thành lập, tổ chức, hoạt động, kiểm soát đặc biệt, tổ chức lại, giải thể tổ chức tín dụng;... "
}
[
}
Every request must include your team's secret token in the X-API-Key header — the same token the organizers issued you for leaderboard submissions. Requests without a valid token are rejected with 403.
Here is the example of X-API-Key: alqac_xxxxxxxxxxxxxxxxxxxxxxxx
Retrieve the top-ranked evidence segment for a query within one case.
Request body (JSON)
Response 200 — Top-1 Segment
{
"results": [
{
"score": 0.886,
"text": "Người có quyền lợi nghĩa vụ liên quan: ...",
"chunk_id": "case_1087_0037_chunk_2"
}
]
}
chunk_id — the segment id to record in your submission's case_evidence.
score — BM25 relevance score (higher is more relevant).
Exactly one segment is returned per call; issue multiple queries to gather more evidence.
Examples
CURL
curl -X POST https://alqac-api.ngrok.pro/retrieve \
-H "X-API-Key: $ALQAC_TOKEN" \
-H "Content-Type: application/json" \
-d '{"query": "tranh chấp quyền sử dụng đất", "case_id": "case_1087_0037"}'
PYTHON
import requests
resp = requests.post(
"https://alqac-api.ngrok.pro/retrieve",
headers={"X-API-Key": "YOUR_TEAM_TOKEN"},
json={
"query": "tranh chấp quyền sử dụng đất",
"case_id": "case_1087_0037",
},
timeout=30,
)
resp.raise_for_status()
for hit in resp.json()["results"]:
print(hit["chunk_id"], round(hit["score"], 3), hit["text"][:120])
Rate limits & errors
The API is limited to 1 request every 5 seconds per team. Pace your requests accordingly; exceeding the limit returns 429.
How retrieval affects your score
The number of API calls per case feeds the 20% Penalized Case Recall component: case-evidence recall is multiplied by an API-efficiency factor that gives full credit up to 2·n calls and decays to zero at 5·n (where n is the number of segments in the case). Retrieve thoroughly, but economically — see the scoring rules for the full formula.
The public test input will be released as a JSON file.
Example:
[
{
"case_id": "0001",
"case_query": "Ông Nguyễn Khắc Vũ H1 (nguyên đơn) và Chu Quang Nguyễn H2 (bị đơn) tranh chấp hợp đồng chuyển nhượng quyền sử dụng đất đối với một phần thửa 366. Nguyên đơn yêu cầu được công nhận hợp đồng chuyển nhượng cho diện tích nêu trên. Agent cần dự đoán nguyên đơn thắng kiện hay bị đơn thắng kiện"
},
{
"case_id": "0002",
"case_query": "..."
}
]
Field descriptions:
| Field | Type | Description |
|---|---|---|
| `case_id` | string | Public identifier of the test case. |
| `case_query` | string | Short natural-language description of the dispute and the prediction question. |
The input file will not include the gold verdict, court reasoning, court decision, or gold evidence.
1. Submission Format
Each team must submit a single JSON file named: submission.json
The submission must contain a list of predictions, one object per test case.
Example:
[
{
"case_id": "0001",
"prediction": "A_WIN",
"law_evidence": [ { "law_id": "47/2010/QH12", "aid": 270 }, { "law_id": "47/2010/QH12", "aid": 271 }, { "law_id": "91/2015/QH13", "aid": 357 } ]
}
]
a. Required Fields
Each item in law_evidence must follow this format:
b. Prediction Labels
The prediction field must be one of the following values:
If a case contains multiple claims, teams should focus on the main claim described in the `case_query`.
2. Evaluation
The official evaluation metrics for the Legal Case Outcome Prediction task.
The final score consists of three components:
- Outcome Accuracy: whether the system correctly predicts the winning side.
- Penalized Case Evidence Recall: whether the system retrieves the correct case-content evidence, with a penalty for excessive API calls.
- Micro Law Evidence F1: whether the system retrieves the correct legal provisions from the law corpus.
The final score is defined as:
2.1 Definitions
2.2 Outcome Accuracy
This component rewards systems that correctly predict whether the plaintiff or the defendant wins the case.
2.3 Case Evidence Recall
For each case, the system submits a set of case-content evidence segments.
This component measures how many gold case evidence segments are successfully retrieved by the system.
2.4 API Efficiency Penalty
The API call budget is case-dependent. Larger cases have more segments, so they are allowed more API calls.
2.5 Penalized Case Evidence Recall
2.6 Micro Law Evidence F1
Law evidence is evaluated using micro-averaged F1 over the full test set.
2.7 Full Final Score Formula
2.8 Example
2.9 Practical Interpretation
The metric is designed to reward systems that:
a. Predict the correct outcome.
b. Retrieve the correct case evidence.
c. Retrieve the relevant legal provisions.
d. Use the Case Content API efficiently.
3. Submission Validation
A submission may be rejected or partially ignored if it violates the required format.
The organizers may validate the following conditions:
- Every test case has exactly one submitted prediction.
- Every `case_id` exists in the official test set.
- There are no duplicate `case_id`s.
- The `prediction` value is either `A_WIN` or `B_WIN`.
- `law_evidence` is a list of valid legal provision identifiers from the law corpus.
- The JSON file is valid and can be parsed automatically.
Duplicate evidence items may be automatically deduplicated before scoring.
*** Example Submission ***
[
{
"case_id": "0001",
"prediction": "A_WIN",
"law_evidence": [
"001abc",
"001aac",
]
},
{
"case_id": "0002",
"prediction": "B_WIN",
"law_evidence": [
"001abb",
"001aba",
],
}
]
1. System Requirements and Reproducibility
Participating teams are encouraged to submit a short technical report describing their method, including:
- Retrieval strategy for the Case Content API.
- Retrieval strategy for the law corpus.
- Reasoning and prediction method.
- Models and tools used.
- Prompting or agent design, if applicable.
- Post-processing and validation steps.
The organizers may request source code, configuration files, or logs for verification and reproducibility.
2. Notes for Participants
- The case query is not sufficient to solve the task reliably. Systems should retrieve additional case segments through the official API.
- The law corpus is provided separately and should be used to retrieve relevant legal provisions.
- The `explanation` field is optional and is not the principal scoring component, but it may be used for qualitative analysis.
- The official ranking is based on the final score defined above.
- Participants should ensure that their submission file strictly follows the required JSON format.
3. Model Limitations
- Proprietary Models Prohibited: The use of closed or proprietary systems, including but not limited to ChatGPT, GPT-4, Claude, Gemini, or any other non-open API-based models, is strictly prohibited.
- Model Size Limitation: To ensure fairness and accessibility, and to encourage resource-conscious approaches, only open-weight models with fewer than 10 billion parameters are allowed. This levels the playing field for teams with limited computational resources.
- External Dataset Limitations: While querying online legal databases is permitted, the use of externally annotated datasets specifically created for legal question answering or legal entailment (e.g., pre-labeled QA pairs or entailment examples) is explicitly not allowed.
- Submission Limits: To prevent system overload and encourage deliberate optimization, teams are strictly limited to a maximum of 3 submissions per day on the official leaderboard.
Any results or submissions obtained in violation of these rules will be disregarded in the final team ranking.