Table Retrieval

Task Overview

The goal of the Table Retrieval (TR) task is to search a table that contains the data for an answer, given a single document (annual securities report) and the question.

Task Definition

Input: Question (question), annual security report HTML (doc_id)
Output: Table (table_id)
Evaluation: Accuracy

Dataset

An example from questions_tr.json:

{

"question_train1": {

"question": "大和ハウス工業の2019年の個別のShareholdersEquityにおける「自己株式の処分」を含む表は？",

"doc_id": "S100ITAZ",

"table_id": "S100ITAZ-0000000-tab1"

}

The template is structured as follows.

question_id (object)
- question_{data type}{sequence number}
- The data type is either train, valid or test.
- The sequence number is not continuous for each task to prevent duplication of question sentences between tasks.
question: Question sentence (str)
- This is the question sentence used as the query.
- All questions include the company name, period, and item name (search target).
- Some questions include the individual/consolidated element and member element when available in the XBRL.
doc_id (str)
- S100{4 capital letters}
- This is the document number assigned by EDINET.
table_id (str)
- {doc_id}-{HTML file number (7 digits)}-tab{consecutive number}
- This is an ID assigned to the HTML annual securities report by the Task Organizer.
- It is provided as the "table-id" attribute of the table tag in the HTML.

Page updated

Google Sites

Report abuse