The goal of the Table Retrieval (TR) task is to search a table that contains the data for an answer, given a single document (annual securities report) and the question.
Input: Question (question), annual security report HTML (doc_id)
Output: Table (table_id)
Evaluation: Accuracy
An example from questions_tr.json:
{
"question_train1": {
"question": "大和ハウス工業の2019年の個別のShareholdersEquityにおける「自己株式の処分」を含む表は?",
"doc_id": "S100ITAZ",
"table_id": "S100ITAZ-0000000-tab1"
},
}
The template is structured as follows.
question_id (object)
question_{data type}{sequence number}
The data type is either train, valid or test.
The sequence number is not continuous for each task to prevent duplication of question sentences between tasks.
question: Question sentence (str)
This is the question sentence used as the query.
All questions include the company name, period, and item name (search target).
Some questions include the individual/consolidated element and member element when available in the XBRL.
doc_id (str)
S100{4 capital letters}
This is the document number assigned by EDINET.
table_id (str)
{doc_id}-{HTML file number (7 digits)}-tab{consecutive number}
This is an ID assigned to the HTML annual securities report by the Task Organizer.
It is provided as the "table-id" attribute of the table tag in the HTML.