The goal of the Table QA (TQA) task is to extract the answer value to a question, given a single table from the securities reports and the question.
Input: Question (question), a table from annual securities report HTML (table_id)
Output: Value (value) and cell (cell_id)
Evaluation: Accuracy
An example from questions_tqa.json:
{
"question_train2": {
"question": "大和ハウス工業の2019年の個別のShareholdersEquityにおける「自己株式の処分」を示すセルは?",
"table_id": "S100ITAZ-0000000-tab1",
"cell_id": "S100ITAZ-0000000-tab1-r1c1",
"value": "1033000000000"
},
}
The template is structured as follows.
question_id (object)
question_{data type}{sequence number}
The data type is either train, valid or test.
The serial numbers are not consecutive for each task to prevent duplication of question texts between tasks.
table_id (str)
{doc_id}-{HTML file number (7 digits)}-tab{serial number}
This is an ID assigned to each table in HTML files by the Task Organizer.
It is provided as the "table-id" attribute of the table tag in the HTML.
cell_id (str)
{TableID}-r{row number}c{column number}
This is an ID assigned to each cell in HTML files by the Task Organizer.
It is provided as the "cell-id" attribute to the th and td tags in the HTML.
value (str)
This is the raw value obtained from the XBRL report.
The unit is considered if the value is a number, so please note that it will not exactly match the string contained in the HTML cell.