Unifying, Understanding, and Utilizing Unstructured Data in Financial Reports
Unifying, Understanding, and Utilizing Unstructured Data in Financial Reports
The U4 Task aims to extract information correctly from tables contained in financial reports (annual securities reports).
It consists of two subtasks: the Table Retrieval and the Table QA.
Given a single document, search a table that contains the data for an answer
Given a single table, extract an answer value to a question
Number of companies: 100 (those included in the TOPIX 100 index for FY2021)
Number of annual securities reports per company: 1 (those submitted for FY2021)
Total number of annual securities reports: 100 (1 document x 100 companies = 100)
Our dataset was released in July 2024. The procedure for participating is as follows.
NTCIR-18 registration http://research.nii.ac.jp/ntcir/ntcir-18/howto.html
Create a U4 account (the team name for NTCIR-18 registration is required)
Click "Registration" on the leaderboard below, enter the required information, and submit.
Data download https://github.com/nlp-for-japanese-securities-reports/ntcir18-u4
Yasutomo Kimura, Otaru University of Commerce, Japan
Eisaku Sato, Otaru University of Commerce, Japan
Kazuma Kadowaki, The Japan Research Institute, Limited, Japan
Hokuto Ototake, Fukuoka University, Japan