This subtask aims to extract agricultural management types and indicators from various representations of tabular data contained in PDF files, and convert that information into unified format.
■ Input Data
The input files are following:
Agricultural technical document: Documents in PDF and Excel formats published by each local government
Task description document: A document that describes the data to be extracted by each prefecture, the format and value of the output expected by the structured processing
■ Output Data
The outputs are unified format documents of management type and management index in JSON format.
Management types: Information that determines the crops to be cultivated and the scale, and summarizes the income and expenditures that occur. Extract the information you need from the input data, and structure it into the following four unified formats.
Primise: List of crops to be cultivated and prerequisites for simulations such as target income and labor hours
Cultivate scale: List of corps and cultivate areas
Profits and losses: Income items and outcome items that are the basis for achieving the target income set
Capital equipments: List of equipment and materials required for cultivation and depreciation costs
Management indexes: Information on the labor force, land, machinery, equipment, materials, etc., and the schedule of cultivation required for each crop. Extract the necessary information from the input data, and structure it into the following three unified formats
Profits and losses: Income items and Outcome items per cultivate area
Work technologies: Technologies, equipment and materials to cultivate the crop
Work schedule: Annual schedule of each work, time required for each work
■ Evaluations
Precision, Recall, and F1 score over management types and management indicators of unified format
TBA