Data Formatting

Data Formatting

The training data is now available on Github. The train.json file provides the Premise-Statement-Evidence information and the CT json folder contains the full set of complete CTRs in individual json files.

Train.json 

The test.json has the same format with the omission of the "Label", "Primary_evidence_index"  and "Secondary_evidence_index" entries. See below for "Comparison" and "Single" examples.

Comparison Example

    },

    "dbed5471-c2fc-45b5-b26f-430c9fa37a37": {

        "Type": "Comparison",

        "Section_id": "Adverse Events",

        "Primary_id": "NCT00093145",

        "Secondary_id": "NCT00703326",

        "Statement": "Heart-related adverse events were recorded in both the primary trial and       the secondary trial.",

        "Label": "Entailment",

        "Primary_evidence_index": [

            0,

            3

        ],

        "Secondary_evidence_index": [

            0,

            7,

            8,

            9,

            10

        ]

    }

 Single Example

 },

    "83b83400-1439-462d-bba3-42817b5b1fa1": {

        "Type": "Single",

        "Section_id": "Adverse Events",

        "Primary_id": "NCT00777049",

        "Statement": "Most of the cases of CHF in the primary trial, were in cohort 1.",

        "Label": "Entailment",

        "Primary_evidence_index": [

            0,

            6,

            14,

            20

        ]

    }

CT Json

Each CTR is contained in an individual file named by its ID e.g. NCT00003199.json. This contains 5 entries, "Clinical Trial ID", "Eligibility criteria", "Intervention", "Results", and "Adverse events". In each section entry, there is a list of sentences from the CTR. When retrieving evidence, the indexes correspond to the list under the relevant section. example edited for simplicity*

{

    "Clinical Trial ID": "NCT00003199",

    "Intervention": [

        "INTERVENTION 1: ",

        "  TX/Maintenance Therapy for Stage IIIB/IV Breast Cancer",

        "  See Detailed Description.",

        "  tamoxifen citrate: Given orally",

        "  busulfan: Given orally",

        "  thiotepa: Given IV",

        "  melphalan: Given IV",

        "  aldesleukin: Given SC",

        "  sargramostim: Given SC",

        "  peripheral blood stem cell transplantation: Undergo autologous peripheral blood stem cell infusion",

        "  radiation therapy: May undergo radiotherapy after completion of IL-2/GM-CSF"

    ],

    "Eligibility": [

        "Inclusion Criteria:",

        "  Hepatic function: Bilirubin =< 2 mg%; SGOT or SGPT =< 2.5 x institutional normal",

        "  Renal function: Creatinine =< 2.0 mg/dl or a creatinine clearance >= 50 mg/min",

        "  Pre-Study tests have been performed as outlined in the Study Calendar",

        "  Patients will begin IL-2/GM-CSF therapy if they meet the following criteria post transplant:",

        "  Can start therapy 30 to 100 days after transplant",

        "  Karnofsky performance status > 60",

        "  Total bilirubin =< 2.5 x upper limit of normal",

        "  SGOT =< 2.5 x upper limit of normal",

        "  Creatinine =< 2.0 mg/dl",

        "Exclusion Criteria:",

        "  Patients with a Karnofsky Performance Score less than 70",

        "  Patient is pregnant",

        "  Patient is seropositive for the human immunodeficiency virus",

        "  Patients with a history of seizures",

        "  Patients with hypersensitivity to E.coli preparations",

        "  Patients with active auto-immune disease",

        "  Patients with a history of CNS lesion (brain or carcinoid meningitis)",

        "  Patients with significant active infection precluding transplant",

        "  Patients who have had CD34+ selection of their PBSC products",

        "  Patients will not receive IL-2/GM-CSF therapy if they:",

        "  Are > 100 days from transplant",

        "  Have documented disease progression after transplant",

        "  Have an active infection",

        "  Currently have pericardial effusions, pleural effusions or ascites",

        "  Are on steroids",

        "  Currently have a Grade 3 toxicity from BuMelTT",

        "  If the patient does not wish to receive the therapy"

    ],

    "Results": [

        "Outcome Measurement: ",

        "  Event-free Survival",

        "  Event-free survival of patients treated for inflammatory (Stage IIIb) and responsive stage IV breast cancer with BUMELTT and PBSC support and 

      low dose immunotherapy with IL2 and GM-CSF.",

        "  Time frame: 11 years",

        "Results 1: ",

        "  Arm/Group Title: TX/Maintenance Therapy for Stage IIIB/IV Breast Cancer",

        "  Arm/Group Description: See Detailed Description.",

        "  tamoxifen citrate: Given orally",

        "  busulfan: Given orally",

        "  thiotepa: Given IV",

        "  melphalan: Given IV",

        "  aldesleukin: Given SC",

        "  sargramostim: Given SC",

        "  radiation therapy: May undergo radiotherapy after completion of IL-2/GM-CSF",

        "  Overall Number of Participants Analyzed: 50",

        "  Measure Type: Count of Participants",

        "  Unit of Measure: Participants  Stage IIIB Disease: 18 participants",

        "  11  61.1%",

        "  Stage IV Disease: 32 participants",

        "9  28.1%"

    ],

    "Adverse Events": [

        "Adverse Events 1:",

        "  Total: 2/50 (4.00%)",

        "  Prolonged hospitalization for post-transplant complications [1]1/50 (2.00%)",

        "  Pulmonary Emboli [2]1/50 (2.00%)"

    ]

}