Project names extracted from text files in the following ways:
1. Using NLP tool:
2. Reading the words between the following keywords:
3. Get the words that starts with capital letters and comes with the word 'project'
4. If nothing, just get the sentence parts with the word 'project'
Ran the code on 33.299 files and the project names are extracted. The process took 1.5 hours and created 7,561 KB file to save project names.
Files without more than 4 characters in them, marked as empty.
Duration of process: Start: 17/01/2018 11:17:28 AM, End: 17/01/2018 12:55:59 PM
Project names for the Files in the Data Folder
[
{
"FileID": 0,
"FileName": "a070455_partial surrender report e59-907_9771170.json",
"Empty": false,
"Projects": [
"Windimurrra Project",
"Windimurra Project"
],
"ProjectMentions": []
},
{
"FileID": 1,
"FileName": "a070682_a70682_9855451.json",
"Empty": false,
"Projects": [],
"ProjectMentions": [
"LIST OF FIGURES Project"
]
},
{
"FileID": 2,
"FileName": "a071009_plac04081_report_v1.0_doir_12722043.json",
"Empty": false,
"Projects": [],
"ProjectMentions": [
"David Gibbons Project"
]
},
{
"FileID": 3,
"FileName": "a071053_e16_263partialsurrender_17405691.json",
"Empty": false,
"Projects": [],
"ProjectMentions": [
"Callion Project"
]
},
{
"FileID": 4,
"FileName": "a071095_a71095_14949032.json",
"Empty": true,
"Projects": null,
"ProjectMentions": null
},
{
"FileID": 5,
"FileName": "a071096_a71096_14966153.json",
"Empty": true,
"Projects": null,
"ProjectMentions": null
},
{
"FileID": 6,
"FileName": "a071105_a71105_15072417.json",
"Empty": false,
"Projects": [],
"ProjectMentions": [
"Find project",
"LENNON FIND PROJECT",
"Jabiru Metals Ltd Lennons Find Project",
"Section Lennons Find Project"
]
},
...
{
"FileID": 9,
"FileName": "a071565_pan_2005a_18059306.json",
"Empty": false,
"Projects": [
"Panorama CopperZinc Project",
"Panorama Project",
"CBH-owned Sulphur Springs Development Project",
"Sipas Panorama Exploration Project"
],
"ProjectMentions": []
},
{
"FileID": 10,
"FileName": "a071575_e51_1077_2005s_15695127.json",
"Empty": false,
"Projects": [
"Gabanintha"
],
"ProjectMentions": []
},
...
{
"FileID": 33821,
"FileName": "a97631_a097631_c139_2012_2012an_15812058.json",
"Empty": false,
"Projects": [
"West Musgrave Project"
],
"ProjectMentions": []
},
{
"FileID": 33822,
"FileName": "a97631_a97631_appendixb.json",
"Empty": false,
"Projects": [],
"ProjectMentions": [
"West Musgraves Project",
"E S L T D West Musgraves Project"
]
}
]