Conal Brown of ABB suggests using text analytics and natural language processing (Technical Language Processing) to analyse and categorise engineering inspection records (which his company has boatloads of). He describes these pieces of his puzzle:
1. Analysis and categorisation of inspection records
Use of text analytics and natural language processing techniques to understand contents of datasets
(Probably) based on clustering algorithms and weakly/unsupervised deep learning neutral network
2. Extraction of useful knowledge from inspection records
Test hypotheses, identify patterns and extract useful knowledge from the datasets (i.e. confirm the significance and relationships/correlations of the features, events and processes that the literature review identified as influencing the system integrity)
Potentially combine 'black box' neural methods with domain knowledge
3. Application of knowledge to engineering design and management contexts to support decision-making
Refine the methodology to include uncertainty quantification to show the limits/boundaries of the knowledge
Apply to industrial examples
Develop tool/methodology/framework (depends on successful research outcome)
Conal says:
The Document Abstraction Markup language idea is an interesting one. A reason I've suggested analysing ABB's database with NLP is the lack of structure in the reports. For inspection reports/asset health data there are some international standards that could be used as a starting point (https://en.wikipedia.org/wiki/ISO_15926 potentially https://en.wikipedia.org/wiki/IEC_62264). I also found a book chapter that is very relevant (Ontology-Based Knowledge Platform to Support Equipment Health in Plant Operations <https://link-springer-com.manchester.idm.oclc.org/chapter/10.1007/978-3-319-15326-1_8>) - However, the chapter only has 3 citations so hasn't led to much! There is a balance between forcing too much structure and not having enough. Perhaps there is scope for a hybrid approach where a markup language can be combined with AI/ML techniques? A markup language provides the basic structure but the AI/ML can <<Scott says: did you mean "and...can" or "but...can't"?>> identify meaning within that structure. This would be similar to what is being termed scientific machine learning, where a ML techniques are used to fit the parameters of a known physical model (e.g. ODE model) to real world data (https://sciml.ai/).
Of the other things of yours I've been reading that most resonated with my thoughts around the inspection database is that is has many of the features of 'bad data' https://sites.google.com/site/crappydata/home, resulting in uncertainty.
The project/studentship on communicating uncertainty to humans is an interesting opportunity and it is encouraging to see that engineering practice is specifically mentioned in the project overview. There is a direct correlation to inspection.
The API 581 industry standard on Risk-based inspection defines inspection effectiveness as:
The ability of the inspection activity to reduce the uncertainty in the damage state of the equipment or component. Inspection effectiveness categories are used to reduce uncertainty in the models for calculating the probability of failure.
Communication of inspection findings would be a factor in their effectiveness. <<I think it was the CEO of GE or HP or something who said "I wish I knew what we know">> I can also a see a broader range of application around work ABB is doing in Asset Health / Asset Performance Management tools. We are developing a lot of dashboards / risk indicators that reduce risk to a NxN matrix, a single dimensionless number, or worse a traffic light. Although this may be necessary to aggregate a lot of data, is it the best way to communicate the true picture of what is known and what isn't?
Scott had said:
As you know, I got excited by our original correspondence, and I have put together some loose leaf topics that seem to be related into a Google Site. See https://sites.google.com/site/davmarkup/inspection-records which is a placeholder for your project should it be allowed to flower. Use the navigation links on the left of the screen to see the larger context I see for it. It may be a bit crazy, and it is definitely “pre-decisional” as they say. I need to involve some colleagues in computer science to see whether it already exists. I am pretty sure that it is possible, but it would be a serious research effort to show that it is useful, friendly and “light-weight” enough to be widely adopted. I would certainly be interested in your thoughts.
Some notes from Conal's first supervisory meeting:
A proper view of numbers of the future must include respect for numbers of the past
Taking serious account of annotations C. Brown
Quantitatively interpreting numeric hedges and linguistic shields
Recovering their implicit uncertainties from measurement protocols
Assessing uncertainties from associated or relevant validation studies
Inferring uncertainties from naked numbers A. Shlyakhter
Advanced bias correction manuscript
Not excluding or neglecting values that are difficult to pin down C. Brown & N. Gray
Remembering original units and measurement conditions
The “not excluding” bullet refers to developing interval and other imprecise methods so we don’t have to throw away or neglect data that are missing or censored values. Conal does it with time ranges in linear regressions, and Nick does it with burn data in logistic regression.
Considerations
Data accessibility (commercial, proprietary & privacy issues; access issues; data structure issues; ethics)
Analyses (data extractions, statistical summaries, content analysis, sentiment analysis, decision analysis)
Tracking (data provenance, Statshow, etc.)
Compatibility (professional standards; industry and business practices; ISO standards)
What is the thing you are making?
What is the problem(s) you’re solving?
Who is the audience?
Software architecture
Links
https://sites.google.com/view/conalbrown/home
https://www.nist.gov/el/technical-language-processing-community-interest