Topic: AI-Assisted Machinery Functional Safety Risk Assessment
Over the past years, I have carried out independent research on applying modern AI techniques, particularly Large Language Models (LLMs), Natural Language Processing (NLP) and Natural Language Understanding (NLU) to machinery functional safety risk assessment.
The research traces its origins to early explorations in 2021-2022, where my colleague and former CEO at innotec GmbH-Mr. Kieviet introduced me to this topic of employing AI for conducting a PLr estimation. It started with exploration of neural models and chatbot-style interfaces which were evaluated for assisting machinery safety engineers in determining the required Performance Level (PLr).
Since then, I have spent considerable (free) time to carry out rigorous research work and it has progressed into a systematic scientific body of work covering dataset construction, model evaluation, prompt engineering strategies, scenario-based benchmarks, and reproducible methodology design.
With my colleague and current CEO Mr. Claudio Gregorio at innotec-GmbH-TÜV Austria Group, I try to push research and development forward on this and related topics (AI+CRA, cybersecurity workflows and risk assessment etc-more on this later!!).
I am open to collaboration, joint R&D projects, dataset or tool development, and research partnerships that build upon this foundation. Please feel free to get in touch, if you find this work interesting or would like to explore collaboration opportunities (or just drop a comment :-)).
This page summarises my main contributions and key publications from 2022 to 2025 on this topic.
The initial phase focused on creating interactive and automated assistants for machinery designers. These prototypes explored whether machine-learning models and AI-based reasoning could support structured risk assessment tasks.
Key publications from this foundational period include:
AI-Based Assistant for Determining the Required Performance Level for a Safety Function, IECON 2022
Padma Iyenghar, Yuxia Hu, Michael Kieviet, Elke Pulvermüller, Juergen Wuebbelmann
(Introduced one of the first chatbot-style PLr assistants for ISO 13849)
A Chatbot Assistant for Reducing Risk in Machinery Design, INDIN 2023
Padma Iyenghar et al.
(Explored conversational risk assessment workflows)
Experimentation on NN Models for Hazard Identification in Machinery Functional Safety, INDIN 2023
Padma Iyenghar et al.
(Evaluated neural network models for hazard identification)
These works established the technical direction and demonstrated the feasibility of AI-assisted safety reasoning.
A major step forward was the creation of a structured dataset for machinery safety scenarios aligned with ISO 12100 Annex B. This dataset supports reproducible PLr classification experiments using deterministic prompting and fixed templates.
GitHub link: https://github.com/piyenghar/hazardscenariosISO12100AnnexB
The dataset is now used in multiple benchmark studies and has become a foundation for systematic evaluation of LLM behavior in functional safety tasks.
Padma Iyenghar
(Introduced the dataset formally and demonstrated its application in LLM benchmarking)
The research evolved into a detailed investigation of LLM capabilities for PLr determination and machinery safety risk assessment. This includes deterministic prompting (temperature=0), rule-based evaluation, exact/partial/no-match scoring, and multi-model benchmarking (GPT-4+, GPT-5.1-mini, Gemini models, DeepSeek, LLaMA variants).
Major publications in this direction include:
Clever Hans in the Loop? A Critical Examination of ChatGPT in a Human-In-The-Loop Framework for Machinery Functional Safety Risk Analysis, MDPI Eng Journal, 2025
Analyses whether LLMs exhibit “Clever Hans” behaviour, producing correct answers for the wrong internal reasons, and evaluates how ChatGPT behaves when a human remains in the loop during ISO 12100 / ISO 13849 risk assessments. The study identifies reliability risks, behavioural shortcuts, and the boundaries of trust in human–LLM collaboration for safety-critical tasks.
Empirical Evaluation of Reasoning LLMs in Machinery Functional Safety Risk Assessment and the Limits of Anthropomorphized Reasoning, MDPI Electronics 2025
(In-depth benchmarking study on using state of the art reasoning LLMs for risk assessment, detailed study published as a feature paper)
(Large-scale benchmarking study with statistical evaluation)
Implementation of an AI-Based Expert System for Functional Safety of Machinery, Risk Analysis Journal (2025)
(Discusses prototype development of an expert system for AI-supported risk assessment without data leaving your organization)
This research introduces structured datasets for PLr classification, establishes reproducible benchmarking methods, develops deterministic prompting strategies, and provides scenario-level evaluation techniques for analysing LLM behaviour in machinery functional safety. It benchmarks multiple advanced AI models, examines chain-of-thought reasoning and OT security assessment, and contributes to the emergence of AI-assisted functional safety engineering as a recognised research field.
It represents one of the first systematic and reproducible examinations of how LLMs perform on machinery functional safety tasks. The work demonstrates how LLMs can support safety-critical workflows such as PLr classification, structured reasoning, and dataset-driven evaluation, while also identifying the limitations, constraints, and necessary human-in-the-loop safeguards that must be considered when applying these models in real engineering contexts.
This research direction is ongoing, with future work focusing on model reliability, deterministic reasoning, dataset expansion, and deeper integration of LLMs into early-stage safety engineering processes.
Opportunities to expand dataset-driven safety analysis across broader machinery domains and new ISO/IEC standards
Development of AI-assisted tools for hazard identification, PLr estimation, or generating safety documentation
Exploration of hybrid human–AI workflows to enhance consistency and reduce effort in early design stages
Joint benchmarking of next-generation LLMs for structured safety and risk-reasoning tasks
Integration of LLM-based support tools into industrial workflows, digital engineering toolchains, and educational environments
Collaboration on open datasets, reproducible pipelines, and evaluation frameworks for functional safety
Thank you for reading and for taking the time to explore this work.