ClinicalBERT Named Entity Recognition (NER) Module
(Free, optional Python utility for extracting medical terms from text claim documents)
TA MedMal is an analysis system designed to interpret, internalise, and understand medical malpractice documents. It includes its own evolving medicolegal vocabulary, which grows through a learn‑by‑doing approach and relies on its internal entity‑detection routines to expand over time.
In addition to its own tools, TA MedMal can make use of publicly available expert resources. One such resource is ClinicalBERT, a biomedical language model developed by Google. Because this is an external add‑on rather than a built‑in component of TA MedMal, the Python code used to run it is provided below for demonstration and experimentation only.
A future version of TA MedMal may offer a free trial that can optionally make use of this output. This is not currently available.
How the Extracted Terms Are Used
The module highlights candidate medical terms found in claim documents. In the TA MedMal model these candidates are intended for human review, not automatic acceptance.
In other words:
The module suggests terms it finds in the text
An active curation decision is made to decide which of those terms are meaningful
Only the terms that are approved are incorporated into the structured data
This ensures accuracy and prevents irrelevant or misleading items from being introduced. There is no automatic classification or behind‑the‑scenes decision‑making.
Downloadable ClinicalBERT NER Module
All downloadable files are provided in plain‑text form. You can open and inspect everything before running it. There are no executables, installers, or compiled components — just simple, transparent Python code.
Expected Input Format
Your CSV file must contain at least:
ID — unique identifier for each claim
Text — the free‑text content to analyse
Example input file path: c:/data/TAM/Input/TA_Master.csv
Output Format
The module produces a CSV file listing:
The claim ID
The term found
The type of term (e.g., condition, anatomy, procedure)
Character offsets
Confidence score
Document length
This output is designed for manual review, so you can decide which terms are relevant.
Installation & Setup
Install Python 3.10+
Install dependencies:
Code
pip install transformers pandas torch
Place your input CSV in the expected folder
Run the script from the command line
Important Notes
This module is optional and experimental.
It is provided as-is, with no support, no warranty, and no guarantee of future compatibility.
A future free trial may support importing this output, but this is not currently available.
Extracted terms are suggestions only and should be reviewed by a human before being used.
All downloadable files are plain‑text, fully inspectable and safe to audit
Best‑Endeavours Disclaimer
This module is provided free of charge, without warranty or support. It is intended for experimentation and evaluation only. Use is entirely at your own discretion and risk.
Example Code (included for download)
python
import sys
sys.stdout.reconfigure(encoding='utf-8')
from transformers import pipeline
import pandas as pd
pipe = pipeline(
task="token-classification",
model="d4data/biomedical-ner-all",
aggregation_strategy="simple"
)
df = pd.read_csv("c:/data/TAM/Input/TA_Master.csv", encoding="cp1252")
rows = []
for idx, row in df.iterrows():
text = row["Text"]
doc_char_length = len(text)
print(f"Processing row {idx+1} of {len(df)} (ID: {row['ID']})")
preds = pipe(text)
for p in preds:
rows.append({
"ID": row["ID"],
"label": p["entity_group"],
"span": p["word"],
"start": p["start"],
"end": p["end"],
"confidence": p["score"],
"doc_char_length": doc_char_length
})
out = pd.DataFrame(rows)
out.to_csv("c:/data/TAM/BertOutput/bert_full_output.csv", index=False)
print("Done!")