Sample Code

ClinicalBERT Named Entity Recognition (NER) Module

(Free, optional Python utility for extracting medical terms from text claim documents)

TA MedMal is an analysis system designed to interpret, internalise, and understand medical malpractice documents. It includes its own evolving medicolegal vocabulary, which grows through a learn‑by‑doing approach and relies on its internal entity‑detection routines to expand over time.

In addition to its own tools, TA MedMal can make use of publicly available expert resources. One such resource is ClinicalBERT, a biomedical language model developed by Google. Because this is an external add‑on rather than a built‑in component of TA MedMal, the Python code used to run it is provided below for demonstration and experimentation only.

A future version of TA MedMal may offer a free trial that can optionally make use of this output. This is not currently available.

How the Extracted Terms Are Used

The module highlights candidate medical terms found in claim documents. In the TA MedMal model these candidates are intended for human review, not automatic acceptance.

In other words:

The module suggests terms it finds in the text
An active curation decision is made to decide which of those terms are meaningful
Only the terms that are approved are incorporated into the structured data

This ensures accuracy and prevents irrelevant or misleading items from being introduced. There is no automatic classification or behind‑the‑scenes decision‑making.

Downloadable ClinicalBERT NER Module

All downloadable files are provided in plain‑text form. You can open and inspect everything before running it. There are no executables, installers, or compiled components — just simple, transparent Python code.

Expected Input Format

Your CSV file must contain at least:

ID — unique identifier for each claim
Text — the free‑text content to analyse

Example input file path: c:/data/TAM/Input/TA_Master.csv

Output Format

The module produces a CSV file listing:

The claim ID
The term found
The type of term (e.g., condition, anatomy, procedure)
Character offsets
Confidence score
Document length

This output is designed for manual review, so you can decide which terms are relevant.

Installation & Setup

Install Python 3.10+
Install dependencies:

Code

pip install transformers pandas torch

Place your input CSV in the expected folder
Run the script from the command line

Important Notes

This module is optional and experimental.
It is provided as-is, with no support, no warranty, and no guarantee of future compatibility.
A future free trial may support importing this output, but this is not currently available.
Extracted terms are suggestions only and should be reviewed by a human before being used.
All downloadable files are plain‑text, fully inspectable and safe to audit

Best‑Endeavours Disclaimer

This module is provided free of charge, without warranty or support. It is intended for experimentation and evaluation only. Use is entirely at your own discretion and risk.

Example Code (included for download)

python

import sys

sys.stdout.reconfigure(encoding='utf-8')

from transformers import pipeline

import pandas as pd

pipe = pipeline(

task="token-classification",

model="d4data/biomedical-ner-all",

aggregation_strategy="simple"

)

df = pd.read_csv("c:/data/TAM/Input/TA_Master.csv", encoding="cp1252")

rows = []

for idx, row in df.iterrows():

text = row["Text"]

doc_char_length = len(text)

print(f"Processing row {idx+1} of {len(df)} (ID: {row['ID']})")

preds = pipe(text)

for p in preds:

rows.append({

"ID": row["ID"],

"label": p["entity_group"],

"span": p["word"],

"start": p["start"],

"end": p["end"],

"confidence": p["score"],

"doc_char_length": doc_char_length

})

out = pd.DataFrame(rows)

out.to_csv("c:/data/TAM/BertOutput/bert_full_output.csv", index=False)

print("Done!")

Page updated

Google Sites

Report abuse