I worked as a Data Scientist for 3 years at DeepTek, where my primary focus was on developing computer vision based segmentation models for chest X-ray scans. Besides just training models across multiple experiments, I also owned the entire lifecycle of the models from data collection, cleaning, auditing, packaging for deployment, monitoring performance and reporting to customers.
Another major part of my work was to carry out extensive statistical validation studies of our AI models. Some of these were part of our applications to various regulatory bodies such as the US FDA, CE, HSA and Thai FDA. Of these, the published studies can be found here:
I led a team of interns in developing an internal tool to evaluate ML experiments at scale.
Tools like MLFlow, W&B and Neptune only let you monitor training runs. They don't let you visually compare what happened after training. Instead of training 10 models, fixing a threshold for each and comparing static metrics, we created an interactive tool which aggregated performance metrics across runs, visualised relevant plots such as ROC, PRC and Confusion Matrix + Probability Distributions.
We created a small API which let you add ~15-20 lines to your training script. Start training in the evening, let it run overnight, see the metrics on the dashboard tomorrow. No notebooks to run to visualize and print performance.