Client Type: Early stage health tech startup
Domain: Malaria / Infectious Disease · Sub-Saharan Africa
Tools: Python · pandas · SQL · Power BI · Relational Data Modelling
Project Type: Concept Project · Mock Portfolio
Date: March 2026
VitaLoop Health had been running for 18 months. Forty Community Health Agents were moving through households across three regions, conducting malaria screenings, distributing medications, and referring patients to clinics. The operation was real. The work was real. But when leadership sat down with prospective Series A investors and was asked how the programme was performing, they could not answer with confidence.
The data existed. It just could not be trusted.
Agent visit logs lived in a Google Sheet that had grown messy over time. Stock records were tracked separately in regional Excel files that nobody reconciled against each other. Referral outcomes were supposed to be captured in a shared tracker, but the column was mostly empty. Three sources. No connections. No single version of the truth.
One comment from the ops lead almost killed the funding conversation: "We are still figuring out our data infrastructure."
That was the moment VitaLoop needed a different kind of help.
The mandate was straightforward, even if the work was not. Audit the data. Fix what was broken. Build something that leadership could actually use to make decisions and show to investors. The deliverables were three: a clean relational data model, an operational dashboard, and an executive insight report.
The underlying question driving all of it was simpler still: what is actually happening in the field, and where do the problems live?
The first thing I did was look at the raw data without cleaning anything. Before building a system, I needed to understand exactly what had gone wrong and why.
What I found across the three source files was a catalogue of quiet failures. There were 2,465 duplicate visit rows, inflating agent performance numbers by nearly 5 percent. Over 4,000 rows had no agent code attached, meaning almost one in ten field visits could not be traced back to a specific person. The same three regions were written fifteen different ways across the files — Northern, northern, North, NORTHERN, Northern Region — which meant any attempt to join the data would silently return wrong answers. In over 1,800 rows, a test result had been recorded as positive even though the field for whether the test was conducted at all was marked false. Twenty-five stock entries showed negative closing inventory, a physical impossibility that would have generated false stockout alerts. And 57.7 percent of all patient referrals had no outcome recorded whatsoever.
None of these were catastrophic in isolation. Together, they meant that every number VitaLoop had been looking at was, to some degree, wrong.
The audit finding I kept coming back to was the referral outcome gap. Nearly 1,900 patients had been referred to health facilities. For most of them, there was no record of what happened next. Whether they attended. Whether they were treated. Whether they ever arrived. That is not a data quality problem. That is a visibility problem with direct implications for whether the programme is actually working.
Once the audit was complete, I built the data model from scratch.
The schema I designed has five tables: regions, agents, agent visits, stock inventory, and referrals. Every table connects through clean foreign keys, with regions as the central hub. The design decision I am most deliberate about is the referral outcome field. It is intentionally nullable, meaning a blank is not the same as unknown. A blank means there is no record. That distinction matters when you are trying to measure a programme's reach into the community, and it is the kind of thing that gets lost when you default missing data to a catch-all label.
With the schema in place, I built a Python pipeline to simulate the kind of messy data VitaLoop would realistically have, apply the cleaning logic, and export five structured tables ready for analysis. The pipeline is fully documented, with every transformation decision traceable.
The Power BI dashboard was built to answer five questions that any investor or programme manager would reasonably ask. Which agents are underperforming? Where is malaria burden highest, and how is it trending? Which regions are approaching stockout? Where are patient referrals falling through? And what does it cost to confirm a single malaria case in each region?
Each panel on the dashboard was designed with a specific decision in mind, not just a metric to display.
The most striking finding was not the one I expected.
Southern Region has the highest malaria burden in the network. Its RDT confirmation rate peaked at 53 percent in June 2024, rising consistently over the 18-month period. It also has the lowest cost per confirmed case in the network at $6.91, compared to $7.67 in Northern and $8.80 in Eastern. By every operational measure, Southern is the most effective region VitaLoop runs. It is also the most critically undersupplied.
At the time of analysis, Southern Region's RDT kit and ACT course inventory had dropped below reorder threshold. The region carrying the highest disease burden, delivering the most efficient results, was days away from running out of the supplies needed to do its job.
That is the kind of finding that does not show up in a spreadsheet. It only becomes visible when agent activity, stock data, and regional burden metrics are connected in a single system and examined together.
The second finding that stands out is the cost gap between Eastern and Southern. Eastern Region spends 27 percent more per confirmed malaria case than Southern, driven by a combination of lower confirmation rates and higher supply consumption per positive result. Resources are not flowing to where they produce the most impact. A burden-weighted allocation model would close that gap within two resupply cycles.
And then there is the referral outcome gap. Among the referrals that do have outcomes recorded, the results are encouraging: the majority of patients attended or were treated. The programme works when patients reach a facility. The problem is that for 57.7 percent of referrals, there is no feedback loop at all. No confirmation. No follow-up. No way to know.
The system VitaLoop came away with is not just a dashboard. It is a foundation.
The relational schema is designed to grow. As new agents are added, new regions brought online, and new data sources connected, the model supports it without restructuring. The Power BI dashboard updates automatically as new data flows in, giving leadership a live view of field operations rather than a monthly retrospective. The insight report translates the technical findings into plain language that works equally well in a board meeting and a funding pitch.
For the Series A conversation, VitaLoop can now walk an investor through what is happening in each region, how efficiently resources are being used, where the programme has gaps, and precisely what actions would close them. That is a different conversation than the one that nearly ended the deal.
Data existed before this engagement. What was missing was the structure to make it say something.
Data audit across 3 source files with 51,974 raw records
Relational schema design in SQL, 5 tables and 6 relationships
Python data simulation and cleaning pipeline using pandas
Power BI dashboard with 6 DAX measures and 7 interactive panels
Executive insight report grounded in real operational findings
— GET IN TOUCH
Whether you have a specific data challenge or just want to explore how I can help, I'd love to hear from you.
Currently available for freelance engagements and consulting projects. Response time is typically within 24–48 hours.
© 2026 Anthony Okeibuno. Health Data Systems Strategist.