Post date: May 08, 2021 10:8:28 AM
The president of New York City Health and Hospitals Corporation, the largest municipal healthcare system in the United States, personally assured me that HCC doesn't need to worry about disclosing personal patient information because it has secure sequestration protocols in place that prevent it. Google the phrase "hhc data breach" to understand why we were meeting. The data breaches at HCC continued many months, well after our meeting, before HHC's IT department got the immediate situation under some control. The longer-term problem may be still waiting to be resolved.
Like all serious issues, data security is not simple. It requires multiple layers of protection and defense in depth. The well publicized data breaches at HHC were the result of clumsiness, rather than specific, intelligent attacks. The approach to data security at HHC, as at hospitals almost everywhere, relies on sequestration (firewalls), encryption, and erasure. But these turn out to be ham-handed, insufficient, and self-defeating because hospitals intentionally share medical information in the form of
hospital performance summaries,
patient satisfaction surveys,
accreditation reports (data sharing),
medical errors reporting,
registries (trauma, cardiac, cancer, etc.),
within-institution and across-institution data sharing.
For example, HCC touts its cardiac care on its public website (right). Such information releases are useful and important, and indeed they are often required by regulatory bodies that govern hospitals. This kind of release is also in the commercial interests of individual hospitals. But such sharing risks inadvertent disclosures, and it is vulnerable to re-identification attacks. It is especially risky when multiple releases are made with transparent dates.
<<Details about how such attacks are possible and the extent and severity of the risk are redacted from this document.>>
Hospitals collect an enormous amount of sensitive data about individuals. The value of this data to medical science and public health broadly is incalculable, but surely great. It becomes clear that more subtle strategies are needed to protect private patient data and somehow allow information sharing in a way that progressively protects patients' personal information from disclosure at the individual level. Anonymization (of which de-identification is only a tiny piece) may be the key to this puzzle. The National Institutes of Health (NIH) funded a multiyear project to develop infrastructure for an information-preserving approach to privacy protection based on anonymization. See the presentation (attached to this page below) summarised as an abstract and thumbnails here.
Protecting patient privacy while releasing medical data for research
Patient health records possess a great deal of information that would be useful in medical research, but access to these data is impossible or severely limited because of the private nature of most personal health records. Anonymization strategies, to be effective, must usually go much further than simply omitting explicit identifiers because even statistics computed from groups of records can often be leveraged by hackers to re-identify individuals. Methods of balancing the informativeness of data for research with the information loss required to minimize disclosure risk are needed before these private data can be widely released to researchers who can use them to improve medical knowledge and public health. We are developing an integrated software system that provides solutions for anonymizing data based on interval generalization, controlling data utility, and performing statistical analyses and making inferences using interval statistics. For more information see https://sites.google.com/site/abprivacysbir/.