generAtion of Reliable syntheTIc health data for Federated leArning in seCure daTa Spaces
The main goal of the multidisciplinary and interdisciplinary ARTIFACTS project is to advance and scale up data-driven healthcare by developing novel methods for privacy-preserving utilization in collaborative data hubs, conforming secure data spaces. In particular, it will be applied and tested in the domain of lung cancer, patients’ stage III.
Standardization of multi-modal data to 1-D or 2-D objects for further synthetic health data generation. We pave the way to demonstrate that it is possible to use GANs to generate synthetic health data that are statistically indistinguishable from the original data.
Determining under which conditions statistical validation is possible.
Extending previous results to more complex multi-modal data, including heterogeneous data from electronic health records.
Extending previous results to other newest generative algorithms, especially those used in LLMs.
Federated learning is currently working with similar functions’ architecture, but weights are obtained differently for each institution, then shared and incrementally integrated. We propose:
Research in federate learning at the level of data. We claim that machines trained on local real-world data plus outer synthetic health data (“glocal” models) will obtain better results than those only trained with local information.
Proposal of several federated learning structures combining local real-world data, synthetic health data from external partners and state-of-the-art federated learning algorithms based on models and/or outputs.
Local data is stored in local secured data lakes and shared data is only synthetic health data. A next level of security is added by managing synthetic data through secure data spaces.