NSF REU Summer 2021
Across science and engineering disciplines, it is becoming increasingly common to make use of large datasets, which make it possible for researchers to produce profound new discoveries and contributions. However, most of these scientists and engineers do not specialize in data science. As a result, they may experience technical barriers when trying to make discoveries about their data. DVf is a domain-specific functional programming language designed with general engineers and scientists in mind, offering a set of declarative language-based facilities that address this problem. In working with data sets and machine learning models, DVf must consider the way it handles potentially sensitive information. As a foundational design principle, the functional DVf programming language uses a state-of-the-art scientific workflow framework. Research efforts to improve common security issues under scientific workflow, such as provenance access control policies, should also be extended to the DVf infrastructure. Furthermore, a user of DVf may use their domain knowledge to determine which features would result in the most successful model, which essentially characterizes the related DVf program as user-associated intellectual property. The exposition of such information to unauthorized people would result in a violation of their intellectual property rights. Finally, some adversaries may attempt to reverse engineer a model to learn about the data set used to train the model. The aim of this project is to investigate and enhance the security and confidentiality of information within this programming infrastructure with the goal of protecting personal data and intellectual property.