§ TDSC’19 DeepChain: Auditable and Privacy-Preserving Deep Learning with Blockchain-based Incentive
Abstract: Deep learning can achieve higher accuracy than traditional machine learning algorithms in a variety of machine learning tasks. Recently, privacy-preserving deep learning has drawn tremendous attention from information security community, in which neither training data nor the training model is expected to be exposed. Federated learning is a popular learning mechanism, where multiple parties upload local gradients to a server and the server updates model parameters with the collected gradients. However, there are many security problems neglected in federated learning, for example, the participants may behave incorrectly in gradient collecting or parameter updating, and the server may be malicious as well. In this article, we present a distributed, secure, and fair deep learning framework named DeepChain to solve these problems. DeepChain provides a value-driven incentive mechanism based on Blockchain to force the participants to behave correctly. Meanwhile, DeepChain guarantees data privacy for each participant and provides auditability for the whole training process. We implement a prototype of DeepChain and conduct experiments on a real dataset for different settings, and the results show that our DeepChain is promising.
Abstract: ML-as-a-service (MLaaS) becomes increasingly popular and revolutionizes the lives of people. A natural requirement for MLaaS is, however, to provide highly accurate prediction services. To achieve this, current MLaaS systems integrate and combine multiple well-trained models in their services. Yet, in reality, there is no easy way for MLaaS providers, especially for startups, to collect sufficiently well-trained models from individual developers, due to the lack of incentives. In this article, we aim to fill this gap by building up a model marketplace, called as Golden Grain, to facilitate model sharing, which enforces the fair model-money swapping process between individual developers and MLaaS providers. Specifically, we deploy the swapping process on the blockchain, and further introduce a blockchain-empowered model benchmarking process for transparently determining the model prices according to their authentic performances, so as to motivate the faithful contributions of well-trained models. Especially, to ease the blockchain overhead for model benchmarking, our marketplace carefully offloads the heavy computation and designs a secure off-chain on-chain interaction protocol based on a trusted execution environment (TEE), for ensuring both the integrity and authenticity of benchmarking. We implement a prototype of our Golden Grain on the Ethereum blockchain, and conduct extensive experiments using standard benchmark datasets to demonstrate the practically affordable performance of our design.
Abstract: We propose a new approach for privacy-preserving and verifiable convolutional neural network (CNN) testing in a distrustful multi-stakeholder environment. The approach is aimed to enable that a CNN model developer convinces a user of the truthful CNN performance over non-public data from multiple testers, while respecting model and data privacy. To balance the security and efficiency issues, we appropriately integrate three tools with the CNN testing, including collaborative inference, homomorphic encryption (HE) and zero-knowledge succinct non-interactive argument of knowledge (zk-SNARK). We start with strategically partitioning a CNN model into a private part kept locally by the model developer, and a public part outsourced to an outside server. Then, the private part runs over the HE-protected test data sent by a tester, and transmits its outputs to the public part for accomplishing subsequent computations of the CNN testing. Second, the correctness of the above CNN testing is enforced by generating zk-SNARK based proofs, with an emphasis on optimizing proving overhead for two-dimensional (2-D) convolution operations, since the operations dominate the performance bottleneck during generating proofs. We specifically present a new quadratic matrix program (QMP)-based arithmetic circuit with a single multiplication gate for expressing 2-D convolution operations between multiple filters and inputs in a batch manner. Third, we aggregate multiple proofs with respect to a same CNN model but different testers’ test data (i.e., different statements) into one proof, and ensure that the validity of the aggregated proof implies the validity of the original multiple proofs. Lastly, our experimental results demonstrate that our QMP-based zk-SNARK performs nearly $13.9\times$ faster than the existing quadratic arithmetic program (QAP)-based zk-SNARK in proving time, and $17.6\times$ faster in Setup time, for high-dimension matrix multiplication. Besides, the limitation on handling a bounded number of multiplications of QAP-based zk-SNARK is relieved.
Abstract: Data holders, such as mobile apps, hospitals and banks, are capable of training machine learning (ML) models and enjoy many intelligence services. To benefit more individuals lacking data and models, a convenient approach is needed which enables the trained models from various sources for prediction serving, but it has yet to truly take off considering three issues: (i) incentivizing prediction truthfulness; (ii) boosting prediction accuracy; (iii) protecting model privacy.We design FedServing, a federated prediction serving framework, achieving the three issues. First, we customize an incentive mechanism based on Bayesian game theory which ensures that joining providers at a Bayesian Nash Equilibrium will provide truthful (not meaningless) predictions. Second, working jointly with the incentive mechanism, we employ truth discovery algorithms to aggregate truthful but possibly inaccurate predictions for boosting prediction accuracy. Third, providers can locally deploy their models and their predictions are securely aggregated inside TEEs. Attractively, our design supports popular prediction formats, including top-1 label, ranked labels and posterior probability. Besides, blockchain is employed as a complementary component to enforce exchange fairness. By conducting extensive experiments, we validate the expected properties of our design. We also empirically demonstrate that FedServing reduces the risk of certain membership inference attack.
Abstract: Autonomous Vehicles (AVs) take advantage of Machine Learning (ML) for yielding improved experiences of self-driving. However, large-scale collection of AVs’ data for training will inevitably result in a privacy leakage problem. Federated Learning (FL) is proposed to solve privacy leakage problems, but it is exposed to security threats such as model inversion, membership inference. Therefore, the vulnerability of the FL should be brought to the forefront when applying to
AVs. We propose a novel Byzantine-Fault-Tolerant (BFT) decentralized FL method with privacy-preservation for
AVs called BDFL. In this paper, a Peer-to-Peer (P2P) FL with BFT is built by extending the HydRand protocol. In order to protect their model, each AV uses the Publicly Verifiable Secret Sharing(PVSS) scheme, which allows anyone to verify the correctness of encrypted shares. The evaluation results on the MNIST dataset have shown that introducing decentralized FL into AV area is feasible, and the proposed BDFL is superior to other BFT-based FL method. Furthermore, the experimental results on KITTI dataset indicate the practicality of BDFL on improving performances of multi-object recognition in AV areas. Finally, the proposed PVSS-based data privacy preservation scheme is also justified its characteristic of no side-effect on models’ parameters by the experiments on the MNIST and KITTI datasets.
Abstract: The ''Right to be Forgotten" rule in machine learning (ML) practice enables some individual data to be deleted from a trained model, as pursued by recently developed machine unlearning techniques. To truly comply with the rule, a natural and necessary step is to verify if the individual data are indeed deleted after unlearning. Yet, previous parameter-space verification metrics may be easily evaded by a distrustful model trainer. Thus, Thudi et al. recently present a call to action on algorithm-level verification in USENIX Security'22. We respond to the call, by reconsidering the unlearning problem in the scenario of machine learning as a service (MLaaS), and proposing a new definition framework for Proof of Unlearning (PoUL) on algorithm level. Specifically, our PoUL definitions (i) enforce correctness properties on both the pre and post phases of unlearning, so as to prevent the state-of-the-art forging attacks; (ii) highlight proper practicality requirements of both the prover and verifier sides with minimal invasiveness to the off-the-shelf service pipeline and computational workloads. Under the definition framework, we subsequently present a trusted hardware-empowered instantiation using SGX enclave, by logically incorporating an authentication layer for tracing the data lineage with a proving layer for supporting the audit of learning. We customize authenticated data structures to support large out-of-enclave storage with simple operation logic, and meanwhile, enable proving complex unlearning logic with affordable memory footprints in the enclave. We finally validate the feasibility of the proposed instantiation with a proof-of-concept implementation and multi-dimensional performance evaluation.