Data Intensive Analysis and Computing (DIAC) Lab, kno.e.sis Center
1. Introduction: Graph datasets have invaluable use in business
applications and scientific research. Because of the growing size
and dynamically changing nature of graphs, graph data owners
may want to use public cloud infrastructures to store, process,
and perform graph analytics. However, when outsourcing data
and computation, data owners are at burden to develop methods
to preserve data privacy and data ownership from curious cloud
providers. This demonstration exhibits a prototype system for
privacy-preserving spectral analysis framework for large graphs
in public clouds (PrivateGraph) that allows data owners to collect
graph data from data contributors, and store and conduct secure
graph spectral analysis in the cloud with preserved privacy
and ownership. This demo system lets its audience interactively
learn the major cloud-client interaction protocols: the privacy-
preserving data submission, the secure Lanczos and Nystrom
approximate eigen-decomposition algorithms that work over
encrypted data, and the outcome of an important application of
spectral analysis - spectral clustering. The demo users will understand the intrinsic relationship
amongst costs, result quality, privacy, and scalability of the proposed framework. 2. Key Techniques: Additive Homomorphic Encryption (AHE) Scheme to build pseudo-homomorphic Matrix and Vector Operations with one of the operands unencrypted. E( ) represents a homomorphic encryption function below.
LWE based randomization to protect the unencrypted operand in Lanczos. If b is an unencrypted operand vector, the perturbed vector, b’, is given as below, where, matrix A is known to adversaries; secret vector s and noise e are randomly generated vectors, and q is a large prime.
Multiplicative Matrix Noise to protect the unencrypted operand in Nystrom. If V is an unencrypted matrix, the perturbed matrix, V’, is given as below, where matrix R is a uniform random invertible matrix, and D is uniform random additive noise matrix; q is a large prime:
Differential Privacy based sparse data encoding! Incorporate differential privacy to construct sparse submission mechanism which considers node-degree statistics functions. We add fake edges to rows of sparse matrices and preserve ε- differential privacy based on Laplace mechanism.2. Framework Architecture: Figure 1 shows the PrivateGraph framework and the involved three parties: the public cloud (server), the data owner (client), and the data contributors. The data owner periodically runs graph algorithms on the evolving graph datasets to generate models. It also generates the public-private key pair and distributes the public key to the data contributors. Data contributors upload encrypted graph data (i.e., edges) to the cloud through secure interfaces. Alternatively, the data owner can also directly upload to the cloud any in-house graph datasets collected via other channels. The cloud and data owner collaboratively run the secure eigen-approximation algorithms over encrypted dataset residing in the cloud without compromising privacy. The expensive operations O(N2) are
carried out by the cloud while the lighter computations O(N)
by the client. Threat Model: Data contributors’ privacy, data-ownership,
and privacy of data in between computations are the assets at
stake. PrivateGraph preserves these assets from “honest
but curious” cloud without compromising the utility of graph
datasets. A. Privacy Preserving Data SubmissionFig. 2: The graph data is represented as normalized Laplacian matrix. The entries are converted to precision preserving integers and then encrypted before submission to the cloud. We enable data owner collect subscribing users' data over at cloud in a confidential manner. Graph data is represented as normalized graph Laplacian matrix that can be used for special clustering. In order to use the Additive Homomorphic Encryption schemes for encryption the message
space is converted to non-negative integers. For matrices in the
real value domain, we apply a reversible affine transformation
which preserves
d decimal places:Each data contributor represents one or a few nodes in the graph, and contributes to the corresponding row(s) of this graph matrix. Encoding and encrypting the entire user vector in a dense format is straightforward, i.e. convert each element of the submission row to precision preserving integers and then encrypt with the AHE scheme. However, Most graphs in popular applications are sparse. It is desirable to skip encrypting some of the zero entries to save storage, communication, and computation costs, while still providing sufficient protection to privacy. We design techniques and algorithm that protects the privacy of node degrees and links using differential privacy while preserving data sparsity and authenticity.
1. Keep the non-zero elements 2. Add fake edges to hide node degree and real edges - randomly picked zero edges are encrypted before submission 3. Satisfy bin-based differential privacy and preserve data authenticity (not implemented at this point) Note: Submissions are sparse in this version of our implementation - encrypting only the present edges.
Fig 3: Lanczos Method. Cloud computes in parallel expensive pseudo-homomorphic matrix-vector multiplication E(Ab, where _{i}’)bis a perturbed vector and _{i}' A is the encrypted graph matrix; Data Owner generates perturbed vectors b’ decrypts intermediate results, and recovers vector _{i},b_{i}_{ }. The interactions happen for a number of iterations.Fig 4: Nystrom Method. Cloud sends to data owner encrypted subsampled matrix W. Data owner decrypts matrix W and computes small matrix V which it perturbs with a random matrix R and send the result to cloud. Cloud computes in parallel pseudo-homomorphic matrix-matrix multiplication E(CVR), where C is the sub matrix of A containing the corresponding sampled columns. Data Owner decrypts and recover CV from CVR. The interactions are one round. B. Sample Run 4. Downloading and Running: On the cloud server run the cloud program - cloud.py - under the services directory. Provide the IP address of the cloud server.python cloud.py http://xxx.xxx.xx.xxxOn the data owner computer run the program data owner.py under the services directory. Also provide the IP address of the data owner computer python dataowner.py http://xxx.xxx.xx.xxxOpen the interface program pgraph.py at the data owner computer or any other devices. Provide the IP address of the data owner's serverpython testgui.py http://xxx.xxx.xx.xxxTo create new datasets and make more test programs -create an edge file first
Open the test programs test.py and testnys.py in case you create your own datasets and manually select the dataset you want to run.python test.py http://xxx.xxx.xx.xxxpython testny.py http://xxx.xxx.xx.xxx4. References: "PrivateGraph: A Cloud-Centric System for Privacy-Preserving Spectral Analysis of Large Encrypted Graphs," by Sagar Sharma and Keke Chen ICDCS 2017 Atlanta. (pdf) "Privacy-Preserving Spectral Analysis of Large Graphs in Public Clouds", by Sagar Sharma, James Powers, and Keke Chen, is accepted by ACM Symposium on Information, Computer and Communications Security (ASIACCS) 2016.(pdf) 5. Project Members: Advisor: Keke Chen, Associate Professor at Wright State University Sagar Sharma, Ph.D. student at Wright State University |