PrivateGraph

A Cloud-Centric System for Spectral Analysis of Large Encrypted Graphs
    Data Intensive Analysis and Computing (DIAC) Lab, kno.e.sis Center




1. Introduction:

Graph datasets have invaluable use in business applications and scientific research. Because of the growing size and dynamically changing nature of graphs, graph data owners may want to use public cloud infrastructures to store, process, and perform graph analytics. However, when outsourcing data and computation, data owners are at burden to develop methods to preserve data privacy and data ownership from curious cloud providers. This demonstration exhibits a prototype system for privacy-preserving spectral analysis framework for large graphs in public clouds (PrivateGraph) that allows data owners to collect graph data from data contributors, and store and conduct secure graph spectral analysis in the cloud with preserved privacy and ownership. This demo system lets its audience interactively learn the major cloud-client interaction protocols: the privacy- preserving data submission, the secure Lanczos and Nystrom approximate eigen-decomposition algorithms that work over encrypted data, and the outcome of an important application of spectral analysis - spectral clustering. The demo users will understand the intrinsic relationship amongst costs, result quality, privacy, and scalability of the proposed framework. 


2. Key Techniques: 

Additive Homomorphic Encryption (AHE) Scheme to build pseudo-homomorphic Matrix and Vector Operations with one of the operands unencrypted. E( ) represents a homomorphic encryption function below.

Dot product: E(xT y) = E(xT )y = xT E(y)

Matrix-vector multiplication: E(Ax) = E(A)x = AE(x)

Matrix–matrix multiplication: E(AR) = E(A)R =AE(R)

LWE based randomization to protect the unencrypted operand in Lanczos. If b is an unencrypted operand vector, the perturbed vector, b’, is given as below, where, matrix A is known to adversaries; secret vector s and noise e are randomly generated vectors, and q is a large prime.

 b’=As + e + b mod q 

Multiplicative Matrix Noise to protect the unencrypted operand in Nystrom. If V is an unencrypted matrix, the perturbed matrix, V’, is given as below, where matrix R is a uniform random invertible matrix, and D is uniform random additive noise matrix; q is a large prime:

 V’=(V+D) R mod q

Differential Privacy based sparse data encoding! Incorporate differential privacy to construct sparse submission mechanism which considers node-degree statistics functions. We add fake edges to rows of sparse matrices and preserve ε- differential privacy based on Laplace mechanism.



2. Framework Architecture:

Fig. 1: The PrivateGraph framework. Distributed users contribute to the graph data which is encrypted and stored in public cloud. The graph spectral analytics happen with cloud- data owner collaboration. 


Figure 1 shows the PrivateGraph framework and the involved three parties: the public cloud (server), the data owner (client), and the data contributors. The data owner periodically runs graph algorithms on the evolving graph datasets to generate models. It also generates the public-private key pair and distributes the public key to the data contributors. Data contributors upload encrypted graph data (i.e., edges) to the cloud through secure interfaces. Alternatively, the data owner can also directly upload to the cloud any in-house graph datasets collected via other channels. The cloud and data owner collaboratively run the secure eigen-approximation algorithms over encrypted dataset residing in the cloud without compromising privacy. The expensive operations O(N2) are carried out by the cloud while the lighter computations O(N) by the client. 

Threat Model: Data contributors’ privacy, data-ownership, and privacy of data in between computations are the assets at stake. PrivateGraph preserves these assets from “honest but curious” cloud without compromising the utility of graph datasets. 

A. Privacy Preserving Data Submission



Fig. 2:  The graph data is represented as normalized Laplacian matrix. The entries are converted to precision preserving integers and then encrypted before submission to the cloud.


We enable data owner collect subscribing users' data over at cloud in a confidential manner. Graph data is represented as normalized graph Laplacian matrix that can be used for special clustering. In order to use the Additive Homomorphic Encryption schemes for encryption the message space is converted to non-negative integers. For matrices in the real value domain, we apply a reversible affine transformation which preserves d decimal places:

Each data contributor represents one or a few nodes in the graph, and contributes to the corresponding row(s) of this graph matrix. Encoding and encrypting the entire user vector in a dense format is straightforward, i.e. convert each element of the submission row to precision preserving integers and then encrypt with the AHE scheme. However, Most graphs in popular applications are sparse. It is desirable to skip encrypting some of the zero entries to save storage, communication, and computation costs, while still providing sufficient protection to privacy. We design techniques and algorithm that protects the privacy of node degrees and links using differential privacy while preserving data sparsity and authenticity.

For sparsity:

     1. Keep the non-zero elements

     2. Add fake edges to hide node degree and real edges - randomly picked zero edges are encrypted before submission

     3. Satisfy bin-based differential privacy and preserve data authenticity 

      (not implemented at this point)


Note: Submissions are sparse in this version of our implementation - encrypting only the present edges. 

B. Privacy Preserving Lanczos and Nystrom Algorithms




Fig 3: Lanczos Method. Cloud computes in parallel expensive pseudo-homomorphic matrix-vector multiplication E(Abi’), where bi' is a perturbed vector and A is the encrypted graph matrix; Data Owner generates perturbed vectors b’i, decrypts intermediate results, and recovers vector bi . The interactions happen for a number of iterations.


Fig 4: Nystrom Method. Cloud sends to data owner encrypted subsampled matrix W. Data owner decrypts matrix W and computes small matrix V which it perturbs with a random matrix R and send the result to cloud.
Cloud computes in parallel pseudo-homomorphic matrix-matrix multiplication E(CVR), where C is the sub matrix of A containing the corresponding sampled columns. Data Owner decrypts and recover CV from CVR. The interactions are one round.

3. Demo System:
A. Demo Workflow

Fig 5: The PrivateGraph Demo Workflow: The gray boxes represent the back-end while the transparent boxes are client-facing.

B. Sample Run 



4. Downloading and Running:

Download this zip file which contains contains the binary code (Linux built), UI, and data. Copy the unzip folder to both data owner and cloud provider's computers.

On the cloud server run the cloud program - cloud.py - under the services directory. Provide the IP address of the cloud server.

python cloud.py http://xxx.xxx.xx.xxx

On the data owner computer run the program data owner.py under the services directory. Also provide the IP address of the data owner computer

python dataowner.py http://xxx.xxx.xx.xxx

Open the interface program pgraph.py at the data owner computer or any other devices. Provide the IP address of the data owner's server

python testgui.py http://xxx.xxx.xx.xxx


To create new datasets and make more test programs
        -create an edge file first
-use "build/spgtest data_file" to generate a new file with extension .sp which stores the encrypted         normalized graph laplacian matrix (L=D^-1W - I). The top-k eigenvectors of -L can be used for spectral clustering.


Open the test programs test.py and testnys.py in case you  create your own datasets and manually select the dataset you want to run.

python test.py http://xxx.xxx.xx.xxx
python testny.py http://xxx.xxx.xx.xxx


4. References:

"PrivateGraph: A Cloud-Centric System for Privacy-Preserving Spectral Analysis of Large Encrypted Graphs," by Sagar Sharma and Keke Chen ICDCS 2017 Atlanta. (pdf)

"Privacy-Preserving Spectral Analysis of Large Graphs in Public Clouds", by Sagar Sharma, James Powers, and Keke Chen, is accepted by ACM Symposium on Information, Computer and Communications Security (ASIACCS) 2016.(pdf)


5. Project Members:

Advisor: Keke Chen, Associate Professor at Wright State University
Sagar Sharma, Ph.D. student at Wright State University