Projects‎ > ‎

DeepWalk - Online Learning of Social Representations

News

  • DeepWalk released at KDD'14!  


Abstract

We present DeepWalk, a novel approach for learning latent representations of vertices in a network. These latent representations encode social relations in a continuous vector space, which is easily exploited by statistical models. DeepWalk generalizes recent advancements in language modeling and unsupervised feature learning (or deep learning) from sequences of words to graphs. DeepWalk uses local information obtained from truncated random walks to learn latent representations by treating walks as the equivalent of sentences. We demonstrate DeepWalk’s latent representations on several multi-label network classification tasks for social networks such as BlogCatalog, Flickr, and YouTube. Our results show that DeepWalk outperforms challenging baselines which are allowed a global view of the network, especially in the presence of missing information. DeepWalk’s representations can provide F 1 scores up to 10% higher than competing methods when labeled data is sparse. In some experiments, DeepWalk’s representations are able to outperform all baseline methods while using 60% less training data. DeepWalk is also scalable. It is an online learning algorithm which builds useful incremental results, and is trivially parallelizable. These qualities make it suitable for a broad class of real world applications such as network classification, and anomaly detection.

DeepWalk is joint work with Rami Al-Rfou and Steven Skiena.

Presentation


Code

An implementation of DeepWalk is available on Github: https://github.com/phanein/deepwalk



Usage

Example Usage
$deepwalk --input example_graphs/karate.adjlist --output karate.embeddings

--inputinput_filename

  1. --format adjlist for an adjacency list, e.g:

    1 2 3 4 5 6 7 8 9 11 12 13 14 18 20 22 32
    2 1 3 4 8 14 18 20 22 31
    3 1 2 4 8 9 10 14 28 29 33
    ...
    
  2. --format edgelist for an edge list, e.g:

    1 2
    1 3
    1 4
    ...
    
  3. --format mat for a Matlab MAT file containing an adjacency matrix

    (note, you must also specify the variable name of the adjacency matrix --matfile-variable-name)

--outputoutput_filename

The output representations in skipgram format - first line is header, all other lines are node-id andd dimensional representation:

34 64
1 0.016579 -0.033659 0.342167 -0.046998 ...
2 -0.007003 0.265891 -0.351422 0.043923 ...
...
Full Command List
The full list of command line options is available with $deepwalk --help

Requirements

  • numpy
  • scipy

(may have to be independently installed)

Installation

  1. cd deepwalk
  2. pip install -r requirements.txt
  3. python setup.py install


Data

The multi-label classification tasks we evaluated on come from Lei Tang, and are available for download at his personal website.  (if his site goes offline, contact me)

Citing DeepWalk

Bibtex reference available here (or in other formats from the ACM DL)
@inproceedings{Perozzi:2014:DOL:2623330.2623732,
 author = {Perozzi, Bryan and Al-Rfou, Rami and Skiena, Steven},
 title = {DeepWalk: Online Learning of Social Representations},
 booktitle = {Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining},
 series = {KDD '14},
 year = {2014},
 isbn = {978-1-4503-2956-9},
 location = {New York, New York, USA},
 pages = {701--710},
 numpages = {10},
 url = {http://doi.acm.org/10.1145/2623330.2623732},
 doi = {10.1145/2623330.2623732},
 acmid = {2623732},
 publisher = {ACM},
 address = {New York, NY, USA},
 keywords = {deep learning, latent representations, learning with partial labels, network classification, online learning, social networks},
} 
Comments