This is the documentation page for the Stanford GPS (Graph Processing System) project.
GPS is an open-source system for scalable, fault-tolerant, and easy-to-program execution of algorithms on extremely large graphs. GPS is similar to Google’s proprietary Pregel system, and Apache Giraph. GPS is a distributed system designed to run on a cluster of machines, such as Amazon's EC2.
This documentation contains information about:
- How to set up GPS in your cluster.
- How to program GPS (i.e. add your algorithm).
- How to run GPS.
- Detailed description of the GPS API.
- Input graph format.
- Online and offline monitoring website.
- How to use Green-Marl high-level language and compile Green-Marl programs to GPS as an alternative to programming GPS directly.
Email Group: If you are considering using GPS, please join the email group for users: firstname.lastname@example.org
Overview of GPS
GPS architecture consists of a single master task and a number of worker tasks. Below is a diagram depicting the overall GPS architecture. GPS uses Apache MINAThe input graph (directed, possibly with values on edges) is distributed across machines and vertices send each other messages to perform a computation. Computation is divided into iterations called supersteps. Analogous to the map() and reduce() functions of the MapReduce framework, in each superstep a user-defined function calledvertex.compute() is applied to each vertex in parallel. The user expresses the logic of the computation by implementing vertex.compute(). This design is based on Valiant's Bulk Synchronous Parallel model of computation. A detailed description can be found in the original Pregel paper.