JanusGraph is a OLPT graph database optimized for storing and querying graphs containing hundreds of billions of vertices and edges distributed across a multi-machine cluster. JanusGraph is a transactional database that can support thousands of concurrent users executing complex graph traversals in real time. It can use Cassandra as backend storage. JenusGraph is a fork from the TitanDB. One commentary has been - "JenusGraph picks up where TitanDB left off". JanusGraph could be the de facto reference provider implementation for TinkerPop.
Apache TinkerPop is a graph computing framework for both graph databases (OLTP) and graph analytic systems (OLAP). JanusGraph supports queries using apache TinkerPop. Below are some examples of TinkerPop queries.
// What are the names of the managers in the management chain going from Gremlin to the CEO?
// What is the distribution of job titles amongst Gremlin's collaborators?
// Get a ranking of the most relevant products for Gremlin given his purchase history.
Gremlin is a graph traversal language and virtual machine developed by Apache TinkerPop. Gremlin works for both OLTP-based graph databases as well as OLAP-based graph processors. Gremlin's automata and functional language foundation enable Gremlin to naturally support imperative and declarative querying, host language agnosticism, user-defined domain specific languages, an extensible compiler/optimizer, single- and multi-machine execution models, hybrid depth- and breadth-first evaluation.
Other graph database systems
- Amazon Neptune - Fully-managed graph database service.
- Bitsy - A small, fast, embeddable, durable in-memory graph database.
- Blazegraph - RDF graph database with OLTP support.
- CosmosDB - Microsoft's distributed OLTP graph database.
- ChronoGraph - A versioned graph database.
- DSEGraph - DataStax graph database with OLTP and OLAP support.
- GRAKN.AI - Distributed OLTP/OLAP knowledge graph system.
- Hadoop (Spark) - OLAP graph processor using Spark.
- HGraphDB - OLTP graph database running on Apache HBase.
- IBM Graph - OLTP graph database as a service.
- JanusGraph - Distributed OLTP and OLAP graph database with BerkeleyDB, Apache Cassandra and Apache HBase support.
- JanusGraph (Amazon) - The Amazon DynamoDB Storage Backend for JanusGraph.
- Neo4j - OLTP graph database (embedded and high availability).
- neo4j-gremlin-bolt - OLTP graph database (using Bolt Protocol).
- OrientDB - OLTP graph database
- Apache S2Graph - OLTP graph database running on Apache HBase.
- Sqlg - OLTP implementation on SQL databases.
- Stardog - RDF graph database with OLTP and OLAP support.
- TinkerGraph - In-memory OLTP and OLAP reference implementation.
- Titan - Distributed OLTP and OLAP graph database with BerkeleyDB, Apache Cassandra and Apache HBase support.
- Titan (Amazon) - The Amazon DynamoDB storage backend for Titan.
- Titan (Tupl) - The Tupl storage backend for Titan.
- Unipop - OLTP Elasticsearch and JDBC backed graph.
Scylla - Cassandra Killer?
Scylla Is Next Generation NoSQL database that claims to give 10x performance of Cassandra. It is written in C++ ground up. It gives redis like performance. Scylla is a droping replacement of Cassandra 2.2 along with support for. Find the roadmap of Scylla here.
- All Apache Cassandra Drivers
- Protocols: CQL, Thrift, JMX
- Tooling: cqlsh, nodetool, cassandra-stress, and all of Cassandra 2.2 tools
- SSTable format
C++ applications can draw in maximum output from the available hardware resources. It is evident from the benchmark report too - to achieve the same of level of performance by a 3 node Scylla database might require as much as 30 nodes of Cassandra database. In the industry there is a push for C++ based products that take lower the hardware requirements and lower energy bills at the data center level. One drawback of C++ is that it requires significantly higher learning curve compared to Java and lack of standard libraries that Java ecosystem is blessed with.
Benchmark reports: https://www.scylladb.com/product/benchmarks/
Although Scylla has a superior throughput than Cassandra, the latter is more mature and battle tested for numerous internet scale applications with commercial support from Datastax. Perhaps sticking to Cassandra to solving is a good idea at this moment and let Scylla gain a more product maturity.