Analyzing the performance of NoSQL Databases using the Yahoo Cloud Service Benchmark
In recent years alternatives to relational databases have emerged that provide advantages in terms of performance, scalability, and suitability for cloud environments. While they vary significantly in terms of capabilities, many in the industry have adopted the term “NoSQL” to describe these products. Here we evaluate the performance – Cassandra and MongoDB – using an industry standard benchmark created by Yahoo! called YCSB.
YCSB is a popular Java open-source specification and program suite developed at Yahoo! to compare the relative performance of various NoSQL databases. Its workloads are used in various comparative studies of NoSQL databases.
We use YCSB as the basis for all our tests. We used the same number of threads and number of operations in all of our tests to provide a common baseline. For each of the tests, we performed multiple independent runs. For measuring results, we recorded throughput and latencies for all the workloads.
The workloads are as follows:
Workload A: Update heavy workload: 50/50% Mix of Reads/Writes
Workload B: Read mostly workload: 95/5% Mix of Reads/Writes
Workload C: Read-only: 100% reads
Workload D: Read the latest workload: More traffic on recent inserts
Workload E: Short ranges: Short range based queries.
Workload F: Read-modify-write: Read, modify and update existing records
For each of the durability configurations, here we discuss how the databases performed for the YCSB workloads.
Workload A: Update Heavy First, we examined Workload A, which has 50 percent reads and 50 percent updates. Figure 1 shows latency versus throughput curves for each database for both the read and update operations. Keeping the operation count as 1 million, we varied the number of records from 1000000 to 5000000 at 50000 intervals. As the figure 1 a shows, read latency(in micro sec) decreased as offered throughput(op/sec) increased.
Other workloads: The process was repeated in order to compare the two databases and the plots obtained, displayed quite similar behavior. Throughputs for MongoDB were much different from those for Cassandra, as indicated by the range of the axis, for workloads A,B,C and D
Workload B: The read latency curves for both databases against the throughputs (op/sec) showed similar trends. However, the update latency curves displayed different behaviour for both the databases. On comparing the update latencies for the two, we see that Cassandra shows a decreasing graph. However, in case of Mongodb, the behavior is not stable till 11000 op/sec and shows extreme fluctuation. Beyond 11000 op/sec, the update latency becomes constant to an extent.
Workload C: The decrease in case of MongoDB(Fig 3b) was not as linear as was for Cassandra for workload C. In the case of MongoDb, the decrease is steep till a throughput of 11500 op/sec beyond which it stays constant till 12260 op/sec and then decreases again.
Performance with Workload A
Performance with Workload B
Performance with Workload C
Workload D: Read latency vs throughput and Insert latency vs throughput were plotted. In the case of mongodb, amore stable decrease was observed for read latency vs throughput. The plots obtained for insert latency (Fig 4c and 4d below) were quite interesting. In both cases, initially, i.e, with fewer record count, there was no stable behaviour. However, Cassandra showed a decreasing trend beyond 5550 op/sec while MongoDb showed an increase after 10320 op/sec.
Performance with Workload D
Performance with Workload F
The various graphs plotted gave us an insight into the performance of each database for different workloads. For workloads A,B, C, and D, F , read latencies vs Throughputs (op/sec) recorded at 50000 intervals for record counts from 100000 to 500000 showed similar decreasing plots. The performance for workload E was less predictable for the two databases.
On comparing the two databases, it was inferred that all though Mongodb and Cassandra are quite similar, Cassandra offers higher throughput values for most workloads than Mongodb. While throughput op/sec vs record count for mongodb decreases, cassandra’s plot is comparatively constant and hence it can be concluded that Cassandra is more stable.The perf tool was used to analyze the load and run commands for the six workloads.