Lab 3

Measure performance of Raft log replication (your lab 2)

You will now implement performance benchmark scripts for Raft implementation.

Getting Started

Go to the same repo you used for lab 2. Let's create a separate branch for your lab 3 submission based on the lab 2 implementation.

$ git checkout -b lab-3-solution lab-2-solution
$ git push -u origin lab-3-solution

Download the lab3-release.tar.gz tarball here from: https://drive.google.com/file/d/1ScNKcAEeAKhlYs5_OJZ1M7ENdn1ba7GM/view?usp=sharing

Move the tarball into the root directory of your lab and untar it with tar -zxvf lab3-release.tar.gz

Finally, do not forget to push the changes to the remote.

Task 1. Measure unloaded latency (4 pt)

Overview

In this step, you should create a closed loop client for benchmark. We provide a skeleton code in app/latency.cpp.

When you run ./latency, it should run the closed loop client and send synchronous proposal for 1000 times. After 1000 proposal, output average latency, median (p50) latency, and p99 latency.

Example:

######################################
# latAvg latP50 latP99
# (ms) (ms)
--------------------------------------
10 10 50

Please strictly follow the layout and clearly display latAvg, latP50, and latP99 values and headers. Fail to clearly display the required information may results in point deduction.

Step 1. Add synchronous propose RPC to your Raft implementation

Before measuring the latency of Raft, you first create a new RPC that sends a proposal request and returns after the execution of the command (committed).

Define a function in the inc/rafty/raft.hpp as follows:

ProposalResult Raft::propose_sync(const std::string &data)

Keep the signature the same as it shows above and implement this function in your corresponding source code, by default it should be src/raft.cpp. This function should NOT immediately return. It shall only return when the corresponding log is consider committed.

Step 2. Implement a closed loop client "latency"

In the file app/latency.cpp, we provide some skeleton codes for you to start. This skeleton code is largely similar to the multinode application, where it starts a number of raft servers, and you can interact with raft servers with RPCs. Therefore, it effectively serves as the "client".

You need to pay attention to line 142 - line 153, as it shows you how to create raft servers and how to start them. The code is also attached here:

toolings::RaftTestCtrl ctrl(

configs, node_tester_ports,

std::string(node_path), std::string(ctrl_addr), fail_type,

verbosity, logger

);

// don't forget to invoke this function to start up the raft servers

run_raft_servers(ctrl);

// do the measurement

measure_once(ctrl);

// don't forget to invoke this function to clean up the raft servers

cleanup_raft_servers(ctrl);

We provide two functions run_raft_servers() and cleanup_raft_servers() so that you can bootstrap and clean up raft servers properly. Please don't forget to properly invoke these two functions. We already invoke for you in this part, but you may also need to do this in the later part of this lab.

You can put your measurement and output logic in the measure_once function.

Step 3. Measure latency of each proposal.

As mentioned, you can put your measurement and output logic in the measure_once function. The skeleton functions for metric calculation are also included. You may choose to fill in the blank or create new functions. They are here just for your reference, but you are not required to use them.

In this step, you need to figure out how to measure each request's latency by looping over 1000 requests to the raft servers. You are required to call ctrl.propose_to_all_sync() to propose to the raft servers. You MAY NOT use other APIs. It can be some random but non-empty text data to propose.

In this step, keep the replica number as 3, which is the default option, and you won't need to explicitly change anything.

Step 4. Format text

The last step after the measurement is output the data. You need to clearly indicates the header (labels of each number) and the corresponding values. Fail to clearly display the number may result in point deduction.

Step 5. Report

In your report, describe how you calculate p50 and p99 numbers.
In your report, attach the outputted results.

Task 2. Measure maximum throughput of system (3 pt)

Now, let's scale our benchmark to multiple clients using multi-threading. In this step, you should create another closed loop client for benchmark. Create an application named tput in the app folder.

When you run ./tput <MaxClientCount>, it should run the closed loop multi-client benchmark and send synchronous proposals from each client for 1000 times. This time you need to run it in several rounds. First round starts with 1 client, and then next round doubles the number of client count (thread count). Stop if the client count is larger than the MaxClientCount supplied as the command-line argument. After it is done, output the number of clients, average latency, median (p50) latency, p90 latency and p99 latency to result.txt.

Example of result.txt:
##################################################################################
# clientCount latAvg latP50 latP90 latP99 throughput
# (ms) (ms) (ms) (ms) (ops/sec)
----------------------------------------------------------------------------------
1 10 10 20 50 100
2 11 11 22 55 180

Step 1. Create the tput application

Create a tput.cpp file in the app folder. You can start from copying the latency.cpp file.

Step 2. Multi-client

In this step, you need to refactor the latency app into a multi-client throughput application. In the throughput app, you need to conduct multiple rounds of "latency" tests. In each round, you should:

Create a new cluster of raft servers.
Start a different number of clients to concurrently propose to the raft servers. Each client uses a different thread to run the proposal loop (1000 proposals for each client, same as task 1).
Properly clean up the raft servers.

Step 3. Format Text

Again, after each round, you need to output the result. In this application, multiple rows of data should be written to result.txt. Each row (record) should display the number of clients involves, the average latency, the median (p50) latency, the p90 and p99 latency, and throughput (measured by number of operations (committed proposals) per second).

Step 4. Report

In your report, describe how you measure the throughput using the closed-loop client.
In your report, attach the outputted results.

Task 3. Measure and draw latency-throughput plot (2 pt)

Now, let's plot the latency and throughput measured from Task 2.
You should write a Python script lat-tput.py in the folder bench/. You need to create the folder yourself. No template is provided.

Step 1. Parse "results.txt"

First, create python lists for saving the data.

clientCount = []
latAvg = []
latP50 = []
latP90 = []
latP99 = []
throughput = []

You should read in result.txt generated from Task 2. If you are not sure, check out https://stackoverflow.com/questions/3277503/how-to-read-a-file-line-by-line-into-a-list
For each line of string, you should parse the data. There are many different ways to parse data, but I recommend you to use glob. https://docs.python.org/3/library/glob.html

Step 2. Plot the parsed data

You may use any plot methods, but we suggest using matplotlib.pyplot.scatter
https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.scatter.html

To plot latency-throughput graph, apply x=throughput and y=latAvg.

Let's plot multiple lines by using different latency statistics, p50, p90, and p99.

Step 3. Report

Include the plot with 4 lines in the report.
Don't forget to push your Python script to the Git repo. Your script will be tested with other student's result.py

Task 4. Re-measure performance after configuration change (1 pt)

Ok, you now have basic latency and throughput benchmark for Raft. Let's use it for some interesting experiments.

Let's re-measure throughput and latency after the configuration change:

Increase the number of replicas to 5
Increase the number of replicas to 11

Please include a plot showing all three configurations (original, rep = 5, rep = 11) in the report.

Write a paragraph on your explanation of why performance is changed as you measure.

Submission of Lab 3

Submit both your code (for task 1, 2, and 3) and a 1 page report (Task 1, 2, 3, and 4).
Push all of your code (two apps), plotting script, and report.pdf to the same Git repo.

For submission, please follow the following rules. You may create the folder if it doesn't exist.

Place latency.cpp and tput.cpp in the folder app/
Place lat-tput.py in the folder bench/
Place report.pdf in the project root directory.

Page updated

Report abuse