Lab 2: Raft

Always feel free to seek help by posting questions or concerns on Piazza or coming to the office hours. For complex problems, we encourage you to come during office hours so that you have dedicated time to help you solve issues.

Always start your lab as early as possible. This class's labs are relatively easy to make the first prototype, but it can be quite challenging and time-consuming to make it correct and bug-free.

Introduction

In this lab, you will keep working on the Raft consensus protocol to complete the log replication. This lab will be much more complicated and harder to get right than the previous one.

We have implemented leader election in the previous lab. Now, after the leader is established, the leader should be able to correctly replicate its logs when it receives the client's proposal. The leader is responsible for conducting consensus by sending out to-be-replicated logs to the followers.

Objective

In this lab, you’ll implement the log replication part of the Raft design described in the extended paper. You WILL NOT implement: saving persistent state, cluster membership changes (Section 6), log compaction/snapshotting (Section 7).

Some general tips:

Start early. Although the amount of code isn’t large, getting it to work correctly will be challenging.
Read and understand the Raft paper and the Raft lecture notes before you start. Your implementation should follow the paper’s description closely, particularly Figure 2, since that’s what the tests expect.

Getting Started

Checking out code

Go to the same repo you used for lab 1. Let's create a separate branch for your lab 2 submission based on the lab 1 implementation.

$ git checkout -b lab-2-solution lab-1-solution
$ git push -u origin lab-2-solution

Download the tarball included in the Piazza announcement --- "lab2-release.tar.gz" --- into the root directory of your project directory and run tar -xzvf lab2-release.tar.gz. It will overwrite some skeleton codes, which now come with 7 test cases for lab 2 and a newly updated tester, which runs in the actual multi-process way.

Beyond that, the skeleton code, toolchain requirement, and build procedures will be the same as lab 1. If you have any questions, please refer to lab 1's spec on these matters.

Finally, do not forget to push the changes to the remote.

Required Functions

You probably have implemented run() and get_state() functions according to lab1's spec. Now, you need to correctly implement:

ProposalResult propose(const std::string& data);

Implement this member function.
It has a data (or command, represented as a C++ string) passed as the parameter.
It returns a ProposalResult (defined in inc/common/common.hpp), representing the proposal metadata.
- Populate the field term with the term of this newly appended log entry.
- Populate the field index with the index of this newly appended log entry.
- Populate the field is_leader with the value of whether the current raft instance considers itself the leader.
- You need to set is_leader to false if the current raft instance is not the leader. You can directly return it in this case without any additional handling for the proposal. Ideally, in a real-world implementation, you may want to forward the proposal to the actual leader.

Other Requirements/Notes

Lab 2 additional requirements
- When the raft instance considers a log has been committed, it should invoke function void apply(const ApplyResult &result) (defined in inc/rafty/raft.hpp). The apply function takes an ApplyResult (defined in inc/common/common.hpp). The ApplyResult should contain the index and data of the committed log entry.
Other requirements that are the same as lab 1

All peer stubs are created for you in the std::unordered_map<uint64_t, RaftServiceStub> peers_, as a mapping from raft instance id to its stub. You can invoke a peer stub by using its ID, for example:

this->peers_[target_id]->AppendEntries(/* args */)

When invoking a remote RPC function, you MUST create a context using the provided create_context function and pass it into the RPC call.

Function signature (defined in the class Raft):

std::unique_ptr<grpc::ClientContext> create_context(uint64_t to) const;

Your remote RPC invocation should be something like this:

auto context = this->create_context(target_id);

grpc::Status status = this->peers_[target_id]->AppendEntries(&*context /* other args */);

The tester program will rely on this context to evaluate your implementation. Failing to use create_context to generate the context will result in failed test cases.

A mutex is defined in the Raft class as a protected property. You are free to use this mutex to protect concurrent data access. DO NOT delete this mutex. The mutex is defined as:

mutable std::mutex mtx;

For simulated network errors, all RPC requests will be intercepted and returned immediately with an error status. For each RPC request, you should check the grpc::Status it returned. For network errors, grpc::Status::ok() will be evaluated as false.
DO NOT use gRPC streaming in this series of labs.

Step-by-Step

You are recommended to do this lab in the following steps:

Part A - Send Log Entries

You should have figured out how to send regular heartbeats to the follower in lab 1. Now, you need to embed log entries into the heartbeat RPCs. With gRPC, this should be pretty straightforward.

Part B - Log Replication

In this part, you need to implement the AppendEntries RPC handler according to the paper's discussion. In this part, you also need to ensure the state transition is correctly implemented if an outdated request is received.

Part C - Log Commit

After successfully replicating the log entries, the leader should update the commit index and notify all the followers. In this part, you need to ensure you invoke the provided apply function to notify the commitment of the corresponding log entry.

Expectations

Consensus/Agreement (they are interchangeable here) should be done fairly fast when no network failure is presented. Ideally, when there is no network failure, the raft cluster should reach a consensus on a log entry within 4-5s. You can assume the network latency is minimal (all processes deployed on one single local machine).
You should not reach an agreement when no majority quorum is available.
You should not send excessive RPCs or embed excessive data in the RPC calls. For example, with no network failure, one agreement process (from propose to commit) should involve, at most, 3 * (n - 1) RPCs, where n is the cluster size.
Similarly, under the same assumption in the previous bullet, such an operation should involve approximately 3 *(n - 1) * ds data usage, where n is the cluster size and ds is the data size of the proposing command.
We will be slightly relaxed about the amount of RPCs and data size requirement. Not as strict as described above. However, if you try to embed substantial data in each RPC or invoke excessive RPCs that are not necessary, you will fail some tests.
Your implementation should be data race free. Namely, we will concurrently invoke the propose function and your implementation should not be broken or become inconsistent under concurrent proposals.
Your implementation should be resistant to network failure. Namely, the agreement should be reached if the previously disconnected servers are back online eventually.
Your implementation should be fast enough to ensure fall-behind or diverged (partitioned) followers quickly catch up with the leader.

NOTE: you may need to implement this described in the extended paper:

If desired, the protocol can be optimized to reduce the number of rejected AppendEntries RPCs. For example, when rejecting an AppendEntries request, the follower can include the term of the conflicting entry and the first index it stores for that term. With this information, the leader can decrement nextIndex to bypass all of the conflicting entries in that term; one AppendEntries RPC will be required for each term with conflicting entries, rather than one RPC per entry. In practice, we doubt this optimization is necessary, since failures happen infrequently and it is unlikely that there will be many inconsistent entries.

Be Caution & Hints

The tests for upcoming labs may fail your code if it runs too slowly. Try to write efficient code.
Do not write loops that execute continuously without pausing; it will slow your implementation enough to fail tests. Sleep the coroutine/thread when possible to avoid spinning and burning all CPU resources.
Be aware of outdated PRC replies. Due to (simulated) network failure, the packet might be delayed. Handling delayed replies is crucial to ensure all server states remain correct.

Run and Test

In this lab, we will have a total of 10 cases. In the skeleton code, we already provide 7 test cases (named ends with B). The other 3 test cases will be hidden test cases.

To run the auto tester:

# proj root dir

cd ./build/integration_tests

./raft_test

This will run all available test cases. If you want to specify the test cases to run, you can run ./raft_test --gtest_filter="*B". This will run all test cases whose name ends with B (a.k.a, all test cases for lab 2).

We also updated the multinode app. With the new multinode app, you can also start a proposal (agreement). For example, you can run prop hello. This is an updated description:

After starting up the node app, you will have four commands to control the raft instance. They are

r (run raft),
dis <id1{, id2}> (disconnect a node)
- conn <id1{, id2}> (connect a node)
- prop <data> (propose a command/data)
k (kill raft)

The new multinode will also receive notification of log committed from raft nodes (as long as you invoke the apply function correctly). Inside the build/app/logs/multinode.log log file, you will find something like this after you propose something:

........

[2024-10-09 21:49:16.801] [multinode] [info] ApplyResult: id=2, index=1, data=hello

[2024-10-09 21:49:16.833] [multinode] [info] ApplyResult: id=1, index=1, data=hello

[2024-10-09 21:49:16.834] [multinode] [info] ApplyResult: id=0, index=1, data=hello

........

This gives you some information regarding which node decides to commit the log along with the associated data and log index.

Pre-Grade

Pre-grade is still a thing in this lab, so that you can try out these hidden test cases. You will have 5 trials for this lab.

Pre-grade will be open via the announcement.

Submit

Leave your latest modification and implementation on the "lab-2-solution" branch. Please do not push any commits or changes after the official deadline. The grader will check out the latest commit before the deadline for the grading.

DO NOT

Make sure you do not do any of the following; otherwise, you will fail the labs, even if the code may pass the testing scripts.

Change/create files other than the files allowed.
Use shared variables between Raft instances.
Use file/network/IPC interface other than the gRPC interface.
Use external libraries other than googletest and spdlog.

Acknowledgment
This lab is inspired by MIT 6.824. The codebase is re-designed to a C++ version.

Page updated

Report abuse