First Biweekly Update

For this milestone, I have looked specifically at the functionality of IPFS, its design and implementation.

For the former, IPFS borrows much from its predecessors, similar to BitTorrent, IPFS decouples the name for a piece of content from its storage location, this is the content identifier (CID). This enables the decentralization of content storage, as well as its delivery and address management.

For content indexing, IPFS uses a variant of the Kademlia distributed hash table (DHT), using the SHA256 hash of the CID and peerID as the key to a content’s index.

To support scalability, IPFS introduces hierarchy among its peers, where a peer will first join the system as a client node, and ask its neighbors to connect to it; if it has three or more connections, it is then considered a server node.

Since content is split into chunks much like BitTorrent, each chunk is hashed and stored separately in a data structure called a merkle directed acyclic graph, where the root node contains the hash of all its descendant nodes and can be used for self verification, as if the content was changed, the hash of the root, the root CID would then not match.

When peers join the system, they receive a unique peer ID, and to locate the location of their neighbors, a peer uses a multiaddress for each neighbor, which encodes the information required to communicate with that neighbor across multiple protocols used in the internet protocol suite.

For its implementation, IPFS has two distinct operations, Publication and Retrieval.

Publication first has the peer assigns the piece of content its CID, generates a provider record and attempts to store the record with its 20 closest peers. A peer will also then publish its peer record, which is used by a peer retrieving content to traverse the network of the peer containing the content.

For content retrieval, a peer first asks its neighbors for the CID, and as a fallback performs the DHT lookup, where peers receiving the lookup request will reply with the content directly if they have it, or forward the request towards peers with the closest peer ID to the CID of the content. After the peer is returned the peer ID which hosts the requested content, it will map that to a network address by using the peer record, where the peer will use the multiaddress of the hosting peer to connect.


References

[1] Trautwein et al. “Design and Evaluation of IPFS: A Storage Layer for the Decentralized Webhttps”