Understanding DHT in BitTorrent: Protocols, Metadata Transfer, and Node Interaction

I’m translating the content of the WordPress post while retaining the original formatting, HTML tags, and styles. Here’s the translated text:—

0. DHT Review

Previous articles:

A recap of previous content:

BitTorrent is a protocol used for file distribution. The metadata files are encoded using bencode, divided into pieces for SHA-1 hash verification, and the metadata file data structure is introduced. Communication is facilitated through HTTP requests by the Tracker to exchange node information, and nodes communicate directly.

In the Distributed Hash Table (DHT), each node has its own ID and routing table, and specific information related to the hash of a downloader can be obtained through KRPC in the DHT.

DHT: Reflection and Discussion on Previous Content

The fourth part of the article elaborated on how distributed hashing works. In this article, we will separately analyze the extended protocol and metadata transfer extensions. Before starting, a discussion on some detailed understandings can be conducted here, which are not explicitly explained, and are judgments based on personal understanding. If there are different viewpoints, they are welcome for discussion.

This part mainly addresses the questions and results of analysis that I had during my understanding of BEP, as follows:

What is the connection between a downloader’s node ID (peer id) and the infohash of the content it is downloading?

There is no particularly strong connection. The infohash of the download content is determined by the contents of the metadata file, and the node ID is randomly selected. They are merely similar in form, both being 20-byte strings resembling SHA1 results, but have no intrinsic connection.

If there is no intrinsic link between node ID and infohash, how does one find nodes through infohash?

Infohash is used to find corresponding nodes. Each DHT node can be seen as an enhanced version of a Tracker. Through the Tracker, it is possible to obtain the IP or domain address and the corresponding port of the downloading node, but not the DHT ID of this node. This does not cause any conflict.

Can DHT obtain metadata files?

Obviously not, as repeatedly clarified in the text.

If metadata files cannot be retrieved, what is the role of DHT nodes?

As mentioned before, each node can be seen as a Tracker. When requested via infohash, if the corresponding node is known, the corresponding node will be returned; otherwise, the search will attempt to find closer nodes to respond.

As a downloader, to which nodes should announce_peer be sent?

When announcing a seed with a specific infohash, it is generally required to announce_peer to a group of nodes closest to that infohash. The number of these nodes is usually the number of nodes in the bucket corresponding to that infohash.

As a DHT, knowing that there are more and closer nodes, but still receiving announce_peer, how should it be handled?

Even if other current node IDs are closer to the target infohash, announce_peer requests should still be processed and stored to help other users of the DHT network. The objective of the DHT protocol is to decentralize and share information, so even if a node is not the closest, it can still contribute to the entire network.

If a node sends announce_peer to nodes it believes are closest, but fails to discover many closer nodes, could this lead to failure in finding the node?

This is possible. However, it generally does not lead to the failure of finding the node because the DHT network is dynamic, with information continuously spreading and updating. In many DHT implementations, nodes selectively send messages to the closest nodes based on network and optimization strategies, which is an optimization measure and not a requirement of the protocol. It is important to note that this may increase network traffic and load.

2. DHT Extended Protocol

As previously mentioned, DHT provides only node information and cannot transmit or exchange metadata, nor is it used to transmit files. To obtain metadata files through the infohash, it requires the metadata file extension realization illustrated in BEP 0009. In this chapter, we will discuss the extended protocol implementation specified in BEP 0010, with metadata file exchange content to be discussed in the next chapter.

The purpose of BEP 10 is to provide a simple transfer feature for BitTorrent without interfering with the BitTorrent protocol.

BitTorrent Extensions

To identify this extension, the reserved bit should be set at the 20th position from right to left in the handshake message.

When both parties in the handshake support this protocol, the communication content added

Label

Description

0x14

extended

The extended message was used to implement the functionality of the extension protocol, and the extended message includes:

  • 4-byte (uint23) message length, big-endian encoding
  • 1-byte message type, 20 or 0x14
  • 1-byte extension information which, when 0, indicates an extension handshake. Other situations are determined by the handshake.

Extension Handshake

The payload of the handshake information is a dictionary where all content is optional, case-sensitive, and unknown key-value pairs can be ignored. If this section is difficult to understand, analysis can be conducted with the examples in the next chapter. The dictionary usually includes:

  • m: A dictionary of supported extensions, containing extension names and mapped message IDs. Setting the value to 0 indicates that the extension is not supported, and unknown content should be ignored.
  • p: Local TCP listening port. The responder does not need to send this extension message.
  • v: Client name and version (UTF-8 encoded).
  • yourip: Contains a string of the compact representation of the IP address seen on the other side, that is, the external IP address of the receiver (excluding port). This can be an IPv4 (4-byte) or IPv6 (16-byte) address.
  • ipv6: If IPv6 supported, it is a compact representation (16-byte) of its address.
  • ipv4: If IPv6 supported, it is a compact representation (4-byte) of its address.
  • reqq: Indicates the client request queue length, requests queued beyond the queue will be discarded. The default value in libtorrent is 250.

For instance, here is an example of extended handshake information:

d1:md11:LT_metadatai1e6:”T_PEXi2ee1:pi6881e1:v13:QCloud_rand 1e

Its corresponding handshake information:

{    "m":{        "LT_metadata":1,        "ut_pex":2,    },    "p":6881,    "v":QCloud_rand 1}

In BEP10, no extension protocol is defined. Specific support is required from other extensions. The next chapter will analyze the metadata file exchange extension as the target. Furthermore, BEP 0010 has provided the following clarifications, explaining why not to use a global information ID, standardized name prefixes, the use of dictionaries instead of arrays, single-byte identifiers, and the usage of constants. Interested individuals may translate and review it personally.

Metadata Transfer Extension

Based on the above extension protocol, the metadata transfer extension allows clients to download metadata from peers, making magnet links possible.

In this process, metadata is processed in blocks of 16KiB (16384 bytes). The index of metadata blocks starts from 0. Except for the final block, which may be smaller, all blocks are 16KiB.

Extension Header:

The metadata transfer extension adds ut_metadata to the “m” dictionary in the extended header handshake message and adds metadata_size to the handshake message specifying the integer value of the metadata bytes, for example, here is a handshake message:

{    "m": {        "ut_metadata", 3        },    "metadata_size": 31235}

The extension message has three types:

  • 0 Request
  • 1 Data
  • 2 Reject

Request

A request message requests a slice of the metadata.

Data

In a data message, an additional key total_size is added to the dictionary.

The metadata fragment follows the dictionary and is part of the message (length calculation includes it).

Reject

For security considerations, a downloader may reject the transfer requests from other connections.

3. Metadata Transfer Extension Example

Here again, please note: Frequently requesting DHT without contributing data to DHT is selfish and not recommended by the community.

The implementation of DHT is not the focus of this article. Here we use an existing library for processing, specifically, the Go-based DHT library (https://github.com/nictuku/dht) is used for peer acquisition.

For demonstration purposes, the official release of ubuntu-22.04.3-live-server-amd64.iso is selected as an example, with its infohash being:

da1a0defb35d43a218fc7eb0fc8d4c6c44a3ed2d

The following program is used for node acquisition through DHT:

package mainimport (	"fmt"	"github.com/nictuku/dht"	"os"	"time")func main() {	ih, _ := dht.DecodeInfoHash("da1a0defb35d43a218fc7eb0fc8d4c6c44a3ed2d")	d, _ := dht.New(nil)	_ = d.Start()	go drainresults(d)	d.AddNode("router.bittorrent.com:6881")	d.AddNode("dht.transmissionbt.com:6881")	d.AddNode("router.utorrent.com:6881")	for {		d.PeersRequest(string(ih), false)		time.Sleep(5 * time.Second)	}}func drainresults(n *dht.DHT) {	count := 0	for r := range n.PeersRequestResults {		for _, peers := range r {			for _, x := range peers {				fmt.Printf("%d: %v\n", count, dht.DecodePeerAddress(x))				count++				if count >= 10 {					os.Exit(0)				}			}		}	}}

This program launches a DHT listener and constantly sends request peers queries to obtain the first 10 nodes, and the results are as follows:

DHT />DHT Node Query Results

It is important to note that DHT is an anonymous, publicly maintained environment containing much malicious data, and nodes provided by DHT may be incorrect. Node connectivity needs to be checked individually, and nodes with good connectivity are contacted using the previously mentioned Socket tool. If these nodes fail to connect or respond to requests, attempt to acquire more nodes from DHT. Proceed by using infohash for handshaking, then extend the handshake to send metadata transfer requests:

Here, handshake information is consistent with the content mentioned earlier, handshake information:

DHT />Handshake Information

Upon receiving the handshake information from the other party, determine if extensions are enabled. If enabled, send an extended handshake packet. As packets are tested manually, the extended handshake packet from the opposite end is received almost immediately upon sending the packet. Using the analyzed information ID corresponding to ut_metadata, construct an extended handshake packet, like:

{    "m": {        "ut_metadata", 3,    },    "p": 6881,    "v": "QCloud_rand 1",    "metadata_size": 31235}

After encoding it and calculating the message length based on the content from earlier, construct the data packet:

[00 00 00 4D 14 00 64 31 3A 6D 64 31 31 3A 75 74 5F 6D 65 74 61 64 61 74 61 69 33 65 65 31 33 3A 6D 65 74 61 64 61 74 61 5F 73 69 7A 65 69 33 31 32 33 35 65 31 3A 70 69 36 38 38 31 65 31 3A 76 31 33 3A 51 43 6C 6F 75 64 5F 72 61 6E 64 20 31 65 ]

After completing the extended handshake, a metadata request can be sent, such as:

{    "msg_type": 0,    "piece": 0,}

Message type 0 corresponds to a request, with no data at present, requesting piece 0 data, here’s a rejection example:

Peer Rejects Metadata RequestPeer Rejects Metadata Request

It is clearly seen that the reply message type is 2, which indicates rejection.

Followed by changing target nodes and re-sending, the following successful request is achieved:

Metadata Request ResultsMetadata Request Results

The two requests marked by red boxes correspond to the responses of requests for pieces 2 and 0 of the data, respectively, with the start of the target metadata file marked by the blue box.

By using this cycle, the entire metadata file can be acquired, it is also necessary to perform a SHA 1 value verification upon completion to avoid transmission errors or malicious nodes providing incorrect data.

Magnet Links

The Magnet URI (Magnet Link) format is:

v1: magnet:?xt=urn:btih:&dn=&tr=&x.pe=v2: magnet:?xt=urn:btmh:&dn=&tr=&x.pe=

Currently, we only analyze the BitTorrent protocol proposed in BEP 3 and have not yet analyzed the new BitTorrent protocol. Therefore, version 2 addresses are ignored for now, and only version 1 addresses are examined. Upon reaching this point, you should no longer have difficulty understanding. It is merely necessary to note that, for compatibility purposes, clients should still support 32-character base32 encoded infohash.

For a link, only the infohash is mandatory, and all other parameters are optional.

4. Completion of Metadata Transfer Extension

This section first further analyzes and discusses potentially problematic areas from previous content, examines two metadata in BEP 10 and BEP 9, and performs DHT queries to obtain metadata examples.

In practice, the DHT often returns a large number of incorrect pieces of information, with some misleadingly apparent as false. During the completion of the article, some articles with similar content were found in the Chinese internet. Upon joining DHT, they have an identical logic of processing DHT requests: Fabricating random data to be returned, frequently updating their node IPs to avoid being blacklisted. Frequently requesting queries, especially to the same node can be considered impolite behavior. Only conducting queries without providing service to other nodes is an undesirable environment, and randomly returning erroneous data is a blunt disruption of the community, which is condemned and opposed here. (Also: For testing purposes, marking as read-only to inform other nodes not to send requests to oneself is possible without fabricating responses.)

This article concludes here. Up to this point, the analysis of specifications in BEP 3, 5, 9, and 10 has been completed. Subsequent articles will analyze more content, and if available, links will be placed here. Please stay tuned:

Finally, the advertisement for this essay activity:

I am participating in 2023 Tencent Technical Writing Training Camp Phase II Award-winning Essay, share ten thousand yuan prize pool and winning prizes such as keyboards and watches