0. Recap
Certainly! Here is the rewritten content:âPrevious articles: file distributionâ
- Introduction to BitTorrent Protocol (Part 1) Metadata Files https://cloud.tencent.com/developer/article/2332701
- Introduction to BitTorrent Protocol (Part 2) Tracker and Peers https://cloud.tencent.com/developer/article/2333043
- Introduction to BitTorrent Protocol (Part 3) Peer Data Transfer Example https://cloud.tencent.com/developer/article/2333677
Review of previous content:
BitTorrent is a protocol for distributing files, where metadata files use bencode encoding, fragments undergo SHA-1 hash verification, and the metadata file structure is introduced. Nodes exchange information via HTTP requests from the Tracker, with direct node-to-node communication.
The contents discussed in the previous three parts mainly refer to BEP 0003, the first version of the BitTorrent protocol. It is considered the final version, with little expected change. However, the content starting from this article might change over time or as the protocol evolves. A link to the protocol basis will be provided, and itâs recommended to consult the protocol during reading to accommodate content changes.
Before discussing fast exchange and new versions of the BitTorrent protocol, the Distributed Hash Table (DHT) is a crucial topic. While itâs not necessary for the BitTorrent protocol and it can function without it, some users might opt to disable it. Nevertheless, the role of distributed hash in the entire usage process is significant and cannot be overlooked.
1. Distributed Hash
As mentioned earlier, the Tracker plays a crucial role in BitTorrent data transmission. Downloaders send a GET request to the Tracker to announce themselves and obtain the addresses and ports of other participating downloaders. The Tracker requires a central server, which typically needs to support HTTP services, posing a significant instability. Some resources may lack a Tracker address, or the Tracker might be unreachable for various reasons, precluding the retrieval of other downloadersâ information. However, with the use of distributed hash in BitTorrent, this issue is resolved as each node can become a Tracker, exchanging data with others and jointly maintaining a vast network, facilitating the connection and transmission of these resources.
The aim of this article isnât to analyze or expound on the principles and implementation of distributed hash in detail, but rather to illuminate important parts. Some specifics and stipulations may be overlooked for ease of understanding. For those interested in the principles, refer to âKademlia: A Peer-to-peer Information System Based on the XOR Metricâ. Discussion of its history and development is beyond our scope.
Routing Table
Each node has a unique 160-bit (20-byte) node ID. Each node should have a routing table, including information on other nodes. Specifically, it should contain more routing information of nodes closer to itself, and a small amount of routing information of nodes further away.
Before proceeding, understand the concept of âdistance.â In implementation, this distance is unrelated to physical distance or network latency, but is relevant to its node ID. Distance is calculated using XOR operation, interpreted as an unsigned integer, with smaller values indicating closer proximity.
When maintaining the routing table, only retain information of nodes responding correctly within the last 15 minutes. Question nodes not responding for over 15 minutes by sending a response request to assess them (details will be elaborated later). If multiple consecutive requests fail to receive a response, the node connection is deemed poor, and these nodes should not remain in the routing table, with their weights reduced or abandoned.
The routing table should cover the entire node ID space, from 0 to 2160, segmented by buckets. Each bucket manages part of the space, with nodes inserted into the corresponding range bucket space. At initialization, only one bucket covers the entire space. Each bucket has a maximum capacity, currently 8 for the BEP0005 implementation. When a bucket contains 8 good nodes, itâs considered full.
When a bucket is full, and new nodes appear, assessment is required, with options to split or discard the bucket. After splitting, each new bucket manages half of the original range. Consider splitting or discarding from these aspects: if the nodeâs ID lies in the eligible bucket range, split the current bucket, while considering recent changes to the bucket and evaluating system performance and resource consumption comprehensively.
Consider a bucket changed under the following circumstances:
- Ping test and response on nodes within the bucket.
- Addition of a new node to the bucket.
- Replacement of a node in the bucket with another node.
If no changes occur to a bucket within 15 minutes, a ârefreshâ is warranted by selecting an ID randomly and conducting a âfind_nodesâ search. Nodes incapable of accepting queries from other nodes need to refresh all buckets periodically to maintain good connectivity.
In maintaining the routing table, nodes should attempt to find nodes closest to themselves by executing âfind_nodesâ towards increasingly closer nodes.
Metadata File Expansion
When a new node attempts to download BitTorrent data without a Tracker, it needs a metadata file. A tracker-less BitTorrent metadata file may exclude the Announce key, instead featuring a nodes key-value, containing a list of lists of domain names or IPs, alongside port numbers.
nodes = [["", ], ["", ], ...]nodes = [["127.0.0.1", 6881], ["your.router.node", 4804], ["2001:db8:100:0:d5c8:db3f:995e:c0f7", 1941]]
BitTorrent Extension Protocol
When nodes support DHT extensions, during the handshake, they will set the last bit of the last byte of reserved space to indicate support for this extension.
When a downloader receives this handshake information, the communication content increases:
Marker |
Description |
---|---|
0x09 |
PORT |
PORT information lets the counterpart know its DHT listening UDP port.
For an example based on previous articlesâ torrent information, Sockit listens on UDP port 9000 (0x2328) and sends handshake data supporting extensions alongside PORT information. Its UDP listening port receives DHT data from Transmission, a KRPC protocol ping query explained in later sections.
The sent PORT information is:
00 00 00 03 09 23 28
The message length is 3 bytes, with a type marker of 0x09; the valid content identifies UDP port 0x2328 (9000).
/>Sent handshake information />UDP received data
KRPC Protocol
The UDP received data described above is a KRPC ping query. KRPC is a simple remote procedure call (RPC) composed of encoded dictionaries transmitted via UDP, with encoding akin to BitTorrent metadata files, using Bencode encoding.
There are three message types: query, response, and error. For the DHT explored in this article, four queries exist: ping, find_node, get_peers, and announce_peer. Specifics will be analyzed in detail later.
The KRPC message type is a dictionary typically containing universal keys (t, y, v) and others. Taking the previous example:
{ "a": { "id": b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' }, "q": "ping", "t": b'pn\x00\x00', "y": "q" }
Each message includes a t, generated by the querying party, echoed in the response to indicate the corresponding query. y indicates the type, usually âqâ (query), ârâ (response), or âeâ (error). Each message with a client version string should include the âvâ key, but itâs not present in this request, and many implementations omit this key, so its presence shouldnât be assumed.
a and q are additional keys for the q query. q is a string containing the method name of the query. a is a dictionary including query parameters.
A y value of r or inclusion of the r key in the KRPC dictionary signifies a successful post-query response message, with the type as a dictionary.
A y value of e or inclusion of the e key signifies a failed post-query response message, with the type as a list.
For error responses, the first list element is the error code, and the second element is the error message string. Common examples:
Error Code |
Description |
---|---|
201 |
General Error |
202 |
Server Error |
203 |
Protocol Error |
204 |
Unknown Method |
For instance, d1:eli201e23:A Generic Error Occurred1:t2:aa1:y1:ee
indicates an error response, corresponding to
{ "t":"aa", "y":"e", "e":[201, "A Generic Error Occurred"]}
Now letâs examine the requests and respective responses:
All queries and responses contain an id key-value pair with the querying or responding nodeâs ID.
ping Request:
A ping request is the most basic, containing only the id parameter. Hereâs a request and response example:
// Requestd1:ad2:id20:abcdefghij0123456789e1:q4:ping1:t2:aa1:y1:qe// Returnd1:rd2:id20:mnopqrstuvwxyz123456e1:t2:aa1:y1:re
Corresponding to:
// Request{ "t":"aa", "y":"q", "q":"ping", "a":{"id":"abcdefghij0123456789"} }// Return{ "t":"aa", "y":"r", "r": {"id":"mnopqrstuvwxyz123456"} }
find_node Request:
The find_node request contains an additional target parameter, representing the target. Upon receiving a find_node request, a node should respond with a selection of nodes of maximum bucket capacity from its routing table, using compact IP address and port encoding (refer to Tracker section for compact encoding).
Hereâs a request-response pair example:
// Requestd1:ad2:id20:abcdefghij01234567896:target20:mnopqrstuvwxyz123456e1:q9:find_node1:t2:aa1:y1:qe// Returnd1:rd2:id20:0123456789abcdefghij5:nodes9:def456...e1:t2:aa1:y1:re
Corresponding to:
// Request{"t":"aa", "y":"q", "q":"find_node", "a": {"id":"abcdefghij0123456789", "target":"mnopqrstuvwxyz123456"}}// Return{"t":"aa", "y":"r", "r": {"id":"0123456789abcdefghij", "nodes": "def456..."}}
get_peers Request:
In the DHT protocol, a get_peers
request is for obtaining peer nodes associated with a specific torrentâs info hash (refer to metadata structure content). It additionally includes an info_hash parameter, indicating the infohash of the torrentâs peers to be found.
Upon receiving a get_peers
request, a node checks for peers tied to the specified info hash. If found, it returns a response with the values list. If not, it returns a response containing nodes.
Inclusion:
"token"
: a token for subsequentannounce_peer
requests."values"
: a list of peer node information strings, each string comprising compact peer address and port information."nodes"
: response compact encoded nodes of maximum bucket capacity, consistent with the previous description.
Example:
// Requestd1:ad2:id20:abcdefghij01234567899:info_hash20:mnopqrstuvwxyz123456e1:q9:get_peers1:t2:aa1:y1:qe// Response with resultsd1:rd2:id20:abcdefghij01234567895:token8:aoeusnth6:valuesl6:axje.u6:idhtnmee1:t2:aa1:y1:re// Response with closer nodesd1:rd2:id20:abcdefghij01234567895:nodes9:def456...5:token8:aoeusnthe1:t2:aa1:y1:re
Corresponding to:
// Request{ "t": "aa", "y": "q", "q": "get_peers", "a": { "id": "abcdefghij0123456789", "info_hash": "mnopqrstuvwxyz123456" }}// Response with results{ "t": "aa", "y": "r", "r": { "id": "abcdefghij0123456789", "token": "aoeusnth", "values": ["axje.u", "idhtnm"] }}// Response with closer nodes{ "t": "aa", "y": "r", "r": { "id": "abcdefghij0123456789", "token": "aoeusnth", "nodes": "def456..." }}
announce_peer Request:
The announce_peer
request announces that a node is downloading the content indicated by a specified info hash. This request includes:
"info_hash"
: the infohash of the torrent to announce downloading."port"
: the UDP port where the downloader is listening."token"
: the token obtained from a previousget_peers
response."implied_port"
: an optional parameter; if 0, use the âportâ parameter, otherwise use the incoming connection port, typically allowing NAT devices to receive responses correctly.
An announce_peer
request example:
d1:ad2:id20:abcdefghij01234567899:info_hash20:mnopqrstuvwxyz1234564:porti6881e5:token8:aoeusnthe1:q13:announce_peer1:t2:aa1:y1:qe
An equivalent JSON request example:
{ "t": "aa", "y": "q", "q": "announce_peer", "a": { "id": "abcdefghij0123456789", "info_hash": "mnopqrstuvwxyz123456", "port": 6881, "token": "aoeusnth", "implied_port": 0 }}
The corresponding response is a basic response comprising only the ID.
Read-Only DHT
Read-Only Distributed Hash (ReadOnly DHT) is introduced in BEP43. It includes a ro=1 key in the dictionary of each outgoing request, indicating its Read-Only status.
The Read-Only status is applicable to:
- Devices behind NAT attempting to penetrate, or for various reasons unable to be externally accessed;
- Devices where extra costs (like network traffic, battery loss) are a concern, especially mobile devices;
Distributed Hash Completed
At this point, weâve fully explored the distributed hash content (BEP5). You might notice that even after reading this portion, youâre still uncertain about obtaining a BitTorrent metadata file from an info hash, or acquiring a .torrent seed file. Correct, the DHT section doesnât offer metadata exchange. DHT aids in finding other peers downloading the content, but to acquire metadata through the info hash, implementation of BEP0010 extension protocol and BEP0009 metadata exchange extension is required, which will be elucidated in future articles. If updated, links will be placed here:
- Introduction to BitTorrent Protocol (Part 5) Extension Protocol and Metadata Transmission Extension
- Introduction to BitTorrent Protocol (Part 6) Peer Exchange, Local Service Discovery, Multi-Tracker, and Private Torrent
Once again, I urge readers of this article to adhere to community regulations and be good participants. Conducting DHT queries without providing any resources (especially when Read-Only status isnât marked), or repeatedly sending requests to nodes is considered impolite behavior.
Lastly, this article is participating in a writing activity advertisement: