Understanding the RTMP Protocol: A Deep Dive into Live Streaming Handshake Process

In previous articles, we introduced setting up a live streaming environment based on RTMP. Next, let’s delve into the details of the RTMP protocol. Due to the protocol’s complexity, I will break it down into smaller modules through a series of articles distributed via this channel. You’re welcome to follow along. Let’s start with the RTMP protocol handshake.

RTMP stands for Real Time Messaging Protocol. This is an application layer protocol based on TCP. RTMP is designed for real-time communication, primarily used for video and data transmission between the Flash platform and streaming/interactive servers that support the RTMP protocol. It’s widely used in live streaming scenarios.

The process of establishing a connection with the RTMP protocol also involves a handshake. This article records the RTMP handshake process.

RTMP is based on TCP, which uses a three-way handshake to establish a connection. As shown in the diagram:

 live streaming />

Therefore, RTMP first has a three-way handshake at the TCP layer. After establishing the TCP connection, the protocol-level handshake of RTMP proceeds.

As a side note, when understanding and learning a protocol, packet capturing is often used to analyze and understand it deeply. Common tools include Wireshark. On Linux platforms, you can use tcpdump or tshark (the command-line version of Wireshark). We will not elaborate on Wireshark specifics here. Similarly, when learning RTMP, we use packet capturing involving RTMP. You can create RTMP scenarios by referring to previous articles on “live streaming”.

Simple Handshake Process

After an RTMP connection is established, the server and client complete the handshake through three exchanges of message packages. Unlike other handshake protocols, the RTMP protocol exchanges fixed-size data packets. The client sends three packets: C0, C1, C2, and the server sends three packets: S0, S1, S2. C0 and S0 are 1 byte in size, C1 and S1 are 1536 bytes, and C2 and S2 are 1536 bytes.

Transmission Sequence

  • After connecting, the client starts sending C0 and C1 to the server;
  • The server sends S0 and S1 upon receiving C0 or C1;
  • Once the client receives S0 and S1, it sends C2;
  • Once the server receives C0 and C1, it sends S2;
  • The handshake completes when the client receives S2, and the server receives C2.

In practical applications, the client usually sends C0 and C1 together. Upon receiving C1, the server sends S0, S1, and S2 to the client. The client then sends C2 after receiving S1, completing the handshake.

Handshake Packet Format

  • C0 and S0

 live streaming />

C0 and S0 packets are 1 byte and represent the RTMP version. The RTMP version is currently defined as 3. Versions 0-2 were used by early proprietary products and are obsolete. Values 4-31 are reserved, and 32-255 are prohibited.

  • C1 and S1

C1 and S1 packets occupy 1536 bytes, containing a 4-byte timestamp, 4 bytes of zeros, and 1528 bytes of random numbers.

  • C2 and S2

C2 and S2 packets are 1536 bytes, including a 4-byte timestamp and a counterpart’s timestamp (C2 using S1’s timestamp, S2 using C1’s timestamp).

Several Handshake States

  • Uninitialized: The protocol version is sent, and both client and server are uninitialized. The client sends the protocol version in the C1 packet. If the server supports it, S0 and S1 will be sent in response. If not, the connection will be terminated as the server’s response in RTMP.
  • Version Sent: After uninitialized, both client and server enter the version sent state. The client waits for S1, and the server waits for C1. Upon receiving the respective packet, the client sends C2, and the server sends S2, changing the state to query sent.
  • Query Sent: Client and server wait for S2 and C2.
  • Handshake Complete: The client and server start exchanging information.

Next, we visualize a real RTMP handshake process using Wireshark:

Note that the filter for RTMP in Wireshark is rtmpt, don’t forget the trailing ‘t’.

Packet Capture Practicum

  • The source is the client (192.17.1.92). During the handshake, it first sends C0 and C1 packets to the server (192.17.1.200);
  • After the server receives the C0 and C1 packets, it immediately sends S0+S1+S2 packets;
  • Upon receiving the S2 packet from the server, the client sends the C2 packet, completing the RTMP handshake, and data exchange can begin.
  • Let’s look at the contents of C0 and C1 packets

  • S0+S1+S2 data

  • C2 data

Oh, by the way, I almost forgot to mention, the default port for RTMP servers is 1935, typically configured in nginx servers, which can be clearly seen from the packet capture.

Having reached this point, the process of RTMP protocol handshake is quite clear. The handshake primarily accomplishes two tasks: validating the RTMP version and sending some random data for network condition detection. A successful handshake means that normal network communication between the client and server can occur, and data exchange can proceed.

This concludes the current article. In the upcoming articles, we will explore how RTMP organizes data transmission over the network. We welcome you to join us in discovering this.