Understanding WebSocket Data Frames: A Deep Dive into Real-Time Communication

This article is authored by: IMWeb and Enchen. Original source: IMWeb Community. Reproduction is prohibited without permission. The article covers topics related to data frames.

WebSocket is a technology that was developed to address real-time communication between the client and the server. The WebSocket protocol is essentially a TCP-based protocol. It begins with initiating a special HTTP request through the HTTP/HTTPS protocol to create a TCP connection for data exchange after the handshake. Following this, the server and client engage in real-time communication over this TCP connection.

Sending Data

All data sent within WebSocket is transmitted in the form of frames. Data frames sent by the client must undergo masking, whereas data frames from the server must not be masked. Otherwise, a close frame needs to be sent by the other party.

data frames >

FIN: Indicates whether this is the final fragment in a message, occupying 1 bit

RSV1, RSV2, RSV3: Reserved for extension protocols, generally set to 0, each occupying 1 bit

Opcode: Specifies the type of frame, occupying 4 bits

0x0: Indicates a continuation frame

0x1: Indicates a text frame

0x2: Indicates a binary frame

0x3-7: Reserved for further use

0x8: Indicates a connection close frame

0x9: Indicates a ping frame

0xA: Indicates a pong frame

0xB-F: Reserved for further use

MASK: Occupies 1 bit, indicating whether the PayloadData is masked.

Payload length: Indicates the length of the Payload data.

1. If the value is between 0-125, it represents the actual length of the payload.

2. If the value is 126, the following 2 bytes represent the payload’s actual length as a 16-bit unsigned integer.

3. If the value is 127, the following 8 bytes represent the payload’s actual length as a 64-bit unsigned integer.

Example Analysis

Below is the data information returned from the WebSocket server

data frames >

We can use Wireshark to capture the TCP packets and observe the data

The captured binary stream is analyzed in hexadecimal form, and below is the specific unpacking process:

81 (hex) = 10000001 (binary) => FIN(0) + RSV1(0) + RSV2(0) + RSV3(0) + Opcode(0x1) text frame.

30 (hex) = 0110000 (binary) => MASK(0) + Payload length(0x30) data length 48 bytes.

3c to 6f is the specific text data content. (48 bytes)