Analyzing the WebSocket Protocol with Wireshark: TCP Handshake and Data Transmission Insights

The previous article explained that the WebSocket protocol is an upper-layer protocol built on top of TCP transmission, and its handshake method utilizes the HTTP process. Let’s use a powerful network packet capture tool, Wireshark, to observe its operation.

Open Wireshark, write the WebSocket protocol in the filter for filtering, and select a packet for TCP stream tracking.

The purpose of this is to obtain data packets related to opening a website with a browser, as shown in the figure below:

WebSocket protocol

Let’s specifically analyze what these packets are doing:

Connection Process

① First, let’s look at an old diagram – the TCP message format and handshake process O(∩_∩)O:

WebSocket protocol

As can be seen, the first step (the first three packets) is the three-way handshake process of the TCP protocol. Detailed message information can be viewed by clicking on the packet details, which will not be specifically explained here:

Corresponding explanation of the message format:

② The fourth packet is the HTTP protocol. Check the packet details, indicating a request to upgrade to the WebSocket protocol:

PS: The correspondence between the detailed information in Wireshark packets and the OSI seven-layer model is as follows:

⑱ The fifth packet continues to be the TCP protocol. From the IP, it can be seen that it is sent from the server to the client. ACK indicates that the packet has been received (ACK packets are continuously sent for confirmation during the process), and then the sixth HTTP protocol packet converts HTTP to the WS protocol:

Open the packet details of the sixth packet, as follows:

④ The subsequent packets have been converted to the WS protocol, using the WS protocol for data transmission. Check the detailed information:

PS: Packets with MASKED need to be decoded using the Masking-Key before the line-based text data can be displayed in plain text. Otherwise, it will appear as garbled text as shown above. For example, the ninth packet Mask: False, allows you to directly see the transmitted data content:

PS: Observing the WS data transmission of the stock index game, there will be a similar situation where the client only sends 1, and then the server replies with 2, which is the heartbeat information sent to ensure the other party is still alive.

Closing Process

① The client initiates a TCP Close (here it is closing the browser, or it can be the server initiating a close to the client)

Once a Close control frame is sent or received, it means that the _WebSocket closing handshake has been initiated_, and the WebSocket connection is in the CLOSING state. A Close control frame can contain a status code indicating the reason for closing. For example, 1000 indicates a normal closure. Refer to the article for specific error codes:

Note that the ACK packets in these packets indicate that the server has received the client’s request. Why are there three TCP reply packets with the same length? After checking the meaning of the “TCP segment of a reassembled PDU” print, it is roughly understood that when the host responds to a query or command and needs to respond with a lot of data (information) that exceeds the maximum MSS of TCP, the host will send multiple packets to transmit this data (note: these packets are not fragmented). For Wireshark, these packets corresponding to the same query command are marked as “TCP segment of a reassembled PDU”, which does not affect our learning and understanding of the closing process.

The last four steps are the four-way handshake protocol of TCP. You can look at an old diagram again O(∩_∩)O: