Recently, a reader asked about a peculiar issue. He said he wanted to capture a baidu.com
packet and enjoy the experience of packet inspection.
But he found it was unable to capture, which seemed quite strange.
Let me recreate his steps for you.
First, using the ping
command, he identified the IP address being requested when accessing Baidu.
From the above result, it can be seen that accessing baidu.com
involves visiting 39.156.66.10
.
He then used the following tcpdump
command to capture packets, specifying that packets to and from the eth0
interface with the IP 39.156.66.10
be saved to the baidu.pcap
file.
At this point, he opened the baidu.com
webpage in a browser or simulated it with the curl
command in another terminal window.
Logically speaking, the data packets from accessing baidu.com should have been captured.
Then, the packet capture was stopped.
Next, he opened the baidu.pcap
file with wireshark
and entered http.host == "baidu.com"
into the filter bar.
He found nothing.
Searching for Baidu packets in Wireshark yielded no results
Why is that?
At this point, experienced individuals might already know where the issue lies.
Why the Packet Was Not Captured
This is because he was accessing Baidu.com via the HTTPS protocol. The Host and actual request body within the HTTP protocol are encrypted.
Since itâs encrypted, filtering using http.host
is not possible.
However.
While encrypted, filtering can still be done.
During the Client Hello stage of the HTTPS handshake, there is an extension server_name
that logs which website youâre trying to access. Using the filter condition below can bring it out.
You can search for Baidu packets using the server_name extension in TLS
Select one of the packets, right-click, and choose Follow TCP Stream
.
Locate the TCP stream via right-click
All other related packets of this TCP connection can be displayed.
HTTPS packet capture
The screenshots show that the process includes a complete TCP handshake and a TLS encrypted handshake process, followed by two segments of encrypted information and a TCP teardown process.
It can be seen from packet 18 and packet 20 that one is a request packet from port 56028 to 443, and the other is a response packet from 443 back to 56028.
Generally, numbers like 56028
, which are rather large and random, are client-generated random port numbers.
Port 443
is the server port number for HTTPS.
HTTP uses port 80. If you capture packets on port 80, no data will be caught either.
Itâs roughly judged that packets 18 and 20 are the request and response packets from the client to baidu.com
, respectively.
Upon inspection, it will be found that the URL and body are encrypted, resulting in nothing being discovered.
So the question arises. Is there a way to decrypt the data inside?
Yes, there is. Letâs see how itâs done.
Decrypting the Packet
First, execute tcpdump to capture packets again.
Then execute the following command in another terminal window, with the goal of exporting the encrypted key, specifying the export path as /Users/xiaobaidebug/ssl.key
.
Then continue executing the curl command or open the Chrome browser from the command line within the same terminal window. The aim is for curl or Chrome to inherit this environment variable.
At this point, an ssl.key
file will appear under /Users/xiaobaidebug/
.
Next, follow the steps below to modify the wireshark
settings.
Open the Wireshark settings
After clicking âProtocols,â scroll down to find the TLS
option.
Locate âProtocolsâ in the settings
Enter the path to the exported ssl.key
file here.
Find the TLS section under Protocols
After clicking OK, youâll notice that packets 18 and 20 have been decrypted.
Content of the decrypted packet
You can then use http.host == "baidu.com"
to filter out the data.
The decrypted packet allows filtering of Baiduâs data packets
At this point, the issue of not being able to view the data packet is resolved.
However, a new question arises.
What exactly is the ssl.key file?
This involves understanding the encryption principle of HTTPS.
The HTTPS Handshake Process
The HTTPS handshake process is quite complex, so letâs review it.
First, a TCP connection is established since HTTPS is a TCP-based application layer protocol.
After successfully establishing the TCP protocol, the HTTPS stage can begin.
HTTPS can use encryption with TLS or SSL, such as TLS1.2
, as an example below.
In general, the entire encryption process is divided into two stages.
The first stage is the four-part TLS handshake, mainly involving utilizing the properties of asymmetric encryption to exchange various pieces of information, eventually obtaining a âsession key.â
The second stage is symmetric encryption communication based on the session key from the first stage.
The four-part TLS handshake
Letâs start by looking at how the TLS four-part handshake consists.
The First Handshake:
- â˘
Client Hello
: The client informs the server about the encryption protocol versions it supports, such asTLS1.2
, the encryption suite it uses, such as the commonRSA
, and provides a client random number.
The Second Handshake:
- â˘
Server Hello
: The server informs the client, with a server random number + server certificate + confirmed encryption protocol version (e.g., TLS1.2).
The Third Handshake:
- â˘
Client Key Exchange
: The client generates a random number calledpre_master_key
. Using the serverâs public key obtained from the server certificate in the second handshake, it encrypts thepre_master_key
and sends it to the server. - â˘
Change Cipher Spec
: The client already has three random numbers: the client random number, the server random number, and the pre_master_key. These three random numbers are used to calculate a session key. The client informs the server that subsequent communications will be encrypted using this session key. - â˘
Encrypted Handshake Message
: The client produces a hash of all communication data thus far and encrypts it using the session key, sending it to the server for verification. The clientâs handshake process is concluded here, hence itâs also called a Finished message.
The Fourth Handshake:
- â˘
Change Cipher Spec
: Upon receiving thepre_master_key
from the client (even though it was encrypted with the serverâs public key, the server can decrypt it with its private key), the server combines the three random numbers in the same way to generate a session key.
The server tells the client communications will now be encrypted using this session key. - â˘
Encrypted Handshake Message
: Similar to the client, the server creates a hash of communication data thus far and encrypts it with the session key, sending it to the client for verification, completing the handshake, thus termed a Finished message.
Both the client and the server have three random numbers in the four-part handshake, which are very important and have been emphasized.
The clientâs random number generated during the first handshake is called the client random
.
In the second handshake, the server also generates a server random number, server random
.
In the third handshake, the client generates another random number, the pre_master_key
.
These three random numbers constitute the ultimate symmetric encryption key, the previously mentioned âsession key.â
Three random numbers generate a symmetric key
Simply put, if you know these three random numbers, you can decrypt HTTPS communications.
However, of these three random numbers, client random
and server random
are in plaintext and accessible by anyone. However, the pre_master_key
is not. It is encrypted with the serverâs public key and only known to the client and anyone with access to the corresponding serverâs private key.
So the question becomes, how does one obtain this pre_master_key
?
How to Get the pre_master_key
Since the serverâs private key is not easily accessible, the question remains whether thereâs a way to obtain the pre_master_key
from the clientâs side.
There is a way.
When the client uses HTTPS to conduct data transmission with the server, it needs to establish an HTTP connection based on TCP first and then trigger the TLS handshake by invoking the clientâs side TLS library (OpenSSL, NSS).
With the environment variable SSLKEYLOGFILE set, you can influence the TLS libraryâs behavior, causing it to output a file containing the pre_master_key
. This file is what we mentioned above as the /Users/xiaobaidebug/ssl.key
.
Inject the environmental variable into curl and Chrome
Though TLS libraries support key file exports, the prerequisite is that the application must support triggering the TLS libraryâs key file export via the SSLKEYLOGFILE
environment. In practice, not all applications support this feature. However, popular tools like curl and the Chrome browser are supported.
Content of the SSLKEYLOGFILE
Letâs circle back to whatâs inside the ssl.key
file.
In this file, there are three columns.
The first column is CLIENT_RANDOM, signifying that the following second column is the client random, and the third column is pre_master_key
.
But again, a question arises.
With so many lines, how does Wireshark know which lineâs pre_master_key to use?
wireshark
can extract the client random
from the data packet.
For instance, as shown below.
Client random in Client Hello
Observe that the client random number above ends with "bff63bbe5"
.
Similarly, you can find the server random in the data packet.
Locating the server random
Put this client random
into the second column of the ssl.key file and match it line by line.
You can find the corresponding record.
Data in the ssl.key
Note how the string in the second column also ends with "bff63bbe5"
. This is what we previously identified as the client random
.
Extract the data of the third column from this line, and thatâs your pre_master_key
.
This would enable wireshark
to obtain the three random numbers, facilitating the calculation of the session key to decrypt the data.
On the flip side, identifying the client random to locate the correct line in the ssl.key
file is imperative. The client random number, appearing during the first handshake (Client Hello), is only present then. Thus, to decrypt HTTPS packets, you must capture all four TLS handshakes to accomplish decryption. If the connection is already established and data has been exchanged, attempting to capture packets at that point wonât allow for decryption.
Conclusion
- ⢠The article begins with capturing Baiduâs data packets, demonstrating simple steps using Wireshark for packet capture.
- ⢠HTTPS encrypts both URL and Request bodies in HTTP, meaning filtering with
http.host == "baidu.com"
results in no findings directly. - ⢠During the HTTPS handshake, various pieces of information, including three random numbers, are exchanged using asymmetric encryption, which is then used to generate a symmetric encryption session key for subsequent data encryption.
Obtaining these three random numbers allows for decrypting HTTPS encrypted packets. - ⢠The three random numbers include the client random (client random), server random (server random), and pre_master_key.
The first two are in plaintext, while the third, encrypted with the public key of the server, requires extracting from the client side using SSLKEYLOGFILE. - ⢠By setting the SSLKEYLOGFILE environment variable and having curl or Chrome request an HTTPS domain, they export the sslkey file while invoking the TLS library. This file contains three columns with invaluable information, whereupon the second columnâs client random info helps pinpoint the necessary record, with the third column holding the pre_master_key needed for decryption.