Understanding Packet Inspection: A Guide to Capturing and Decrypting Encrypted HTTPS Traffic

Recently, a reader asked about a peculiar issue. He said he wanted to capture a baidu.com packet and enjoy the experience of packet inspection.

But he found it was unable to capture, which seemed quite strange.

Let me recreate his steps for you.

First, using the ping command, he identified the IP address being requested when accessing Baidu.

From the above result, it can be seen that accessing baidu.com involves visiting 39.156.66.10.

He then used the following tcpdump command to capture packets, specifying that packets to and from the eth0 interface with the IP 39.156.66.10 be saved to the baidu.pcap file.

At this point, he opened the baidu.com webpage in a browser or simulated it with the curl command in another terminal window.

Logically speaking, the data packets from accessing baidu.com should have been captured.

Then, the packet capture was stopped.

Next, he opened the baidu.pcap file with wireshark and entered http.host == "baidu.com" into the filter bar.

He found nothing.

Searching for Baidu packets in Wireshark yielded no results

Why is that?

At this point, experienced individuals might already know where the issue lies.

Why the Packet Was Not Captured

This is because he was accessing Baidu.com via the HTTPS protocol. The Host and actual request body within the HTTP protocol are encrypted.

Since it’s encrypted, filtering using http.host is not possible.

However.

While encrypted, filtering can still be done.

During the Client Hello stage of the HTTPS handshake, there is an extension server_name that logs which website you’re trying to access. Using the filter condition below can bring it out.

You can search for Baidu packets using the server_name extension in TLS

Select one of the packets, right-click, and choose Follow TCP Stream.

Locate the TCP stream via right-click

All other related packets of this TCP connection can be displayed.

HTTPS packet capture

The screenshots show that the process includes a complete TCP handshake and a TLS encrypted handshake process, followed by two segments of encrypted information and a TCP teardown process.

It can be seen from packet 18 and packet 20 that one is a request packet from port 56028 to 443, and the other is a response packet from 443 back to 56028.

Generally, numbers like 56028, which are rather large and random, are client-generated random port numbers.

Port 443 is the server port number for HTTPS.

HTTP uses port 80. If you capture packets on port 80, no data will be caught either.

It’s roughly judged that packets 18 and 20 are the request and response packets from the client to baidu.com, respectively.

Upon inspection, it will be found that the URL and body are encrypted, resulting in nothing being discovered.

So the question arises. Is there a way to decrypt the data inside?

Yes, there is. Let’s see how it’s done.

Decrypting the Packet

First, execute tcpdump to capture packets again.

Then execute the following command in another terminal window, with the goal of exporting the encrypted key, specifying the export path as /Users/xiaobaidebug/ssl.key.

Then continue executing the curl command or open the Chrome browser from the command line within the same terminal window. The aim is for curl or Chrome to inherit this environment variable.

At this point, an ssl.key file will appear under /Users/xiaobaidebug/.

Next, follow the steps below to modify the wireshark settings.

Open the Wireshark settings

After clicking “Protocols,” scroll down to find the TLS option.

Locate “Protocols” in the settings

Enter the path to the exported ssl.key file here.

Find the TLS section under Protocols

After clicking OK, you’ll notice that packets 18 and 20 have been decrypted.

Content of the decrypted packet

You can then use http.host == "baidu.com" to filter out the data.

The decrypted packet allows filtering of Baidu’s data packets

At this point, the issue of not being able to view the data packet is resolved.

However, a new question arises.

What exactly is the ssl.key file?

This involves understanding the encryption principle of HTTPS.

The HTTPS Handshake Process

The HTTPS handshake process is quite complex, so let’s review it.

First, a TCP connection is established since HTTPS is a TCP-based application layer protocol.

After successfully establishing the TCP protocol, the HTTPS stage can begin.

HTTPS can use encryption with TLS or SSL, such as TLS1.2, as an example below.

In general, the entire encryption process is divided into two stages.

The first stage is the four-part TLS handshake, mainly involving utilizing the properties of asymmetric encryption to exchange various pieces of information, eventually obtaining a “session key.”

The second stage is symmetric encryption communication based on the session key from the first stage.

The four-part TLS handshake

Let’s start by looking at how the TLS four-part handshake consists.

The First Handshake:

  • • Client Hello: The client informs the server about the encryption protocol versions it supports, such as TLS1.2, the encryption suite it uses, such as the common RSA, and provides a client random number.

The Second Handshake:

  • • Server Hello: The server informs the client, with a server random number + server certificate + confirmed encryption protocol version (e.g., TLS1.2).

The Third Handshake:

  • • Client Key Exchange: The client generates a random number called pre_master_key. Using the server’s public key obtained from the server certificate in the second handshake, it encrypts the pre_master_key and sends it to the server.
  • • Change Cipher Spec: The client already has three random numbers: the client random number, the server random number, and the pre_master_key. These three random numbers are used to calculate a session key. The client informs the server that subsequent communications will be encrypted using this session key.
  • • Encrypted Handshake Message: The client produces a hash of all communication data thus far and encrypts it using the session key, sending it to the server for verification. The client’s handshake process is concluded here, hence it’s also called a Finished message.

The Fourth Handshake:

  • • Change Cipher Spec: Upon receiving the pre_master_key from the client (even though it was encrypted with the server’s public key, the server can decrypt it with its private key), the server combines the three random numbers in the same way to generate a session key.
    The server tells the client communications will now be encrypted using this session key.
  • • Encrypted Handshake Message: Similar to the client, the server creates a hash of communication data thus far and encrypts it with the session key, sending it to the client for verification, completing the handshake, thus termed a Finished message.

Both the client and the server have three random numbers in the four-part handshake, which are very important and have been emphasized.

The client’s random number generated during the first handshake is called the client random.

In the second handshake, the server also generates a server random number, server random.

In the third handshake, the client generates another random number, the pre_master_key.

These three random numbers constitute the ultimate symmetric encryption key, the previously mentioned “session key.”

Three random numbers generate a symmetric key

Simply put, if you know these three random numbers, you can decrypt HTTPS communications.

However, of these three random numbers, client random and server random are in plaintext and accessible by anyone. However, the pre_master_key is not. It is encrypted with the server’s public key and only known to the client and anyone with access to the corresponding server’s private key.

So the question becomes, how does one obtain this pre_master_key?

How to Get the pre_master_key

Since the server’s private key is not easily accessible, the question remains whether there’s a way to obtain the pre_master_key from the client’s side.

There is a way.

When the client uses HTTPS to conduct data transmission with the server, it needs to establish an HTTP connection based on TCP first and then trigger the TLS handshake by invoking the client’s side TLS library (OpenSSL, NSS).

With the environment variable SSLKEYLOGFILE set, you can influence the TLS library’s behavior, causing it to output a file containing the pre_master_key. This file is what we mentioned above as the /Users/xiaobaidebug/ssl.key.

Inject the environmental variable into curl and Chrome

Though TLS libraries support key file exports, the prerequisite is that the application must support triggering the TLS library’s key file export via the SSLKEYLOGFILE environment. In practice, not all applications support this feature. However, popular tools like curl and the Chrome browser are supported.

Content of the SSLKEYLOGFILE

Let’s circle back to what’s inside the ssl.key file.

In this file, there are three columns.

The first column is CLIENT_RANDOM, signifying that the following second column is the client random, and the third column is pre_master_key.

But again, a question arises.

With so many lines, how does Wireshark know which line’s pre_master_key to use?

wireshark can extract the client random from the data packet.

For instance, as shown below.

Client random in Client Hello

Observe that the client random number above ends with "bff63bbe5".

Similarly, you can find the server random in the data packet.

Locating the server random

Put this client random into the second column of the ssl.key file and match it line by line.

You can find the corresponding record.

Data in the ssl.key

Note how the string in the second column also ends with "bff63bbe5". This is what we previously identified as the client random.

Extract the data of the third column from this line, and that’s your pre_master_key.

This would enable wireshark to obtain the three random numbers, facilitating the calculation of the session key to decrypt the data.

On the flip side, identifying the client random to locate the correct line in the ssl.key file is imperative. The client random number, appearing during the first handshake (Client Hello), is only present then. Thus, to decrypt HTTPS packets, you must capture all four TLS handshakes to accomplish decryption. If the connection is already established and data has been exchanged, attempting to capture packets at that point won’t allow for decryption.

Conclusion

  • • The article begins with capturing Baidu’s data packets, demonstrating simple steps using Wireshark for packet capture.
  • • HTTPS encrypts both URL and Request bodies in HTTP, meaning filtering with http.host == "baidu.com" results in no findings directly.
  • • During the HTTPS handshake, various pieces of information, including three random numbers, are exchanged using asymmetric encryption, which is then used to generate a symmetric encryption session key for subsequent data encryption.
    Obtaining these three random numbers allows for decrypting HTTPS encrypted packets.
  • • The three random numbers include the client random (client random), server random (server random), and pre_master_key.
    The first two are in plaintext, while the third, encrypted with the public key of the server, requires extracting from the client side using SSLKEYLOGFILE.
  • • By setting the SSLKEYLOGFILE environment variable and having curl or Chrome request an HTTPS domain, they export the sslkey file while invoking the TLS library. This file contains three columns with invaluable information, whereupon the second column’s client random info helps pinpoint the necessary record, with the third column holding the pre_master_key needed for decryption.