Mastering H264 Compression: Techniques, Tools, and Troubleshooting Tips

Before exploring some tips, let’s first understand what H264 compression is.

Let’s start with a problem,

For a 720P resolution, an image with a depth of 8 requires the following data volume: 1280*720*8 (bits). If the video frame rate is 15, the data volume for one second would be:

1280*720*8*15/8/1024/1024 = 13.18MB

This is undoubtedly unacceptable for users. Therefore, we need to compress the video to provide clear visuals at lower bit rates.

Common frame types include I-frames, P-frames, and B-frames:

Intra-frame compression can be decoded into a complete image using video decompression algorithms. This is a full preservation of a single frame, also known as a keyframe. Typically, when packet loss is detected, we immediately request an I-frame from the other side.

Inter-frame compression uses forward predictive coding frames, representing the differences between this frame and the previous one, i.e., the predictive difference and motion vectors.

Bi-directional predictive coding frames record the differences between the current frame and adjacent frames, resulting in a higher compression rate. However, decoding requires high performance and is generally not used.

The H264 bitstream is composed of individual NALUs. We won’t delve into the specific format and packaging methods such as the single NAL unit mode, aggregation packet mode, and fragment mode, as these formats remain constant and are relatively easy to understand. Plenty of resources are available online. Additionally, with Wireshark, one can intuitively see the meaning of each field. Here’s an example:

H264 compression

Among them:

FU-A indicates a fragmented unit; a Start bit of 1 signifies the start fragment; Nal_unit_type of 1 denotes a non-IDR slice; P slice indicates this is a P-frame slice, which is very intuitive.

What if the Wireshark Protocol only shows UDP?

Right-click, select Decode As, choose RTP.

What if it only shows RTP instead of H264?

Select Preferences, then Protocols, then H264, and modify the PT value.

Although Wireshark provides parsing for each field, we cannot intuitively see the video content. Next, I’ll share a useful tip.

First, download the rtp_h264_extractor.lua script and place it in the Wireshark installation directory;

Edit init.lua and add dofile(DATA_DIR..”rtp_h264_extractor.lua”) at the end, also ensure that enable_lua is true, or disable_lua is false;

We can now filter the H264 packets for analysis, ensuring the Protocol displays H264. It’s crucial to filter out all other calls to avoid interference. You can use UDP port filtering such as udp.srcport==1000 && udp.dstport==2000, or sequentially select Telephony-RTP-RTP Streams, then select the packets you want to filter and click Prepare Filter, as shown below:

H264 compression

Click the “Extract h264 stream from RTP” menu item under tools, and a dump.h264 file will be generated in the original packet path;

At this point, players like VLC can directly play the bitstream file. However, when encountering problem frames, VLC will often close unexpectedly. So, how should you analyze this?

Here are some recommended analysis tools, with some screenshots shown below:

All the above tools can provide detailed viewing of each frame. Interested readers can download them for further exploration of various windows and tools.

Additionally, we have our own setups, making it convenient to see video frames from the packet itself while working.