Today’s topic covers several technical aspects related to HTTP in web performance optimization, including gzip compression.
- gzip compression transfer;
- chunked transfer encoding;
>
1. What is gzip?
gzip is short for GNU zip, a popular file compression algorithm; gzip is commonly used to compress CSS, JS, HTML, and other plain text content, which can save a significant amount of network bandwidth;
2. How good is gzip?
… Let the data speak …
Before enabling gzip compression:
>
After enabling gzip compression (Tomcat):
Note: How to enable gzip compression on Tomcat and Weblogic, click here:HTTP: Compression Transfer, Chunked Transfer;
3. gzip file format
A gzip file is composed of 1 or more “blocks,” but typically only contains one block. Each block consists of header, data, and trailer sections.
Remember: 0x1F8B08 is the gzip identifier
4. How to implement gzip compression/decompression in Java
Java’s I/O methodology provides GZIPOutputStream and GZIPInputStream for gzip compression and decompression respectively;
The code isn’t written with high precision
For reference only
5. Chunked transfer in HTTP
In general, HTTP uses Content-Length to indicate the length of the response content. Browsers (like IE)
will wait until it receives Content-Length bytes before starting to parse the page; otherwise, it will result in a white screen, waiting indefinitely;
Look!
IE just hangs the request!
Screen went white!
(Stay away from IE, treasure life!)
The chunked transfer encoding in the HTTP 1.1 protocol is a data transfer mechanism that allows the server to break the response data into multiple chunks and send them to the browser in batches. The browser does not need to wait for all the content bytes to be downloaded and can begin parsing the page as soon as it receives a chunked block.
6. Chunked protocol details
Reference: https://tools.ietf.org/html/rfc2616#section-3.6.1
… Illustrating with message examples …
7. What happens with chunked and gzip?
Image 1: Enabling gzip on Tomcat
Image 2: Observing TCP stream with Wireshark
Several points to note:
- Tomcat uses gzip to compress the response data, while also utilizing the chunked transfer mechanism;
- In the chunked + gzip mode, each chunk is not an independently decompressible gzip package; rather, the entire message is compressed with gzip and then divided into chunks (see below);
- The first chunk only contains 10 bytes, including the gzip file header (0x1f 8b 08);
Reference: https://en.wikipedia.org/wiki/Chunked_transfer_encoding
8. Implementing a GZIPFilter by yourself
Below is a simplified version of a self-implemented GZIPFilter;
Note: This version of GZIPFilter is based on tk-filters-1.0.1, simplified and corrected for implementation, for learning and discussion purposes only; https://sourceforge.net/projects/filterlib/