Mastering DNS Optimization for HTTPS: How Meitu App Achieved a 50% Reduction in Request Time

 

1. Introduction

In the era of mobile internet, competition among app manufacturers is incredibly fierce, and providing an excellent user experience must be prioritized. Meitu products are renowned for their high aesthetic appeal and place significant emphasis on user experience. From a technical perspective, DNS optimization is a crucial aspect of enhancing client-side experience. Reducing DNS response time and minimizing domain hijacking are key research and development challenges that need to be addressed.

This article introduces the practices for DNS optimization on the mobile end by Meitu App, primarily focusing on the HTTPS protocol. The content covers DNS issues, principles, and the final optimization effects comprehensively, making it worth learning from and referencing.

3. Content Overview

DNS optimization

DNS services function before a network connection, resolving domain names into IP addresses for subsequent processes to connect. (For detailed principles, see: “TCP/IP Detailed Analysis Volume 1: Protocols â€“ Chapter 14: DNS – Domain Name SystemIt seems like there might be a mistake in your input. Could you please provide the text content that needs to be translated? I’ll be happy to help with the translation while retaining any HTML tags or formatting.

During a DNS lookup, the system will first attempt to find the information in the local cache. If it doesn’t exist or if the records have expired, it will continue to initiate a recursive query to the DNS server, which is typically the ISP’s DNS server.

During this process, some uncontrollable issues may arise.

Meitu’s mobile products in real-world user environments may face issues such as DNS hijacking and time fluctuation (see:《Comprehensive Understanding of Mobile DNS Hijacking and Other Issues: Principles, Root Causes, HttpDNS Solutions, and More“These unstable factors in the DNS chain cause subsequent network requests to be hijacked or fail directly, negatively impacting the user experience of the product.”

In this regard, we conducted an optimization exploration of DNS resolution for mobile products and developed a corresponding SDK. During this process, we referenced mainstream industry solutions and engaged in some practical considerations.

The following content will mainly explain using the Android platform.

4、LocalDNS VS  HTTP DNS

In long-term practice, internet companies have discovered the following issues with LocalDNS:

1) Domain Name Caching: The ISP’s DNS caches the domain name resolution results, directing users to the in-network caching server;

2) Analyze Forwarding & Outbound NAT: The ISP DNS forwarding query requests or outbound NAT causes the traffic routing strategy to fail.

What is LocalDNS? Generally speaking, LocalDNS refers to the DNS of the local ISP operator:

DNS optimization

▲ The “local DNS server” in the figure is LocalDNS

In order to address these LocalDNS issues, the industry has also prompted the concept of HTTP DNS.Note:If you are not yet familiar with the concepts of LocalDNS and HTTP DNS, please make sure to first read “Comprehensive Understanding of Mobile DNS Domain Hijacking and Related Issues: Principles, Root Causes, and HttpDNS SolutionsIt seems like your input is very brief. Could you please provide more context or the specific part of the WordPress post content you need translated?

The basic principle of HTTP DNS is as follows:

Originally, users perform DNS resolution by sending UDP packets to the ISP’s DNS server for queries. However, under HTTP DNS, we modify this process so that the user includes the domain name to be queried and the local IP address directly in an HTTP request to the HTTP web server. This HTTP web server will return the resolved IP address of the domain.

For example, the implementation principle of DNSPod is as follows:

Compared to LocalDNS, HTTP DNS offers the following advantages: 

1) Cure DNS Resolution Anomalies: Bypass your ISP’s DNS by sending queries to an HTTP web server equipped with DNS resolution capabilities;

2) Precise Scheduling: HTTP DNS can directly obtain the user’s IP address, thus enabling accurate traffic direction;

3) High Scalability: Based on the HTTP protocol, it can achieve more powerful functionality extensions.

So, should we switch entirely to HTTP DNS then?

5. Exploration of DNS Optimization Strategy for Meitu APP

HTTP DNS offers several advantages over LocalDNS; however, HTTP DNS also comes with certain cost issues.

Meitu’s product line is diverse and involves a wide range of domain names. To accommodate the actual scenarios of various products, we have developed a relatively flexible strategy control in practice.

First, strategically, we have not completely abandoned LocalDNS.

An app involves numerous domains; strategically, we can configure its core API domains to use HTTP DNS. However, for non-core requests, we still want them to initially try using LocalDNS, only upgrading to HTTP DNS in exceptional circumstances.

How can you determine if LocalDNS is experiencing abnormalities?

We have selected several metrics to assess the quality of a DNS server:

1) TTL Time for IP Records: In the event of DNS hijacking, the returned TTL could potentially have a very large value;

2) Analysis Time Consumption: If a DNS server takes too long to resolve queries, it is not what we desire;

3) Connectivity of the Returned IP: Conduct quality testing on the returned IP. If the connection status is poor, this DNS server may be suspicious of hijacking.

On the Android platform, the parsing result information obtained through system methods is very limited, and some of the above metrics cannot be obtained. Therefore, in practice, we construct DNS query packets ourselves and initiate queries to multiple DNS servers from the operators.

Based on the comprehensive evaluation of the above-mentioned indicators, when LocalDNS performance is subpar, our strategy is to upgrade to HTTP DNS in an attempt to provide users with a better DNS resolution experience.

In the DNS resolution phase, there is another metric that we are particularly concerned about, and that is the DNS resolution time:

1) When LocalDNS expires, it initiates a recursive query, a process that is uncontrollable and can take several seconds under certain circumstances.

2) HTTP DNS is relatively better, but under normal circumstances, it still incurs a latency of about 200ms.

Could this timeline be optimized further?

Our SDK locally constructs its own record cache pool, storing each record obtained through LocalDNS or HTTP DNS resolution in the cache pool.

Certainly, this is a common practice, and it is also how the netdb library at the system’s core is implemented.

The difference is that we made a small modification:For expired records, we employ a lazy update strategy. When an expired cache record is detected, we first return the expired record to the user, while asynchronously initiating a DNS query to update the cache record.

This slight modification ensures that our secondary parsing can always hit the local cache, significantly reducing DNS resolution time. However, it also introduces a certain degree of risk.

In practice:We will also add an asynchronous periodic DNS record cache pool scanning feature to promptly identify and update expired records in the cache, thereby reducing instances of the App hitting expired records.

5. Exploring Non-Invasive SDK Integration Methods for the Meitu App

In the practice of DNS optimization, the biggest challenge we encounter is not the strategic design aspect, but rather the way our DNS SDK is applied to actual app product operations.

5.1 IP Direct Connection Scheme and Various Pitfalls

In the industry, the integration of HTTP DNS in practical applications often uses IP direct connection, which means originally requesting directly.http://www.meitu.comNow, let’s first use the SDK for domain resolution to obtain the IP address, for example, 1.1.1.1, and then replace the domain name with: http://x.x.x.x/.

After performing this operation, since the HOST in the URL is already an IP address, the network request library will skip the domain name resolution phase and directly initiate an HTTP request to the 1.1.1.1 server.

In practice, we encountered several challenges with the direct IP connection solution.

First, for HTTP requests, even after adopting the IP direct connection scheme, we still need to manually configure the HOST in the Header.

URL htmlUrl = new URL(“http://1.1.1.1/”);
HttpURLConnection connection = (HttpURLConnection) htmlUrl.openConnection();
connection.setRequestProperty(“Host”,”www.meitu.com”);

The HTTP protocol is relatively straightforward, as it only requires handling the HOST. But what about HTTPS?

Initiating an HTTPS request first requires an SSL/TLS handshake, which follows this process:

1) The client sends a “Client Hello,” carrying a random number, supported encryption algorithms, and other information.

2) After the server receives the request, it selects an appropriate encryption algorithm and returns it to the client along with the public key certificate, random number, and other information;

3) The client validates the server’s certificate legitimacy, generates a random number, and encrypts it using the certificate’s public key before sending it to the server.

4) The server retrieves random number information using the private key, computes the negotiated key based on previous interaction information, and communicates it to the client.

5) The client verifies the data and key sent by the server, and upon successful verification, both parties complete the handshake process and begin encrypted communication.

After we adopt a direct IP connection, the third step of the aforementioned HTTPS process will encounter problems.

The client verification of the certificate issued by the server involves two steps: 

1) The client uses a locally stored root certificate to unravel the certificate chain, confirming that the server’s certificate was issued by a trusted authority;

2) The client needs to verify that the certificate’s Domain field and extensional domain include the HOST of the current request.

The validation of the certificate requires both steps to be successfully verified before proceeding with the subsequent processes; otherwise, the SSL/TLS handshake will fail at this point.

Since in direct IP connection, the host part of the URL given to the network request library has already been replaced with an IP address,

In the second step of certificate verification, under the default configuration, the “HOST of this request” will be an IP address, which will lead to a domain check mismatch, ultimately resulting in an SSL/TLS handshake failure.

So how do we solve this problem?

The solution to the domain name verification issue in the SSL/TLS handshake lies in reconfiguring the `HostnameVerifier`, allowing the request library to use the actual domain name for verification.

Here is a code example: 

finalURL htmlUrl = newURL(“https://1.1.1.1/”);
HttpsURLConnection connection = (HttpsURLConnection) htmlUrl.openConnection();
connection.setRequestProperty(“Host”,”www.meipai.com”);
connection.setHostnameVerifier(newHostnameVerifier() {
      @Override
      publicbooleanverify(String hostname, SSLSession session) {
          returnHttpsURLConnection.getDefaultHostnameVerifier()
                    .verify(“www.meipai.com”,session);
      }
});

We’ve resolved another issue, so does that mean all problems with direct IP connections and HTTPS are sorted out?

No, HTTPS and SNI scenarios require special handling.

SNI (Server Name Indication) is an SSL/TLS extension designed to resolve the issue of a single server using multiple domain names and certificates.

Its basic working principle is as follows:

1) The server is configured with multiple domains and corresponding certificates. When the client establishes an SSL connection with the server, it first sends the domain name of the site it wants to access;

2) The server returns an appropriate certificate based on this domain name.

Similar to the domain verification situation mentioned above, the default behavior of the network request library here is to send the “domain name of the site to be accessed” to the server as our replaced IP address.

The server, upon receiving a domain name in such an IP address format, will be utterly confused, unable to find the corresponding certificate, and ultimately will have no choice but to issue a default domain certificate in return.

What happens next is that the client fails to validate the Domain field of the certificate, as the server-supplied certificate does not correspond to the expected domain name.

Finally, the SSL/TLS handshake ended in failure.

In the aforementioned SNI scenario, do we have a way to resolve the issue?

It is possible to solve this by customizing the SSLSocketFactory on the client side. However, the code modifications are relatively extensive, so they are not listed here.

If our SDK needs to be integrated into the actual business of an App, many people might feel overwhelmed by the time they reach the processing of HTTPS SNI scenarios. The workload for integration is not trivial either.

In many cases, compromises might have been made, and the SDK is only used in the OkHttp scenario because OkHttp itself supports DNS replacement, avoiding those aforementioned issues.

In Meitu’s practice, we don’t only want to apply this DNS optimization to Okhttp requests; we also aim to implement the corresponding optimizations in scenarios such as App H5 page loading and player playback.

In scenarios like this, the workload brought by an IP direct connection access solution is actually not low, and it may even require modifications to some components.

In our initial practice, we indeed attempted to implement direct IP connections to various modules. However, even if we overcome the workload challenges of modifications, there are still numerous pitfalls during actual operation.

5.2 The Non-Intrusive DNS SDK Integration Solution Ultimately Used by Meitu

So, is there a more suitable technical solution that can reduce the integration workload of our DNS SDK while also accommodating various use cases, such as HTTPS and RTMP protocols?

Based on this objective, we explored in practice a non-intrusive DNS SDK integration solution that is friendly to business integration. Below, we will illustrate using the Android platform.

We know that the basic approach to perform DNS resolution at the Java level is by invoking the following method:

InetAddress.getAllByName(“www.meipai.com”);

Common network request libraries on the Android platform, such as OkHttp and HttpUrlConnection, rely on this form of DNS resolution.

We conducted an in-depth analysis of the InetAddress execution process, which is roughly as follows:

In the above process, we can understand that InetAddress will attempt to retrieve cached records from AddressCache, and here AddressCache is a static map structure variable.

Therefore, let’s do a little tweak here:

1) Create your own AddressCache structure by emulating the system’s AddressCache. However, replace its get method with one that retrieves resolution records from our SDK.

2) Using reflection, replace the system’s AddressCache variable with our modified AddressCache.

After this sleight of hand operation, when network requests like HttpsUrlConnection at the Java layer perform DNS resolution, the process will be as follows:

Using this approach, we can perfectly resolve the issue of DNS SDK integration at the Java layer. For the business side, they do not need to perform any URL replacement operations, and the corresponding issues in HTTPS scenarios are also eliminated.

The integration at the Java layer is resolved, so what about the Native layer?

We know that on the Android platform, modules like WebView and media players handle network connections at the native layer, and do not invoke the InetAddress methods in the Java layer.

Firstly, at the C/C++ layer, we know that DNS resolution is performed using either the `getaddrinfo` or `gethostbyname2` functions.

Additionally, we also know that on Android and other Linux systems, .so files, which are shared object files, use the ELF file format.

Therefore, from this known information, we can derive the following scenarios:Our app directly uses the `getaddrinfo` function from the system `libc.so` in `a.so`. According to the ELF file specification, there will be a relationship defined in the `.rel.plt` table of `a.so` as follows: getaddrinfo ==> 0xFFFFFF.

The mapping relationship in the .rel.plt table indicates the absolute address of the external symbol getaddrinfo in the current memory space when running a.so.

Under normal circumstances, the function flow in a.so reaches getaddrinfo like this:

So, in this context, can we manually modify this mapping table content to replace the memory address of `getaddrinfo` with our `my_getaddrinfo` address?

So, will a.so be redirected to our my_getaddrinfo during actual execution?

In reality, it is indeed feasible. We attempted to modify the .rel.plt table of a.so after the SDK starts in order to take over the DNS of a.so.

The modified process flow of a.so is as follows:

By utilizing the aforementioned method, we can seamlessly intercept the DNS process in both the Java layer and the Native layer of the app, thereby enabling the business side to apply the optimization effects of our DNS SDK without any additional modifications.

6. Performance After SDK Launch

In practical application, we achieved relatively good results. Thanks to the strategy optimization of the DNS SDK in the local cache hit rate, our mobile products have reduced the time spent on the DNS resolution phase in network requests.

Based on real monitoring data, the time for a complete network request can also be reduced by approximately 100ms:

Through the introduction of HTTP DNS and the optimization upgrade strategy for LocalDNS, our network request success rate has improved, showing a decreasing trend in specific error rates such as unknown hosts.

Due to the flexible strategy configuration inherent at the SDK level, our online monitoring and configuration also enable each product to achieve an optimal balance between efficiency and cost.