1. Introduction to HttpDNS
For the internet, the domain name is the first hop in accessing content, and often this hop can “misstep” (especially on mobile networks), leading to errors in accessing content, failed connections, and more, causing the smooth experience of surfing the web to vanish.
When it comes to this critical first hop, major domestic Internet companies, including Tencent, have been continually researching and contemplating strategies. This article aims to provide an in-depth summary and technical sharing of Tencent’s team’s technological practices in this area, with the hope of offering some inspiration to everyone.
Study Exchange:
– Instant Messaging/Push Technology Development Discussion Group 4:101279154[Recommendation]
– Introductory Article on Mobile IM Development: “Beginner’s Guide: Developing a Mobile IM from Scratch》
(This post is also published on:http://www.52im.net/thread-2121-1-1.htmlIt appears that the input is incomplete or possibly not in English. Could you please provide the text content of the WordPress post you’d like translated? This will help me in assisting you with the translation accurately.
2. Articles Related to HttpDNS
《Network Programming for Lazy Beginners (Part 1): Quickly Understanding Network Communication Protocols (Part One)》
《Network Programming for Beginners (Part Two): Quickly Understanding Network Communication Protocols (Conclusion)》
《Introduction to Network Programming for Beginners (Part Six): The Most Straightforward Explanation of Hub, Switch, and Router Functionality and Principles》
《Network Programming for Beginners (Part Seven): A Thorough yet Straightforward Understanding of the HTTP Protocol》
《The Lazy Guide to Network Programming (Part Nine): In Simple Terms, Why Do We Use MAC Addresses When We Already Have IP Addresses?》
《Technical Primer: An In-depth Explanation of the New Generation Low-Latency Network Transport Layer Protocol Based on UDP—QUIC》
《Optimization Techniques for Modern Mobile Network Short Connections: Request Speed, Weak Network Adaptability, Security Assurance》
《Mobile IM Developers Must Read (Part One): Easy-to-Understand Guide to Comprehending the “Weak” and “Slow” Nature of Mobile Networks》
《Mobile IM Developer Must-Read (Part Two): The Most Comprehensive Summary of Mobile Weak Network Optimization Methods in History》
《Intro to Mindless Web Programming (Part Five): The Ping Command We Use Every Day – What Exactly Is It?》
《“Brain-Dead Introduction to Network Programming (Part Six): What Are Public and Private IPs? What on Earth Is NAT Translation?”》
3. Main Text Overview
Any internet enterprise using domain names to deliver services to users will more or less encounter issues like domain caching and slow cross-network access in the unique Chinese internet environment. So, for an internet company like Tencent, which has domain names in the range of 100,000, just how serious is the issue of domain resolution anomalies?
Every day, Tencent’s distributed Domain Name System (DNS) monitoring system continuously probes all major LocalDNS services (referring to ISPs’ DNS services) across the country. The daily DNS resolution anomalies for Tencent domains nationwide have exceeded 800,000 records (with mobile-originated anomalies being especially prominent in this aspect). This has caused significant losses to Tencent’s business operations. To address this, Tencent has established a specialized team to engage in in-depth communications with various service providers. However, due to multiple factors, the processing efficiency and effectiveness have not met the requirements of Tencent’s various business departments.
Aside from communicating with service providers, is there a technical solution that can fundamentally resolve issues with domain name resolution anomalies and cross-network user access? This is a question that many technical teams in major domestic internet companies, including Tencent, have been contemplating.
4. Firstly, what is DNS?
To understand the various DNS issues that will be discussed in this article, we first need to review the basic principles and related knowledge of DNS.
4.1 How DNS Works
DNS (Domain Name System) is a service used during network requests to translate domain names into IP addresses. This allows users to access the internet more conveniently without having to remember sequences of IP numbers that can be directly read by machines.
The basic principles of DNS are illustrated in the figure below:
Traditional public DNS services based on the UDP protocol are highly susceptible to DNS hijacking, resulting in security issues.
4.2 DNS Domain Name System Structure
As illustrated in the diagram above, the structure of a typical DNS domain name system is as follows:
1) Root Domain: When using DNS domain names, it is specified by a trailing dot to indicate the name is located at the root or a higher level of the domain hierarchy;
2) Top Level Domain (TLD): Used to indicate the type of name used by a specific country, region, or organization, such as .net;
3)Second-Level Domain: The registered name used by individuals or organizations on the Internet. For example, 52im.net;
4) Third-Level Domain: Domains derived from a registered second-level domain, such as docs.52im.net.
4.3 DNS Resolution Process
As depicted in the image above, this is a typical domain name resolution process:
1) Enter in the browserwww.52im.netSend a resolution request;
2) The local domain name resolver program queries the local cache and the host file for any mapping relationships for the domain name. If such mappings exist, it uses the corresponding IP address mapping to complete the resolution.
3) If neither the hosts file nor the local resolver cache contains the corresponding URL mapping, the local resolver will send a recursive query request to the preferred DNS server set in the TCP/IP parameters (which we refer to as the Local DNS server);
4) When the server receives a query, if the domain name to be resolved is handled by the local machine, it will return the resolution result to the client, thus completing the domain name resolution. This type of resolution is authoritative. If the domain name to be queried is not resolved by the Local DNS server, but the server has cached this URL mapping, it will use this IP address mapping to complete the domain name resolution. This resolution is non-authoritative.
5) If the Local DNS server’s local area files and cache resolutions fail, it will query based on its configuration (whether recursive is enabled). If the non-recursive mode is used, the Local DNS server will forward the request to the 13 Root DNS servers. If the recursive mode is enabled, this DNS server will forward the request to the upstream DNS server. If the upstream server cannot resolve it, it will either query the root DNS or pass the request further upstream, continuing this cycle.
6) Once the root DNS server receives the request, it determines which entity is authorized to manage the domain name and will return the IP address of a server responsible for that top-level domain.
7) Once the local DNS server receives the IP information, it will reach out to the server responsible for the .net domain;
8) Once the server responsible for the .com domain receives a request and cannot resolve it, it will provide the local DNS server with the address of a lower-level DNS server that manages the .net domain.
9) When the Local DNS server receives this address, it will look for the 52im.net domain server. Steps 10 and 11 repeat the above actions to perform the query;
10) Finallywww.52im.netReturn the IP address of the domain that needs to be resolved to the Local DNS server;
11) The local DNS server caches this resolution result (it will also cache the results returned in steps 6, 8, and 10);
12)The local DNS server simultaneously returns the results to the local domain name resolver;
13) Cache local DNS resolution results;
14) The local resolver returns the results to the browser;
15) The browser initiates a request using the returned IP address.
4.4 Recursive and Iterative DNS Queries
Recursive Query:If the local domain name server queried by the host does not know the IP address of the queried domain name, the local domain name server then acts as a DNS client and continues to send query request packets to other root domain name servers, rather than letting the host proceed with the next step of the query itself.
Iterative Query:When the root name server receives an iterative query request message from the local name server, it either provides the queried IP address or informs the local name server which name server should be queried next. This directs the local name server to perform the subsequent query, rather than conducting the subsequent query on behalf of the local name server.
In this context, it is evident that the client to Local DNS server and the interaction between the Local DNS and the upper-level DNS servers involve recursive queries. Meanwhile, the interactions between DNS servers and the root DNS server are characterized by iterative queries.
In actual environments, using a recursive model can result in high traffic for DNS servers, so most DNS systems today operate in iterative mode.
Challenges Faced by Mobile Networks in China Regarding Various DNS Issues
Summarizing, the main issues with DNS result in three categories:
1) LocalDNS Hijacking;
2) Average access latency decreases;
Certainly! Here’s the translation of the text content while maintaining the original formatting and ignoring any HTML structure:
3) The user connection failure rate decreased.
LocalDNS Hijacking:Since HttpDNS requests the server’s A record directly via IP over HTTP, and doesn’t inquire about domain resolution with local ISPs, it fundamentally avoids hijacking issues. (For hijacking at the HTTP content TCP/IP layer, the reliability of data transmission can be ensured by using validation factors or data encryption methods.)
Average Access Latency Reduction: Since accessing via IP directly eliminates the domain resolution process (even if the system cache speeds it up slightly at a ‘millisecond level’), the fastest node is accessed after intelligent algorithm sorting.
User Connection Failure Rate Decrease:By employing algorithms to reduce the previously high failure rates in server ranking, prioritize servers that have been accessed recently based on time, and improve server ranking through successful historical access records. If IP(a) encounters an error, return the record ranked with IP(b) or IP(c) on the next attempt.
So, getting to the root of the problem, why exactly do these issues exist? This is the topic we’ll discuss in the next section.
6. Tracing the Source: What is the Root Cause of Domestic DNS Issues?
First, we need to understand the basic situation of Local DNS for various ISPs within the country.
The problems caused by LocalDNS of domestic operators can be attributed to the following three reasons:
1) Domain name caching;
2) Resolution forwarding;
3) LocalDNS recursive outbound NAT.
Certainly! Let’s translate the text content while preserving the HTML structure:
—
Below, let’s analyze each one step by step.
—
If you have any more text that needs translation or further assistance, feel free to share!
6.1 Domain Caching
The concept of domain caching is easy to understand. It means that the LocalDNS has cached the resolved results of Tencent’s domain and does not initiate recursion to Tencent’s authoritative DNS.
Diagram is as follows:
Why does Local DNS need to cache domain name resolution results? The reasons are as follows:
1) Ensuring that user access traffic is processed domestically: There are significant differences among domestic internet access providers in terms of bandwidth resources, inter-network settlement fees, IDC facility distribution, and the distribution of ICP resources within the network. To ensure access quality for users within the network and reduce cross-network settlements, providers have set up content caching servers within the network. By forcefully directing domain names to the IP addresses of these content caching servers, they effectively achieve the goal of keeping local network traffic entirely local.
2) Ad Injection: Certain LocalDNS may cache the resolution results of some domain names and replace the intended content with advertisements from third-party ad networks.
The type of behavior described above is what we commonly refer to as domain caching. Domain caching can lead to the following access anomalies for users:
Certainly! Here’s the translation of the text content while maintaining the original formatting and HTML structure:
—
A. Caching is only implemented for HTTP services on port 80. If the domain provides services via HTTPS protocol or other ports, users will encounter failures. This is particularly relevant for services like payment processing or games that connect to a server through a specified port.
—
Let me know if there’s anything else you need!
B. The operational level of cache servers varies, leading to occasional issues where users experience access anomalies due to cache server failures.
6.2 Packet Forwarding Analysis
In addition to domain name caching, the ISP’s LocalDNS also exhibits the phenomenon of query forwarding. Query forwarding refers to the practice where the ISP itself does not perform recursive domain name resolution but instead forwards domain name resolution requests to the recursive DNS of other ISPs.
The normal LocalDNS recursive resolution process is as follows:
And some small operators, in order to conserve resources, directly forwarded the resolution requests to the recursive LocalDNS of other operators:
The direct consequence of this is that the authoritative DNS requests received by Tencent seem to originate from IP addresses of other operators, ultimately resulting in user traffic being directed to the wrong IDC, causing slower access for users.
6.3 LocalDNS Recursive Egress NAT
The concept of LocalDNS recursive egress NAT refers to a situation where an Internet Service Provider’s (ISP’s) LocalDNS performs recursion according to the standard DNS protocol. However, due to multiple network egress points and the configuration of destination NAT routing, there is a chance that the egress IP used by LocalDNS for the final recursive resolution is not an IP address from the local network.
As shown in the figure below:
The direct consequence of this is that the source IP of the domain name resolution requests received by the GSLB DNS becomes the IP of other ISPs, ultimately causing user traffic to be directed to the wrong IDC, resulting in slower user access.
7. It’s essential to address these issues, but conventional solutions have too many problems.
The abnormal domain resolution by the local DNS of the operator has caused significant damage to the user experience when accessing Tencent’s internet services.
So how did we handle these domain name resolution anomalies in the past?
1) Real-time Monitoring + Business Promotion:
This approach is currently being used by Tencent’s operations team. However, it involves longer cycles since leveraging administrative measures to prompt carriers to address this issue is relatively time-consuming. Additionally, through big data analysis, we concluded that the Top 3 problematic users are mobile internet users. What technical means do we have available to resolve the above issues for this group of users?
2) Bypass automatic DNS assignment by using 114 DNS or Google Public DNS:
This solution appears quite promising. 114 DNS is the largest neutral caching DNS in China, while Google adheres to a “don’t be evil” philosophy, acting as a giant in the internet engineering sphere. Furthermore, Tencent’s authoritative DNS supports the edns-client-subnet feature, which can directly recognize the IP addresses of users resolving Tencent domain names through Google Public DNS, ensuring traffic distribution remains effective.
But here’s the problem:
a. How to Construct Domain Name Requests Client-Side: For PC clients, creating a standard DNS request packet isn’t particularly challenging. However, when it comes to sending a standard DNS request packet to a specified LocalDNS on mobile devices, ensuring compatibility across various iOS and Android versions is technically feasible, though the cost of maintaining compatibility would be quite high.
b. High User Effort in Modifying Configuration: Encouraging users to manually modify the DNS configuration on a PC is somewhat feasible under WiFi conditions on PCs and smartphones. However, the difficulty of getting users to change DNS configurations in a mobile internet environment speaks for itself.
3) Completely discard the domain name and build a self-hosted connect center for traffic scheduling:
If you want to adopt this approach, first you need to obtain an accurate IP address database to determine user ownership. Then, establish a protocol and set up a connect center for scheduling. Afterwards, make scheduling modifications to the access layer. This solution is similar to the second one; it’s not that it can’t be done, but the cost will be relatively high, especially for a company as large-scale as Tencent.
Given all the issues with the traditional solutions mentioned above, is there a traffic scheduling system based on domain names that offers precise control, low cost, and easy configuration? The answer is yes, and we will continue sharing this in the next step.
8. The current mainstream solution: HttpDNS has emerged.
8.1 What is HttpDNS?
HTTPDNS employs the HTTP protocol to interact with DNS servers, replacing the traditional DNS interaction based on the UDP protocol. This approach bypasses the ISP’s Local DNS, effectively preventing domain hijacking and improving domain resolution efficiency. Additionally, since the DNS server obtains the actual client IP rather than the IP from the Local DNS, it can accurately determine the client’s geographical location and ISP information, thereby effectively enhancing scheduling precision.
The principles of HTTPDNS are illustrated in the diagram below:
8.2 The Main Issues Addressed by HttpDns
Local DNS Hijacking:Since HttpDns requests HTTP directly using an IP address to obtain the A record of a server, bypassing the local ISP’s domain resolution process, it fundamentally prevents hijacking issues.
Average Access Latency Decline:By directly accessing the IP, it eliminates the domain resolution process, enabling access through the fastest node found via algorithmic sorting.
User connection failure rate decreases:By utilizing algorithms to decrease the previously high failure rates in server ranking, improving server ranking based on recently accessed data, and enhancing server ranking through historical records of successful access.
8.3 Tencent’s HttpDNS Approach
After much effort, the GSLB team at Tencent launched HttpDNS, a traffic scheduling solution tailored specifically for mobile clients. It is based on the HTTP protocol and domain name resolution, and it effectively addresses issues with LocalDNS resolution anomalies and inaccurate traffic scheduling.
Detailed introduction is as follows.
The basic principle of Tencent’s HttpDNS:
The principle of HttpDNS is very straightforward, consisting mainly of two steps:
A. The client directly accesses the HttpDNS interface to obtain the optimal IP with the least access delay, as configured in the domain name management system. (For disaster recovery considerations, the alternative method of using the ISP’s LocalDNS to resolve the domain name is still retained.)
B. After obtaining the IP, the client directly sends a business protocol request to this IP. Taking an HTTP request as an example, by specifying the host field in the header, you can send a standard HTTP request to the IP returned by HttpDNS.
Advantages Brought by HttpDNS:
In principle, HttpDNS merely changes the protocol for domain name resolution from DNS to HTTP, which isn’t complex.
However, this minor change has brought countless benefits:
A. Resolving Domain Name Resolution Anomalies: By bypassing the LocalDNS of the operator, the user’s domain name resolution requests are directly transmitted to Tencent’s HttpDNS server IP via the HTTP protocol. This ensures that users on the client side will not experience issues related to domain name resolution anomalies.
B. Precision in Routing: HttpDNS can directly acquire the user’s IP address. By integrating Tencent’s proprietary IP address database and speed measurement system, it ensures that users are directed to the fastest IDC node for access.
C. Low implementation cost: Integrating HttpDNS into the business requires only minimal modifications at the client-side access layer, with no need for rooting or jailbreaking user mobile phones. Additionally, since the HTTP protocol request construction is very simple, compatibility with various versions of mobile operating systems is not an issue. Moreover, the backend configuration of HttpDNS fully reuses the existing authoritative DNS configuration, keeping the management cost very low. In summary, it solves the issue of domain name resolution anomalies affecting the business with minimal modification costs, while also meeting the need for precise traffic scheduling.
D. Highly Scalable: HttpDNS offers reliable domain name resolution services, allowing businesses to integrate their own scheduling logic with the results returned by HttpDNS to achieve more refined traffic scheduling. For example, it can specify the IP address for client connection requests based on a particular version, or specify the IP address for users of a specific network type.
Of course, you all might be asking:The user switched their preferred domain resolution method to HttpDNS, so how is the high availability of HttpDNS ensured? Additionally, when users from different carriers access the same HttpDNS service IP, how is the user’s access latency guaranteed?
To ensure high availability and enhance user experience, HttpDNS utilizes the BGP Anycast network by integrating with Tencent’s Public Network Exchange Platform and establishing BGP interconnections with several major operators nationwide. This ensures that users from these operators can quickly access HttpDNS services. Additionally, HttpDNS is deployed across multiple data centers, enabling seamless failover to backup nodes in the event of a node failure, thus ensuring normal user resolution.
Integration Results and Future Prospects:
Tencent’s current HttpDNS solution has been integrated into numerous internal services, covering hundreds of millions of users, and it has been running stably for over a year. The services that have implemented HttpDNS have significantly improved the user access experience.
Taking a business that has integrated HttpDNS as an example, just by incorporating HttpDNS without any other optimizations, the average user access latency decreased by more than 10%, and the access failure rate dropped by more than one-fifth, resulting in a significant improvement in user experience. Additionally, Tencent’s HttpDNS service, aside from its extensive internal use within Tencent, has also been recognized by industry peers. Inspired by Tencent DNS, China’s largest public DNS provider, 114dns, has also launched an HttpDNS service.
In the coming days, the Tencent GSLB team will further promote the HttpDNS service within Tencent and will upgrade the HttpDNS service based on actual business needs. This may include providing more general, secure, and simple access protocols to further enhance the network access experience for users. We hope HttpDNS can offer you a straightforward and feasible approach to resolving domain name resolution anomalies and global traffic scheduling failures.
9. As a startup team, how do you transform an app to support HttpDNS?
As a startup team, you might find yourselves lacking the human resources, financial means, and technical strength to address this area, yet your mobile app genuinely faces the various DNS problems mentioned in the text. So, what should we do?
Using Third-Party Cloud Service Provider’s HttpDNS Interface
At present, some domestic manufacturers have already provided this parsing service, allowing direct use of third-party services.
Currently, the number of third-party service providers offering HttpDns resolution services is increasing, for instance:Aliyun HttpDNS、Tencent Cloud HttpDNS、Huawei Cloud HttpDNSIt appears you want to translate the text “等” which is a single Chinese character. In English, it translates to “etc.” or “and so on”. Please provide more context or additional text if needed for further translation.
Using Alibaba Cloud’s HttpDNS for convenience, its API is quite standardized. You can simply send a GET request with the request parameters, and the results are returned in JSON format:
http://203.107.1.1/d?host=www.52im.net
When the request is successful, the returned result is as follows:
{
“host”: “www.linkedkeeper.com”,
“ips”: [
“115.238.23.241”,
“115.238.23.251”
],
“ttl”: 57
}
On mobile devices, the IP address obtained from HttpDns is used to replace the domain name in the original URL, and then a new URL is created to initiate the HTTP request.
GitHub address for HTTPDNSLib:https://github.com/CNSRE/HTTPDNSLib
Implementation of the HttpDNS interaction process principle in HTTPDNSLib:
From the image above, you can see the entire interaction process of the business. The user inputs a URL address into the query module, which then checks if the cache exists. If not, it queries the httpdnsapi interface and returns through the evaluation module. Once the user’s URL request process is complete, the result of this request needs to be fed back to the evaluation module of the lib library, where it will be recorded for quality data assessment.
Detailed Interaction Process of the HttpDns Lib Library:
The detailed flowchart in the image delves further into the working principle of the library. Two vertical lines divide the image into three sections: the left part, the middle part, and the right part.
The left section involves tasks performed by the app’s main thread, the middle section deals with handling library event logic in the app caller’s thread, and the right section involves logic for independently processing events in a new thread.
The process begins at the client-side caller, which inputs a URL to obtain domain information. The query module then proceeds to check the domain record. Initially, it queries the memory cache layer; if no data is found there, it queries the database. If the database also lacks the data, it will request from the local LocalDNS. If data is obtained from any of these three stages, it moves to the next stage. If no data is acquired, the process terminates with a null return. Next, the evaluation module ranks the data using five plugins and then returns the sorted data to the client.
The `lib` module sets a timer to check whether a domain needs updating based on the TTL expiration time. The timer operates on a separate thread, so it doesn’t affect the app’s main thread. The `httpdns` API requests data, first attempting to retrieve it from a configured `httpdns` API interface. If it fails, it will then attempt to get the data from the DNSPod API interface. Should that also fail, data is retrieved directly from the local DNS (which in future will be replaced by sending UDP packets encapsulating DNS protocols to fetch data from public DNS servers like 114DNS). The DNS server address can be set as preferred. Once data is acquired, it enters the speed test module. The latest version of the speed test module can be configured in two ways: one is through an HTTP empty request. This involves two HTTP header interactions, similar to the time it takes for an initial TCP handshake, used to test the fastest link. The other method is the `ping` command (ICMP protocol) to minimize traffic consumption. If some servers block pings, using an empty HTTP speed test is recommended. After testing, the data is inserted into the local cache.
Below is a screenshot of the test demo:
For a detailed introduction to the HttpDns Lib library, please refer to: “App Domain Name Hijacking and DNS High Availability – Detailed Explanation of the Open Source HttpDNS Solution》。