How to Troubleshoot  DNS Zone Transfer Issue?

Unicorn tutorials

In this scenario, we have a company with a central headquarters and newly deployed remote branch offices. The company’s IT infrastructure is mostly contained within the central office using a Windows server-based domain and a secondary domain controller. The domain controller is responsible for handling DNS and authentication requests for users at the branch office. The domain controller is a secondary DNS server that should receive its resource record information from the upstream DNS servers at the corporate headquarters.

The deployment team is rolling out the new infrastructure to the branch office when it finds that no one can access the intranet web application servers on the network. These servers are located at the main office and are accessed through the wide area network (WAN). This issue affects all users at the branch office, and is limited to just these internal servers. All users can access the Internet and other resources within the branch.

Figure 8-33 shows the components to consider in this scenario, which involves multiple sites.

Figure 8-33: The relevant components for the stranded branch office issue

Tapping into the Wire

Because the problem lies in communication between the main and branch offices, there are a couple of places we could collect data to start tracking down the problem. The problem could be with the clients inside the branch office, so we’ll start by port mirroring one of those computers to check what it sees on the wire. Once we’ve collected that information, we can use it to point toward other collection locations that might help solve the problem. The initial capture file obtained from one of the clients is stranded_clientside.pcap.

Analysis

As shown in Figure 8-34, our first capture file begins when the user at the workstation address 172.16.16.101 attempts to access an application hosted on the headquarters app server, 172.16.16.200. This capture contains only two packets. It appears as though a DNS request is sent to 172.16.16.251 for the A record  for appserver  in the first packet. This is the DNS name for the server at 172.16.16.200 in the central office.

As you can see in Figure 8-35, the response to this packet is a server failure , which indicates that something is preventing the DNS query from completing successfully. Notice that this packet does not answer the query since it is an error (server failure).

We now know that the communication problem is related to some DNS issue. Because the DNS queries at the branch office are resolved by the DNS server at 172.16.16.251, that’s our next stop.

Figure 8-34: Communication begins with a DNS query for the appserver A record.

Figure 8-35: The query response indicates a problem upstream.

In order to capture the appropriate traffic from the branch DNS server, we’ll leave our sniffer in place and simply change the port-mirroring assignment so that the server’s traffic, rather than the workstation’s traffic, is now mirrored to our sniffer. The result is the file stranded_branchdns.pcap.

As shown in Figure 8-36, this capture begins with the query and response we saw earlier, along with one additional packet. This additional packet looks a bit odd because it is attempting to communicate with the primary DNS server at the central office (172.16.16.250)  on the standard DNS server port 53 , but it is not the UDP  we’re used to seeing.

In order to figure out the purpose of this packet, recall our discussion of DNS in Chapter 7. DNS usually uses UDP, but it uses TCP when the response to a query exceeds a certain size. In that case, we’ll see some initial UDP traffic that triggers the TCP traffic. TCP is also used for DNS during a zone transfer, when resource records are transferred between DNS servers, which is likely the case here.

Figure 8-36: This SYN packet uses port 53 but is not UDP.

The DNS server at the branch office location is a slave to the DNS server at the central office, meaning that it relies on it in order to receive resource records. The application server that users in the branch office are trying to access is located inside the central office, which means that the central office DNS server is authoritative for that server. In order for the branch office server to be able to resolve a DNS request for the application server, the DNS resource record for that server must be transferred from the central office DNS server to the branch office DNS server. This is likely the source of the SYN packet in this capture file.

The lack of response to this SYN packet tells us that the DNS problem here is the result of a failed zone transfer between the branch and central office DNS servers. Now we can go one step further by figuring out why the zone transfer is failing. The possible culprits for the issue can be narrowed down to the routers between the offices or the central office DNS server itself. In order to figure this out, we can sniff the traffic of the central office DNS server to see if the SYN packet is making it to the server.

I have not included a capture file for the central office DNS server traffic because there was none. The SYN packet never reached the server. Upon dispatching technicians to review the configuration of the routers connecting the two offices, it was found that the central office router was configured to allow UDP traffic inbound only on port 53 and block TCP traffic inbound on port 53. This simple misconfiguration prevented zone transfers from occurring between servers, which prevented clients within the branch office from resolving queries for devices in the central office.

Lessons Learned

You can learn a lot about investigating network communications issues by watching crime dramas. When a crime occurs, the detectives begin by interviewing those most affected. Leads that result from that examination are pursued, and the process continues until a culprit is found.

In this scenario, we began by examining the victim (the workstation) and established leads by finding the DNS communication issue. Our leads led us to the branch DNS server, then to the central DNS server, and finally to the
router, which was the source of the problem.

When performing analysis, try thinking of packets as clues. The clues don’t always tell you who committed the crime, but they often take you to the culprit eventually.

Share this