Mastering AWS CloudFront: A Guide to Configuration, CDN Principles, and Performance Analysis

As cloud product development progresses, mastering cloud products and understanding their underlying logic is an essential part of the task. This series provides a basic configuration experience and usage analysis of the AWS CloudFront product.

Too Long; Didn’t Read

  • What is CloudFront
  • CDN principles and the problems it solves
  • CloudFront basic configuration process
  • tcpdump packet capture and analysis
  • Conclusion

01/What is CloudFront

Here is an excerpt from the official website. CloudFront is a web service that speeds up the distribution of your static and dynamic web content, such as HTML, CSS, JS, and image files, to users. It delivers your content through a global network of data centers, referred to as edge locations.

When a user requests content that you serve with CloudFront, the request is routed to the edge location that provides the lowest latency (time delay) so that content is delivered with the best performance.

CloudFront directly translates to “cloud front end,” and it is a CDN product.

02/CDN Principles and Solutions

Principles
There is an illustrative diagram below to explain CDN principles (external reference):

CloudFront>

From the above flowchart, it can be seen:

  • User Initiation: The user requests the resource via http://www.test.com/1.jpg.
  • DNS Resolution: The browser initiates domain name resolution with the DNS server, which identifies the CNAME configuration pointing to the CDN’s scheduling domain, and DNS recursively queries the CDN’s scheduling system.
  • Scheduling Decision: The scheduling service returns the best edge location IP for access.
  • Cache Check: The browser requests the 1.jpg file from the IP, entering the CDN edge location, which checks if the content is in the cache.
  • Cache Hit: If the content is in the cache, the edge location directly returns it to the user.
  • Cache Miss: If the content is not in the cache, the edge location requests it from the origin server (possibly going through multiple intermediary nodes to reduce origin server load).
  • Cache and Transfer Content: The edge location stores the content from the origin server in the cache and delivers it to the user.
  • Browser Rendering: Upon receiving content, the user’s device’s browser begins rendering the page.

Problems Solved
In summary, CDN services are intended to reduce the service provider’s costs (resource costs, operational costs) and enhance user experience. They achieve this by providing quick request response and resource caching with access control, edge computing, security, and other value-added capabilities through extensive, proximal edge node networks.

Cost Example:

CloudFront>

Based on an example of 100G traffic per month, the cost after discounts is approximately 96.89 RMB, usually more affordable than bandwidth fees. It is less than putting in resources to operate servers and bandwidth without a CDN, requiring only some configuration for access.

03/CloudFront Basic Configuration Process

Create Distribution

Origin Configuration

Cache Configuration

Functions, WAF, Alternate Domain Name, etc. (default not configured)

Upon completion of the configuration, a domain d37z7ecg72nt7t.cloudfront.net was assigned (in this section, this domain is used directly, and further configuration of custom domain will be covered in later sections).

04/tcpdump Packet Capture and Analysis

Log into the server hosting the origin sg.lukachen.work, capture packets, and write them to a test.pcap file (capture all incoming and outgoing packets on the network interface, use Wireshark for filtering and reassembling. Note! Packet capture consumes substantial CPU and disk resources; if on a live server, execute during low load times or after setting reasonable filtering parameters and evaluation).

tcpdump -i eth0 -w test.pcap

Access resources from a local browser (or use curl), discover a 404 response — there’s a problem
 Troubleshoot the cause.

curl https://d37z7ecg72nt7t.cloudfront.net/1.txt

Log into the server, stop packet capture, and send the capture file to local Wireshark for analysis.

sz test.pcap -y

Use Wireshark to locate packets, filter by keyword 1.txt, and use Follow TCP Stream to reconstruct TCP packets.

From the reconstructed view, observe request headers; analysis shows the cause to be the Host in the request header not being configured on my server.

Configure it, reload nginx configuration

Request again, it’s through!

Repeat the above packet capture actions and examine the related header information again.

05/Conclusion

Well, the above is the content of this opening chapter, covering server-side packet capture and Wireshark reassembly analysis with tcpdump.

CDN is a crucial component of modern internet infrastructure, especially for large websites and services needing global content distribution. CDN is key to improving performance and user satisfaction.

In this chapter, we explored the basic concepts, principles, and basic configuration of CloudFront CDN. In subsequent chapters, we will delve into each configuration item’s usage and packet capture analysis to further explore optimization for different business needs, demonstrated through test cases.