More and more web applications are starting to adopt a Content Delivery Network (CDN). We will share some tips and tricks about how we adjust our web applications to get the most out of the CDN that is serving them.
What are the advantages of using a CDN?
- Your server load will decrease (as more content is served from the CDN cache), which will save you money.
- Content delivery will be faster, since a round-trip to the webserver is not necessary for cached content.
- The availability of your website will increase, because even if your webserver is down, the CDN can serve cached content.
- And last but certainly not least, your web application security will increase. Most CDNs offer DDOS protection and some will protect you from various forms of attacks, such as XSS and SQL injection.
Cache as Much Content as You Can
Generally, we want to cache as much content as possible. After all, when 100% of our content is served from the CDN cache, not a single request would cause load on our servers and our application would be as performant as it can get.
Unfortunately, an offload percentage of 100% is not realistic for applications that serve dynamic content, but it is a nice goal to strive for.
For the dynamic content, things are a little bit trickier. But since our goal is to serve as close to 100% of the requests from CDN as we can, there are some techniques here as well.
When it comes to dynamic content, I think most content can be divided into two categories. The first category is what I call, somewhat ironically, “dynamic content that is actually static content”. Imagine the homepage of a newspaper site. One might say that the content on that page is highly dynamic. After all, when something happens in the world, an article is written and the homepage will include a link + summary + image of that article, to tease people to read the full article. I argue, however, that even this should be considered as static content, because the content will not change until something else happens in the world. For a certain period – and that may only be a minute – this new homepage is what everyone will see. Therefore we can cache it in the CDN. Even if we only cache it for a minute, we’ll significantly reduce the number of requests to our origin servers. We can even decide to cache for longer, and invalidate it using the aforementioned API call.
Treat all parts of the page that are equal for all users as static content. Only the parts that are different should always be fetched from the origin servers.
The second category are pages where (at least part of) the content is dependent on the user that is viewing it. Imagine the homepage of that newspaper site again. At the top of the page, some people will see the “Login”-button, while others will see their name. Since that means that the content on the same page is actually different for all users, that has to mean that we can’t really treat this page as static content anymore, right?
Not necessarily. The idea is to treat all parts of the page that are equal for all users as static content, and only the parts that are different should always be fetched from the origin servers. To do this, we have two techniques:
- Split the static content and the dynamic content into separate webpages and include the dynamic part using ESI. This technique is explained in more detail in our blog post on improving caching with ESI tags.
The same techniques can be used on pages that are highly user-specific, for example a profile page where the user can see and change their own personal information. On one side of the page we might show a list of the most recent articles. While the page itself is completely user-specific, we can treat the list of recent articles as static content using the techniques mentioned above (enrichment on the frontend and ESI).
Bust Your Cache When Necessary
The trick is that every time we deploy, we serve the new statics under another URL (different path, different filename, or even different query parameter). In most of our applications, we include a build ID in the path of our static content that uniquely identifies the deploy. After each deploy, every static file will be fetched once and is then cached indefinitely in our CDN.
Cache-Control or Edge-Control?
In most CDN’s you can control how long content is being cached by using headers. In some CDNs you can also use the header Edge-Control. An example of this header that would cache the page for 60 seconds is:
Edge-Control: !no-store, cache-maxage=60s
A special feature of this header is that the CDN will use this information (taking precedence over other headers that control caching), but it won’t transmit this information to the end user. This is an important distinction and we often see mistakes against it. If you send a Cache-Control header, the CDN will cache the page if it is configured to do so, and the browser of the end-user will do so as well. If you use an Edge-Control header, the end-user’s browser won’t even see the header anymore and therefore it won’t cache it on the user side.
If you want to be able to invalidate certain pages in cache sometimes, you should not rely on browser cache.
Don’t Forget the ETag & Last-Modified Headers
In your application you can return a header called ETag or Last-Modified. The first contains a unique identifier of the current version of the content, the second contains the date and time that the content was last modified.
When the CDN or the user’s browser expects that a page may be changed (for example, when the cache item is supposed to expire), it can launch an OPTIONS request to the webserver (this request will only returns the headers, not the actual content of the page, and may therefore be more performant). This allows the CDN or browser to verify whether there is a new version of the page, and if so it will fetch the page on the webserver – otherwise it can keep returning the cached content.
Hopefully you now understand that a Content Delivery Network can be an awesome tool for any web application.
A CDN makes our work a lot easier and all web applications should be served using one. This post contains a lot of tips and tricks that will help you to get the most out of your CDN, without getting into trouble.