Well its been quite some time since I wrote Part 1 of this series.
Lets revisit the integration from my last post and see where things are now that time has passed more. This is a overview of 30 days traffic taken earlier this month;
So as you can see the site does still have bursting traffic patterns with large peaks (though not quite as large as the initial launch, still significant). Offload is still performing well for this customer and the updated content is being delivered to end users in the time frames expected. Lets take a look at the overall response codes the origin is sending to Akamai to verify how often content is changing
An obvious split of some 200 and mainly HTTP 304s. This is a graph I would expect for a site like this, but lets dig deeper and filter out static assets such as Images and see how much of the actual HTML content is being changed in the cache. Remember – 200/OK means ‘here Akamai this content is changed” and 304 is ‘Not Modified.’ When we filter out the static assets we see that actually, the majority of this site HTML is changing on a daily basis. There are significantly more 200 responses from Origin to Akamai than there are HTTP 304/not modified.
This, together with the offload figures provides validation that our configuration is performing as we intended. So – as promised in the last post, lets take a look at how a configuration like this is set up. With Akamai’s Luna Control Property Manager - it’s a lot easier than you would think.
Before beginning, ensure you have provisioned two CPCodes in Luna Control, this is so you can easily separate static asset traffic and HTML traffic out in reports (lets call them the ‘static Cpcode’ and the ‘HTML Cpcode’). Create the basic configuration in your preferred manner using Luna Property Manager or Property Manager API (discussing how to add an origin or initial site configurations is out of scope of this post, but there are plenty of docs and videos on Akamai Luna Control support section). I recommend making sure things like Enhanced Akamai Protocol, Logging and Akamai RUM are enabled in your default rule. You should also set your CPCode in your default rule – this should be the "HTML Cpcode".
Heres what my rule tree looks like;
In your “Content-Type” match within the compression rule, don’t forget to add additional MIME types such as “application/xml*” and “application/json*” this is because a lot of Wordpress plugins and some core functionality use these MIME types so we should compress them for performance. through this post, keep in mind that child rules will inherit behaviors from the parent rules.
In the “Caching” parent rule, turn on Tiered Distribution using a "Tiered distribution map" that matches your target audience location – in this example our target audience is across the globe, so we select the global "ch2" map. Also switch on “cache HTTP error responses” for a reasonable amount of time (10 to 30 seconds is fine). Counter to the rule name itself; set the caching behavior to "no-store" in this section. This is because, as a caching config strategy, I generally recommend to cache nothing, and build rules upon that to cache specific items – it avoids all kinds of problems, especially when users are logging in or commenting etc ( the exception to this rule is if you manage your own caching strategy at Origin, using Cache-control or Expires headers ). Heres the behaviors I apply in the parent rule called "caching"
You will notice there is also an “advanced” box containing the XML tag “enforce-single-refresh.” This functionality ensures refreshed content is only requested once from the origin per edge/parent, irrespective of how many end users are asking for the content. Those of you with insight to this tag may wonder why this is needed as - per my previous post - we have asynchronous refresh/cache prefresh functionality elsewhere in the configuration. Surely asynchronous refresh/cache prefresh negates the need for “encforce-single-refresh” which actually only applies to synchronous refreshes?
The answer is that generally you would not need it (and the risk of using it is the magnification of origin errors I spoke about in part 1 of this series). However, when we are talking about high volume, bursting traffic patterns there is an additional factor to take into account. The Akamai network shares requests over (at last count) nearly 200,000 edge servers using DNS balancing. It goes without saying that we try to ensure users are hitting cached content where possible so we employ internal mechanisms to make this happen. But in bursting, high volume scenarios we may need to pull in additional edge servers to cope with the demand for your site at the time. It’s those additional edge servers that “might be used” instead of edge servers that are “highly likely to be used” where this tag comes into play. Imagine that there is a burst of traffic and those “might be used” servers are pulled in to handle the additional end user requests. Those servers request the assets and cache them. Imagine that then bursting traffic dies down so the “might be used” servers no longer have end user traffic routed to them. Content on those “might be used” servers will eventually expire, as those edge servers had no recent requests for the given content. We only refresh expired content on servers that are actively being used by your end users and, importantly, those same end user requests also trigger the asynchronous prefresh mechanism! In this scenario, the content could expire from – but may still remain in – cache on those “might be used” machines.
Now imagine there is another burst of users and we start routing traffic back to those “might be used” machines (which now have expired content in cache). Lets imagine 1500 users hit those machines in a second and the request goes to origin to refresh the content. These requests would all be synchronous refreshes, so the origin would get up to 1500 requests in that second (I covered exactly what synchronous refresh is all about in my last post when discussing prefresh). So we would want to ensure bursting traffic in this scenario is requesting content from the origin as little as possible. This is why the enforce-single-refresh is a useful tag to include. Its optional, and you might get away without it – but the trade off would be higher origin requests for this specific scenario under busting load.
If you are not an Akamai employee, you will need to engage Akamai PS to add this line of metadata to your configuration. For the sake of a small amount of Akamai PS time, I recommend you do it if you expect bursting traffic patterns of many tens (if not hundreds) of thousands of unique users per hour. I have also raised an internal request to see if we can gain approval to expose this tag into Akamai Property Manager on a self-service basis for customers - fingers crossed (for internal Akamai viewers, this request is PMCATMGT-86 if you wish to watch)
If you recall, I did mention in part 1 of this series that using a ‘single refresh’ has the capacity to magnify up errors the origin might return. That would, of course, be bad. The way I negate this is to use Aqua IONs ‘Site Failover’ module – I simply tell property manager that in the event the origin returns a 5xx error, its OK to serve the expired content we already have - instead of returning the origin error to all of those users. It’s better an end user receives content that is expired instead of receiving a HTTP error because the origin had some transient error that prevented Akamai from refreshing the cached content. Ill cover how the failover module is set up in a later part of this series
In my next part of this series, im going to cover the actual caching of HTML, static assets and also how to configure rules to ensure administrators can use the system.