Intermittent 503 errors even after increasing keep-alive timeout to 301 as per recommandation

Blog Post created by B-3-1AITCS3 Employee on Jun 29, 2015

Generally without Akamai, Customer's Origin handles TCP connection from many users. To manage this connections, Customer has to put very low Keep-Alive timeout value to make sure they don't run out to TCP connection by hitting maximum open file handles.


With Akamai, Customer will not have to keep very low value for Keep-Alive as there will be fewer servers which will be fetching the content from origin. To give you better performance and reduce the overhead of creating new TCP connection with origin every time, We recommend 301s(1 sec more than Akamai's default Keep-Alive timeout) on origin. This helps Akamai edge server to establish persistent connection for long time and fetch the content on same TCP connection which reduces 3 round trip times per object.


With Origin having less than 300s Keep-Alive timeout, It may result in intermittent 5xx error. This intermittent error happens in a case where origin may try to close the connection and at the same time edge server goes forward on same TCP connection to fetch the content. When these 2 requests crosses each other, it results in 5xx error. For GET request, Akamai's edge will do a retry with establishing a new TCP connection and will be able to fetch the content without impacting end user. With POST request, Edge server do not try to send the request again and due to this end user receives "Zero Size Object Error". To make sure this doesn't impact your end user, please consider setting your origin Keep-Alive timeout to 301.


Today's Application Architectures involves multiple layers of LBs and Web serves before it reaches to actual web server serving the content. The connection may not get dropped between Akamai->Origin only, It may get dropped internally between your LB and Apache servers.  In typical case where the request first hits your LB and then it goes to Frontend Web Servers which distributes request to Backend Web server depending on the Context-Root, you need to make sure you increase Keep-Alive time at each layer.

Ex. LB(301s)->Frontend Webserver(302s)->Backend Webserver(303s).


In this case, LB will never go forward to Frontend Web server on closed TCP connection and the same way Front End Webserver will not go forward to Backend Webserver on closed Connection.


Useful link : How to test origin server persistent connection timeout




Harsh Dhandhukia