B-3-1AITCS3

What is Failover delay and Failback delay in GTM failover property?

Blog Post created by B-3-1AITCS3 Employee on Jun 11, 2015

GTM property in failover mode has 2 settings called Failback and Failover delay time. We will take an example to help you understand how Failover and Failback delay works together to switch the traffic between your Primary/Secondary DC so that you can use this feature according to your requirement.

 

I have created a GTM property with name "test" under GTM domain akamai.akadns.com. I have configured 2 DC as following under GTM property "test"

Primary: dc1.xyz.com

Secondary: dc2.xyz.com

 

At any time dc1.xyz.com is primary.Let's say something happened to my DC1 and my users started getting error while browsing the site. In this situation, the ideal way to fix this issue is to move the traffic away from DC1. If I am not on GTM, I may need my 24x7 team to monitor the site or I will need some kind of monitoring in place so whenever this situation happens, my 24x7 team may follow normal standard operating procedure to troubleshoot and move the traffic away from DC1. It may take at minimum 30 minutes to troubleshoot and move the traffic away from impacted DC. But GTM does it within a very short period of time if your Failover and Failback is set to 0. It helps you to react to this situation quickly and reduce the over time of the impact.

 

At this moment my Failover and Failback setting are as following.

 

Failover Delay: 300s

Failback Delay: 300s

 

Depending on the interval of your liveness tests, GTM will detect this failure as soon as liveness tests starts failing for DC1. GTM does internal calculation of scores returned by liveness test. Depending on these scores GTM decides cut off score. If the aggregated score of liveness tests from all the GTM test agent is crossing this cut off value, DC1 will be marked as down. Because my failover Delay is set to 5 min, GTM will not mark it as down immediately upon detecting this failure. GTM will schedule a time (5 min failover delay time) in future to mark DC1 as down. After 5 min, GTM will again evaluate the score to check if the situation has sustained or not. If yes, the DC1 will be marked as down and all the traffic will be moved to your secondary DC.

 

Failback Delay works in same way but in reverse. Now your traffic is on secondary DC. You found that your DC1 went down due to power outage. You have switched to backup power link and you manually checked that your DC1 is working fine now. The moment liveness test start returning successful response, the liveness score will improve and will fall below cut off score. The moment it falls below cut off score, GTM will schedule a time (5 min in future) to mark DC1 as up. After 5 min, GTM will verify if the situation sustained or not. If yes, your DC1 will be marked as up. All the traffic will be back on DC1.

 

If Liveness tests are failing for your both DCs, the traffic will remain on Primary DC even if it is down.

Outcomes