Performance and load testing has traditionally been an exercise for the end of a development cycle. The testing was conducted in a performance lab and often the testing was only done at a fraction of the load for cost and logistical reasons. The results were then extrapolated to try to understand the ability of the production infrastructure to handle expected load.
There are many challenges with this approach, particularly as the nature of applications has changed in the last few years. Many of the nuts and bolts of load testing (accurate reproduction of business processes, developing appropriate coverage models, identifying and measuring key performance metrics) remain the same. There are, however, a number of factors driving change in how and when we test applications for performance and the ability to handle load.
To begin with, consumer-facing websites have created a need for testing at far greater scale than had ever been done in the past. Many sites get hundreds of thousands of users a day, some peaking at a million or more concurrent users based on promotions or other events. Testing these sites not only implies scale, but it also means testing the entire infrastructure, including load balancers, firewalls, and externally provided services.
In addition, companies are increasingly focused on driving faster development cycles, leveraging Agile Development tools and processes. This has driven a need to integrate testing, including performance testing, throughout the software development lifecycle. Continuous integration systems are playing a key role in enabling more frequent testing as it ‘shifts left’.
The scale and scope of today’s architectures and applications creates increasing complexity in testing, as well. Uncovering issues is becoming more like finding a needle in a haystack. In addition, new technologies, such as rich user interfaces, duplex protocols and the increasing use of mobile technologies need to be incorporated into test plans.
Akamai has taken advantage of the cloud, the ability to handle big data in real-time, and modern approaches in UI to create a methodology that reflects all of the best practices associated with load and performance testing over the years, combined with an approach that addresses the new challenges presented by today’s technologies.
CloudTest Performance Methodology
The graphic below illustrates the elements of Akamai's CloudTest Methodology, which leverages existing best practices to extend traditional approaches and address the new opportunities and challenges presented by cloud testing.
The following is a description of each pillar and step in this highly-iterative methodology (shown above). The fundamental approach from a strategic standpoint is to assess the current people, processes and tools used by an organization and map those to the Akamai methodology. There are a number of methodology assets associated with each pillar of the methodology that Akamai makes available to its customers. These assets include templates, test plans, scenario-definition documents, and best practices documents when using CDNs, building and sizing tests.
A clear and generally accepted strategy is core to the Akamai performance methodology. The broader performance and application engineering teams need to understand and buy-into the overall test strategy, the goals, and the benefits achieved as a result of implementing a robust performance test strategy. All stakeholders should have a voice in the overall strategy and stand to benefit from its implementation
For testing projects it's important to identify goals and establish the means to reach those goals. Akamai has developed a set of assets to provide testers with templates, guidance, and best practices to create a strategy for success when testing modern applications.
|Testing in Production||This white paper describes why you would test in production and discusses the most frequently asked questions around security, test data and potential live customer impact.|
|Data Retention and Security Policy||This document describes Akamai's policy for retaining customer data.|
|Akamai Services||A high level overview the services provided by Akamai many of which span various pillars in this methodology.|
Describes how to make performance engineering a part of your overall software development life cycle and integrate it with other tools and processes. There is an existing organization and infrastructure for software development, release, and operations. Testing needs to integrate as seamlessly as possible. In some cases, new opportunities for testing may drive changes in other tool choices or processes.
|Continuous Integration||How to integrate CloudTest into Jenkins.|
|Performance in the SDLC||A PowerPoint that discusses performance testing with CloudTest in the SDLC and associated document describing how to to integrate CloudTest for performance projects.|
In order to implement a performance strategy it is critical to understand the people and processes that need to be in place. This is the link between the strategy and execution. In some cases it is one or two people who do everything and the processes are rather simple. In other cases, such as the complete overhaul of a website, there will be multiple people for many of the responsibilities and a comprehensive test plan that is part of a much larger process.
The engagement profile is highly dependent on the size of the company and the complexity and length of the testing. There are a number of responsibilities and the extent to which we take any of them on is dependent on the sophistication of the client. Most commonly there will be multiple people involved in executing discrete tasks, such as scripting, monitoring setup, test execution, and test analysis and reporting. For some customers, Akamai will fill many of these roles.
|Roles and Responsibilities||The engagement profile is highly dependent on the size of the company and the complexity and length of the testing. There are a number of responsibilities and the extent to which we take any of them on is dependent on the sophistication of the client. Most commonly we will have multiple people involved for executing discrete tasks, such as scripting, monitoring setup, test execution and test analysis and reporting. For some customers we'll also play a project management role. The following is a set of definitions for these responsibilities. In some cases it is one or two people who do everything. In other cases, such as the complete reconstruction of a website, there will be multiple people for many of the responsibilities.|
|Launching SOASTAStore||SOASTAStore is a Wordpress-based application that is used by Akamai for demonstrations and training. You can use www.soastastore.com, or launch your own version as described in this article for training purposes since many of the materials use this site.|
Processes are the set of guidelines used to implement the test strategy. It is the combination of these processes and the people involved which form the basis for executing the test strategy. This often includes understanding and addressing the software development lifecycle, test execution, remediation, tuning, and reporting processes.
|Project/Engagement Process||A recommendation engagement process that is appropriate for both Akamai on demand and self-service testing|
|Implementation and Installation||Steps required for most common CloudTest Installation options, along with criteria to determine the best deployment option.|
Once we've established and documented the strategy for testing, as well as identified the participants and their roles, we're ready to start testing. Depending on the test requirements and complexity, the execution phase may be compressed or extended. Many of the assets used with this methodology are designed to enable rapid testing in response to unexpected issues. In those cases, the established process may not be followed and instead the focus will be on identifying the scenario, the goals of the test, and quick execution.
This methodology uses assets such as engagement guides, monitoring worksheets, and sizing and calibration guides to develop the scope of individual test engagements. They are used to help identify scenarios and prepare for the tactical test sessions.
|Engagement Guide||A spreadsheet template for capturing the overall requirements for an engagement. This includes the same data we capture in an engagement/test record as well as a VU calculator and data burn calculator.|
|Scenario Guides||Most often you'll use the browser scenario guide but please be sure to send the 'headless' scenario guide if we are going to be testing web services. These are useful to capture business processes.|
|Virtual User Calculator|
Many testers don't speak in the language of "virtual users" (vuser). Instead, they think in terms of HTTP requests/second, Page loads/second or number of visitors over time. This document converts those numbers to an estimate of the number of CloudTest virtual users needed to generate the desired load.
|Load Server Hours Definition||Load server plans base a 'server hour' on an Amazon Web Services Large instance. This describes the relative values of using different options.|
|Monitoring is the ability to capture resource metrics about target servers and network devices involved in CloudTest-generated tests. When you monitor you can get information about how servers and other network devices (load balancers, routers, etc.) are handling the load being generated against the web application. Real-time monitored metrics are included in the Akamai CloudTest dashboards to help identify possible bottlenecks in the web application being tested. Monitored information is available at several different levels including system resources (i.e. CPU, network, disk, and memory utilization), processes (i.e. apache thread counts), and server specific information (i.e. application server JVM information, database commits/rollbacks, etc.). CloudTest also has a tight integrate with a number of third party APM systems.|
|Load Generator Calibration||Load server calibration is an iterative process to determine the appropriate number of virtual users to run from each load server. This document will help you calibrate your load servers and sizing table. The latter classifies scenarios by type and provides a high level indicator for both the development effort and potential impact on VUs per server.|
|Bandwidth Usage Calculator||An example for calculating bandwidth usage|
As the test is built there are a number of considerations, best practices, and assets that need to be used. These include test creation checklists, script samples, and third-party best practices.
|Test Clip Creation Checklist||Common things to keep in mind when creating a test clip.|
|Test Composition Creation Checklist||Common things to keep in mind when creating a test composition.|
|Akamai Best Practices||Best practices when tests include assets delivered by Akamai.|
|Script Guide||Document of pre-defined scripts for edge case requirements, pacing and more.|
|Sample Scripts||58 Scripts created by Akamai Performance Engineers ready to be imported into CloudTest (V54 and above)|
For the test itself there are a number of factors that need to be taken into account for how the test is managed. Understanding the goals is key to a successful test. Depending on the goals, the participation of many stakeholders may need to be coordinated. The infrastructure, types of tests, and load-generating locations will also have an impact on the test.
|Grid Deployment||Load Servers are usually automatically managed by CloudTest. The CloudTest user is only expected to deploy and tear down the grid. But there can be corner cases where grid deployment is not successful: it may not finish or all listed servers are not actually working properly. This document provides best practices that should be employed for ensuring grids are working and troubleshooting when they are not.|
|Continuous Integration||Using Jenkins with CloudTest for automation and continuous integration.|
|Manual Server Deployment||This document describes the process to successfully incorporate one or more previously provisioned CloudTest Load Generators into the main instance of a CloudTest environment. It is assumed that these servers have been already provisioned and are successfully running and were not provisioned through the API.|
|If you need to whitelist IP addresses the links for the various vendors can help you determine the ranges. If you don't want to whitelist a full range, you can reserve IP addresses from some of the cloud providers. For AWS, after downloading the JSON you can run curl “https://ip-ranges.amazonaws.com/ip-ranges.json” 2>/dev/null | jq ‘.prefixes | .region + “,” + .ip_prefix’ | sed “s/\“//g” | sort >> Amazon_AWS_IP_Ranges_$TODAY.txt to pull the data by regions. We can also help you with techniques for launching servers and pulling the IP addresses from the CloudTest server list, assuming you can whitelist on demand. Here is a helpful tool for converting IP range to CIDR: http://ip2cidr.com/|
|ELB Best Practices||Describes how to configure a test to generate the most realistic load possible against an Amazon ELB-based application|
A key component of this methodology is the ability to assess test results and application performance in real-time. This offers the ability to identify and potentially remediate performance issues during test sessions by dynamically adjusting load profiles to represent various scenarios.
|CloudTest Widgets||The list of widgets/metrics that can be displayed out of the box on the dashboard. They are divided into logical sections to provide an easy reference.|
With CloudTest, the ability to surface and drill into metrics while the test is running can dramatically shorten testing cycles. Break/fix activities become much more interactive. Nonetheless, there’s still a need to review results over time, analyze the data, and make adjustments by tuning the application/infrastructure and updating the tests.
All data from the real-time dashboards is available for post-test reporting. Reporting data from CloudTest by reviewing integrated dashboards, exporting raw data results, or by using the integrated report generation. This allows for deep, post-test analysis, management reporting, and trend analysis.
Ultimately, testing is about reacting to what is learned; most often by addressing issues identified during the test. It is also important to assess the efficacy of the testing itself and adjust as needed.
Testing in the Lab
Cloud testing does not obviate the need or eliminate the benefits of testing in a lab as well as the production environment, and it’s important to have continuity between the two. Ongoing testing in a lab allows engineering teams to assess performance over time, and helps catch any show-stopping performance bugs before they reach production.
The lab provides a place to performance test code and configuration changes for regressions, before these changes are released to production. This could include things like a quick bug fix in a page, or a seemingly minor configuration change that could have a performance impact. Often, these kinds of changes are deployed with little to no testing and come back later to cause performance issues.
CloudTest uses continuous integration systems such as Jenkins and Bamboo and service virtualization tools such as CA LISA to move performance testing earlier in the software development life cycle. This helps find many application and database-oriented issues.
Testing in Production
Testing in production is the best way to get a true picture of capacity and performance in the real world. There are many issues that Akamai's production testing approach will catch that cannot be found with traditional lab-based test methods. These include:
- Batch jobs not present in the lab (log rotations, backups, etc.) or the impact of other online systems affecting performance
- Load balancer performance issues, such as mis-configured algorithm settings
- Network configuration problems with switches and routers
- Insufficient network bandwidth
- Latency between systems
Common concerns around production testing include security, test data, and real user impact. This methodology is based on Akamai's extensive experience testing the production infrastructure of the world’s largest consumer-facing web sites. It incorporates risk-mitigating techniques such as slower ramp times, integrated monitoring, various approaches to test data management, real-time visibility to site responsiveness, interactive load adjustment, and the test can be instantly stopped, if necessary.
Akamai's production testing approach helps identify the invisible walls that show up in architectures after they move out of the lab. Traditionally, testers have been limited to making extrapolations over time about whether small tests on a few servers in a lab can support exponentially higher amounts of load in production. Without proper testing, these types of assumptions result in hitting unexpected barriers after multiple years of consistent traffic growth. We have seen that successful companies are using production testing to learn things about the performance of their sites that they could have never learned in a lab.
Test Execution Strategy
Having a well-defined strategy, with explicit test plans geared toward individual objectives, such as holiday readiness, a major architectural change, or the release of a major version of code, provides business and engineering leaders with a high degree of confidence in operational readiness. This approach yields greater insight into an application’s performance and readiness. There are common test types included in a plan, which, when taken together, make for a well-rounded view of application performance and reliability. The exact number, order, and frequency are guided by several factors, including those mentioned above. These test types include:
- Baseline: the most common type of performance test. Its purpose is to achieve a certain level of peak load on a pre-defined ramp-up and sustain it while meeting a set of success criteria such as acceptable response times with no errors.
- Spike/overdrive: simulates steeper ramps of load, and is critical to ensuring that an application can withstand unplanned surges in traffic, such as users flooding into a site after a commercial or email campaign. A spike test might ramp to the baseline peak load in half of the time, or a spike may be initiated in the middle of steady state of load.
- Endurance/soak: help ensure that there are no memory leaks or stability problems over time. These types of tests typically ramp up to baseline load levels, and then run for anywhere from 2 to 72 hours to assess stability over time.
- Failure: ramps up to peak load while the team simulates the failure of critical components such as the web, application, and database tiers. A typical failure scenario would be to ramp up to a certain load level, and while at steady state the team would pull a network cable out of a database server to simulate one node failing over to the other. This would ensure that failover took place, and would measure the customer experience during the event.
- Stress: finds the breaking point for each individual tier of the application or for isolated pieces of functionality. A stress test may focus on hitting only the home page until the breaking point is observed, or it may focus on having concurrent users logging in as often as possible to discover the tipping point of the login code.
- Diagnostic: designed to troubleshoot a specific issue or code change. These tests typically use a specially designed scenario outside of the normal library of test scripts to hit an area of the application under load and to reproduce an issue or verify issue resolution.
Using an iterative process within test plans provides the agility needed for continuous improvement in the applications being tested. A cycle that starts with test definition and ends with obtaining actionable intelligence results in a continuous cycle of improvement.