Every year we welcome customers to our annual festive sale. And with each passing year, it becomes increasingly challenging to ensure all our users have a frictionless experience during these sale days considering the increase in our number of users.

Last year, for example we saw a 750% increase in users as opposed to 2021.

This year, we are expecting the numbers to be even higher. In fact, our Business team is forecasting 200K requests per second during our “Mega Blockbuster festive Sale” of 2022!

What this means for us is "to handle and serve every request that comes in while ensuring that the app does not crash." 😅

That’s why we have written in stone - “To load test before any sale”.  

Why, you ask? Imagine, a piece of code you wrote on the local environment. You tested it out on your staged environment as well. And, everything works like a charm. But what happens when it is implemented live.

It may work, it may not. But we know we cannot take any chances.

That’s why load testing comes in handy.

So, how do you load test for a “Mega Blockbuster” Sale day?

With the massive increase in predicted traffic when compared to our previous sale we needed to load test our servers even before we start preparing for the sales. But how do you test the servers without the actual traffic?

Simple, the idea is to generate false requests, in order to simulate real-world traffic and to test the app beforehand. So, to stimulate actual traffic during the sale days, a couple of load generators would not suffice, but a massive 50+ load generators are needed.

To do so, we need to create an orchestration between these 50+ odd load generators. And that’s where JMeter helped us prepare for the D - Day.

The long list of hurdles

Anyone who uses JMeter knows that this master-slave architecture needs manual intervention to make sure it’s up and running. Because it involves:

  1. Setting up and starting the Elastic Compute Cloud (EC2) machines for master and slave deployment.
  2. Logging in to the master server and editing JMeter’s property file by changing remote_hosts parameter based on the slaves that need to started.
  3. Identifying unique data within the scenario by splitting the test data across the slaves.

But, when dealing with 50+ slaves, the setup becomes extremely time-consuming, taking up to 2-3 hours. And that’s why we decided to automate the process. We no longer had to manually configure each and every slave while running the tests.

Telling the code what to do - Automation

If something is repeatable, it should be done by a program. That’s one of our core principles. And here’s how we automated the master slave setup of JMeter:

jmeter-perf-framework.png
Architecture

The Main Ingredients

  • A central repository on GitHub with Apache-JMeter with all our in-house written scripts in place to have the lab infra spun up with the add-on ability to support customisations.
  • Complete functional setup on AWS with required IAM roles and permissions for creating and destroying EC2 resources along with adding and fetching objects from S3 buckets

The Preparations

  • Create an EC2 instance and install ansible and boto3 dependencies on your Linux machine
  • Clone the git repository on the EC2 instance. We are using ‘/home/ubuntu’ as our base directory

You will find the following scripts in the cloned repository

  1. setup-jmeter-lab.sh - Script to create and orchestrate the slaves for JMeter
  2. spinup-slaves.sh - Script to create slaves as per user requirement
  3. get-asg-ip.sh - Script to get IP address of the slaves for orchestration
  4. jmeter-slaves.yml - Ansible script for downloading updated repository, starting JMeter server and copying data files on the slave
  • Create an AMI using that EC2 instance. This AMI will be used to spin up EC2 instances, which will be used as both master and slaves
  • Create a launch config named “perf-jmeter-slaves” using the AMI

Juggling with the Test Data Files

  • We used S3 directory since we have huge data files. So it is not recommended to save the data files inside the github repository. Create a subfolder inside the S3 bucket based on your scenario
  • The S3 bucket path would be prompted as part of internal execution of the script test-data-distribute.sh
  • The script “test-data-distribute.sh” will help in splitting the data file and send it across to the configured slaves to ensure unique data usage while running the tests
  • Bucket name to be provided as input only during execution of “setup-jmeter-lab.sh” script, which can be found in readme section of github


Refer readme section of github repository for the steps to execute the tests.We generally skip writing the results in jtl files when using huge number of slaves to avoid high resource utilisation on the master server.

For a visual representation of JMeter metrics, we use InfluxDB and Grafana, where InfluxDB backend listener is used to push test metrics to InfluxDB and then visualising the data through a Grafana dashboard.

For Backend servers monitoring, we use victoria metrics with a prometheus setup.

Once the test is done, delete the ASG and terminate the master machine.

The Report card

Oh boy! With the automation in place we ended up load testing around 2,00,000 RPS.And here are three things that stood out for us:

  • ~2 hours of manual work is now done in a span of 2 min
  • ~40% reduction in load generators infrastructure cost due to automated on-demand setup
  • Configuring, commissioning and decommissioning of 50+ slaves, became a piece of cake


Before we go, wanted to ask: Did you know 5% of Indians shop on our app? And the secret to our success is good code that is extensively tested and build with the Engineering Guidelines that we follow here.


ICYMI, the code for automation of distributed load testing with JMeter is hosted on GitHub


Think you have the chops to take up projects like these? Sprint over to our careers website and apply now.

Guidance: Amlan Sekhar (LinkedIn)

Editing 📝  Raman Namboodiripad (Twitter, LinkedIn), Mangala Dilip

Creatives 🧑🏻‍🎨  Ved Sarkar (Portfolio, Linkedin)