August 29, 2021

Cassandra Load Testing with Groovy

Performance Testing

This blog post will explain how to use Cassandra Performance Testing with Groovy scripting language in JMeter.

Table of Contents: 

What is Cassandra Performance Testing?

Cassandra performance testing can be done with the Cassandra stress tool which is a Java-based stress testing utility for basic benchmarking and load testing a Cassandra cluster.  Significant load testing over several trial is the best way to discover issues with a data model in your application

 

Many test engineers use Apache JMeter™ as a standalone test solution to test database performance. By using the JDBC request sampler you can test many databases like Oracle or MySQL. But what if you need to test NoSQL databases like Cassandra? There is no Cassandra request sampler in default JMeter. No worries, JMeter is a very flexible performance testing tool, which can be expanded with a variety of different plugins and flexible scripting capabilities.

Cassandra is one of the most popular distributed databases from Apache. Cassandra is designed to be highly scalable and to be able to manage a very large amount of structured data. But what is the most important aspect of a highly scalable database with huge data set inside? Right, of course, it’s performance!

A few years ago, DataStax (a data management software vendor, powered by Cassandra) developed an easy-to-run tool for performance testing Cassandra clusters, called cstar_perf. This solution works great if you need to test a Cassandra database separately and concentrate on database testing only.

But what if you want to run your Cassandra performance testing scenarios in parallel with existing load tests, and to orchestrate them all from one place? After all, using separate tools for each database creates an overhead. This is the main reason we love JMeter so much - it allows us to perform automation tests against almost any source, and Cassandra is not an exception.

Cassandra Performance Testing With JMeter

The first step to implementing Cassandra performance testing in JMeter was done by Netflix, who (after detailed benchmarking) decided to use Cassandra as one of their main backend data sources, beginning about 5 years ago.

The Netflix team developed the JMeter plugin for Cassandra, which extends JMeter default capabilities and provides new samplers to perform direct requests to the Cassandra database. Read more details in this article. This solution worked very well until the moment the Cassandra thrift API (which is used in that developed plugin) became deprecated (more details in this article) and the Netflix plugin for Cassandra didn’t have any recent commits. The plugin became unusable immediately.

A new solution was required, and that solution is Groovy.

So let’s begin our test. First of all, let’s prepare our test environment.

Cassandra Installation

For our example, we are going to install Cassandra on one single node (the test machine is based on Mac OS but all steps should be almost the same even if you are using some other UNIX platform).

Let’s create a folder to where we can download and unpack one of the latest Cassandra distributions:

mkdirCassandracdCassandracurl-Ohttp://www-us.apache.org/dist/cassandra/3.0.14/apache-cassandra-3.0.14-bin.tar.gztarxzvfapache-cassandra-3.0.14-bin.tar.gz

After that, we need to add the Cassandra executor to our PATH environment, to be able to run it regardless of the current path location. For that, we need to add these lines into “~/.bash_profile” file on your system (keep in mind that path will be different based on your current location):

exportPATH="/Users/ybushnev/Cassandra/apache-cassandra-3.0.14"

After that, you need to trigger the command to update your PATH:

source~/.bash_profile

To verify if the Cassandra database has been installed successfully, you can run the command that prints you out the version of Cassandra distribution:

cassandra-v

If everything was installed successfully, you should see something like this:

cassandra load testing jmeter groovy

Now you can start the Cassandra server by running this command. ‘-f’ is required if you want to see Cassandra instance logs in the same terminal.

cassandra-f

 

As a result, you will see many logs with the INFO level. If there are no logs entries with the ERROR level, you are good to go further:

cassandra jmeter load testing

Database Creation

To proceed with some basic test scenarios, we need to create a trivial database with existing data that can be retrieved by performance tests. In order to create that, we need to use Cassandra Query Language (CQL), which provides a query module that is very close to the commonly used SQL. In order to use CQL, you need to run the Cassandra Query Language shell by typing:

cqlsh

After that you can just copy-paste the script for database creation and type ‘Enter’:

CREATEKEYSPACEtest_keyspaceWITHREPLICATION={'class':'SimpleStrategy','replication_factor':1};USEtest_keyspace;CREATETABLEusers(user_idINTPRIMARYKEY,first_nameTEXT,last_nameTEXT,emailTEXT);INSERTINTOusers(user_id,first_name,last_name,email)VALUES(1,'Bob','Moore','bob@gmail.com');INSERTINTOusers(user_id,first_name,last_name,email)VALUES(2,'Brian','Nelson','brian@gmail.com');INSERTINTOusers(user_id,first_name,last_name,email)VALUES(3,'Anthony','Parker','anthony@gmail.com');INSERTINTOusers(user_id,first_name,last_name,email)VALUES(4,'Kevin','Collins','kevin@gmail.com');INSERTINTOusers(user_id,first_name,last_name,email)VALUES(5,'Jeff','Wilson','jeff@gmail.com');INSERTINTOusers(user_id,first_name,last_name,email)VALUES(6,'Mark','Taylor','mark@gmail.com');INSERTINTOusers(user_id,first_name,last_name,email)VALUES(7,'John','Baker','john@gmail.com');INSERTINTOusers(user_id,first_name,last_name,email)VALUES(8,'Laura','Carter','laura@gmail.com');

 

As a result, you shouldn't see any errors, and the test keyspace with database will be created on your local Cassandra server:

how to load test with jmeter and cassandra

To check, you can just make a basic query to retrieve all the data from the newly created ‘users’ database, by using the query below in the same terminal:

SELECT*FROMcountriesWHEREid=1;

using cassandra performance testing with jmeter ad groovy

Now we have a local installation of the Cassandra server and an example database. These can be used as a source for performance tests. So we are ready to go!

Cassandra Performance Testing Using Groovy

One of the most stable options to load test Cassandra is to use JSR223Groovy scripts with CQL commands in your script. To do that, you need to use some of the latest Cassandra drivers, but keep in mind that the driver should be compatible with Cassandra’s server version.

Use this link to download the driver jar file. After that you can add the downloaded jar file to JMeter’s classpath:

groovy for performance testing jmeter and cassandra

In addition, you need to add few other dependencies as well in order to make the Cassandra driver work. It might be tricky to discover what are all dependencies you need (you can get an idea from the Maven repository page of the Cassandra driver). So I put together a list with links directing to where you can download the appropriate jar files separately:

Once you have downloaded all the jars, add them to the JMeter classpath as well:

jmeter cassandra performance testing with groovy

Now we can start the test sampler implementation. This is a simple example because Cassandra is mainly designed to manage terabytes of data, but it will give you a clear picture to how you can test your Cassandra server. In addition, keep in mind that the provided code is simplified to fit everything together without dividing it into lots of small screenshots and code snippets.

Let’s add a Test Group that will contain our test steps. For script creation, it is enough to use even one user with several loops.

how to load test with cassandra database

As soon as we are done with test script prerequisites, we can add the Groovy script to our ‘Thread Group’. To do that:

Right click on ‘Thread Group’ -> Add -> Sampler -> JSR223 Sampler

Inside that sampler you need to ensure that you are using ‘Groovy’ as a scripting language. All the other JMeter element parameters can stay the same. The basic Groovy script for execution of a simple Cassandra query looks like this:

 

importcom.datastax.driver.core.Cluster;importcom.datastax.driver.core.Session;Clustercluster=Cluster.builder().addContactPoint("127.0.0.1").build();Sessionsession=cluster.connect("test_keyspace");defresults=session.execute("SELECT * FROM users");session.close();cluster.close();

 

jmeter plugin for cassandra performance testing

In this code snippet, we:

  • Specified dependency imports to use Cassandra Driver objects
  • Opened the connection to the local Cassandra server and made a session to keyspace with the name ‘test_keyspace’
  • Closed the session and the server connection. This is important to avoid too many open connections to your database, which might reflect on performance as well

 

Let’s add the ‘View Results in Table’ listener and verify the test results.

You should see that all the requests were successfully completed:

testing cassandra with jmeter and groovy

But in the result table, you might notice that each sampler takes more than 2 seconds to be completed. As we know, Cassandra is one of the most powerful databases and our table with test data has just a few rows! So what is wrong here? The main issue in our code is that our sampler opens and closes a database connection each time to create unnecessary steps, which takes a lot of time. Ideally, we should open the connection only once for each specific thread and close the connection right before the test is terminated.

The best practice to achieve that is to use ‘setUp Thread Group’ and ‘tearDown Thread Group’ to open and close database connection accordingly. Let’s add ‘setUp Thread Group’ (Right click on ‘Test Plan’ -> Add -> Thread group -> setUp Thread Group) and use this code snippet as the same Groovy script to open the database connection and session:

importcom.datastax.driver.core.Cluster;importcom.datastax.driver.core.Session;Clustercluster=Cluster.builder().addContactPoint("127.0.0.1").build();Sessionsession=cluster.connect("test_keyspace");props.put("cluster",cluster);props.put("session",session);

load testing cassandra with jmeter and groovy

Also, you might notice that we put the connection and session objects to JMeter properties (“props”). We are doing this because JMeter properties are shared across all thread groups, so we can use the created connection (“cluster”) and session further in the test.

After that, we can use the “props.get(*)” function to retrieve the required objects from JMeter properties. As the objects will be already initiated, you don’t need to add any dependency imports. All we need is to add a teardown step (Right click on ‘Test Plan’ -> Add -> Thread group -> tearDown Thread Group) and use these functions in order to close the open connections after the tests have ended:

jmeter plugin for cassandra how to use

props.get("session").close();props.get("cluster").close();

As a final step, we need to refactor the existing code of the main sampler where we send our query request to Cassandra:

cassandra performance testing with jmeter and groovy

props.get("session").execute("SELECT * FROM users");

 

Now the main sampler is simplified and doesn’t have any redundant logic.

To verify the execution of all test steps, we can move the ‘View Results in Table’ listener from ‘Thread Group’ to ‘Test Plan’ and execute the test again:

jmeter database testing cassandra

That’s it! We achieved our goal and now the database connection and close actions are separated from samplers. As you can see, requests to Cassandra take just milliseconds, while open and close database operations take much more. That’s why it is so important to separate these steps. Now you can add more Groovy scripts to the existing test and perform any operations supported by Cassandra.

Bottom Line

Cassandra performance testing via JMeter is not so easy but definitely doable, and a reasonable option if you are comfortable with the JMeter framework itself and want to use Cassandra tests in combination with other tests already written in JMeter.

If this is not the case and you want to run extensive Cassandra low-level performance testing in isolation, I would recommend using the “cstar_perf” tool. In addition, you can use different system monitoring tools like collectd or zabbix to verify how your nodes behave during Cassandra servers load. If you need to perform some tests against Cassandra 2.0, there are some other options available across the web, which are simpler:

Our main goal was to show how you can achieve Cassandra performance testing via JMeter, independently of the Cassandra version you have. By using Groovy scripting language you don’t need to care about which Cassandra functionality is supported by JMeter, because you can implement the operations yourself. This requires some basic scripting experience but on the other hand, it gives your very strong flexibility within an understanding of requests happening behind a scene. And once you feel very comfortable with Cassandra scripting you can become the person who will develop a new JMeter plugin for Cassandra testing which will simplify all these steps. The community is looking for such heroes!

START TESTING NOW

 

Related Resources: