Portworx Tutorial : Demonstrate HA Cassandra Stateful Application

Portworx is a popular Kubernetes persistent storage and Docker storage solution. It’s a clustered block storage solution and provides a Cloud-Native layer from which containerized stateful applications programmatically consume block, file, and object storage services directly through the scheduler.

With Portworx, you can manage any database or stateful service on any infrastructure using any container scheduler. You get a single data management layer for all of your stateful services, no matter where they run.

In this post, we will learn how to deploy Cassandra to Kubernetes and use Portworx Volumes to provide HA capability:

  1. Install, configure Portworx
  2. Use the Portworx Storage Class to create a PVC with 3 replicas of the data
  3. Use a simple YAML file to deploy Cassandra using this storage class
  4. How to validate data persistence by deleting the Cassandra pod

First, we will deploy Cassandra in a StatefulSet with a single node (replicas=1) to show the basics of node failover. We will create sample data, force Cassandra to flush the data to disk, and then failover the Cassandra pod and show how it comes back up with its data intact. Then, we’re going to show how we can scale the cluster to 3 nodes and dynamically create volumes for each.

Step #1.Validate Kubernetes

Use kubectl get nodes to check if the Kubernetes nodes are ready.

Image – Kubernetes Pods are ready

Subscribe to Emails

More than 100,000 subscribers across the world trust & read Upnxtblog articles. Subscribe to emails updates!
* indicates required

Step #2.Install Portworx

Portworx requires at least 2 to 3 nodes in the cluster to have dedicated storage for use. It will then carve out virtual volumes from these storage pools. In this example, we use a 20GB block device that exists on each node.

Image – Choose the device to install portworx

Image – Install Portworx

Devops Engineer Masters Program will make you proficient in DevOps principles like CI/CD, Continuous Monitoring and Continuous Delivery, using tools like Puppet, Nagios, Chef, Docker, Git & Jenkins. It includes training on Linux, Python, Docker, AWS DevOps Certification Training and Splunk. The curriculum has been determined by extensive research on 5000+ job descriptions across the globe.
Generic Category (English)728x90

In the above install command, note the below:

  • c=px-demo specifies the cluster name
  • b=true specifies to use internal etcd
  • kbVer=${VER} specifies the Kubernetes version
  • s=/dev/vdb specifies the block device to use

Use kubectl get pods -n kube-system -l name=portworx -o w to check if the Portworx pods are ready and status is in RUNNING state.

Recover over 700 unique file formats. Fully recover documents, emails, photos, videos, audio files and more. Supports 8 different data lost scenarios: deleted files recovery, recycle bin recovery, formatted drive recovery, lost partition recovery, externa

Image – Portworx pods are ready

You can also take a look at the cluster status using the pxctl command as well.

Now, we have the Portworx cluster ready, we can proceed to the next step.

Step #3: Create StorageClass

StorageClass provides a way to describe the “classes” of storage. Various classes might map to quality-of-service levels, or to backup policies, or to arbitrary policies determined by the cluster administrators.

Storage class may differ according to the needs of the business application. Now for our scenario, we have defined below storage class with a replication factor of 2 to accelerate Cassandra node recovery and we also defined a group name for Cassandra so that we can take 3DSnapshots.

Image – Cassandra StorageClass

Refer here for a full list of supported parameters for Portworx volume.

Create the storage class using kubectl create command.

Career Category (English)640x480

Image – Create the storage class

In case of production environments, you would also have to add the "fg=true" parameter to your StorageClass to ensure that Portworx places each Cassandra volume and their replica on separate nodes so that in case of node failure we never failover to a node where it is already running. To enable this feature with a 3 volume group and 2 replicas you need a minimum of 6 worker nodes.

We have got StorageClass ready, let’s deploy Cassandra on the cluster.

Step #4: Deploy Cassandra

In this step, we are going to deploy a 3 node Cassandra application using a stateful set. StatefulSet is used to manage stateful applications i.e., maintains a sticky identity for each of their Pods. Kubernetes maintains a persistent identifier so that it can maintain across any rescheduling.

Create below Cassandra StatefulSet that uses a Portworx PVC created in the earlier step.

Google IT Support Professional Certificate by Grow with Google (300x600) - 2

apiVersion: v1
kind: Service
app: cassandra
name: cassandra
clusterIP: None
- port: 9042
app: cassandra
apiVersion: "apps/v1beta1"
kind: StatefulSet
name: cassandra
serviceName: cassandra
replicas: 1
app: cassandra
# Use the stork scheduler to enable more efficient placement of the pods
schedulerName: stork
- name: cassandra
imagePullPolicy: Always
- containerPort: 7000
name: intra-node
- containerPort: 7001
name: tls-intra-node
- containerPort: 7199
name: jmx
- containerPort: 9042
name: cql
cpu: "500m"
memory: 1Gi
cpu: "500m"
memory: 1Gi
command: ["/bin/sh", "-c", "PID=$(pidof java) && kill $PID && while ps -p $PID > /dev/null; do sleep 1; done"]
value: 512M
value: 100M
value: "cassandra-0.cassandra.default.svc.cluster.local"
value: "K8Demo"
value: "DC1-K8Demo"
value: "Rack1-K8Demo"
value: "false"
- name: POD_IP
fieldPath: status.podIP
fieldPath: metadata.namespace
- /bin/bash
- -c
- /
initialDelaySeconds: 15
timeoutSeconds: 5
# These volume mounts are persistent. They are like inline claims,
# but not exactly because the names need to match exactly one of
# the stateful pod volumes.
- name: cassandra-data
mountPath: /cassandra_data
# These are converted to volume claims by the controller
# and mounted at the paths mentioned above.
- metadata:
name: cassandra-data
annotations: px-storageclass
accessModes: [ "ReadWriteOnce" ]
storage: 1Gi
apiVersion: v1
kind: Pod
name: cqlsh
- name: cqlsh
image: mikewright/cqlsh
- sh
- -c
- "exec tail -f /dev/null"

Create the StatefulSet using kubectl create command.

Image – Create a Cassandra StatefulSet

Use kubectl get pods the command to validate if the pod is READY.

Image – Validate if pods are ready

As an optional step, you can use pxctl the command line to inspect the volumes underlying volumes of Cassandra pod. that we have created.

Image – Inspect volume using pxctl

From the output, infer the following

  • State indicates the volume is attached and shows the node on which it is attached and This is the node where the Kubernetes pod is running.
  • HA shows the number of configured replicas for this volume.
  • Labels show the name of the PVC for this volume.
  • Replica sets on nodes shows the px nodes on which volume is replicated.

Now that we have Cassandra ready, we can create a sample database and populate some data.

Step #5: Create a Cassandra Database

Initialize a sample database on our Cassandra instance using CQL commands.

Start your future with a Data Analysis Certificate.

Image – Connect to CQL Shell session

Next step is to create a keyspace with replication of 3 and insert some sample data:

Image – Create a keyspace and insert sample data

Once the data is inserted, check if the same has been created.

Image – Select rows from the keyspace

Online business courses to jumpstart your future.

Now that we have got the records created, we can proceed to check if the failover works properly or not but before that, we will have to flush (use nodetool flush command) the in-memory data onto disk so that when the Cassandra starts on another node it will have access to the data that was just written. Cassandra by default keeps data in memory and only flushes it to disk after 10 minutes by default.

Image – Flush data to disk

Step #6: Delete Cassandra Instance

Let us simulate failure by cordoning the node where Cassandra is running and then deleting the Cassandra pod. The pod will then be rescheduled to make sure it lands on one of the nodes that have the replica of the data.

Image – Delete Cassandra instance

Once the Cassandra pod gets deleted, Kubernetes will start to create a new Cassandra pod on another node. Use kubectl get pods to verify, when the pod comes back up it will be in the RUNNING and READY(1/1) state.

Image – Verify replacement pod starts running

Also, we have to uncordon the node before the next step.

Image – Uncordon node

We have the new Cassandra pod running, let’s check if the database we previously created is still intact.

Step #7: Verify data is still available

Let’s start a CQL Shell session and validate if the data is available.

Image – Verify if data is still available

Congrats! we have our data and survived the node failure too!

Step #8: Scale the cluster

We will scale our Cassandra stateful set to 3 replicas using kubectl scale command.

Image – Scale the cluster

You can watch the pods getting added:

Image – Cluster scaled

It will take a minute or two for all three Cassandra nodes to come online and discover each other.

Additional Resources :


Portworx Tutorial : Demonstrate HA Cassandra Stateful Application
Article Name
Portworx Tutorial : Demonstrate HA Cassandra Stateful Application
In this post, we will learn how to deploy Cassandra to Kubernetes and use Portworx Volumes to provide HA capability:
Publisher Name
Publisher Logo
%d bloggers like this: