Announcing Dragonfly Kubernetes Operator General Availability
We are thrilled to announce the general availability of the Dragonfly Kubernetes Operator.
November 6, 2023
Introduction
We are excited to announce that the Kubernetes Operator for Dragonfly is now generally available, making it simple and easy to run and manage Dragonfly on Kubernetes. Dragonfly is a data store built for modern cloud workloads, and Kubernetes is the leading orchestration engine for modern cloud workloads, making this a perfect fit for those looking to architect resilient, reliable, and performant applications.
Along with general availability, we are also excited to announce new capabilities such as advanced snapshotting, enterprise-grade security, and performance and reliability enhancements.
To get started immediately, visit our newly updated Dragonfly Operator documentation.
Advanced Snapshotting
Snapshotting has always been a reliable data backup mechanism for Dragonfly. In this latest release, we are taking it to the next level, ensuring that snapshotting is more seamlessly integrated with Kubernetes and cloud storage solutions. With the introduction of the high-level snapshot
field in the Dragonfly Custom Resource Definition (CRD), configuring and utilizing snapshotting has never been easier.
By setting up this configuration, you empower Dragonfly to automatically handle data backups during pod terminations as well as restores when a pod comes back up again, minimizing downtime and maintaining the integrity of your operations.
Dragonfly Kubernetes Operator supports snapshotting in two ways: Persistent Volume Claims (PVC) and Cloud Storage. Each option offers unique advantages, catering to different use cases and preferences.
1. Persistent Volume Claims (PVC)
PersistentVolume (PV)
is the method Kubernetes users employ to manage disk storage from underlying cloud or on-premise infrastructure. A PersistentVolumeClaim (PVC)
is a request for storage by your applications. With the snapshot.persistentVolumeClaimSpec
field, you can use the exact same Kubernetes PVC syntax to configure Dragonfly snapshotting storage.
apiVersion: dragonflydb.io/v1alpha1
kind: Dragonfly
metadata:
name: dragonfly-instance-snapshotting-to-pvc
spec:
replicas: 1
snapshot:
cron: '*/5 * * * *'
persistentVolumeClaimSpec: # uses standard Kubernetes PVC API
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 2Gi
More details can be found in the Dragonfly Operator snapshots to S3 documentation.
<iframe width="100%" height="475" src="https://www.youtube.com/embed/chqlgwsPS6M" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
2. Cloud Storage
Dragonfly has recently added support for snapshotting to S3-compatible cloud storage. This allows for seamless writing and reading of snapshot files directly from an S3 bucket, facilitated by the --dr s3://<>
server argument. To utilize this feature, the environment must be properly configured with the necessary credentials.
The same should work with a Dragonfly instance managed by the operator when the snapshot.dir
field is set accordingly. Additionally, for those utilizing managed Kubernetes services such as Amazon EKS, there are tools available to attach an IAM role directly to a Kubernetes service account. This feature simplifies credential management, automating rotation based on the pod's lifecycle and eliminating the need to handle long-lived credentials.
apiVersion: dragonflydb.io/v1alpha1
kind: Dragonfly
metadata:
name: dragonfly-instance-snapshotting-to-s3
spec:
replicas: 1
serviceAccountName: dragonfly-s3-svc-acc # service account with S3 permissions
snapshot:
dir: 's3://dragonfly-snapshots' # S3 bucket name
Enterprise-Grade Security
1. Client Authentication
With the introduction of the authentication
field in the Dragonfly Operator configuration, we have streamlined the process of authenticating clients connecting to your Dragonfly instance. Currently, the following two methods, passwordFromSecret
and clientCaCertSecret
, are supported.
passwordFromSecret
utilizes Kubernetes Secrets to store and manage credentials. By specifying a secret in your configuration, Dragonfly will automatically retrieve and use the value associated with the key
as the authentication password for clients.
apiVersion: dragonflydb.io/v1alpha1
kind: Dragonfly
metadata:
name: dragonfly-instance-with-password-auth
spec:
replicas: 1
authentication:
passwordFromSecret:
name: dragonfly-auth
key: password
clientCaCertSecret
enhances security with TLS by having client certificates verified by Dragonfly. Setting this up requires a few more steps to follow. Detailed instructions for both methods above can be found in the Dragonfly Operator authentication documentation.
2. Server TLS
The Dragonfly Kubernetes Operator now supports the integration of TLS certificates. By specifying a Kubernetes Secret in your Dragonfly instance configuration, you can ensure that the certificates are propagated and configured. This results in encrypted communication between clients and the Dragonfly server, safeguarding network communications from man-in-the-middle attacks.
Using TLS with cert-manager is available in the Dragonfly Operator server TLS documentation.
Monitoring and Reliability
1. Monitoring with Prometheus & Grafana
Prometheus is the default way of monitoring and storing metrics in Kubernetes. We have new documentation on how to install the Prometheus Operator and use it to collect and store metrics.
Grafana can then be used to start visualizing these important metrics. We provide custom dashboards with important metrics that you can directly load and start monitoring your instances.
2. Reliability - Custom Rollout Strategy
Unlike the conventional way of relying on Kubernetes for stateful set upgrades, the Dragonfly Operator takes a proactive and controlled approach. When any modification is made to the Dragonfly Custom Resource, the Operator first initiates the upgrade process with the replicas. It upgrades each replica, pausing to confirm the readiness of at least one replica before proceeding. Following this validation, the master is then upgraded, with the Operator selecting one of the latest replicas to assume the master role. This whole rollout process is done automatically with no additional operational input.
3. Reliability - Using the REPLTAKEOVER
Command for Upgrades
In previous iterations, upgrading Dragonfly presented certain challenges. Particularly, the abrupt transition from the old version of the master to a new one could result in potential data inconsistencies. Clients were not locked during this transition, meaning that writes to the old master might not have been fully propagated to the new master, leading to data loss.
The REPLTAKEOVER
command addresses these challenges by locking the old master, ensuring that all ongoing operations are completed. Only once this steady state is achieved will the system proceed to migrate to the new master.
Finally, with the recent change to use Cluster IP Service instead of Headless Service, any failover updates are propagated to clients faster.
Benchmarks
Last but not least, Dragonfly is known for being ultra-performant and extremely reliable. With all the abstraction layers in Kubernetes plus Dragonfly running in a containerized environment, we are able to achieve 1.3 million QPS with sub-millisecond P99.9 latency on an AWS c6gn.8xlarge
instance. The load is generated with the memtier-benchmark tool. Detailed benchmarking steps and results can be found in the video below.
Conclusion
Kubernetes is designed for managing complex production workloads, and Dragonfly is designed to make it easy to scale those same workloads with unparalleled performance. We're excited to see what you build. If you would like a free trial of a fully managed Dragonfly Cloud account, please request one here.
Also, we will be hosting an online Technical Workshop about Dragonfly Operator on Nov 15, 2023. It's a great chance to connect and learn, and if you are interested, please register here.