When writing a piece of software, we are generally thinking about solving a particular problem or adding a specific functionality.
But, we are also thinking about how this piece of software will evolve in the future, because we know that our software will need to change: We carefully modularize our code, we write tests and even go the extra mile of configuring its continuous integration and continuous deployment scripts.
On the other hand, when designing systems, we are more concerned about:
- how it will scale to our user base ?
- how will the response time be affected ?
- will it still fulfil the initial requirements ?
I have been in the software engineering field for more than a decade and a half, within diverse and varied industries, and I can assert, from experience, that answering these questions is not always straightforward.
At the same time, we want to ensure that the systems we build age well at with the company, and that they keep our business nimble and our customers happy.
In this article, we explore how combining Kafka and Kubernetes support architecting for scale by integrating event streaming with microservices to produce scalable event-driven architectures without burning piles of cash in required engineering effort.
A step-by-step guide to deploy both tools as well as setting-up basic monitoring for the clusters is provided at the end of this article.
The Need, and the Problem, for Autonomy
First of all, why do we need such tools? Is it only to keep up with the latest and greatest and buzz-word technologies ? Not really. Let me explain:
As a company grows it forms into teams, and those teams have different responsibilities and need to be able to make progress without extensive interaction with one another.
The larger the company, the more of this autonomy they need, and most importantly, a balance must be struck in terms of the way people, responsibility and communication structures are arranged in a company.
This balance translates inevitably to the software.
Say for instance that when working on a project and someone from another team asks if their application can pull data out of your database, you know that this is probably going to lead to problems down the road. You will keep wondering if your latest release will break the dependency they have on you.
That’s why, among other reasons, we design software systems where components are operated and evolved independently.
When architecting for scale, we are more concerned about how to build software that runs a company while organizing a large engineering effort.
The problem we face has three distinct, but linked, parts: organization, software and data.
What differentiates the good architecture from the bad, is their ability to manage these three factors as they evolve, independently, over time.
Now, let’s explore how combining microservices and event-based architectures can help, in this case, with Kubernetes and Apache Kafka.
Cloud-Native with Kafka and Kubernetes
Apache Kafka and Kubernetes cover complementary concerns — event streaming and container orchestration. Used together, they give you a solid base for distributed, cloud-native systems.
Apache Kafka is a distributed event streaming platform built for high-throughput, low-latency communication between services.
Kubernetes automates the deployment, scaling, and management of containerized applications.
Here’s how they complement each other:
1.Scalability
Kafka is designed to handle high-throughput data streams and scale horizontally. It can manage large volumes of data and a high number of concurrent producers and consumers.
Kubernetes handles the deployment, scaling, and management of containerized applications, scaling them dynamically based on demand.
Kafka’s partitioning and replication mechanisms ensure that data can be distributed and processed efficiently across multiple brokers.
Similarly for containers, Kubernetes manages their lifecycle, automatically restarting failed containers and scaling out additional instances as needed.
When Kafka runs on Kubernetes, brokers can be scaled up or down as data volume or processing demand changes. Kubernetes handles the orchestration, keeping the cluster balanced and operational.
2. Resilience and high availability
Kafka ensures high availability through data replication across multiple brokers. If one broker fails, another broker with the replicated data can take over to support fault tolerance.
Kubernetes provides self-healing capabilities, such as automatic restarts, rescheduling of failed containers, and node health checks. This ensures that applications continue running smoothly even in the event of hardware or software failures.
3. Ease of management
Kafka requires careful management of brokers, topics, partitions, and consumer groups. Kubernetes provides declarative configuration for deploying and managing applications.
Tools like Helm and Kubernetes Operators (e.g., Strimzi for Kafka) simplify the deployment and management of Kafka clusters (Follow the link below for a step-by-step deployment with Strimzi).
Also, Kubernetes’ built-in monitoring and logging capabilities, along with integrations with Prometheus and Grafana, provide deep insights into the health and performance of Kafka clusters.
With Kubernetes, Kafka clusters can be managed declaratively, reducing the operational overhead. Kubernetes Operators like Strimzi automate complex tasks such as broker configuration, rolling upgrades, and scaling.
4. Resource efficiency
Kafka’s performance is highly dependent on resource allocation and efficient use of CPU, memory, and disk I/O.
To help with this, Kubernetes efficiently schedules and allocates resources to containers based on their requirements and the overall cluster resources. It can ensure that no single application monopolizes resources at the expense of others.
The takeaway
Architecting at scale is not an easy task. We are organizing a large engineering effort to build software that is basically running the company.
Kafka and Kubernetes solve different problems, but they pull in the same direction: giving teams the autonomy to build, deploy, and scale services independently without coordinating every change.
If you’re running distributed systems at any meaningful scale, combining them is worth the investment.