KaaS - Kafka as a Service

Introduction

This document walks-through the design specification for Kafka as a Service (KaaS).

The key aspect is to facilitate Kafka as a managed service by the Company. The KaaS customers would leverage Kafka without overheads of deployment & TCO, however producer and consumer to Kafka would reside in customer boundary.

System Architecture

Kafka Key Attributes

Message

A record or unit of data within Kafka. Each message has a key and a value, and optionally headers.

Producer

Producers publish messages to Kafka topics. Producers decide which topic partition to publish to, either randomly (round-robin) or using a partitioning algorithm based on a message's key.

Broker

Kafka runs in a distributed system or cluster. Each node in the cluster is called a broker.

Topic

A topic is a category to which data records — or messages — are published. Consumers subscribe to topics in order to read the data written to them.

Topic partition

Topics are divided into partitions, and each message is given an offset. Each partition is typically replicated at least once or twice. Each partition has a leader and one or more replicas (copies of the data) that exist on followers, providing protection against a broker failure. All brokers in the cluster are both leaders and followers, but a broker has at most one replica of a topic partition. The leader is used for all reads and writes.

Offset

Each message within a partition is assigned an offset, a monotonically increasing integer that serves as a unique identifier for the message within the partition.

Consumer

Consumers read messages from Kafka topics by subscribing to topic partitions. The consuming application then processes the message to accomplish whatever work is desired.

Consumer group

Consumers can be organized into logic consumer groups. Topic partitions are assigned to balance the assignments among all consumers in the group. Within a consumer group, all consumers work in a load-balanced mode; in other words, each message will be seen by one consumer in the group. If a consumer goes away, the partition is assigned to another consumer in the group. This is referred to as a rebalance. If there are more consumers in a group than partitions, some consumers will be idle. If there are fewer consumers in a group than partitions, some consumers will consume messages from more than one partition.

Lag

A consumer is lagging when it's unable to read from a partition as fast as messages are produced to it. Lag is expressed as the number of offsets that are behind the head of the partition. The time required to recover from lag (to "catch up") depends on how quickly the consumer is able to consume messages per second:

Goals

Multi Tenant KaaS

SLA

Availability - 100% availability desired -> 99.9999
Latency - Low latency
Durability - 0% data loss

Distributed Data System (KaaS)

High Availability

Fault Tolerance

Network Failures
Data corruption on disk or Disk failure
Bad code resulting in bugs and failure

Durable

- Partitions
- Replication factor

Adapt to Change for Distributed components

New Software versions need to be deployed to 1000s of machines
Capacity planning - expand the cluster with increase in load
Data skew across machines - need re-balancing

Monitoring Systems

Want to track key metric for system health with accuracy (comes with time and experience)

Components

Kafka Cluster

- The Kafka Cluster is a collection of Kafka Broker nodes

Disk Based
Durable
Scalable
Low Latency
Finite Retention (Size of data or time for retention - e.g. 7 days)

Zookeeper Ensemble

- The Zookeeper Ensemble is a collection of Zookeeper nodes. The Kafka cluster is managed by Zookeeper ensemble.

Use ssd, Consumers publishing offsets every time, so need low latency

Monitoring

- The monitoring pipeline is to collect\process\visualize variety of metric produced in the system
- The statsd client would be running as service over each node in Kafka cluster and zookeeper ensemble with custom metric pushed to statsd
- The statsd client would also collect system metric
- The metric from statsd would be streamed and stored in Graphite (Metric DB)
- The metric threshold alerts would be configured in Graphite, with alert delivery to Pager Duty
- Pager Duty is alert management tool that delivers alerts to On-call team
- The Grafana would be metric visualization\monitoring tool in form of graphs
- The monitoring components Graphite, Grafana, Pager Duty would be outside the Kafka Cluster VPC

Logging

- The logs generated by Kafka broker, zookeeper would be filtered to specific Error class of logs
- The logs would be written to file on disk with bounded file size and count of files, after which log-rotate would ensure safe-state
- The logs would also be streamed out to logging as a service (Use Kibana) to ensure visibility of issue for disaster and errors
- The logging components Kibana would be outside the Kafka Cluster VPC

DDoS

- The DDoS component would be first layer of defense for the KaaS
- The DDoS would be external service provided by external vendors e,g, Incapsule (DDoS) and CloudFlare (DDoS, DNSSec). Pick CloudFlare as vendor for feature set.
- The customer connection would hit the DDoS first.

API Gateway

- The API Gateway is 2nd layer of defense with throttling over burst-rate, quota-limit etc
- This layer could also facilitate api-key based auth

ELB

- The Elastic Load Balancer distributes the load to the Kafka brokers

Jumpbox

- The Jumpbox would be used to access the Kafka cluster and zookeeper nodes from outside the VPC

Orchestration

- This module needs to evolve based on customer-onboarding and management requirements
- The Key responsibility could be to create create customer topic with partition count and replication factor with ACL

System Flow

Customer On-boarding

- The Customer On-boarding involves creation of ACLs for customer for Kafka Broker
- This includes creation of API Gateway plan for customer for burst-rate, etc
- This would include license attributes e.g. max no of topics, partitions etc
- The customer would get the FQDN-Broker for Kafka Broker FQDN-Zookeeper for Zookeeper ensemble

Customer Orchestration Path

- Customer would create a topic (e.g. foo) with partition count and replication factor
- This could be a orchestration service as part of KaaS to facilitate this functionality to customer

Customer Control Path

- Customer creates a producer with FQDN-Broker (duplicated 3 times). This would act as metadata.broker.list

Customer send a message to topic foo using this producer.
The producer looks up its local metadata cache to see what brokers are leaders for each partition of topic foo and how many partitions does your foo topic have. As this is the first send to the producer, local cache contains nothing.
Producer sends a TopicMetadataRequest to each broker in metadata.broker.list (3 * FQDN-Broker) sequentially until first successful response. 1 broker in that list would work as long as it's alive.
Returned TopicMetadataResponse will contain the information about requested topics, in your case it's foo and brokers in the cluster. Basically, this response contains the following.
list of brokers in the cluster, where each broker has an ID, host and port. This list may not contain the entire list of brokers in the cluster, but should contain at least the list of brokers that are responsible for servicing the subject topic. The host and port are public PAT mapping to Kafka broker
list of topic metadata, where each entry has topic name, number of partitions, leader broker ID for each partition and ISR broker IDs for each partition.
Based on TopicMetadataResponse, the producer builds up its local cache and now knows exactly that the request for topic foo partition 0 should go to broker X.
Based on number of partitions in a topic, producer partitions your message and accumulates it with the knowledge that it should be sent as a part of batch to some broker.
When the batch is full or linger.ms timeout passes, your producer flushes the batch to the broker. By "flushes" it means "opens a new connection to a broker or reuses an existing one, and sends the ProduceRequest".
The producer does not need to open unnecessary connections to all brokers, as the topic you are producing to may not be serviced by some brokers, and the Kafka cluster could be quite large. Imagine a 1000 broker cluster with lots of topics

Security

Kafka Supported Options

- Plain Text (No Encryption and Authentication)
- SSL (Encryption\Authentication)
- SASL (Kerberos Authentication)
- SSL + SASL (SSL for Encryption and SASL for Authentication)

Kafka Security has three components

Encryption of data in-flight using SSL / TLS: This allows your data to be encrypted between your producers and Kafka and your consumers and Kafka. This is a very common pattern everyone has used when going on the web. That’s the “S” of HTTPS (that beautiful green lock you see everywhere on the web).
Authentication using SSL or SASL: This allows your producers and your consumers to authenticate to your Kafka cluster, which verifies their identity. It’s also a secure way to enable your clients to endorse an identity. Why would you want that? Well, for authorization!
Authorization using ACLs: Once your clients are authenticated, your Kafka brokers can run them against access control lists (ACL) to determine whether or not a particular client would be authorised to write or read to some topic.

Leverage TLS

- TLS (Transport Layer Security) encrypts the connection between 2 endpoints for secure data exchange
- Based on SSL certificates
- Two ways of using TLS (Kafka can use both)
- 1-way verification : Encryption
- 2-way verification : Authentication

Security of connections (data in-flight)

- TLS1.2 Encryption + Authentication : Between Brokers and Client (Producers and Consumers)
- TLS1.2 Encryption : Between Brokers
- TLS1.2 Encryption : Between Brokers and other tools
- TLS1.2 Encryption : Between Brokers and Zookeeper

Authorization of Read and Write

- Achieved by ACL for read and write to topic on a broker

Security of data at-rest

- Encryption of data on disk

Monitoring Systems

Monitor Kafka Broker

Cluster Health and Utilization
Under replicated partitions - ISR in-sync replica is low (0\1)
Offline partition (Other than 0 is a problem)
Broker partition count (No of partition each broker is managing whether as replica or leader) this should be even across cluster
Data size on disk
Leader partition count (No of leaders per brokers should be even, traffic would go to leaders) Ensure traffic is balanced across the board
Network Utilization

Monitor Zookeepers

Ensemble available
Latency (SSD to help for Zookeeper)
Outstanding requests

Monitor Mirror Makers and Audit

Mirror Maker (Create mirror of cluster1 to cluster2)
Lag - How far behind is new cluster2 is copying messages (Want to be zero)
Dropped Messages (Only 0 is accepted)

Monitor Audit Consumer

To assure pipelines are healthy all the time
Producer gives messages and periodic messages_count
Audit consumer monitors lag
Completeness check - Message_count produced Vs Messages present (if not 100% get alert)

Monitor REST Interfaces

- Synthetic connections metric

Synthetic Heartbeat

- Synthetic heartbeat is achieved by creating a synthetic customer
- This customer activity and metric could be monitored from client end to captured client side experience
- This could capture any drop in user experience
- More-over this could simulate different use-cases by customers to ensure all customer flows are functional

Scalability

Partitions for Consumer

- More than one partition to distribute consumers

Add a Broker Node

Once the node is started and has successfully joined the cluster, it doesn’t automatically receive partitions
Redistribute partitions
Use the kafka-reassign-partitions.sh tool to generate partition assignments. It takes the topic list and the broker list as input, and produces the assignment plan
Once the reassignment is finished, your partitions have been redistributed over the cluster

Remove a Broker Node

Get the topic list
Generate the assignment plan excluding the node to remove
Execute the assignment plan and check if the plan has got executed
Stop the broker and remove it

Ensure Network Bandwidth

Throttle the traffic generated by Data Balancing when real-time data transfers occur

Zookeeper

Fault Tolerance

- More than one replica for each partition for availability
- Rack Awareness: Starting with version 0.10.0, Kafka supports rack aware replica placement. It means Kafka will try to place replicas in different racks (or availability zones).