Apache Kafka¶

Overview¶

Apache Kafka is a distributed streaming platform designed to handle real-time data feeds. It combines the functionality of a messaging system with the characteristics of a distributed database log, making it ideal for building real-time data pipelines and streaming applications.

Data Model¶

Core Concepts¶

graph TB
    subgraph "Kafka Cluster"
        B[Broker 1]
        B2[Broker 2]
        B3[Broker 3]
    end

    subgraph "Topic: user-events"
        P1[Partition 0]
        P2[Partition 1]
        P3[Partition 2]
    end

    subgraph "Consumer Group"
        C1[Consumer 1]
        C2[Consumer 2]
        C3[Consumer 3]
    end

    PROD[Producer] --> B
    B --> P1
    B2 --> P2
    B3 --> P3

    P1 --> C1
    P2 --> C2
    P3 --> C3

Data Structure¶

Topics: Categories of messages (e.g., "user-events", "order-updates")
Partitions: Ordered sequences of records within a topic
Records: Key-value pairs with metadata (timestamp, offset)
Offsets: Unique sequential identifiers for each record in a partition

Message Format¶

{
  "key": "user-123",
  "value": {
    "userId": "user-123",
    "action": "login",
    "timestamp": "2025-01-11T16:54:15Z",
    "metadata": {
      "source": "web-app",
      "version": "1.2"
    }
  },
  "headers": {
    "correlation-id": "req-456",
    "content-type": "application/json"
  }
}

Architecture Overview¶

Single Node Architecture¶

graph TB
    subgraph "Kafka Broker"
        KS[Kafka Server]
        KRAFT[KRaft Controller]

        subgraph "Storage"
            L1[Topic A - Partition 0]
            L2[Topic A - Partition 1]
            L3[Topic B - Partition 0]
        end

        subgraph "Network Layer"
            API[API Requests]
            REP[Replication]
        end
    end

    PROD[Producers] --> API
    API --> KS
    KS --> L1
    KS --> L2
    KS --> L3

    L1 --> CONS1[Consumer Group 1]
    L2 --> CONS2[Consumer Group 2]
    L3 --> CONS3[Consumer Group 3]

    KRAFT --> KS

Distributed Architecture¶

graph TB
    subgraph "Kafka Cluster"
        subgraph "Broker 1 (Leader)"
            B1[Kafka Server 1]
            S1[Storage 1]
        end

        subgraph "Broker 2 (Follower)"
            B2[Kafka Server 2]
            S2[Storage 2]
        end

        subgraph "Broker 3 (Follower)"
            B3[Kafka Server 3]
            S3[Storage 3]
        end

        subgraph "KRaft Controller Cluster"
            KRAFT1[KRaft Controller 1]
            KRAFT2[KRaft Controller 2]
            KRAFT3[KRaft Controller 3]
        end
    end

    subgraph "External Systems"
        PROD[Producers]
        CONS[Consumers]
        CONN[Kafka Connect]
        STREAMS[Kafka Streams]
    end

    PROD --> B1
    PROD --> B2
    PROD --> B3

    B1 --> S1
    B2 --> S2
    B3 --> S3

    S1 -.-> S2
    S1 -.-> S3

    KRAFT1 --> B1
    KRAFT2 --> B2
    KRAFT3 --> B3

    B1 --> CONS
    B2 --> CONS
    B3 --> CONS

    B1 --> CONN
    B1 --> STREAMS

Target Operating Model (TOM)¶

Without High Availability¶

Single Broker Setup¶

Component	Specification	Purpose
Kafka Broker	1 instance	Single point of message handling
KRaft Controller	1 instance	Cluster coordination
Storage	Local disk	Message persistence
Network	Single NIC	Client communication

Resource Requirements¶

Resource	Minimum	Recommended	Purpose
CPU	2 cores	4+ cores	Message processing
Memory	4GB	8GB+	Page cache, JVM heap
Storage	100GB	500GB+	Message retention
Network	100Mbps	1Gbps+	Throughput

Configuration Example¶

# Single broker configuration
broker.id=0
listeners=PLAINTEXT://localhost:9092
log.dirs=/var/kafka-logs
num.network.threads=3
num.io.threads=8
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
log.retention.hours=168
log.segment.bytes=1073741824
log.retention.check.interval.ms=300000
process.roles=broker,controller
node.id=1
controller.quorum.voters=1@localhost:9093

With High Availability¶

Multi-Broker Cluster Setup¶

Component	Specification	Purpose
Kafka Brokers	3+ instances	Fault tolerance
KRaft Controllers	3+ instances	Distributed coordination
Storage	Replicated across brokers	Data durability
Load Balancer	Optional	Client distribution

Resource Requirements (Per Broker)¶

Resource	Minimum	Recommended	Purpose
CPU	4 cores	8+ cores	Concurrent processing
Memory	8GB	16GB+	Caching, replication
Storage	500GB	1TB+	Message retention
Network	1Gbps	10Gbps+	Replication traffic

Deployment Architecture¶

graph TB
    subgraph "Production Cluster"
        subgraph "Availability Zone 1"
            B1[Broker 1]
            KRAFT1[KRaft Controller 1]
        end

        subgraph "Availability Zone 2"
            B2[Broker 2]
            KRAFT2[KRaft Controller 2]
        end

        subgraph "Availability Zone 3"
            B3[Broker 3]
            KRAFT3[KRaft Controller 3]
        end

        subgraph "Monitoring"
            PROM[Prometheus]
            GRAF[Grafana]
            ALERT[AlertManager]
        end

        subgraph "Management"
            MANAGER[Kafka Manager]
            SCHEMA[Schema Registry]
        end
    end

    subgraph "Load Balancer"
        LB[Load Balancer]
    end

    subgraph "Applications"
        APP1[App 1]
        APP2[App 2]
        APP3[App 3]
    end

    APP1 --> LB
    APP2 --> LB
    APP3 --> LB

    LB --> B1
    LB --> B2
    LB --> B3

    B1 -.-> B2
    B2 -.-> B3
    B3 -.-> B1

    KRAFT1 -.-> KRAFT2
    KRAFT2 -.-> KRAFT3
    KRAFT3 -.-> KRAFT1

    B1 --> PROM
    B2 --> PROM
    B3 --> PROM

    PROM --> GRAF
    PROM --> ALERT

HA Configuration¶

# High availability configuration
broker.id=1  # Unique per broker
listeners=PLAINTEXT://broker1:9092
advertised.listeners=PLAINTEXT://broker1:9092
log.dirs=/var/kafka-logs
num.network.threads=8
num.io.threads=16
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600

# Replication settings
default.replication.factor=3
min.insync.replicas=2
unclean.leader.election.enable=false
auto.create.topics.enable=false

# Retention settings
log.retention.hours=168
log.segment.bytes=1073741824
log.retention.check.interval.ms=300000

# KRaft configuration
process.roles=broker,controller
node.id=1
controller.quorum.voters=1@broker1:9093,2@broker2:9093,3@broker3:9093
controller.listener.names=CONTROLLER
listeners=PLAINTEXT://broker1:9092,CONTROLLER://broker1:9093

Pros and Cons¶

Pros¶

Performance & Scalability¶

High Throughput: Handles millions of messages per second
Horizontal Scaling: Easy to add brokers and partitions
Low Latency: Sub-millisecond latencies possible
Efficient Storage: Optimized for sequential disk I/O

Reliability & Durability¶

Fault Tolerance: Built-in replication and failover
Data Durability: Configurable persistence guarantees
Exactly-Once Semantics: Transactional message processing
Long-term Storage: Messages can be retained indefinitely

Ecosystem & Integration¶

Rich Ecosystem: Kafka Connect, Kafka Streams, KSQL
Protocol Support: Native protocol with many client libraries
Cloud Integration: Available on all major cloud platforms
Enterprise Features: Security, monitoring, management tools

Developer Experience¶

Mature Tooling: Extensive monitoring and management tools
Strong Community: Large user base and active development
Documentation: Comprehensive documentation and examples
Flexible APIs: Producer, Consumer, Admin, and Streams APIs

Cons¶

Complexity¶

Operational Overhead: Requires skilled operations team
Configuration Complex: Many tuning parameters
KRaft Migration: Transitioning from Zookeeper to KRaft
Learning Curve: Steep learning curve for optimal usage

Resource Requirements¶

Memory Intensive: Requires substantial RAM for good performance
Storage Costs: Can be expensive for long-term retention
Network Bandwidth: High bandwidth requirements for replication
JVM Management: Requires JVM tuning expertise

Limitations¶

Message Ordering: Only guaranteed within partitions
Small Message Overhead: Higher overhead for small messages
Consumer Lag: Can be challenging to manage consumer lag
Schema Evolution: Requires careful planning for schema changes

Use Case Constraints¶

Simple Use Cases: Overkill for basic messaging needs
Synchronous Processing: Not designed for request-response patterns
Complex Routing: Limited routing capabilities compared to message brokers
Protocol Limitations: Kafka protocol only, no standard protocol support

Best Practices¶

Production Deployment¶

Hardware Sizing
Use dedicated hardware for production
Ensure adequate disk I/O and network capacity
Monitor JVM heap and garbage collection
Monitoring & Alerting
Monitor broker metrics (CPU, memory, disk, network)
Set up alerts for consumer lag and broker failures
Use tools like Prometheus, Grafana, and Kafka Manager
Security
Enable SSL/SASL authentication
Use ACLs for topic-level authorization
Encrypt data in transit and at rest
Data Management
Plan topic partitioning strategy
Set appropriate retention policies
Implement proper backup and disaster recovery

Development Guidelines¶

Producer Best Practices
Use appropriate acknowledgment settings
Implement retry logic with exponential backoff
Batch messages for better throughput
Consumer Best Practices
Handle consumer rebalancing gracefully
Implement proper error handling
Monitor consumer lag
Schema Management
Use Schema Registry for schema evolution
Plan for backward/forward compatibility
Version your message schemas

When to Choose Kafka¶

Ideal Use Cases¶

Event Streaming: Real-time event processing
Log Aggregation: Centralized log collection
Data Integration: ETL pipelines and data lakes
Activity Tracking: User behavior analytics
Microservices: Service-to-service communication

Consider Alternatives When¶

Simple Messaging: Basic point-to-point or pub/sub
Low Volume: Less than 1000 messages/second
Request-Response: Synchronous communication patterns
Resource Constraints: Limited infrastructure resources
Operational Complexity: Lack of operational expertise