BIG DATA-kafka
Kafka is often used in real-time streaming data architectures to provide real-time analytics.
Kafka is used for real-time streams of data, used to collect big data or to do real time analysis or both). Kafka is used with in-memory microservices to provide durability and it can be used to feed events to CEP (complex event streaming systems), and IOT/IFTTT style automation systems.
KAFKA FUNDAMENTALS
Messaging System Semantics
Clustering is Core
Durability & Ordering Guarantees
USE CASES
Modern ETL/CDC
Data Pipelines
Big Data Ingest
PRODUCERS & CONSUMERS
Broker =Node in the Cluster
Producer writes records to a broker
Consumer reads records from broker
Leader/Follower for cluster distribution
TOPICS & PARTITIONS
Topic =Logical name with 1 or more partitions
Partitions are replicated
Ordering is guaranteed for partition
OFFSETS
Unique sequential ID (PER PARTITION)
Consumers track offsets
Benefits: Replay,Different Speed Consumers etc,
DELIVERY GUARANTEES
Producer
Async (No Guarantee)
Committed to Leader
Committed to Leader & Quorum
Consumer
At-least-once (Default)
At-most-once
Effectively-once
Exactly Once (Maybe)