BIG DATA-kafka

Kafka is often used in real-time streaming data architectures to provide real-time analytics.

Kafka is used for real-time streams of data, used to collect big data or to do real time analysis or both). Kafka is used with in-memory microservices to provide durability and it can be used to feed events to CEP (complex event streaming systems), and IOT/IFTTT style automation systems.

KAFKA FUNDAMENTALS

Messaging System Semantics
Clustering is Core
Durability & Ordering Guarantees

USE CASES

Modern ETL/CDC
Data Pipelines
Big Data Ingest

PRODUCERS & CONSUMERS

Broker =Node in the Cluster
Producer writes records to a broker
Consumer reads records from broker
Leader/Follower for cluster distribution

TOPICS & PARTITIONS

Topic =Logical name with 1 or more partitions
Partitions are replicated
Ordering is guaranteed for partition

OFFSETS

Unique sequential ID (PER PARTITION)
Consumers track offsets
Benefits: Replay,Different Speed Consumers etc,

DELIVERY GUARANTEES

Producer

- - Async (No Guarantee)
  - Committed to Leader
  - Committed to Leader & Quorum

Consumer

- - At-least-once (Default)
  - At-most-once
  - Effectively-once
  - Exactly Once (Maybe)

Page updated

Google Sites

Report abuse