What is Apache Kafka ?
Kafka is a distributed data streaming platform. Through Kafka, you can
- Publish and subscribe to streams of records/messages.
- Process streams of records as they arrive.
- Can store streams of records in a fault-tolerant way, when needed.
At enterprise level Kafka is used in
- Data pipelines that reliably transfer data between systems or applications
- Streaming applications that modify data.
- Response systems that take some action for continuous incoming data.
Remember that Kafka runs on a cluster of one or more servers. The Kafka cluster stores incoming streams of records in categories called topics. Each record consists of a key, a value, and a timestamp.
How to develop applications using Kafka:
Kafka has 4 APIs to simplify the application developers task.
⇒ Producer API : Helps in publishing a stream records to one or more Kafka topics.
⇒ Consumer API : For subscribing to one or more topics and process the incoming stream of records/messages.
⇒ Streams API : meant for stream processing. Helps in consuming an input stream from one or more topics and producing an output stream to one or more output topics.
⇒ Connector API : Connect Kafka topics to existing applications or data delivering systems.