What is Apache Kafka ?

Kafka is a distributed data streaming platform. Through Kafka, you can

  • Publish and subscribe to streams of records/messages.
  • Process streams of records as they arrive.
  • Can store streams of records in a fault-tolerant way, when needed.

At enterprise level Kafka is used in

  1. Data pipelines that reliably transfer data between systems or applications
  2. Streaming applications that modify data.
  3. Response systems that take some action for continuous incoming data.

Remember that Kafka runs on a cluster of one or more servers. The Kafka cluster stores incoming streams of records in categories called topics. Each record consists of a key, a value, and a timestamp.


How to develop applications using Kafka:

Kafka has 4 APIs to simplify the application developers task.

   ⇒ Producer API : Helps in publishing a stream records to one or more Kafka topics.

⇒ Consumer API : For subscribing to one or more topics and process the incoming stream of records/messages.

⇒ Streams API : meant for stream processing. Helps in consuming an input stream from one or more topics and producing an output stream to one or more output topics.

⇒ Connector API : Connect Kafka topics to existing applications or data delivering systems.