Kafka’s publishing is not bound to consuming. So the messages that are published can live in Kafka very long time, until they are consumed. This makes kafka a good storage system for the in-flight messages.
Data written to Kafka is written to disk and replicated for fault-tolerance. Kafka allows producers to wait on acknowledgement so that a write isn’t considered complete until it is fully replicated and guaranteed to persist even if the server written to fails.
The disk structures used in Kafka scale very well. Kafka’s performance remains good even though you have 50 KB or 50 TB of persistent data on the server.
Kafka allows the clients to control their read position. So it acts as a distributed filesystem dedicated to high-performance, low-latency commit log storage, replication, and propagation.
Kafka for Stream Processing
It isn’t enough to just read, write, and store streams of data, the purpose is to enable real-time processing of streams.
In Kafka a stream processor is anything that takes continual streams of data from input topics, performs some processing on this input, and produces continual streams of data to output topics.
For example, a retail application might take in input streams of sales and shipments, and output a stream of reorders and price adjustments computed off this data.
It is possible to do simple processing directly using the producer and consumer APIs. However for more complex transformations Kafka provides a fully integrated Streams API. This allows building applications that do non-trivial processing that compute aggregations off of streams or join streams together.
This facility helps solve the hard problems this type of application faces: handling out-of-order data, reprocessing input as code changes, performing stateful computations, etc.
The streams API builds on the core primitives Kafka provides: it uses the producer and consumer APIs for input, uses Kafka for stateful storage, and uses the same group mechanism for fault tolerance among the stream processor instances.