Kafka producers and consumers
- Producers publish data to the topics of their interest.
- The producer decides which record to publish to which partition within a topic.
- Producer can distribute records based on some key or in a round-robin fashion, so that load is balanced. Semantic partition functions may also be used when publishing to multiple partitions.
- Consumers assign themselves a consumer group name. When a record is published into a topic, it is delivered to a consumer within each subscribing consumer group.
- Consumers may be in separate processes or across separate machines.
- When N consumer instances have the same consumer group, then the records will be balanced over the N instances.
- When consumers belong to different groups, each record has to be broadcast to all the consumers.
The above image represents a two server Kafka cluster. Four partitions (P0-P3) with two consumer groups are maintained. Consumer group A has two consumer instances and group B has four.
In most cases, topics have a small number of consumer groups, one for each “logical subscriber”. Each group is composed of many consumer instances for scalability and fault tolerance. This is nothing more than publish-subscribe semantics where the subscriber is a cluster of consumers instead of a single process.
Partitions are divided over consumer instances , so that each instance can read from a fair number of partitions. Kafka protocol dynamically controls the number of consumer in a group. If new consumers join a group, they take over some partitions from other members of the group. When an instance expires, its partitions are distributed to the remaining instances.
Kafka guarantees ordering within a partition not across the whole system. If ordering is needed over total records, the topic should have only one partition. This means Kafka will allow one consumer process per consumer group.