Synchronous REST calls couple services in time: the caller waits, and if the callee is down the call fails. Event-driven architecture flips that — services publish facts (“order placed”) to a durable log and other services react on their own schedule. Apache Kafka is the de-facto backbone for this in the enterprise, and Spring Kafka is how Java teams use it. This deep dive covers the patterns that matter in production: partitions and consumer groups, delivery semantics, error handling, the outbox pattern, and exactly-once processing.
Kafka is a distributed, append-only commit log. Producers append records to topics; consumers read them at their own pace; records are retained for a configured time (or compacted) regardless of whether anyone has read them. That durability and replayability is the key difference from a traditional message queue — a new consumer can join later and reprocess history from the beginning.
The architectural payoff is loose coupling: the order service publishes OrderPlaced and doesn’t know or care that inventory, billing, and analytics each consume it. You add a new consumer without touching the producer.
Each topic is split into partitions, and this single concept drives ordering and scaling:
orderId guarantees all events for one order are processed in order.Spring Kafka wraps the native client with a KafkaTemplate for producing and @KafkaListener for consuming. Configuration is mostly YAML.
spring:
kafka:
bootstrap-servers: ${KAFKA_BROKERS}
producer:
key-serializer: org.apache.kafka.common.serialization.StringSerializer
value-serializer: org.springframework.kafka.support.serializer.JsonSerializer
acks: all # wait for all in-sync replicas — durability
consumer:
group-id: inventory-service
auto-offset-reset: earliest
enable-auto-commit: false # commit offsets after processing, not before
@Service
class OrderEventsProducer {
private final KafkaTemplate<String, OrderPlaced> template;
OrderEventsProducer(KafkaTemplate<String, OrderPlaced> t) { this.template = t; }
void publish(OrderPlaced event) {
// key by orderId so all events for an order keep their order
template.send("orders.placed", event.orderId(), event);
}
}
@Component
class InventoryListener {
@KafkaListener(topics = "orders.placed")
void on(OrderPlaced event) {
inventory.reserve(event.sku(), event.qty()); // do the work
} // offset committed on success
}
Two settings above are load-bearing. acks: all makes the producer wait until all in-sync replicas have the record, trading a little latency for durability. Turning off auto-commit and letting the container commit the offset only after the listener returns successfully is what gives you at-least-once delivery — if the consumer crashes mid-processing, the record is redelivered rather than silently lost.
| Semantic | Behavior | Use when |
|---|---|---|
| At-most-once | Commit before processing; a crash loses the record | Lossy telemetry where speed beats completeness |
| At-least-once | Commit after processing; a crash redelivers (possible duplicates) | The pragmatic default — combine with idempotent consumers |
| Exactly-once | Idempotent producer + transactions; no loss, no duplicates within Kafka | Kafka-to-Kafka stream processing where duplicates are unacceptable |
Most services run at-least-once and make the consumer idempotent — processing the same event twice produces the same result. The cheapest way is a dedup table keyed by event ID, or designing the operation to be naturally idempotent (an upsert, a set-to-state rather than an increment).
A “poison” message that always fails will, without guards, block its partition forever as the consumer retries it endlessly. Spring Kafka’s DefaultErrorHandler applies a backoff and a retry budget, then routes the record to a dead-letter topic (DLT) so the partition keeps moving and a human (or an automated process) can inspect the failures later.
@Bean
DefaultErrorHandler errorHandler(KafkaTemplate<Object, Object> template) {
var recoverer = new DeadLetterPublishingRecoverer(template); // -> orders.placed.DLT
return new DefaultErrorHandler(recoverer, new FixedBackOff(1000L, 3)); // 3 retries
}
Distinguish transient failures (a downstream timeout — worth retrying) from permanent ones (a malformed payload — send straight to the DLT). Retrying a deserialization error three times just wastes time before the inevitable.
A subtle but critical trap: a handler that writes to its database and publishes to Kafka is doing two writes to two systems with no shared transaction. If the DB commit succeeds but the Kafka publish fails (or vice versa), your systems diverge — an order exists with no OrderPlaced event, or an event with no order.
The fix is the transactional outbox: within the same database transaction that saves the order, also insert the event into an outbox table. A separate relay process (or a change-data-capture tool like Debezium) reads the outbox and publishes to Kafka, marking rows sent. Now the business write and the “intent to publish” are atomic, and the relay guarantees the event eventually reaches Kafka at-least-once.
@Transactional
public void placeOrder(Order order) {
orderRepository.save(order);
outboxRepository.save(OutboxEvent.from(
"orders.placed", order.id(), new OrderPlaced(order))); // same TX
}
// A relay/CDC process publishes unsent outbox rows to Kafka and marks them sent.
For pure Kafka-to-Kafka flows (consume → transform → produce), Kafka’s transactions plus the idempotent producer give true exactly-once semantics: the produced records and the consumed offsets commit atomically. Spring Kafka enables this with a transactional producer and read_committed isolation on the consumer. Kafka Streams makes it a one-liner (processing.guarantee=exactly_once_v2). Just remember its boundary: EOS covers Kafka, not your external database — for that, you still need idempotency or the outbox.
Events are a contract between teams, and that contract will change. Use a schema registry (Avro, Protobuf, or JSON Schema) to enforce backward/forward compatibility so a producer adding a field can’t break existing consumers. The discipline: only make compatible changes (add optional fields, never remove or repurpose), and version events when you must break compatibility. Skipping this is how an event-driven platform turns into a coordination nightmare.
max.poll.records/max.poll.interval.ms so slow processing doesn’t kick a consumer out of the group.Event-driven Java on Kafka buys you loose coupling, replayability, and independent scaling — but the durability guarantees you actually get depend on choices you make: key for ordering, size partitions for parallelism, run at-least-once with idempotent consumers, dead-letter your poison messages, and use the outbox pattern to keep your database and your events consistent. Get those right and Kafka becomes a reliable nervous system for the whole platform rather than a source of mysterious data drift.
Does Kafka guarantee exactly-once delivery?
Kafka supports exactly-once semantics (EOS) within a Kafka-to-Kafka transactional flow using idempotent producers and transactions. End-to-end exactly-once to an external system (like a database) is not automatic — you achieve effective exactly-once with the transactional outbox pattern plus idempotent consumers.
How many partitions should a Kafka topic have?
Partitions are the unit of parallelism: a consumer group can have at most one active consumer per partition. Size for your target throughput and peak consumer count, allow headroom (you can add partitions but not remove them), and avoid going excessively high since each partition adds broker overhead.