E · Event-Driven Architecture deep dive

Turn operational change into reliable signals

Define what happened, preserve its identity and time, survive disorder and failure, and prevent retries from becoming duplicate business actions.

Core concepts

An event is a durable statement of fact

Event

An immutable record that something material happened, stated in past tense: RoadClosed or ShipmentDelayed.

Command

A request for an action—RerouteTruck—which may be accepted, denied, retried, or expire and therefore carries authority and idempotency concerns.

Event stream

An ordered-by-key sequence of events retained for multiple consumers, replay, audit, and state reconstruction.

Event time

When the real-world event occurred, distinct from ingestion and processing time. Watermarks bound how long late data is expected.

Delivery semantics

At-most-once, at-least-once, or effectively-once behavior. Business exactly-once usually requires idempotent effects and reconciliation.

Schema and contract

Versioned structure plus meaning, keys, ownership, compatibility, quality, security, retention, and consumer obligations.

Consumer lag

The distance between produced and processed events, measured in offsets or time; it exposes an agent acting on stale operational state.

Replay

Reprocessing retained events for recovery or new logic. Replay must be authorized, observable, bounded, and safe for downstream effects.

Logistics example

RoadClosed event contract

ElementExampleReason
Identityevent_id, closure_id, source, schema version, correlation IDDeduplicate, trace, and evolve safely.
Timeoccurred, observed, published, valid-from, valid-toSeparate reality from pipeline delay and temporal applicability.
Subjectnetwork segment, geometry, jurisdiction, affected vehicle classesRoute only relevant consumers and spatial analysis.
Authoritytransportation agency, signature, confidence, provenancePrevent an untrusted signal from driving action.
Behaviorkey by segment, at-least-once delivery, seven-day retention, idempotent consumersMake ordering, recovery, and duplicates explicit.

Reliability patterns

Design the unhappy path first

Transactional outbox

Commit business state and an outbound event atomically, then relay it, avoiding a database update without its event.

Idempotent consumer

Use stable business keys and processed-event records so redelivery does not repeat a consequential action.

Dead-letter and quarantine

Separate poison messages with reason, owner, retention, correction, replay authority, and alerting.

Saga and compensation

Coordinate multi-step business work with explicit compensating action when a later step fails.

Schema compatibility

Automate backward/forward compatibility checks and consumer impact before publishing a breaking change.

Observability

Propagate correlation and trace context across producer, broker, stream processing, agent, tool, and outcome.

Technology map

Open ecosystem and enterprise platforms

NeedOpen / specializedDatabricksSnowflakeMicrosoft Fabric / Azure
Broker and ingestionApache Kafka/Pulsar, NATS, RabbitMQ; Kafka Connect and Debezium for CDCLakeflow Connect, Auto Loader, Kafka connectorsSnowpipe Streaming, Kafka ConnectorFabric Eventstreams, Azure Event Hubs, IoT Hub
Stream processingApache Flink, Kafka Streams, Apache BeamStructured Streaming and Lakeflow pipelinesStreams, Tasks, Dynamic Tables, SnowparkEventstreams transformations, Eventhouse/KQL, Fabric Activator, Azure Stream Analytics
ContractsAsyncAPI, CloudEvents, Avro/Protobuf/JSON Schema registriesSchema Registry integrations, table/pipeline contracts, Unity Catalog metadataSchema governance, object metadata, producer/consumer testsEventstream schemas, Event Hubs schema registry, Purview metadata
ObservabilityOpenTelemetry, Prometheus/Grafana, broker lag toolingStreaming query metrics, system tables, OpenTelemetry integrationAccount Usage, event/task history, monitoringFabric monitoring hub, KQL dashboards, Azure Monitor/Application Insights
Workflow effectsTemporal/Camunda/Durable Task, API gateways, policy enginesWorkflows/Jobs plus external durable orchestration for business sagasTasks plus external workflow for long-running human/business processesFabric pipelines, Azure Logic Apps, Durable Functions, Power Automate

Do not confuse ingestion with event architecture: a fast pipe without business semantics, keys, compatibility, retention, replay controls, ownership, and failure handling is only transport.

Implementation

How to achieve the E layer

1

Discover

Identify material business changes and the decisions they should trigger; avoid publishing every table mutation as a business event.

2

Contract

Name events in past tense; define identity, key, time, source, schema, semantics, owner, classification, and compatibility.

3

Engineer

Select partitions, retention, delivery, idempotency, ordering scope, watermarks, DLQ, back-pressure, recovery, and replay.

4

Secure

Authenticate workloads; authorize topics and fields; encrypt, minimize, retain, and audit producer/consumer access.

5

Observe

Instrument correlation, latency, lag, loss, duplicates, DLQ, throughput, schema failures, retries, and business reconciliation.

6

Prove

Inject duplicate, malformed, late, reordered, missing, and replayed events; exercise broker/consumer outage and reconcile outcomes.

Evidence

  • Event catalog, AsyncAPI/schema versions, owners, producer/consumer map
  • Compatibility, idempotency, disorder, load, outage, and replay test results
  • Latency/lag/DLQ/loss dashboards and SLO reviews
  • Access reviews, retention, replay approvals, incidents, runbooks

Acceptance

  • A sampled event traces source-to-outcome with one correlation ID.
  • Duplicates and replay do not duplicate business harm.
  • Late and out-of-order events yield deterministic state.
  • Recovery meets RTO/RPO and reconciliation proves completeness.