Apache Pulsar

Apache Pulsar is an open-source, distributed messaging and streaming platform built for the cloud. Its documentation describes it as a multi-tenant, high-performance solution for server-to-server messaging that functions as both a publish/subscribe messaging system and a streaming platform. The project README summarizes it as a distributed pub-sub messaging platform with a flexible messaging model and an intuitive client API, capable of handling millions of independent topics.

Pulsar was originally developed at Yahoo to serve as a unified messaging platform across the company’s many properties, then donated to the Apache Software Foundation. It entered the Apache Incubator in 2016 and graduated to a top-level Apache project in 2018, after which it is maintained under the stewardship of the Apache Software Foundation. The motivation was to support very large numbers of tenants and topics with strong durability and predictable latency, requirements that emerged from operating messaging at Yahoo’s scale.

A distinctive aspect of Pulsar’s architecture is the separation of the serving layer from the storage layer. Stateless brokers handle client connections and message dispatch, while message data is persisted in Apache BookKeeper, a separate distributed log storage system. Because brokers hold no permanent state, they can be added, removed, or restarted without rebalancing data, and storage can be scaled independently of serving capacity. This decoupling is the structural feature that sets Pulsar apart from architectures that bind storage and serving together on the same nodes.

Pulsar exposes several subscription modes over the same topic, including exclusive, shared, failover, and key-shared, so a single stream can be consumed as a strictly ordered log, a load-balanced work queue, or a partitioned set, depending on the consumer’s needs. It is multi-tenant by design, with built-in namespaces, authentication, and resource isolation, and supports geo-replication so messages can be mirrored across data centers. A complementary feature, tiered storage, offloads older data from hot or warm storage to cheaper long-term stores such as Amazon S3 or Google Cloud Storage as it ages, controlling cost while keeping historical data accessible.

These properties make Pulsar a frequent choice for organizations that need both queuing and streaming from one platform, that run many independent teams or customers on shared infrastructure, or that must retain very large histories economically. It is often discussed alongside Apache Kafka in the event-streaming space, with Pulsar’s separated storage and native multi-tenancy cited as its main architectural differentiators.