Jay Kreps

Jay Kreps is one of the principal creators of Apache Kafka, listed as the lead author of the 2011 paper “Kafka: a Distributed Messaging System for Log Processing” alongside Neha Narkhede and Jun Rao. The system was built at LinkedIn to move large volumes of activity and log data through a single real-time pipeline rather than a tangle of point-to-point connections.

Kreps is best known for crystallizing the log-centric view of data infrastructure in his 2013 essay “The Log: What every software engineer should know about real-time data’s unifying abstraction,” published on LinkedIn’s engineering blog. In it he defines the log as “an append-only, totally-ordered sequence of records ordered by time” and argues that this single structure underlies three apparently separate problems: data integration across systems, real-time stream processing, and the internal design of distributed databases. The key insight he draws on is the State Machine Replication Principle, the idea that deterministic processes fed the same inputs in the same order produce the same outputs.

He later co-founded Confluent, a company built around Kafka, where he served as CEO. In his 2015 Confluent post “Putting Apache Kafka To Use,” he extends the argument to business operations directly, writing that “most of what a business does can be thought of as streams of events,” and presenting Kafka’s structured commit log as the way to capture and process those streams. Through both his writing and his engineering work, Kreps did much to move the industry toward treating the event stream, rather than the static database table, as the primary representation of data.

Sources

Related