Ray: A Distributed Framework for Emerging AI Applications

“Ray: A Distributed Framework for Emerging AI Applications” was submitted to arXiv on December 16, 2017 and appeared at the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2018). Its authors were Philipp Moritz, Robert Nishihara, Stephanie Wang, Alexey Tumanov, Richard Liaw, Eric Liang, Melih Elibol, Zongheng Yang, William Paul, Michael I. Jordan, and Ion Stoica, working at UC Berkeley’s RISELab.

Ray targets AI applications that “continuously interact with the environment and learn from these interactions,” especially reinforcement learning, where a job mixes long-running simulations, training, and serving. The system exposes a single interface supporting both stateless tasks and stateful actors over one dynamic execution engine, backed by a distributed scheduler and a fault-tolerant store for control state. The paper reports the system scaling beyond 1.8 million tasks per second and outperforming specialized systems on reinforcement-learning workloads.

Ray grew into widely used open-source infrastructure for distributed Python and AI, with libraries for training, hyperparameter tuning, and serving, and its creators founded the company Anyscale to commercialize it. Large AI organizations adopted Ray to orchestrate training and inference across clusters, making this paper a foundational document for the compute layer beneath modern model development.

Ray: A Distributed Framework for Emerging AI Applications

Sources

Related