Soft Actor-Critic was introduced in “Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor,” posted to arXiv on January 4, 2018 by Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. SAC is an off-policy actor-critic algorithm built on the maximum entropy reinforcement learning framework, in which the agent tries to maximize expected reward while also acting as randomly as possible.
The entropy term is the key idea. By rewarding the policy for keeping its options open, SAC explores more thoroughly and avoids collapsing prematurely onto a single behavior. Because it is off-policy, it can reuse past experience from a replay buffer, which makes it far more sample-efficient than on-policy methods on continuous control tasks. The paper reported state-of-the-art performance across a range of continuous control benchmarks along with notable stability across random seeds, a chronic weakness of earlier off-policy methods.
That combination of sample efficiency and robustness made SAC a default choice for continuous control problems, especially in robotics where collecting real-world experience is expensive. For a business reader, SAC matters because it is one of the algorithms that made reinforcement learning practical for physical systems rather than just simulated games.