Markov chain Monte Carlo, usually abbreviated MCMC, is a set of techniques for sampling from probability distributions that are too complicated to handle directly. Many questions in statistics and machine learning reduce to computing averages over a distribution, but for realistic models that distribution has no usable formula and cannot be sampled by ordinary means. MCMC solves this by constructing a random walk, a Markov chain, that is designed so that the places it visits, in the long run, occur in proportion to the target distribution.
The foundational instance is the Metropolis algorithm from the 1953 Los Alamos paper on equation of state calculations, which proposed accepting or rejecting each step based on how it changes a quantity of interest. W. K. Hastings generalized this in his 1970 Biometrika paper, producing the Metropolis-Hastings algorithm that allows almost any proposal mechanism. Together these works define the core of MCMC.
Once the chain has run long enough to forget its starting point, the sequence of states it produces can be treated as samples from the target distribution, and averages over those samples approximate the averages you actually wanted.
MCMC matters because it is what makes Bayesian analysis computable for real problems. From genetics to economics to the probabilistic models behind machine learning, it lets analysts answer questions about models whose exact solutions are out of reach, trading a closed-form answer for a tractable simulation.