Shannon Entropy

Shannon entropy is the central quantity of information theory, introduced by Claude Shannon in his 1948 paper A Mathematical Theory of Communication. It measures the average amount of uncertainty, or equivalently the average amount of information, in the output of a random source. A source that always emits the same symbol carries no surprise and has zero entropy; a source whose outcomes are unpredictable and evenly balanced has the highest possible entropy for its number of outcomes.

Shannon defined entropy as the average of the logarithm of the inverse probability of each outcome, a formula that follows from a few natural requirements about how a measure of uncertainty ought to behave. When the logarithm is taken in base two, entropy is measured in bits, and it tells you the average number of yes-or-no questions needed to pin down an outcome.

The concept’s practical force comes from Shannon’s source coding theorem, which states that entropy is the hard floor on lossless compression: no scheme can represent the output of a source using fewer bits per symbol, on average, than its entropy, and schemes approaching that floor exist.

Entropy matters far beyond communications. It is the basis for how files and media are compressed, it appears as the loss function used to train classifiers in machine learning through cross-entropy, and it gives a rigorous, quantitative meaning to the everyday notion of information that pervades the digital economy.

Sources

Related