In information theory, entropy is a measure of the average amount of information produced by a source, or equivalently of the uncertainty about what the source will produce next. It is usually expressed in bits. The idea comes from Claude Shannon’s 1948 paper “A Mathematical Theory of Communication,” where he introduced it to put the measurement of information on a precise footing.
Shannon defined the entropy of a source from the probabilities of its possible messages. A source whose outcomes are all equally likely has high entropy, because each new symbol is hard to predict and so carries a lot of information. A source that almost always produces the same symbol has low entropy, because its output is nearly certain and tells you little. In Shannon’s words, the quantity he defined “will be recognized as that of entropy as defined in certain formulations of statistical mechanics,” and he denoted it H.
The central practical meaning of entropy is that it sets the limit on lossless compression. No coding scheme can represent a source using fewer bits per symbol, on average, than the source’s entropy without losing information. Efficient codes try to approach this limit by giving short codes to common symbols and longer codes to rare ones.
Because it captures exactly how much information a source carries, entropy is a cornerstone of data compression, communication, and cryptography. It explains why some files shrink dramatically while others barely compress, and it provides the yardstick against which coding methods are measured.