When text is set with straight (justified) margins, a typesetter must sometimes break a word across two lines, inserting a hyphen at an allowable point. Choosing those points well is harder than it looks: English hyphenation depends on a word’s structure and pronunciation, not on simple letter rules, so a naive approach either misses good breaks or produces wrong ones. The Knuth-Liang hyphenation algorithm is the method, pattern-based and compact, that TeX and many later systems use to make these decisions automatically.
The algorithm comes from Frank Liang’s Stanford Ph.D. dissertation, “Word Hy-phen-a-tion by Com-put-er,” issued in August 1983 as Stanford Computer Science report STAN-CS-83-977. Liang worked with Donald Knuth, whose TeX78 system had used an earlier, rule-based hyphenation scheme; Liang’s thesis presented a markedly better approach that became part of the TeX everyone uses today. The hyphenated spelling of the title is itself a small joke that advertises what the work is about.
The core idea is to replace a large hyphenation dictionary with a small set of patterns. Each pattern is a short fragment of letters carrying numeric values between the letters; odd numbers encourage a break at that spot and even numbers discourage one, with higher numbers overriding lower. To hyphenate a word, the algorithm slides all matching patterns across it, accumulates the numbers at each inter-letter position, and allows a hyphen wherever the winning value is odd. The patterns are themselves generated by a companion program, PATGEN, trained on a list of correctly hyphenated words.
What makes the method attractive is its efficiency. The patterns are stored in a packed trie — a tree-shaped lookup structure — so matching is fast and the data is tiny compared with a full word list, yet it captures the great majority of correct breaks while making almost no errors. Because the patterns are just data, the same algorithm hyphenates any language for which a pattern set exists, and pattern sets have been built for dozens of languages.
The Knuth-Liang algorithm has long outlived its origin in TeX. The same pattern files, often distributed through the TeX community, are used by office software, web browsers (via CSS hyphenation), and e-book readers to break words at line ends. It is a durable example of trading a big lookup table for a small, clever, pattern-matching data structure, and it remains the standard answer to a problem every justified column of text must solve.