Context-Free Grammar

A context-free grammar is a set of rules for generating strings, where each rule rewrites a single nonterminal symbol into a sequence of symbols, independent of what surrounds it. This is the type 2 class in the Chomsky hierarchy of language descriptions. The “context-free” name captures the key restriction: a nonterminal can be replaced by its right-hand side anywhere it appears, with no regard for neighboring symbols.

The notation made famous for context-free grammars is Backus-Naur Form. The Revised Report on ALGOL 60 introduced it for describing programming-language syntax, stating that “the syntax will be described with the aid of metalinguistic formulae” with nonterminals enclosed in angle brackets and rules written using the ::= and | connectives. Because the right-hand side of a rule can refer back to its own left-hand side, the report observes that the definition of expressions “is necessarily recursive,” which is exactly how grammars describe nested structure.

Context-free grammars are the workhorse of compiler front ends. They are expressive enough to capture the nested and balanced structure of programming languages, such as matched parentheses and arbitrarily deep expressions, that simpler models cannot. At the same time they can be parsed efficiently, which is why the syntax of nearly every programming language is specified as a context-free grammar.

The class sits between weaker and stronger models. It is strictly more powerful than regular languages, which cannot count nesting, and strictly weaker than the unrestricted grammars at the top of the hierarchy. The machine that recognizes exactly the context-free languages is the pushdown automaton.