AWK

AWK is a small programming language built for one job: scanning text files and acting on the lines that match a pattern. It was created at AT&T Bell Laboratories in 1977 by Alfred V. Aho, Peter J. Weinberger, and Brian W. Kernighan, and its name is simply their three initials. The GNU implementation’s manual, the GNU Awk User’s Guide, describes the language as a way “to select particular records in a file and perform operations upon them,” and notes that AWK programs are data-driven: you describe the patterns to find and the actions to take, rather than spelling out a step-by-step procedure.

The shape of an AWK program is its defining idea. A program is a list of pattern { action } rules. For every line of input, AWK tests each pattern; when a pattern matches, it runs the associated action. A pattern can be a regular expression, a comparison, or special markers like BEGIN and END that run before and after all input. An action is a block of statements with variables, arithmetic, conditionals, and loops. Omit the pattern and the action runs on every line; omit the action and the default is to print the line.

What makes AWK feel effortless for tabular text is automatic field splitting. AWK breaks each input line into fields, which you reference as $1, $2, and so on, with $0 being the whole line. The number of fields is NF and the current record number is NR. The canonical one-liner awk '{ print $2 }' prints the second column of every line; adding a pattern like awk '$3 > 100 { print $1 }' filters and projects in a single expression. This built-in model of records and fields is why AWK became the default tool for slicing log files and delimited data.

Beyond one-liners, AWK is a real language. It has associative arrays, which let you tally and group with a single line such as { count[$1]++ }, string functions, formatted output via printf, and user-defined functions. That power in a tiny package is why AWK has been used not just for throwaway commands but for substantial report generators and data transformations. Kernighan, Aho, and Weinberger documented the language in “The AWK Programming Language” (1988), the book that remains its definitive description.

AWK is a model citizen of the Unix philosophy. It reads standard input and writes standard output, so it drops into pipelines alongside grep, sed, and sort, each tool doing one thing and passing plain text to the next. Its influence runs deep: later structured-text processors, including jq for JSON, openly model themselves on the AWK and sed lineage of small, composable, pattern-driven filters.

The most complete free reference today is the GNU Awk User’s Guide for gawk, the GNU implementation bundled with most Linux distributions, which documents both the original language and the GNU extensions layered on top.

Sources

Related