SGML (Standard Generalized Markup Language)

SGML, the Standard Generalized Markup Language, is a meta-language: rather than being a single fixed markup vocabulary, it is a system for defining markup languages. A document author or designer uses SGML to declare which elements are allowed, how they may nest, and what attributes they carry, then marks up content using those declared elements. The structure of a document is described separately from its content, so that the same source can be processed, formatted, and reused in many ways. This separation of descriptive markup from presentation is the central idea SGML contributed to computing.

SGML was standardized as ISO 8879, “Information processing — Text and office systems — Standard Generalized Markup Language (SGML),” published by the International Organization for Standardization in October 1986. The official ISO catalogue record lists it under technical committee ISO/IEC JTC 1/SC 34, and notes that the standard was later confirmed as remaining current. Because the full ISO text is sold rather than published openly, the W3C and other bodies maintained primary introductory materials; the W3C’s SGML overview page states plainly that SGML is “an enabling technology used in applications such as HTML” and that “the HTML specifications assume a working knowledge of SGML.”

The language grew out of work by Charles Goldfarb and colleagues at IBM in the late 1960s and 1970s on GML (Generalized Markup Language), a system for separating document structure from formatting in large publishing workflows. That research-and-industry effort was carried into a formal standardization process that produced ISO 8879. The result was a general framework rather than a product: SGML defines the notion of a document type and the syntax of markup, but leaves the specific vocabulary to each application.

The mechanism SGML uses to declare a document’s allowed structure is the Document Type Definition, or DTD. A DTD lists the element types, their permitted content, and their attributes, allowing a parser to validate that a given document conforms to its declared type. This idea of a separate, machine-checkable grammar for documents was one of SGML’s most influential inheritances, passing directly into both HTML and XML.

SGML’s lasting importance is that it is the common ancestor of the web’s core document formats. HTML was originally defined as an SGML application, and XML was deliberately designed as a simplified subset of SGML so that generic structured documents could be served and processed on the web. The W3C XML Recommendation describes XML in exactly these terms, as “a subset of SGML.” Large technical documentation systems such as DocBook were also built as SGML applications before migrating to XML. Through these descendants, SGML’s core principle — that structure should be marked up descriptively and validated against a declared grammar — remains foundational to how documents are represented and exchanged.