Berkeley Sockets

The sockets API is the interface through which programs use the network, and it originated in the Berkeley Software Distribution of Unix. When DARPA funded the Computer Systems Research Group at the University of California, Berkeley to integrate the new TCP/IP protocols into Unix, the group needed a programming model that fit Unix’s existing world of file descriptors. The result, introduced in 4.2BSD in 1983, was the socket: an endpoint of communication represented by a descriptor that programs could read from and write to much like a file, but which could carry data across a network. The model is documented in Berkeley’s own “An Introductory 4.4BSD Interprocess Communication Tutorial.”

The API is built from a small set of system calls. A program calls socket() to create a communication endpoint, specifying a protocol family and a type such as stream or datagram. A server binds the socket to a local address and port with bind(), then calls listen() to mark it as accepting incoming connections, and accept() to take each new connection as it arrives, yielding a fresh descriptor for that conversation. A client instead calls connect() to reach out to a remote address. Once connected, both sides exchange data with ordinary read() and write(), or with the socket-specific send() and recv() calls. The Berkeley tutorial describes how programs “create individual sockets, give them names and send messages between them.”

This design was quietly revolutionary. By making a network connection look like a file descriptor, it let the enormous body of existing Unix programming knowledge apply directly to networking. The same select() call that multiplexed file and terminal input could multiplex network connections; the same fork-per-client patterns that structured Unix servers structured network servers. The abstraction also separated the protocol from the API: the same socket() and connect() calls worked for TCP streams, UDP datagrams, and local interprocess communication, with the protocol chosen by arguments rather than by a different interface.

Because 4.2BSD and its successors were widely distributed and freely modifiable, the sockets API spread far beyond Berkeley. Commercial Unix vendors adopted it, and it became the de facto standard for writing networked programs. When formal standardization came, the interface was incorporated into POSIX, the IEEE standard for portable operating system interfaces, which specified the socket functions so that conforming systems would offer the same calls. Microsoft’s Windows Sockets, or Winsock, was a closely related adaptation that brought the same model to the PC world.

More than four decades after its introduction, the Berkeley sockets API remains the foundation on which essentially all network software is built. Web servers, browsers, databases, and messaging systems ultimately reach the network through socket(), bind(), listen(), accept(), and connect(), often through higher-level libraries that wrap these calls. Few programming interfaces have proven as durable or as universal, and its longevity is a testament to how well the original Berkeley design matched both the Unix philosophy and the needs of network programming.