The Computational Graph

A computational graph represents a calculation as a directed graph: the nodes are operations, and the edges are the tensors that flow from one operation into the next. Instead of executing arithmetic immediately, a framework can record what would be computed - first this multiply, then that addition, then a nonlinearity - as a dataflow structure. The TensorFlow documentation describes graphs as “data structures that contain a set of tf.Operation objects, which represent units of computation; and tf.Tensor objects, which represent the units of data that flow between operations.” This is the central abstraction of modern ML frameworks.

Capturing the computation as a graph unlocks several things at once. Because the graph records the exact sequence of operations and how their outputs feed later inputs, it can be traversed backward to compute derivatives - this is how automatic differentiation works. PyTorch’s autograd builds such a graph during the forward pass and notes that “by tracing this graph from roots to leaves, you can automatically compute the gradients using the chain rule.” The same structure also enables whole-program optimizations like folding constants, fusing operations, and scheduling work across multiple devices.

Frameworks split along an important axis: when the graph is built. In the static, “define-and-run” style, you first construct the entire graph as a separate description and then feed data through it. This was the original TensorFlow 1.x model, and it favors optimization and portability - the TensorFlow guide highlights that a built graph can run “on mobile devices, embedded systems, and servers without Python.” The cost is that the graph is fixed before any data flows, which makes debugging and data-dependent control flow awkward.

In the dynamic, “define-by-run” style, the graph is built on the fly as ordinary code executes, one operation at a time. This is PyTorch’s eager mode and TensorFlow’s eager execution, where, as the TensorFlow documentation puts it, operations are run “operation-by-operation, with results returned immediately to Python.” Dynamic graphs make models feel like normal programs - you can use Python loops and conditionals and inspect intermediate values with a debugger - which is why they came to dominate research. The trade-off is fewer ahead-of-time optimization opportunities.

The two styles have since converged. Frameworks now let you write dynamic, define-by-run code for development and then capture a graph for deployment - TensorFlow with the tf.function decorator, PyTorch with tracing and torch.compile. The computational graph remains the unifying idea underneath: a portable, optimizable, differentiable record of a computation, separate from the act of running it.

Sources

Related