Symbolic regression

Symbolic regression is the task of finding a mathematical formula that explains a set of data. Unlike ordinary regression, which fits the parameters of a fixed model such as a straight line, symbolic regression searches over the structure of the formula itself, choosing which operations (addition, multiplication, sine, exponentials and so on) and which variables to combine, and returns a human-readable equation.

The appeal is interpretability. A neural network can predict accurately while remaining an opaque box of millions of numbers; a symbolic expression like an inverse-square law can be read, checked against theory, and reasoned about. The difficulty is that the space of possible formulas is astronomically large, so naive search is hopeless. Modern methods tame this with genetic programming, neural networks, or physics-inspired tricks such as exploiting symmetry and dimensional analysis, as in the AI Feynman work that recovered all 100 equations from the Feynman physics lectures from data.

For a general reader, symbolic regression captures a distinctive ambition within AI for science: not just to predict, but to discover the underlying law. It sits at the intersection of machine learning and the older symbolic tradition in AI, and it is one of the more direct attempts to have machines help formulate scientific theories rather than only crunch numbers.

Sources

Related