statsmodels

statsmodels is the Python library devoted to statistical modeling and econometrics rather than predictive machine learning. Its documentation describes it as “a Python module that provides classes and functions for the estimation of many different statistical models, as well as for conducting statistical tests, and statistical data exploration.” Where scikit-learn optimizes for prediction, statsmodels optimizes for inference: estimating model parameters, reporting standard errors and confidence intervals, and running the hypothesis tests that practitioners in economics, finance, and the social sciences rely on.

The library was introduced to the wider community in Seabold and Perktold’s paper “Statsmodels: Econometric and Statistical Modeling with Python,” presented at the 9th Python in Science Conference (SciPy 2010). The paper positioned statsmodels as filling a gap in the scientific Python ecosystem, which at the time had strong numerical and plotting tools but lacked a comprehensive statistics package comparable to those available in R or commercial statistical software.

A defining ergonomic feature is its formula API. As the documentation notes, “statsmodels supports specifying models using R-style formulas and pandas DataFrames,” exposed through the statsmodels.formula.api module. A model such as ordinary least squares can be written as a string like “y ~ x1 + x2” against a dataframe, with the library handling the construction of design matrices, categorical encoding, and interaction terms. This directly imported a convention familiar to anyone coming from R, easing migration and making model specification concise and readable.

In software terms, statsmodels sits squarely on top of the scientific Python stack. It consumes NumPy arrays and pandas dataframes, returns rich result objects whose summary tables resemble the output of statistical packages, and covers linear and generalized linear models, robust regression, time-series analysis (including ARIMA-family models), and a broad battery of statistical tests. The result objects expose fitted parameters, diagnostics, and prediction methods, making them usable both interactively and inside larger programs.

statsmodels complements rather than competes with scikit-learn. A common workflow uses statsmodels to understand and explain relationships in data through interpretable, well-instrumented statistical models, while turning to scikit-learn or gradient-boosting libraries when the goal shifts to raw predictive accuracy. Together they give Python a two-sided toolkit covering both classical statistics and modern machine learning.