NumPy is, in its own words, “the fundamental package for scientific computing in Python.” Its core contribution is a single data structure, the ndarray, an N-dimensional array of homogeneous, fixed-size elements stored in a contiguous block of memory. Around that object NumPy provides a large library of fast routines for mathematical, logical, shape-manipulation, sorting, linear-algebra, and statistical operations, all implemented in compiled C code so that array operations run at near-C speeds rather than at the speed of interpreted Python loops.
The project exists because the Python numerical community had fractured. Two competing array packages, the older Numeric and the newer Numarray, each had strengths and a separate user base, which split libraries and forced authors to pick a side. NumPy, led by Travis Oliphant and first released in 2006, merged the best features of both into one package so the ecosystem could standardize on a single array type. That unification is the reason essentially every downstream scientific and machine-learning library in Python can pass arrays to one another without conversion.
Two ideas defined the NumPy programming model. The first is vectorization: the documentation describes it as “the absence of explicit looping, indexing, etc., in the code,” so that a whole-array expression like c = a * b replaces a hand-written Python loop and executes in optimized precompiled C behind the scenes. The result is code that is more concise, closer to mathematical notation, and far faster. The second is broadcasting, the set of rules that lets operations combine arrays of different shapes (or arrays with scalars) by implicitly stretching the smaller operand, without copying data.
The ndarray differs from a Python list in ways that make it fast: it is fixed in size at creation, every element shares one data type, and operations dispatch to compiled kernels instead of per-element Python calls. These constraints are what allow NumPy to treat a million-element array as a single object rather than a million boxed Python integers.
NumPy’s role as shared infrastructure was formalized in 2020 with the review paper “Array programming with NumPy” by Charles R. Harris, K. Jarrod Millman, Stefan van der Walt and colleagues, published in Nature (volume 585, pages 357 to 362). The paper, which the project lists as its official citation, documents how NumPy’s array became the common substrate beneath SciPy, pandas, scikit-learn, and the deep-learning frameworks, and how its API shaped a generation of array libraries that followed.