Python NumPy: Array Operations and Scientific Computing

NumPy is the foundation of Python's scientific computing stack — nearly every data science and ML library (pandas, scikit-learn, PyTorch, TensorFlow) builds on NumPy arrays. Its power comes from vectorised operations implemented in C, which run 10–100x faster than equivalent Python loops, and from broadcasting rules that let you operate on arrays of different shapes without explicit looping. Mastering NumPy unlocks efficient data manipulation, numerical algorithms, and the mental model shared across the entire scientific Python ecosystem.

This guide covers array creation, indexing and slicing, broadcasting, universal functions, linear algebra, random number generation, performance optimisation with views versus copies, and practical scientific computing patterns used in machine learning preprocessing.

Array Creation
Indexing, Slicing and Fancy Indexing
Broadcasting Rules
Universal Functions and Aggregations
Linear Algebra
Random Number Generation
Performance: Views, Copies and Memory Layout

Array Creation

NumPy arrays are homogeneous, fixed-size, n-dimensional containers. Unlike Python lists, all elements share the same dtype, enabling SIMD-optimised C operations. Choose the creation function based on whether you have existing data, need a range, or want a filled array.

import numpy as np

# From Python lists
a = np.array([1, 2, 3, 4, 5])                     # 1-D, dtype=int64
b = np.array([[1, 2, 3], [4, 5, 6]], dtype=float)  # 2-D, explicit dtype

# Shape and metadata
print(a.shape, a.dtype, a.ndim, a.size)  # (5,) int64 1 5

# Range arrays
r1 = np.arange(0, 10, 2)          # [0, 2, 4, 6, 8]
r2 = np.linspace(0, 1, 5)         # [0.  , 0.25, 0.5 , 0.75, 1.  ]
r3 = np.logspace(0, 3, 4)         # [1., 10., 100., 1000.]

# Filled arrays
zeros = np.zeros((3, 4))           # 3×4 float64 zeros
ones = np.ones((2, 3), dtype=int)  # 2×3 int ones
eye = np.eye(4)                    # 4×4 identity matrix
full = np.full((2, 2), 7.5)       # [[7.5, 7.5], [7.5, 7.5]]
empty = np.empty((1000, 1000))     # Uninitialised — fastest, use with care

# From existing data shapes
like_zeros = np.zeros_like(b)      # Same shape and dtype as b, filled with 0
like_ones = np.ones_like(a, dtype=float)

# Reshaping
flat = np.arange(12)
matrix = flat.reshape(3, 4)       # 3×4 view (no copy if possible)
row_vec = matrix.reshape(1, -1)   # -1 = infer dimension
col_vec = matrix.reshape(-1, 1)
flat_again = matrix.ravel()       # Returns view when possible
flat_copy = matrix.flatten()      # Always returns copy

Indexing, Slicing and Fancy Indexing

NumPy slices return views — no data is copied. Modifying a slice modifies the original array, which is efficient for large datasets but a common source of subtle bugs. Fancy indexing (using integer arrays or boolean masks) always returns a copy.

import numpy as np

arr = np.arange(24).reshape(4, 6)

# Basic slicing — returns view
row0 = arr[0]           # First row
col2 = arr[:, 2]        # Third column (all rows)
submatrix = arr[1:3, 2:5]  # Rows 1-2, cols 2-4

# Negative indexing
last_row = arr[-1]
last_col = arr[:, -1]

# Step slicing
every_other = arr[::2, ::2]  # Every other row and column

# Boolean masking — always a copy
data = np.array([3, -1, 7, -5, 2, -8, 9])
positive = data[data > 0]              # [3, 7, 2, 9]
data[data < 0] = 0                     # Zero out negatives in-place
mask = (data > 2) & (data < 8)        # Combined condition
print(data[mask])                      # [3, 7]

# Fancy indexing — always a copy
indices = [0, 2, 4]
selected = data[indices]               # Elements at positions 0, 2, 4

# np.where — vectorised conditional
result = np.where(data > 5, data, 0)  # Keep values > 5, else 0
print(result)

# np.nonzero — indices of non-zero elements
idx = np.nonzero(data)
print(idx)

Views vs copies: Use arr.base is None to check if an array owns its data (copy) or is a view. Unexpected view mutation is a frequent NumPy bug — when in doubt, call .copy() explicitly.

Broadcasting Rules

Broadcasting lets NumPy perform element-wise operations on arrays of different shapes by virtually expanding the smaller array to match the larger one — without copying data. The rules: compare shapes from the right; dimensions are compatible if they are equal or one of them is 1; a missing dimension is treated as size 1.

import numpy as np

# Scalar broadcasting — always works
a = np.array([1, 2, 3, 4])
print(a * 3)          # [3, 6, 9, 12] — scalar broadcast to all elements
print(a + 10)         # [11, 12, 13, 14]

# 1-D + 2-D broadcasting
row = np.array([10, 20, 30])          # shape (3,)
matrix = np.ones((4, 3))              # shape (4, 3)
result = matrix + row                 # row broadcast across 4 rows → (4, 3)

# Column vector + row vector → full matrix
col = np.arange(4).reshape(4, 1)     # shape (4, 1)
row2 = np.arange(3)                  # shape (3,) → treated as (1, 3)
outer = col + row2                   # shape (4, 3) — outer sum
print(outer)
# [[0, 1, 2],
#  [1, 2, 3],
#  [2, 3, 4],
#  [3, 4, 5]]

# Practical: normalise each row of a matrix
data = np.random.randn(100, 5)        # 100 samples, 5 features
means = data.mean(axis=0)            # shape (5,) — per-feature mean
stds = data.std(axis=0)              # shape (5,)
normalised = (data - means) / stds   # broadcast across rows

# Practical: pairwise distances
X = np.random.randn(50, 3)           # 50 points in 3-D
# Efficient pairwise L2 distance via broadcasting
diff = X[:, np.newaxis, :] - X[np.newaxis, :, :]  # (50, 50, 3)
distances = np.sqrt((diff ** 2).sum(axis=2))        # (50, 50)

Universal Functions and Aggregations

Universal functions (ufuncs) are vectorised wrappers around C operations that apply element-wise to entire arrays without Python-level loops. Aggregations like sum, mean, and max accept an axis argument for row- or column-wise reductions.

import numpy as np

arr = np.array([[1.0, 4.0, 9.0], [16.0, 25.0, 36.0]])

# Math ufuncs
print(np.sqrt(arr))     # Element-wise square root
print(np.log(arr))      # Natural logarithm
print(np.exp(arr))      # e^x
print(np.abs(arr))      # Absolute value
print(np.sin(arr))      # Sine (radians)

# Aggregations — full array
print(arr.sum())        # 100.0
print(arr.mean())       # 16.67
print(arr.max())        # 36.0
print(arr.std())        # Standard deviation
print(arr.var())        # Variance

# Aggregations — along an axis
print(arr.sum(axis=0))  # Column sums: [17, 29, 45]
print(arr.sum(axis=1))  # Row sums:    [14, 77]
print(arr.max(axis=1))  # Row maxima:  [9, 36]

# Cumulative operations
print(np.cumsum(arr, axis=1))  # Running sum along rows

# Comparison and logical ufuncs
a = np.array([1, 2, 3, 4, 5])
b = np.array([5, 4, 3, 2, 1])
print(np.maximum(a, b))   # Element-wise max: [5, 4, 3, 4, 5]
print(np.minimum(a, b))   # Element-wise min: [1, 2, 3, 2, 1]
print(np.equal(a, b))     # [F, F, T, F, F]

# np.apply_along_axis for custom reductions
def normalise(v):
    return (v - v.min()) / (v.max() - v.min())

result = np.apply_along_axis(normalise, axis=1, arr=arr)
print(result)

Linear Algebra

NumPy's np.linalg module provides production-grade linear algebra built on LAPACK and BLAS. Key operations include matrix multiplication, solving linear systems, eigendecomposition, and singular value decomposition (SVD) — the backbone of PCA, recommender systems, and many ML algorithms.

import numpy as np

A = np.array([[2, 1], [5, 3]])
B = np.array([[1, 0], [0, 1]])

# Matrix multiplication (preferred over np.dot for 2-D+)
C = A @ B                    # Matrix multiply: [[2,1],[5,3]]
C2 = np.matmul(A, B)         # Equivalent
dot = np.dot(A, B)           # Dot product (also works for 2-D)

# Transpose
print(A.T)                   # [[2, 5], [1, 3]]

# Determinant and inverse
det_A = np.linalg.det(A)     # 1.0
A_inv = np.linalg.inv(A)     # Inverse of A
print(A @ A_inv)             # Should be identity matrix

# Solve linear system Ax = b
b_vec = np.array([4, 7])
x = np.linalg.solve(A, b_vec)  # More numerically stable than inv(A) @ b
print(x)  # [5. -6.]

# Eigendecomposition
eigenvalues, eigenvectors = np.linalg.eig(A)
print("Eigenvalues:", eigenvalues)
print("Eigenvectors:\n", eigenvectors)

# SVD — used in PCA, recommender systems
M = np.random.randn(100, 20)
U, S, Vt = np.linalg.svd(M, full_matrices=False)
print(U.shape, S.shape, Vt.shape)  # (100,20) (20,) (20,20)

# Low-rank approximation (keep top k singular values)
k = 5
M_approx = U[:, :k] @ np.diag(S[:k]) @ Vt[:k, :]
print(f"Approximation error: {np.linalg.norm(M - M_approx):.3f}")

# Norms
print(np.linalg.norm(b_vec))         # L2 norm (Euclidean)
print(np.linalg.norm(b_vec, ord=1))  # L1 norm
print(np.linalg.norm(A, 'fro'))      # Frobenius norm of matrix

Random Number Generation

Use NumPy's new np.random.default_rng() API (introduced in 1.17) over the legacy np.random.* functions. The new Generator API is reproducible via seeds, supports independent parallel streams, and has better statistical properties.

import numpy as np

# New API — reproducible with seed
rng = np.random.default_rng(seed=42)

# Uniform distribution
uniform = rng.uniform(low=0.0, high=1.0, size=(3, 4))

# Normal (Gaussian) distribution
normal = rng.normal(loc=0.0, scale=1.0, size=1000)
print(f"Mean: {normal.mean():.3f}, Std: {normal.std():.3f}")

# Integer sampling
integers = rng.integers(low=0, high=100, size=10)

# Random choice (with or without replacement)
population = np.array(["A", "B", "C", "D", "E"])
sample_no_replace = rng.choice(population, size=3, replace=False)
sample_with_replace = rng.choice(population, size=10, replace=True)

# Shuffle in-place
data = np.arange(10)
rng.shuffle(data)
print(data)

# Permutation (returns shuffled copy)
perm = rng.permutation(10)

# Practical: train/test split
X = np.random.randn(1000, 5)
indices = rng.permutation(len(X))
split = int(0.8 * len(X))
train_idx, test_idx = indices[:split], indices[split:]
X_train, X_test = X[train_idx], X[test_idx]
print(f"Train: {X_train.shape}, Test: {X_test.shape}")

# Other distributions
binomial = rng.binomial(n=10, p=0.3, size=100)
poisson = rng.poisson(lam=5.0, size=100)
exponential = rng.exponential(scale=2.0, size=100)

Performance: Views, Copies and Memory Layout

NumPy performance hinges on avoiding unnecessary copies and aligning memory access patterns with array layout. Row-major (C-order) arrays — NumPy's default — are fastest when iterating row-by-row. Column-major (Fortran-order) is faster for column-wise access. Profile with %timeit in Jupyter or time.perf_counter in scripts.

import numpy as np
import time

# View vs copy — views share memory, copies don't
a = np.arange(1_000_000)
view = a[::2]            # View — no copy, instantaneous
copy = a[::2].copy()     # Copy — allocates new memory

print(view.base is a)    # True — view shares data with a
print(copy.base is None) # True — copy owns its data

# C-order vs Fortran-order access patterns
big = np.random.randn(5000, 5000)

# Fast: row-major access (C-order, default)
t0 = time.perf_counter()
row_sums = big.sum(axis=1)
print(f"Row sums: {time.perf_counter()-t0:.4f}s")

# Also fast with contiguous array
t0 = time.perf_counter()
col_sums = big.sum(axis=0)
print(f"Col sums: {time.perf_counter()-t0:.4f}s")

# Memory layout checks
print(big.flags['C_CONTIGUOUS'])   # True for C-order
print(big.T.flags['C_CONTIGUOUS']) # False for transposed view

# Make contiguous copy for performance
big_f = np.asfortranarray(big)     # Fortran-order
big_c = np.ascontiguousarray(big_f)  # Back to C-order

# Vectorised vs loop comparison
data = np.random.randn(1_000_000)

t0 = time.perf_counter()
result_loop = sum(x**2 for x in data)  # Python loop
loop_time = time.perf_counter() - t0

t0 = time.perf_counter()
result_np = (data**2).sum()             # NumPy vectorised
np_time = time.perf_counter() - t0

print(f"Loop: {loop_time:.3f}s | NumPy: {np_time:.4f}s | Speedup: {loop_time/np_time:.0f}x")

Key rule: Avoid Python-level loops over array elements — they forfeit all of NumPy's speed advantages. Reach for broadcasting, ufuncs, and aggregation functions first. Only fall back to np.apply_along_axis or Numba JIT when vectorisation is impossible.