Numba translates Python functions to optimized machine code at runtime using the industry-standard LLVM compiler library. Numba-compiled numerical algorithms in Python can approach the speeds of C or FORTRAN.
You don’t need to replace the Python interpreter, run a separate compilation step, or even have a C/C++ compiler installed. Just apply one of the Numba decorators to your Python function, and Numba does the rest.
from numba import jit
import random
@jit(nopython=True)
def monte_carlo_pi(nsamples):
acc = 0
for i in range(nsamples):
x = random.random()
y = random.random()
if (x ** 2 + y ** 2) < 1.0:
acc += 1
return 4.0 * acc / nsamples
Parallelize Your Algorithms
Numba offers a range of options for parallelizing your code for CPUs and GPUs, often with only minor code changes.
Simplified Threading
@jit(nopython=True, parallel=True)
def simulator(out):
# iterate loop in parallel
for i in prange(out.shape[0]):
out[i] = run_sim()
Numba can automatically execute NumPy array expressions on multiple CPU cores and makes it easy to write parallel loops.
SIMD Vectorization
LBB0_8:
vmovups (%rax,%rdx,4), %ymm0
vmovups (%rcx,%rdx,4), %ymm1
vsubps %ymm1, %ymm0, %ymm2
vaddps %ymm2, %ymm2, %ymm2
Numba can automatically translate some loops into vector instructions for 2-4x speed improvements. Numba adapts to your CPU capabilities, whether your CPU supports SSE, AVX, or AVX-512.
GPU Acceleration
With support for both NVIDIA’s CUDA and AMD’s ROCm drivers, Numba lets you write parallel GPU algorithms entirely from Python.