Numba

Numba translates Python functions to optimized machine code at runtime using the industry-standard LLVM compiler library. Numba-compiled numerical algorithms in Python can approach the speeds of C or FORTRAN.

You don’t need to replace the Python interpreter, run a separate compilation step, or even have a C/C++ compiler installed. Just apply one of the Numba decorators to your Python function, and Numba does the rest.

from numba import jit
import random

@jit(nopython=True)
def monte_carlo_pi(nsamples):
    acc = 0
    for i in range(nsamples):
        x = random.random()
        y = random.random()
        if (x ** 2 + y ** 2) < 1.0:
            acc += 1
    return 4.0 * acc / nsamples

Parallelize Your Algorithms

Numba offers a range of options for parallelizing your code for CPUs and GPUs, often with only minor code changes.

Simplified Threading

@jit(nopython=True, parallel=True)
def simulator(out):
    # iterate loop in parallel
    for i in prange(out.shape[0]):
        out[i] = run_sim()

Numba can automatically execute NumPy array expressions on multiple CPU cores and makes it easy to write parallel loops.

Learn More » Try Now »

SIMD Vectorization

LBB0_8:
	vmovups	(%rax,%rdx,4), %ymm0
	vmovups	(%rcx,%rdx,4), %ymm1
	vsubps	%ymm1, %ymm0, %ymm2
	vaddps	%ymm2, %ymm2, %ymm2

Numba can automatically translate some loops into vector instructions for 2-4x speed improvements. Numba adapts to your CPU capabilities, whether your CPU supports SSE, AVX, or AVX-512.

Learn More » Try Now »

GPU Acceleration

NVIDIA CUDA logoAMD ROCm logo

With support for both NVIDIA’s CUDA and AMD’s ROCm drivers, Numba lets you write parallel GPU algorithms entirely from Python.

Numba CUDA » Numba ROCm »