Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Earthkit-hydro: Xarray and Array Backend Support + Interplay with GPUs

earthkit-hydro-logo

Earthkit-hydro: Xarray and Array Backend Support + Interplay with GPUs

This notebook will show how to change array backend, and illustrate the advantages of this by showing autograd behaviour and speedups via the use of GPUs.

import earthkit.hydro as ekh
import numpy as np
import cupy as cp
import torch
import xarray as xr

Switching array backends

By default, earthkit-hydro uses numpy, but it is designed to be array backend agnostic. The backend of the river network object is easily changed via the to_device method as follows. This operation is in-place.

numpy_network = ekh.river_network.load("efas", "5")
print(numpy_network.array_backend)
torch_network = ekh.river_network.load("efas", "5").to_device(array_backend="torch")
print(torch_network.array_backend)
Cache disabled.
numpy
Cache disabled.
torch

All operations are then computed in the exact same manner. Inputs must match the river network array backend i.e. it is not possible to input a numpy array to a operation with a torch-backed river network.

Many backends such as jax, tensorflow and cupy are supported, but we show as an example numpy and torch below:

ekh.upstream.array.sum(numpy_network, np.ones(numpy_network.shape))
array([[ 1., 1., 1., ..., nan, nan, nan], [ 1., 2., 4., ..., nan, nan, nan], [ 1., 2., 1., ..., nan, nan, nan], ..., [nan, nan, nan, ..., 1., 2., 1.], [nan, nan, nan, ..., 1., 6., 8.], [nan, nan, nan, ..., 2., 1., 1.]], shape=(2970, 4530))
ekh.upstream.array.sum(torch_network, torch.ones(torch_network.shape))
tensor([[1., 1., 1., ..., nan, nan, nan], [1., 2., 4., ..., nan, nan, nan], [1., 2., 1., ..., nan, nan, nan], ..., [nan, nan, nan, ..., 1., 2., 1.], [nan, nan, nan, ..., 1., 6., 8.], [nan, nan, nan, ..., 2., 1., 1.]])

Differentiability

Operations in earthkit-hydro are differentiable, which means that array backends with autograd behaviour can propagate gradients through operations.

initial_tensor = torch.ones(torch_network.n_nodes, requires_grad=True)

output_tensor = ekh.upstream.array.sum(torch_network, initial_tensor, return_type="masked")

loss = output_tensor.sum()

loss.backward()

initial_tensor.grad
tensor([1., 4., 1., ..., 3., 3., 2.])

GPU

Many array backend libraries such as torch have the concept of a device. By default, this is the CPU. However, operations on the GPU can be of interest to speed up operations. The to_device method can also be used to convert between devices.

print(torch_network.device)
torch_network = torch_network.to_device(device="cuda", array_backend="torch")
print(torch_network.device)
cpu
cuda

Operations can then be conducted on the GPU.

ekh.upstream.array.sum(torch_network, torch.ones(numpy_network.shape, device=torch_network.device), return_type="masked")
tensor([1., 1., 1., ..., 2., 1., 1.], device='cuda:0')

xarray

xarray does not support array backends such as torch, tensorflow etc., but it does support both cupy and numpy. Therefore it is possible to leverage the GPU for xarray operations by using a cupy array backend.

example_arr_numpy = np.random.rand(10, *numpy_network.shape)
example_arr_cupy = cp.random.rand(10, *numpy_network.shape)

lat = numpy_network.coords['lat']
lon = numpy_network.coords['lon']
step = np.arange(10)

example_da_numpy = xr.DataArray(
    example_arr_numpy,
    dims = ["step", "lat", "lon"],
    coords = {"step": step, "lat": lat, "lon": lon},
    name = "precip",
    attrs={"units": "m", "description": "Sample precipitation data"}
)

example_da_cupy = xr.DataArray(
    example_arr_cupy,
    dims = ["step", "lat", "lon"],
    coords = {"step": step, "lat": lat, "lon": lon},
    name = "precip",
    attrs={"units": "m", "description": "Sample precipitation data"}
)
%timeit ekh.upstream.sum(numpy_network, example_da_numpy)
4.49 s ± 25.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
cupy_network = numpy_network.to_device(device="cuda")
%timeit ekh.upstream.sum(cupy_network, example_da_cupy)
171 ms ± 650 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)

As seen, this gives a large speedup by moving to the GPU.