Earthkit-hydro: Xarray and Array Backend Support + Interplay with GPUs¶

This notebook will show how to change array backend, and illustrate the advantages of this by showing autograd behaviour and speedups via the use of GPUs.

import earthkit.hydro as ekh
import numpy as np
import cupy as cp
import torch
import xarray as xr

Switching array backends¶

By default, earthkit-hydro uses numpy, but it is designed to be array backend agnostic. The backend of the river network object is easily changed via the to_device method as follows. This operation is in-place.

numpy_network = ekh.river_network.load("efas", "5")
print(numpy_network.array_backend)
torch_network = ekh.river_network.load("efas", "5").to_device(array_backend="torch")
print(torch_network.array_backend)

Cache disabled.
numpy
Cache disabled.
torch

All operations are then computed in the exact same manner. Inputs must match the river network array backend i.e. it is not possible to input a numpy array to a operation with a torch-backed river network.

Many backends such as jax, tensorflow and cupy are supported, but we show as an example numpy and torch below:

ekh.upstream.array.sum(numpy_network, np.ones(numpy_network.shape))

array([[ 1.,  1.,  1., ..., nan, nan, nan],
       [ 1.,  2.,  4., ..., nan, nan, nan],
       [ 1.,  2.,  1., ..., nan, nan, nan],
       ...,
       [nan, nan, nan, ...,  1.,  2.,  1.],
       [nan, nan, nan, ...,  1.,  6.,  8.],
       [nan, nan, nan, ...,  2.,  1.,  1.]], shape=(2970, 4530))

ekh.upstream.array.sum(torch_network, torch.ones(torch_network.shape))

tensor([[1., 1., 1.,  ..., nan, nan, nan],
        [1., 2., 4.,  ..., nan, nan, nan],
        [1., 2., 1.,  ..., nan, nan, nan],
        ...,
        [nan, nan, nan,  ..., 1., 2., 1.],
        [nan, nan, nan,  ..., 1., 6., 8.],
        [nan, nan, nan,  ..., 2., 1., 1.]])

Differentiability¶

Operations in earthkit-hydro are differentiable, which means that array backends with autograd behaviour can propagate gradients through operations.

initial_tensor = torch.ones(torch_network.n_nodes, requires_grad=True)

output_tensor = ekh.upstream.array.sum(torch_network, initial_tensor, return_type="masked")

loss = output_tensor.sum()

loss.backward()

initial_tensor.grad

tensor([1., 4., 1., ..., 3., 3., 2.])

GPU¶

Many array backend libraries such as torch have the concept of a device. By default, this is the CPU. However, operations on the GPU can be of interest to speed up operations. The to_device method can also be used to convert between devices.

print(torch_network.device)
torch_network = torch_network.to_device(device="cuda", array_backend="torch")
print(torch_network.device)

cpu
cuda

Operations can then be conducted on the GPU.

ekh.upstream.array.sum(torch_network, torch.ones(numpy_network.shape, device=torch_network.device), return_type="masked")

tensor([1., 1., 1., ..., 2., 1., 1.], device='cuda:0')

xarray¶

xarray does not support array backends such as torch, tensorflow etc., but it does support both cupy and numpy. Therefore it is possible to leverage the GPU for xarray operations by using a cupy array backend.

example_arr_numpy = np.random.rand(10, *numpy_network.shape)
example_arr_cupy = cp.random.rand(10, *numpy_network.shape)

lat = numpy_network.coords['lat']
lon = numpy_network.coords['lon']
step = np.arange(10)

example_da_numpy = xr.DataArray(
    example_arr_numpy,
    dims = ["step", "lat", "lon"],
    coords = {"step": step, "lat": lat, "lon": lon},
    name = "precip",
    attrs={"units": "m", "description": "Sample precipitation data"}
)

example_da_cupy = xr.DataArray(
    example_arr_cupy,
    dims = ["step", "lat", "lon"],
    coords = {"step": step, "lat": lat, "lon": lon},
    name = "precip",
    attrs={"units": "m", "description": "Sample precipitation data"}
)

%timeit ekh.upstream.sum(numpy_network, example_da_numpy)

4.49 s ± 25.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

cupy_network = numpy_network.to_device(device="cuda")

%timeit ekh.upstream.sum(cupy_network, example_da_cupy)

171 ms ± 650 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)

As seen, this gives a large speedup by moving to the GPU.