3. Unit Conversion and Normalisation - Earthkit -- EUMETNET AI Workshop 2026

Learning objectives

By the end of this notebook you will be able to:

Convert meteorological fields between common units
Compute wind speed and direction from U/V components
Apply min-max and z-score normalisation
Compute and persist normalisation statistics for use at inference time
Visualise raw vs normalised fields side-by-side

Why normalisation matters for ML¶

Neural networks use gradient-based optimisation. When input features have very different scales, the loss landscape becomes elongated: gradients in one dimension dwarf those in another, and training either diverges or converges extremely slowly.

Consider ERA5 variables at a single level:

Variable	Typical range	Units
2m temperature	220 – 320	K
Specific humidity	0.0001 – 0.03	kg/kg
Mean sea level pressure	95000 – 105000	Pa
10m wind speed	0 – 50	m/s

Four orders of magnitude separate temperature in Kelvin from specific humidity. Without normalisation, the model will effectively ignore small-magnitude features.

Normalisation is not optional — it is a prerequisite for convergence.

The normalisation statistics (mean, standard deviation, min, max) computed on the training set must be stored alongside the Zarr store and reapplied at inference time to denormalise predictions back to physical units.

Setup¶

import earthkit.data as ekd
import earthkit.plots as ekp
import xarray as xr
import numpy as np
import json
import os

ekd.settings.set({"cache-policy": "user"})

# Check the Zarr store from notebook 2 is present
assert os.path.exists("data/era5.zarr"), "Run notebook 02 first to create data/era5.zarr"
print("data/era5.zarr found")

Load data¶

We load from the Zarr store produced in notebook 2, and also load the pressure-level sample for wind calculations.

# Load 2t and msl from Zarr
zarr_ds = xr.open_dataset("data/era5.zarr", engine="zarr")
print(zarr_ds)

# DATA: tuv_pl.grib — t, u, v on pressure levels
# Load as FieldList; select wind fields before converting to xarray

wind_fl = ekd.from_source("sample", "tuv_pl.grib").to_fieldlist()
print(f"Fields: {len(wind_fl)}")
wind_fl.head()


# Select u and v fields as a FieldList, then convert
u_fl = wind_fl.sel(shortName="u")
v_fl = wind_fl.sel(shortName="v")
wind_ds = wind_fl.to_xarray()
print("\nxarray Dataset:")
print(wind_ds)

Unit conversion¶

Temperature: Kelvin to Celsius¶

ERA5 stores temperature in Kelvin. Most users think in Celsius. The conversion is simply subtracting 273.15.

t2m_k = zarr_ds["2t"]
t2m_c = t2m_k - 273.15
t2m_c.attrs["units"] = "degrees_C"
t2m_c.attrs["long_name"] = "2 metre temperature"

print(f"Kelvin  : min={float(t2m_k.min()):.1f}  max={float(t2m_k.max()):.1f} K")
print(f"Celsius : min={float(t2m_c.min()):.1f}  max={float(t2m_c.max()):.1f} °C")

earthkit-meteo¶

earthkit-meteo provides thermodynamic functions for more complex conversions, with pluggable backends (NumPy, PyTorch, CuPy):

from earthkit.meteo import thermo

# Potential temperature requires pressure — use pressure-level data
# theta = T * (p0 / p) ^ (R/cp)
# earthkit-meteo handles this formula cleanly

t_pl = wind_ds["t"].isel(level=0).values        # K, shape (lat, lon)
p_pa = float(wind_ds["level"].isel(level=0)) * 100.0  # hPa -> Pa
p_arr = np.full_like(t_pl, p_pa)

theta = thermo.potential_temperature(t_pl, p_arr)
print(f"Potential temperature at {p_pa/100:.0f} hPa:")
print(f"  min={theta.min():.1f} K, max={theta.max():.1f} K")

Wind: U/V components to speed and direction¶

ML models sometimes use U and V directly (preserving vector information), and sometimes prefer speed and direction. earthkit-meteo makes both easy.

from earthkit.meteo import wind

# Select a single pressure level
u = wind_ds["u"].isel(level=0).values  # m/s
v = wind_ds["v"].isel(level=0).values  # m/s

# Wind speed (magnitude)
speed = wind.speed(u, v)

# Wind direction — meteorological convention: direction the wind blows FROM
direction = wind.direction(u, v, convention="meteo")

print(f"Wind speed    : min={speed.min():.1f}, max={speed.max():.1f} m/s")
print(f"Wind direction: min={direction.min():.1f}, max={direction.max():.1f} degrees")

Normalisation¶

There are two common normalisation strategies for ML:

Min-max normalisation maps values to [0, 1]:

x' = \frac{x - x_{\min}}{x_{\max} - x_{\min}}

(1)

Use when the physical bounds are meaningful and you want to preserve the shape of the distribution.

Z-score normalisation maps values to zero mean and unit variance:

x' = \frac{x - \mu}{\sigma}

(2)

Use when the distribution is roughly Gaussian and you want values symmetrically distributed around zero. This is the most common choice for weather ML models.

t2m_vals = zarr_ds["2t"].values.astype(np.float32)
msl_vals = zarr_ds["msl"].values.astype(np.float32)

# --- Min-max normalisation ---
def minmax_normalise(arr):
    lo, hi = arr.min(), arr.max()
    return (arr - lo) / (hi - lo), lo, hi

t2m_mm, t2m_lo, t2m_hi = minmax_normalise(t2m_vals)
print(f"2t min-max: [{t2m_mm.min():.3f}, {t2m_mm.max():.3f}]")

# --- Z-score normalisation ---
def zscore_normalise(arr):
    mu, sigma = arr.mean(), arr.std()
    return (arr - mu) / sigma, mu, sigma

t2m_zs, t2m_mu, t2m_sigma = zscore_normalise(t2m_vals)
msl_zs, msl_mu, msl_sigma = zscore_normalise(msl_vals)

print(f"2t z-score: mean={t2m_zs.mean():.4f}, std={t2m_zs.std():.4f}")
print(f"msl z-score: mean={msl_zs.mean():.4f}, std={msl_zs.std():.4f}")

Storing normalisation statistics¶

Normalisation statistics computed on the training set must travel with the model. At inference time, the model produces normalised outputs; you denormalise using the stored statistics to get predictions in physical units.

We store them as a simple JSON file alongside the Zarr store.

norm_stats = {
    "2t": {
        "mean": float(t2m_mu),
        "std":  float(t2m_sigma),
        "min":  float(t2m_lo),
        "max":  float(t2m_hi),
        "units": "K",
    },
    "msl": {
        "mean": float(msl_mu),
        "std":  float(msl_sigma),
        "min":  float(msl_vals.min()),
        "max":  float(msl_vals.max()),
        "units": "Pa",
    },
}

stats_path = "data/norm_stats.json"
with open(stats_path, "w") as f:
    json.dump(norm_stats, f, indent=2)

print(f"Normalisation statistics saved to {stats_path}")
print(json.dumps(norm_stats, indent=2))

Write normalised data to Zarr¶

# Build a normalised xarray Dataset
norm_ds = xr.Dataset(
    {
        "2t_norm": xr.DataArray(
            t2m_zs,
            dims=zarr_ds["2t"].dims,
            coords=zarr_ds["2t"].coords,
            attrs={"long_name": "2m temperature (z-score)", "units": "1"},
        ),
        "msl_norm": xr.DataArray(
            msl_zs,
            dims=zarr_ds["msl"].dims,
            coords=zarr_ds["msl"].coords,
            attrs={"long_name": "Mean sea level pressure (z-score)", "units": "1"},
        ),
    }
)

norm_ds.to_zarr("data/era5_normalised.zarr", mode="w")
print("Normalised store written to data/era5_normalised.zarr")

Visualise raw vs normalised¶


# Raw 2m temperature — earthkit.plots reads CF metadata automatically
fig = ekp.Figure(1,2, figsize=(12, 4))
fig.add_map(0, 0).quickplot(zarr_ds["2t"])

# Normalised — wrap in a DataArray with updated metadata
t2m_norm_da = xr.DataArray(
    t2m_zs,
    dims=zarr_ds["2t"].dims,
    coords=zarr_ds["2t"].coords,
    attrs={"long_name": "2m temperature (z-score normalised)", "units": "1"},
)
fig.add_map(0, 1).quickplot(t2m_norm_da)

fig.legend()
fig.coastlines()
fig.borders()
fig.title("ERA5 2m temperature: raw vs normalised")
fig.show();

The spatial pattern is identical. Only the colour scale changes — the normalised field is dimensionless and centred on zero.

Summary¶

You have:

Converted temperature from Kelvin to Celsius and computed potential temperature
Derived wind speed and direction from U/V components with earthkit-meteo
Applied min-max and z-score normalisation
Saved normalisation statistics to data/norm_stats.json for use at inference time

Notebook 8 will load norm_stats.json to denormalise model predictions.

Activity
Normalise mean sea level pressure (msl) using min-max normalisation and add it to norm_stats.json.
Write a denormalise(arr, mu, sigma) function and verify that applying it to t2m_zs recovers the original values.
What would happen if you used statistics from only a single month to normalise a full-year dataset? Discuss.
def denormalise(arr, mu, sigma):
    # Your code here
    pass