Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

3. Unit Conversion and Normalisation

Learning objectives

By the end of this notebook you will be able to:

  • Convert meteorological fields between common units

  • Compute wind speed and direction from U/V components

  • Apply min-max and z-score normalisation

  • Compute and persist normalisation statistics for use at inference time

  • Visualise raw vs normalised fields side-by-side


Why normalisation matters for ML

Neural networks use gradient-based optimisation. When input features have very different scales, the loss landscape becomes elongated: gradients in one dimension dwarf those in another, and training either diverges or converges extremely slowly.

Consider ERA5 variables at a single level:

VariableTypical rangeUnits
2m temperature220 – 320K
Specific humidity0.0001 – 0.03kg/kg
Mean sea level pressure95000 – 105000Pa
10m wind speed0 – 50m/s

Four orders of magnitude separate temperature in Kelvin from specific humidity. Without normalisation, the model will effectively ignore small-magnitude features.

Normalisation is not optional — it is a prerequisite for convergence.

The normalisation statistics (mean, standard deviation, min, max) computed on the training set must be stored alongside the Zarr store and reapplied at inference time to denormalise predictions back to physical units.

Setup

import earthkit.data as ekd
import earthkit.plots as ekp
import xarray as xr
import numpy as np
import json
import os

ekd.settings.set({"cache-policy": "user"})

# Check the Zarr store from notebook 2 is present
assert os.path.exists("data/era5.zarr"), "Run notebook 02 first to create data/era5.zarr"
print("data/era5.zarr found")

Load data

We load from the Zarr store produced in notebook 2, and also load the pressure-level sample for wind calculations.

# Load 2t and msl from Zarr
zarr_ds = xr.open_dataset("data/era5.zarr", engine="zarr")
print(zarr_ds)
# DATA: tuv_pl.grib — t, u, v on pressure levels
# Load as FieldList; select wind fields before converting to xarray

wind_fl = ekd.from_source("sample", "tuv_pl.grib").to_fieldlist()
print(f"Fields: {len(wind_fl)}")
wind_fl.head()

# Select u and v fields as a FieldList, then convert
u_fl = wind_fl.sel(shortName="u")
v_fl = wind_fl.sel(shortName="v")
wind_ds = wind_fl.to_xarray()
print("\nxarray Dataset:")
print(wind_ds)

Unit conversion

Temperature: Kelvin to Celsius

ERA5 stores temperature in Kelvin. Most users think in Celsius. The conversion is simply subtracting 273.15.

t2m_k = zarr_ds["2t"]
t2m_c = t2m_k - 273.15
t2m_c.attrs["units"] = "degrees_C"
t2m_c.attrs["long_name"] = "2 metre temperature"

print(f"Kelvin  : min={float(t2m_k.min()):.1f}  max={float(t2m_k.max()):.1f} K")
print(f"Celsius : min={float(t2m_c.min()):.1f}  max={float(t2m_c.max()):.1f} °C")

earthkit-meteo

earthkit-meteo provides thermodynamic functions for more complex conversions, with pluggable backends (NumPy, PyTorch, CuPy):

from earthkit.meteo import thermo

# Potential temperature requires pressure — use pressure-level data
# theta = T * (p0 / p) ^ (R/cp)
# earthkit-meteo handles this formula cleanly

t_pl = wind_ds["t"].isel(level=0).values        # K, shape (lat, lon)
p_pa = float(wind_ds["level"].isel(level=0)) * 100.0  # hPa -> Pa
p_arr = np.full_like(t_pl, p_pa)

theta = thermo.potential_temperature(t_pl, p_arr)
print(f"Potential temperature at {p_pa/100:.0f} hPa:")
print(f"  min={theta.min():.1f} K, max={theta.max():.1f} K")

Wind: U/V components to speed and direction

ML models sometimes use U and V directly (preserving vector information), and sometimes prefer speed and direction. earthkit-meteo makes both easy.

from earthkit.meteo import wind

# Select a single pressure level
u = wind_ds["u"].isel(level=0).values  # m/s
v = wind_ds["v"].isel(level=0).values  # m/s

# Wind speed (magnitude)
speed = wind.speed(u, v)

# Wind direction — meteorological convention: direction the wind blows FROM
direction = wind.direction(u, v, convention="meteo")

print(f"Wind speed    : min={speed.min():.1f}, max={speed.max():.1f} m/s")
print(f"Wind direction: min={direction.min():.1f}, max={direction.max():.1f} degrees")

Normalisation

There are two common normalisation strategies for ML:

Min-max normalisation maps values to [0, 1]:

x=xxminxmaxxminx' = \frac{x - x_{\min}}{x_{\max} - x_{\min}}

Use when the physical bounds are meaningful and you want to preserve the shape of the distribution.

Z-score normalisation maps values to zero mean and unit variance:

x=xμσx' = \frac{x - \mu}{\sigma}

Use when the distribution is roughly Gaussian and you want values symmetrically distributed around zero. This is the most common choice for weather ML models.

t2m_vals = zarr_ds["2t"].values.astype(np.float32)
msl_vals = zarr_ds["msl"].values.astype(np.float32)

# --- Min-max normalisation ---
def minmax_normalise(arr):
    lo, hi = arr.min(), arr.max()
    return (arr - lo) / (hi - lo), lo, hi

t2m_mm, t2m_lo, t2m_hi = minmax_normalise(t2m_vals)
print(f"2t min-max: [{t2m_mm.min():.3f}, {t2m_mm.max():.3f}]")

# --- Z-score normalisation ---
def zscore_normalise(arr):
    mu, sigma = arr.mean(), arr.std()
    return (arr - mu) / sigma, mu, sigma

t2m_zs, t2m_mu, t2m_sigma = zscore_normalise(t2m_vals)
msl_zs, msl_mu, msl_sigma = zscore_normalise(msl_vals)

print(f"2t z-score: mean={t2m_zs.mean():.4f}, std={t2m_zs.std():.4f}")
print(f"msl z-score: mean={msl_zs.mean():.4f}, std={msl_zs.std():.4f}")

Storing normalisation statistics

Normalisation statistics computed on the training set must travel with the model. At inference time, the model produces normalised outputs; you denormalise using the stored statistics to get predictions in physical units.

We store them as a simple JSON file alongside the Zarr store.

norm_stats = {
    "2t": {
        "mean": float(t2m_mu),
        "std":  float(t2m_sigma),
        "min":  float(t2m_lo),
        "max":  float(t2m_hi),
        "units": "K",
    },
    "msl": {
        "mean": float(msl_mu),
        "std":  float(msl_sigma),
        "min":  float(msl_vals.min()),
        "max":  float(msl_vals.max()),
        "units": "Pa",
    },
}

stats_path = "data/norm_stats.json"
with open(stats_path, "w") as f:
    json.dump(norm_stats, f, indent=2)

print(f"Normalisation statistics saved to {stats_path}")
print(json.dumps(norm_stats, indent=2))

Write normalised data to Zarr

# Build a normalised xarray Dataset
norm_ds = xr.Dataset(
    {
        "2t_norm": xr.DataArray(
            t2m_zs,
            dims=zarr_ds["2t"].dims,
            coords=zarr_ds["2t"].coords,
            attrs={"long_name": "2m temperature (z-score)", "units": "1"},
        ),
        "msl_norm": xr.DataArray(
            msl_zs,
            dims=zarr_ds["msl"].dims,
            coords=zarr_ds["msl"].coords,
            attrs={"long_name": "Mean sea level pressure (z-score)", "units": "1"},
        ),
    }
)

norm_ds.to_zarr("data/era5_normalised.zarr", mode="w")
print("Normalised store written to data/era5_normalised.zarr")

Visualise raw vs normalised


# Raw 2m temperature — earthkit.plots reads CF metadata automatically
fig = ekp.Figure(1,2, figsize=(12, 4))
fig.add_map(0, 0).quickplot(zarr_ds["2t"])

# Normalised — wrap in a DataArray with updated metadata
t2m_norm_da = xr.DataArray(
    t2m_zs,
    dims=zarr_ds["2t"].dims,
    coords=zarr_ds["2t"].coords,
    attrs={"long_name": "2m temperature (z-score normalised)", "units": "1"},
)
fig.add_map(0, 1).quickplot(t2m_norm_da)

fig.legend()
fig.coastlines()
fig.borders()
fig.title("ERA5 2m temperature: raw vs normalised")
fig.show();

The spatial pattern is identical. Only the colour scale changes — the normalised field is dimensionless and centred on zero.


Summary

You have:

  • Converted temperature from Kelvin to Celsius and computed potential temperature

  • Derived wind speed and direction from U/V components with earthkit-meteo

  • Applied min-max and z-score normalisation

  • Saved normalisation statistics to data/norm_stats.json for use at inference time

Notebook 8 will load norm_stats.json to denormalise model predictions.


Activity

  1. Normalise mean sea level pressure (msl) using min-max normalisation and add it to norm_stats.json.

  2. Write a denormalise(arr, mu, sigma) function and verify that applying it to t2m_zs recovers the original values.

  3. What would happen if you used statistics from only a single month to normalise a full-year dataset? Discuss.

def denormalise(arr, mu, sigma):
    # Your code here
    pass