Learning objectives
By the end of this notebook you will be able to:
Use
earthkit.data.from_source()to load meteorological dataUnderstand the earthkit FieldList: iterate, inspect, select
Read field metadata, values, and lat/lon coordinates
Convert to
xarray.Datasetwhen array operations are neededVisualise with
earthkit.plotsdirectly from a FieldList
What is ERA5?¶
ERA5 is ECMWF’s fifth-generation global atmospheric reanalysis:
~31 km horizontal resolution (native N320 reduced Gaussian grid)
137 vertical levels, hourly time steps, ~240 variables, 1940–present
Almost every modern ML weather model — GraphCast, Pangu-Weather, FourCastNet, AIFS — was trained on ERA5. Before any pipeline, the first step is: look at your data.
Setup¶
import earthkit.data as ekd
import earthkit.plots as ekp
import os
ekd.settings.set({"cache-policy": "user"})
os.makedirs("data", exist_ok=True)
print("earthkit.data version:", ekd.__version__)Loading data with from_source()¶
The central abstraction in earthkit-data is from_source(source, ...). The source says where; the rest says what. All sources return the same FieldList type.
Option A — sample data (no credentials)¶
# DATA: era5-N320-2t-msl-20200101.grib — 2m temperature (2t) and mean sea level pressure (msl)
ds = ekd.from_source("sample", "era5-N320-2t-msl-20200101.grib").to_fieldlist()
print(type(ds))Option B — CDS (requires a CDS API key)¶
# Uncomment once ~/.cdsapirc is in place.
# ds = ekd.from_source(
# "cds",
# "reanalysis-era5-single-levels",
# request=dict(
# variable=["2m_temperature", "mean_sea_level_pressure"],
# product_type="reanalysis",
# date="2012-05-10",
# time="12:00",
# area=[73, -27, 33, 45], # N, W, S, E
# format="grib",
# ),
# )
print("CDS cell skipped — using sample data.")FieldLists — earthkit’s native GRIB representation¶
Loading GRIB data returns a FieldList: an ordered collection of fields, each carrying its GRIB metadata alongside its values.
A FieldList is the right place to stay until you need array operations:
Every field knows its variable, level, time, units, and grid type
Values are lazy — not decoded until you ask
earthkit.plotsaccepts FieldLists directly — no conversion needed to visualiseearthkit.geo.regridregrids FieldLists directly
Convert to xarray.Dataset when you need mathematics, aggregation, or Zarr output.
# A FieldList is iterable — each element is one GRIB field
print(f"Fields: {len(ds)}")
ds.ls()Inspecting a single field¶
f0 = ds[0] # index into a FieldList like a list
f0.describe()# Values — always a 1-D array over all grid points
vals = f0.values
print(f"Shape: {vals.shape}")
print(f"Range: {vals.min():.2f} – {vals.max():.2f} {f0.metadata('units')}")
# Lat/lon — one pair per grid point, same length as values
lats, lons = f0.geography.latlons()
print(f"Lat range : {lats.min():.1f}° – {lats.max():.1f}°")
print(f"Lon range : {lons.min():.1f}° – {lons.max():.1f}°")Selecting fields¶
When a FieldList contains multiple variables or time steps, .sel() filters by metadata and returns a new FieldList.
# DATA: tuv_pl.grib — temperature (t), u-wind, v-wind on multiple pressure levels
fl_multi = ekd.from_source("sample", "tuv_pl.grib").to_fieldlist()
print(f"Total fields: {len(fl_multi)}")
fl_multi.ls()# Select all temperature fields
fl_t = fl_multi.sel({'parameter.variable': 't'})
print(f"Temperature fields: {len(fl_t)}")
# Select temperature at a single level
fl_t500 = fl_multi.sel({'parameter.variable': 't', 'vertical.level': 500})
print(f"T at 500 hPa: range {fl_t500[0].values.min():.1f} – {fl_t500[0].values.max():.1f} K")Converting to xarray¶
Convert to xarray.Dataset for array operations, normalisation, aggregation, or Zarr output.
xr_ds = ds.to_xarray()
print(xr_ds)
print()
for var in xr_ds.data_vars:
da = xr_ds[var]
print(f" {var}: shape={da.shape} "
f"range [{float(da.min()):.1f}, {float(da.max()):.1f}] "
f"units={da.attrs.get('units', '?')}")Visualisation with earthkit.plots¶
earthkit.plots reads CF metadata for automatic titles, units, and colour scales. It accepts both FieldLists and xarray DataArrays.
# Plot directly from the FieldList
ekp.quickplot(ds, mode = 'overlay');Other data sources¶
All backends return a FieldList. The rest of your pipeline never changes.
ekd.from_source("file", "/path/to/data.grib") # local file
ekd.from_source("url", "https://example.com/data.grib") # public URL
ekd.from_source("s3", "s3://my-bucket/data.grib") # object store
ekd.from_source("fdb", request={...}) # ECMWF FDB
ekd.from_source("polytope", request={...}, address="https://...") # DestinENotebook 7 covers Polytope in depth.
Activity
Select u-wind fields from
fl_multiwith.sel(). Print their levels and value ranges.What
gridTypedoestuv_pl.gribhave? Is it the same astest.grib?Visualise temperature at 500 hPa with
ekp.Figure.fl_u = fl_multi.sel(shortName="u") for f in fl_u: print(f.metadata("level"), f.values.min(), f.values.max())