Comparing TROPOMI NO2 columns with the CAMS regional air quality ensemble product

Comparing TROPOMI NO2 columns with the CAMS regional air quality ensemble product#

This notebook can be run on free online platforms, such as Binder, Kaggle and Colab, or they can be accessed from GitHub. The links to run this notebook in these environments are provided here, but please note they are not supported by ECMWF.

In this tutorial you will compare TROPOMI NO2 tropospheric column observations with CAMS regional ensemble data.

Authors: Felipe Cifuentes and Henk Eskes, KNMI.
Date: September 2024, latest update September 2025.
Contact: henk.eskes@knmi.nl

Before starting the tutorial you may want to select “Restart Kernel and Clear Outputs of All Cells” from the menu.

Execute the following tutorial one block at the time, and advance to the next block once you have understood what has been said or computed.

A detailed explanation on the TROPOMI-CAMS comparison approach can be found in the following paper
Douros et al., 2023, https://doi.org/10.5194/gmd-16-509-2023

All TROPOMI operational products (including NO₂) come with three documents:

The Product Readme File (PRF): This provides an overview of code changes and versions and some high level recommendations for use.
The Product User Manual (PUM): Describing the fields in the data file and data use.
The Algorithm Theoretical Baseline Document (ATBD): Describing the retrieval algorithm in detail.

These documents are provided by Copernicus/ESA on the Sentinel-5P Document Library page:
https://sentiwiki.copernicus.eu/web/s5p-products

The v2 NO₂ retrieval and validation are described in the following paper:
Jos van Geffen et al., Sentinel-5P TROPOMI NO2 retrieval: impact of version v2.2 improvements and comparisons with OMI and ground-based data
https://doi.org/10.5194/amt-15-2037-2022

In this notebook we will focus on Europe, and will pick a day with a relatively low cloud cover over central-western Europe:
26 July 2018. The same day was shown in the Douros paper.

For this tutorial we will use the TROPOMI NO₂ reprocessing based on processor version 2.4.
This dataset is available on the Copernicus Dataspace, and individual orbits can be downloaded here:
https://dataspace.copernicus.eu

This tutorial is based on the following file, available from the dataspace:
S5P_RPRO_L2__NO2____20180726T113922_20180726T132051_04059_03_020400_20221105T081652.nc

The NO2 retrieval files for individual orbits are large, about 600 Mb.
For this tutorial we have prepared a file with a reduced volume.
In particular:

We removed several fields from the file which are not needed for the use of the data.
We cropped the orbit keeping only data which overlaps the CAMS European domain (a lat-lon region covering Iceland down to Turkey).

A command line tool called “ncks” was used for this cropping (“NetCDF Kitchen Sink). In practice this tool is very handy to avoid your computer storage to fill up quickly.

The “reduced_eu” datafile we obtained has a size of 81 Mb.

As explained in Douros et al., there are several ways to compare model output with the TROPOMI data.

Convert the three-dimensional model output to model subcolumns (unit mol/m², as in TROPOMI). Then sum up all layers in the troposphere to obtain a modelled tropospheric column. This can be compared with the tropospheric column in the TROPOMI file.
Make use of the kernels in the retrieval. Multiply the model subcolumns with the averaging kernel values and then sum over the layers. The compare with the TROPOMI column in the L2 datafile.
Replace the a-priori that was used in the retrieval by the profiles from the CAMS ensemble, and recompute the TROPOMI retrieved tropospheric column.

Comparison 1 (orange arrow in the figure, taken from Douros et al.) is sub-optimal because the relative (model-observation)/observation difference is depending on the a-priori which was used in the retrieval. This approach should be avoided.

Comparisons 2 and 3 are both optimal (green arrows in the figure). In both these cases the relative comparison no longer depends on the retrieval a-priori.

Advantages and limitations in using the CAMS regional ensemble data

The CAMS global fields have a resolution of about 0.4 degree, or 40 km. TROPOMI has a resolution of about 4.5 km at nadir (looking down), and about 6 km on average (all viewing angles). Therefore a single CAMS global gridcell corresponds about 40-80 measurements of TROPOMI.

The comparisons with the CAMS regional ensemble is more interesting. This ensemble has a resolution of 0.1 degree, and a gridcell corresponds to about 4 satellite measurements. CAMS regional much better resolves the sources, like large and medium size cities, major industrial sites, power plants, major highways, airports and ship tracks. Therefore we will demonstrate how the CAMS regional air-quality ensemble can be compared with TROPOMI.

It is important to realise that the CAMS regional 3-dimensional output, as provided on the ADS, has limitations. As a result interpolation errors may be considerable (see also Douros et al., 2023):

The number of vertical levels in CAMS is limited to 8. The heights of the levels are 0 50 250 500 1000 2000 3000 5000 meter above the surface.
The NO\(_2\) concentration is provided at half-levels by CAMS. Below, a simple linear interpolation is used in the vertical to obtain concentration values at full-levels. But this may cause substantial differences in the profile shape compared to the original model data.
Only altitude data is provided. Below, the pressure of the levels was computed approximately using a constant lapse rate.
The data is available only up to 5000 meters. For a fair comparison with TROPOMI, data up to the tropopause (around 10 km) would have been required. This has a non-negligible impact on the comparisons (See Douros, 2023)

A more accurate comparion would require other outputs from the CAMS regional forecasts and anaylses, including pressure levels, a larger number of levels and model concentrations up to the tropopause.

The CAMS global output provides NO₂ and half and full level pressures on a large number of vertical layers (137) up to the mesosphere. Interpolation errors are much smaller for these global comparisons as compared to the regional comparisons.

Douros et al. (2023) use CAMS global NO₂ profiles above 3km as upper part of the tropospheric profile. This improves the comparison. In this notebook we restrict ourselves to comparisons between the satellite data and CAMS regional only.

1. Python packages and directories#

Import the needed python packages.

We will also load some utility routines. We keep this notebook short, and the hard work is done by the routines in “utils_no2”. If you are interested in the details you are welcome to investigate these utility routines.

# Uncomment the line below if you are using Google Colab or Kaggle
#!pip install netCDF4

# Uncomment the line below if you are using Google Colab
#!pip install cartopy

import os
import xarray as xr
from netCDF4 import Dataset
import numpy as np

import matplotlib.pyplot as plt
import matplotlib.colors as colors
from matplotlib.colors import LinearSegmentedColormap

import cartopy.crs as ccrs
import cartopy.feature as cfeature
from cartopy.feature import ShapelyFeature
from cartopy.io.shapereader import Reader

import scipy.constants as sc

import requests

from zipfile import ZipFile

print("done")

done

Specify the path of your data directory

DATADIR = '/content' # note that '/content' is the default directory in Colab. Other platforms may require a different path.

Download the data from a remote repository

data_url = 'https://sites.ecmwf.int/training/2025-cams-act7-training/averaging-kernel-notebook-3-data.zip'
r = requests.get(data_url)
file_path = f'averaging-kernel-notebook-3-data.zip'
with open(file_path, 'wb') as f:
    f.write(r.content)

Download also a Python script with helper functions

script_url = 'https://raw.githubusercontent.com/ecmwf-training/2025-cams-act7-training/refs/heads/main/06-averaging-kernels/utils_no2_regional_bilin.py'
r = requests.get(script_url)
file_path = f'utils_no2_regional_bilin.py'
with open(file_path, 'wb') as f:
    f.write(r.content)

We now import the helper functions. Note that if you have problems accessing these, particularly if using online platforms such as Kaggle or Colab, you can simply paste into a cell the code here: https://raw.githubusercontent.com/ecmwf-training/2025-cams-act7-training/refs/heads/main/06-averaging-kernels/utils_no2_regional_bilin.py

from utils_no2_regional_bilin import *

Extract the zip file

with ZipFile(f'{DATADIR}/averaging-kernel-notebook-3-data.zip', 'r') as zipObj:
   # Extract all the contents of zip file in current directory
   zipObj.extractall(path=f'{DATADIR}')

Define the directories with the CAMS regional ensemble data, the TROPOMI NO2 data.
For plotting we also included a shape file for Europe.

Optionally you can also generate output image files.

path_s5p = f'{DATADIR}/S5P_RPRO_L2__NO2____20180726T113922_20180726T132051_04059_03_020400_20221105T081652_reduced_eu.nc'
path_cams = f'{DATADIR}/cams_regional_20180726.nc'
shp_file = f'{DATADIR}/Europe_shp/Europe.shp'
# outputs = './Outputs_NO2/'

4. Regrid CAMS regional data onto the TROPOMI grid#

Bilinear interpolation is used to extract model profiles at the TROPOMI measurement locations, to keep things simple and compatible.

Alternatively, the xesmf package may be used to perform the horizontal interpolation:
this involves a more accurate conservative regridding from the CAMS grid onto the TROPOMI footprints (see ecmwf-training/2025-cams-act7-training). (xesmf is available for Linux and MacOS)
The code for this is contained in the “no2_tropomi_cams_regional_xesmf.zip” file, which can be used after the package has been installed.

# Simple bilinear regridding
regrid_no2 = bilin_regrid_cams_s5p(no2_cams, lat_cams, lon_cams, lat_s5p, lon_s5p)

print("done")

done

Please wait until “done” appears. May take a while, because the bilinear routine is not optimised.

The more advanced xesmf gridding approach is in fact faster because the computations have been speed optimised.

5. Estimate full and half levels pressures, altitudes and NO2 concentrations for CAMS#

A simple aproximation is used for the vertical grid. Assumed surface presure and constant lapse rate to derive pressure at the full and half levels. A simple linear interpolation is applied for NO₂ at the center of the grid.

pres_fl_cams, z_fl_cams, pres_hl_cams, z_hl_cams, no2_cams_fl = CAMS_reg_levels(regrid_no2, alt_cams, sp_tropomi)

print("done")

done

6. Compute the subcolumn and column values, and update TROPOMI apriori#

Find only pixels with valid model data (this saves time for the loops later):

valindex = np.argwhere(~np.isnan(no2_cams_fl[0,:,:])) 

Estimate total columns and layer subcolumns, with and without multiplying with the AK:

no2_col, no2_subcol = no2_column_cams_simple(no2_cams_fl, z_hl_cams)
no2_col_aks, no2_subcol_aks = no2_column_cams_interp(no2_cams_fl, z_hl_cams, pres_fl_cams, AK_trop, fl_fields, valindex)

print("done")

C:\Users\cxcs\act7_Oct2025_AK_notebooks\utils_no2_regional_bilin.py:170: RuntimeWarning: invalid value encountered in float_scalars
  m = (AK_trop.values[index_p1,y,z] - AK_trop.values[index_p2,y,z]) / (tm5_press_levels[index_p1,y,z] - tm5_press_levels[index_p2,y,z])

done

(At a few points the inputs are not defined, and these points create a warning.)

Now the retrieved TROPOMI tropospheric column is updated by replacing the TROPOMI a-priori by the CAMS regional profile.
As explained in the PUM, this calculation uses the two summations that were computed in the previous block, and becomes a simple expression:

column_ratio_cams= no2_col / no2_col_aks
no2_tropomi_cams = column_ratio_cams * no2_tropomi

print("done")

done

C:\Users\cxcs\AppData\Local\Temp\ipykernel_2852\4032144624.py:1: RuntimeWarning: divide by zero encountered in divide
  column_ratio_cams= no2_col / no2_col_aks

7. Filter all the datasets using observations with a qa_value larger than 0.75#

As mentioned before, we will only compare good quality cloud-free pixels and pixels with a small cloud fraction.
This means applying a filter: the qa_value should be bigger-equal than 0.75.

The lines below filter the quantities we computed.

no2_col_75 = np.where(qavalue<0.75, np.nan, no2_col)
no2_col_aks_75 = np.where(np.isnan(no2_col_75), np.nan, no2_col_aks)
no2_tropomi_75 = np.where(np.isnan(no2_col_75), np.nan, no2_tropomi)
no2_tropomi_cams_75 = np.where(np.isnan(no2_col_75), np.nan, no2_tropomi_cams)

print("done")

done

7. Plotting the data#

A four-panel plot is created, showing the two retrieval versions and the two model versions after applying the observation operator.

colors = ["mediumblue", "dodgerblue", "cyan", "limegreen", "yellow", "orange", "red", "purple", "indigo"]
colormap = LinearSegmentedColormap.from_list("mycmap", colors)

fig, axs = plt.subplots(2, 2, figsize=(10,9),subplot_kw={'projection': ccrs.PlateCarree()}, constrained_layout=True)

shape_feature = ShapelyFeature(Reader(shp_file).geometries(), ccrs.PlateCarree(), facecolor='none', edgecolor='black')
axs[0,0].add_feature(shape_feature, linewidth=0.5)
axs[0,1].add_feature(shape_feature, linewidth=0.5)
axs[1,0].add_feature(shape_feature, linewidth=0.5)
axs[1,1].add_feature(shape_feature, linewidth=0.5)

c0=axs[0,0].pcolormesh(lon_s5p, lat_s5p, no2_tropomi_75, vmin=0, vmax=10, transform=ccrs.PlateCarree(), cmap=colormap)
c0=axs[0,1].pcolormesh(lon_s5p, lat_s5p, no2_col_aks_75, vmin=0, vmax=10, transform=ccrs.PlateCarree(), cmap=colormap)
c0=axs[1,0].pcolormesh(lon_s5p, lat_s5p, no2_tropomi_cams_75, vmin=0, vmax=10, transform=ccrs.PlateCarree(), cmap=colormap)
c0=axs[1,1].pcolormesh(lon_s5p, lat_s5p, no2_col_75, vmin=0, vmax=10, transform=ccrs.PlateCarree(), cmap=colormap)

cbar0=fig.colorbar(c0, ax=axs[:,:], location='bottom', shrink=0.85, aspect=50, pad=0.02,fraction=0.1, format = '%.1f')
cbar0.set_label('NO$_2$ VCD [Pmolec cm$^-$$^2$]')

axs[0,0].set_ylim(37,61)
axs[0,0].set_xlim(-10,22)
axs[0,1].set_ylim(37,61)
axs[0,1].set_xlim(-10,22)
axs[1,0].set_ylim(37,61)
axs[1,0].set_xlim(-10,22)
axs[1,1].set_ylim(37,61)
axs[1,1].set_xlim(-10,22)

axs[0,0].set_title('TROPOMI - TM5')
axs[0,1].set_title('CAMS - AKS')
axs[1,0].set_title('TROPOMI - CAMS')
axs[1,1].set_title('CAMS')

plt.suptitle('NO$_2$ VCD - 2018/07/26', fontsize=14, fontweight='bold')

# saveplot = outputs + 'TROPOMI_vs_CAMS_NO2_20180726.png'
# plt.savefig(saveplot, dpi = 300, bbox_inches='tight')

print("done")

done

_images/acdd8dac2210fa0c206915b27a9ce9c93d18d4f53b97e6cb68ada1a7040432c1.png

The figure shows the comparison for 26 July 2018:

top-left: The TROPOMI tropospheric NO2 column retrieval
bottom-left: The TROPOMI NO2 retrieval but with CAMS as a-priori profile
top-right: The CAMS NO2 column with averaging kernel applied
bottom-right: The CAMS vertical column

The white areas correspond to cloud cover. The observation operator applied to the model has exactly the same gaps due to clouds, because the model is interpolated to the selected TROPOMI observations and is also filtered with the same qa_value >= 0.75.

First note the high degree of consistency between the modelled and observed NO2 fields.
All major cities show up as emission hotspots, but looking in more detail we can also identify signals from industry, power generation, roads (for instance the “route du soleil” highway in the south-east of France), airports.

The two good ways of comparing are the top-left with top-right, or bottom-left with bottom-right.
The relative comparison is the same for these two, even though absolute values differ quite a bit.

The suboptimal way is comparing top-left with bottom-right.
This comparison is depending on the retrieval a-priori.
The difference (model-TROPOMI)/TROPOMI is found to be higher in this case.

As explained in the paper by Douros the coarse 1 degree global scale a-priori does not resolve the profiles expected over the emission hotspots. As a result of replacing the a-priori with CAMS, the NO₂ in emission regions has increased substantially.

Background values are lower in CAMS. This may be attributed (partly) to the free troposphere.
The retrieval shows higher free tropospheric concentrations, and CAMS-regional ensemble values above 5km are missing.