CAMS Regional Air Quality Forecast Practical#

Run the tutorial via free cloud platforms: binder kaggle colab

Learning objectives#

In this practical exercise you will download CAMS Regional Air Quality Forecast data. You will then calculate the daily max and mean values from a 96 hour forecast of surface ozone concentration and plot these as maps. Finally, you will create a chart of the full 96 hour time series over a specific location.

Initial setup#

Before we begin we must prepare our environment. This includes installing the Application Programming Interface (API) of the Atmosphere Data Store (ADS), intalling any other packages not already installed, setting up our ADS API credentials and importing the various Python libraries that we will need.

# Ensure that the cdsapi package is installed
!pip install -q cdsapi
# If you are running this notebook in Colab, uncomment the line below and run this cell.
#!pip install cartopy

Add your ADS API credentials#

To set up your ADS API credentials, please login/register on the ADS, then you will see your unique API key here.

You can add this API key to your current session by replacing ######### in the code cell below with your API key.

import os
os.environ['CDSAPI_URL'] = 'https://ads.atmosphere.copernicus.eu/api'
os.environ['CDSAPI_KEY'] = '###########################################'

Import libraries#

# CDS API
import cdsapi

# Libraries for working with multidimensional arrays
import numpy as np
import pandas as pd
import xarray as xr

# Manipulate dates and times
import datetime

# Libraries for plotting and visualising data
import matplotlib.pyplot as plt
import cartopy
import cartopy.crs as ccrs

# Disable warnings for data download via API
import urllib3 
urllib3.disable_warnings()

Here you specify a data directory in which you will download your data and all output files that you will generate:

DATADIR = '.'

Explore and download data#

The download form for the CAMS regional forecast data can be found here https://ads.atmosphere.copernicus.eu/datasets/cams-europe-air-quality-forecasts?tab=download. However, we will download the data programmatically, using the parameters specified in the following code cells.

Please remember to accept the terms and conditions at the bottom of the download form.

Specify parameters of data request#

cams_dataset = 'cams-europe-air-quality-forecasts'
start_date = '2025-03-04'
end_date = '2025-03-04'
time = '00:00'
lead_time_start = 0
lead_time_stop = 96
step_hours = 1
leadtime_hours = list(range(lead_time_start, lead_time_stop + lead_time_start, step_hours))
variables = ['dust']
models = ['ensemble']
levels = [0]

Create dictionary of request#

request = {
        'variable': variables,
		'date': f'{start_date}/{end_date}',
        'time': f'{time}',
        'leadtime_hour': leadtime_hours,
        'type': 'forecast',
        'model': models,
        'level': levels,
        'data_format': 'netcdf'
    }

Download data#

target = f'{DATADIR}/dust-regional-forecast-20250604.nc'
c = cdsapi.Client()
c.retrieve(cams_dataset, request).download(target)

Inspect data#

# Create Xarray Dataset
ds = xr.open_dataset(target)

# view the dataset
ds

Process data#

Here you will calculate the daily forecast maximum and mean values. Before doing so, we will carry-out some processing to more easily manage the time dimension.

Convert time dimension to datetime#

delta_forecast = 3600000000000 # delta between one lead time and the next one
num_forecasts = len(leadtime_hours) 
start_day = pd.to_datetime(start_date)
forecast_index = start_day + pd.to_timedelta(np.arange(num_forecasts), 'H')
ds_dt = ds.assign_coords(time=forecast_index)
ds_dt
# create xarray data array object (single variable)
da = ds_dt['dust']
da

Calculate daily max and mean concentrations#

dust_daily_max = da.resample(time='D').max()
dust_daily_mean = da.resample(time='D').mean()

Plot forecast maps#

def plot_forecasts(xarray_da, num_forecasts, plot_title):
    '''
    Creates a plot of a data array
    '''
    fig, axs = plt.subplots(4, 1, figsize = (10, 40), 
                            subplot_kw={'projection': ccrs.Orthographic(central_latitude=60)})
    
    day_forecasts = xarray_da.time.values
    for i in range(num_forecasts):
        timestamp = day_forecasts[i]
        day = str(timestamp)[:10]
        data = xarray_da.sel(time=day, level=0.0)
        cs = axs[i].pcolormesh(xarray_da.longitude, 
                               xarray_da.latitude, 
                               data,  
                               cmap='YlGnBu', #'YlOrRd' 
                               vmin=0,
                               vmax=150,
                               transform=ccrs.PlateCarree())
    
        cbar = plt.colorbar(cs, fraction=0.046, pad=0.05, orientation='vertical', shrink=0.75)
        cbar.set_label(' $\mu g \cdot m^{-3}$')
        axs[i].set_title(plot_title + day)
        axs[i].coastlines(color='black', alpha=0.7, resolution='50m') 
        axs[i].gridlines(draw_labels=False, linestyle='--')
        axs[i].margins(0.5)
plot_forecasts(dust_daily_mean, 4, 'Dust daily max concentration forecast ')

Plot time series for given latitude and longitude#

Convert longitude to [-180, 180] grid#

Notice that the longitude variables in the Xarray Dataset and Data Array objects are in the range of [0, 359.75]. By default, ECMWF data are on a [0, 360] grid. Should you wish to, there are two options to bring the longitude coordinates to a [-180, 180] grid. The first option, in case you already have the data downloaded, is to assign values to coordinates with the xarray function assign_coords(). The code below shifts your longitude coordinates from [0, 359.75] to [-180, 179.75].

The second option is to specify the area keyword argument right when you request data with the CDS API. The area keyword then automatically reprojects the requested data onto a [-180, 180] grid.

ds_180 = ds_dt.assign_coords(longitude=(((ds_dt.longitude + 180) % 360) - 180)).sortby('longitude')
da = ds_180['dust']
da

Select location#

# CUT-TEPAK Aeronet site
site_lat = 34.67
site_lon = 33.04
site_da = da.sel(latitude = site_lat, longitude = site_lon, method='nearest', level=0.0)
site_da

Plot time series#

fig = plt.figure(figsize=(10, 5))
site_da.plot(marker='o')
plt.suptitle("Dust concentration forecast 96 hours- site")
plt.grid(True)

Suggested further steps#

How do CAMS regional ensemble analyses compare with the forecasts?

The example is for surface dust concentrations - how does the dust distribution change with altitude? Are the flow patterns, or dust source, the same at different altitudes?

How does surface dust concentrations compare with dust AOD from the CAMS global system?