logo


Analysis of September 2020 European Heatwave using ERA5 Climate Reanalysis Data from C3S#

This notebook can be run on free online platforms, such as Binder, Kaggle and Colab, or they can be accessed from GitHub. The links to run this notebook in these environments are provided here, but please note they are not supported by ECMWF.

binder kaggle colab github

logo

Learning Objectives#

In September 2020, a record-breaking heatwave occured in large parts of western Europe, (see a description here). The city of Lille in northern France for example experienced its hottest day in September 2020 since records began in 1945. In this tutorial we will analyse this event with data from the Climate Data Store (CDS) of the Copernicus Climate Change Service (C3S).

In this tutorial, we will:

  • View daily maximum 2m temperature for September 2020

  • Compare maximum temperatures with climatology


Prepare your environment#

Set up the CDSAPI and your credentials#

The code below will ensure that the cdsapi package is installed. If you have not setup your ~/.cdsapirc file with your credenials, you can replace None with your credentials that can be found on the how to api page (you will need to log in to see your credentials).

!pip install -q cdsapi
# If you have already setup your .cdsapirc file you can leave this as None
cdsapi_key = None
cdsapi_url = None
Requirement already satisfied: cdsapi in c:\users\cxcs\anaconda3\lib\site-packages (0.3.1)
Requirement already satisfied: tqdm in c:\users\cxcs\anaconda3\lib\site-packages (from cdsapi) (4.47.0)
Requirement already satisfied: requests>=2.5.0 in c:\users\cxcs\anaconda3\lib\site-packages (from cdsapi) (2.24.0)
Requirement already satisfied: idna<3,>=2.5 in c:\users\cxcs\anaconda3\lib\site-packages (from requests>=2.5.0->cdsapi) (2.10)
Requirement already satisfied: chardet<4,>=3.0.2 in c:\users\cxcs\anaconda3\lib\site-packages (from requests>=2.5.0->cdsapi) (3.0.4)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in c:\users\cxcs\anaconda3\lib\site-packages (from requests>=2.5.0->cdsapi) (1.25.9)
Requirement already satisfied: certifi>=2017.4.17 in c:\users\cxcs\anaconda3\lib\site-packages (from requests>=2.5.0->cdsapi) (2020.6.20)

(Install and) Import libraries#

We will be working with data in NetCDF format. To best handle this data we will use libraries for working with multidimensional arrays, in particular Xarray. We will also need libraries for plotting and viewing data, in this case we will use Matplotlib and Cartopy.

# CDS API
import cdsapi

# Libraries for working with multidimensional arrays
import numpy as np
import xarray as xr

# Libraries for plotting and visualising data
import matplotlib.path as mpath
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
from cartopy.mpl.gridliner import LONGITUDE_FORMATTER, LATITUDE_FORMATTER
import cartopy.feature as cfeature

# Disable warnings for data download via API
import urllib3 
urllib3.disable_warnings()

Specify data directory#

# Directory to store data
DATADIR = './data_dir/'
# Create this directory if it doesn't exist
os.makedirs(DATADIR, exist_ok=True)

Explore Data#

To search for data, visit the CDS website: https://cds.climate.copernicus.eu. To facilitate your search you can use keywords, or apply various filters. The data we are going to use in this exercise is the ERA5 reanalysis data on single levels from 1979 to present.

Search for the data#

Having selected the correct dataset, we now need to specify what product type, variables, temporal and geographic coverage we are interested in. These can all be selected in the “Download data” tab. In this tab a form appears in which we will select the following parameters to download. We will choose a subset area of 1x1 degrees, corresponding to a region of around 111km North/South and 72km East/West in Belgium and Northern France, around the city of Lille:

Parameters of data to download - Product type: `Reanalysis` - Variable: `2m temperature` - Year: `all` - Month: `September` - Day: `all` - Time: `all` - Geographical area: `North: 51`, `East: 4`, `South: 50`, `West: 3` - Format: `NetCDF`

logo

At the end of the download forms, select “Show API request”. This will reveal a block of code, which you can simply copy and paste into a cell of your Jupyter Notebook (see cell below). Having copied the API request into the cell below, running this will retrieve and download the data you requested into your local directory.

Please note that text below will appear in a warning box when rendered on the JupyterBook html pages

:warning: Please remember to accept the terms and conditions of the dataset, at the bottom of the CDS download form!

Download the Data#

With the API request copied into the cell below, running this cell will retrieve and download the data you requested into your local directory.

c = cdsapi.Client()
c.retrieve(
    'reanalysis-era5-single-levels',
    {
        'product_type': 'reanalysis',
        'data_format': 'netcdf_legacy',
        'variable': '2m_temperature',
        'year': [
            '1979', '1980', '1981',
            '1982', '1983', '1984',
            '1985', '1986', '1987',
            '1988', '1989', '1990',
            '1991', '1992', '1993',
            '1994', '1995', '1996',
            '1997', '1998', '1999',
            '2000', '2001', '2002',
            '2003', '2004', '2005',
            '2006', '2007', '2008',
            '2009', '2010', '2011',
            '2012', '2013', '2014',
            '2015', '2016', '2017',
            '2018', '2019', '2020',
        ],
        'month': '09',
        'day': [
            '01', '02', '03',
            '04', '05', '06',
            '07', '08', '09',
            '10', '11', '12',
            '13', '14', '15',
            '16', '17', '18',
            '19', '20', '21',
            '22', '23', '24',
            '25', '26', '27',
            '28', '29', '30',
        ],
        'time': [
            '00:00', '01:00', '02:00',
            '03:00', '04:00', '05:00',
            '06:00', '07:00', '08:00',
            '09:00', '10:00', '11:00',
            '12:00', '13:00', '14:00',
            '15:00', '16:00', '17:00',
            '18:00', '19:00', '20:00',
            '21:00', '22:00', '23:00',
        ],
        'area': [
            51, 3, 50,
            4,
        ],
    },
    f'{DATADIR}NFrance_hourly_Sep.nc')
2022-02-03 08:30:09,321 INFO Welcome to the CDS
2022-02-03 08:30:09,326 INFO Sending request to https://cds.climate.copernicus.eu/api/v2/resources/reanalysis-era5-single-levels
2022-02-03 08:30:09,507 INFO Request is queued
2022-02-03 08:30:10,564 INFO Request is running
2022-02-03 09:10:26,261 INFO Request is completed
2022-02-03 09:10:26,263 INFO Downloading https://download-0009.copernicus-climate.eu/cache-compute-0009/cache/data0/adaptor.mars.internal-1643879144.3714578-9573-17-990f3656-2699-457b-9cf4-ff7a0baba8af.nc to ./records/NFrance_hourly_Sep.nc (1.6M)
2022-02-03 09:10:26,719 INFO Download rate 3.4M/s                                                                      
Result(content_length=1634068,content_type=application/x-netcdf,location=https://download-0009.copernicus-climate.eu/cache-compute-0009/cache/data0/adaptor.mars.internal-1643879144.3714578-9573-17-990f3656-2699-457b-9cf4-ff7a0baba8af.nc)

Inspect Data#

We have requested the data in NetCDF format. This is a commonly used format for array-oriented scientific data. To read and process this data we will make use of the Xarray library. Xarray is an open source project and Python package that makes working with labelled multi-dimensional arrays simple and efficient. We will read the data from our NetCDF file into an xarray.Dataset.

filename = f'{DATADIR}NFrance_hourly_Sep.nc'
# Create Xarray Dataset
ds = xr.open_dataset(filename)

Now we can query our newly created Xarray dataset …

ds
<xarray.Dataset>
Dimensions:    (longitude: 5, latitude: 5, time: 30240)
Coordinates:
  * longitude  (longitude) float32 3.0 3.25 3.5 3.75 4.0
  * latitude   (latitude) float32 51.0 50.75 50.5 50.25 50.0
  * time       (time) datetime64[ns] 1979-09-01 ... 2020-09-30T23:00:00
Data variables:
    t2m        (time, latitude, longitude) float32 ...
Attributes:
    Conventions:  CF-1.6
    history:      2022-02-03 09:10:11 GMT by grib_to_netcdf-2.23.0: /opt/ecmw...

We see that the dataset has one variable called “t2m”, which stands for “2 metre temperature”, and three coordinates of longitude, latitude and time.

Select the icons to the right of the table above to expand the attributes of the coordinates and data variables. What are the units of the temperature data?

While an Xarray dataset may contain multiple variables, an Xarray data array holds a single variable (which may still be multi-dimensional) and its coordinates. To make the processing of the t2m data easier, we convert in into an Xarray data array:

da = ds['t2m']

Let’s convert the units of the 2m temperature data from Kelvin to degrees Celsius. The formula for this is simple: degrees Celsius = Kelvin - 273.15

t2m_C = da - 273.15

View daily maximum 2m temperature for September 2020#

As a next step, let us visualize the daily maximum 2m air temperature for September 2020. From the graph, we should be able to identify which day in September was hottest in the area around Lille.

First we average over the subset area:

Note: The size covered by each data point varies as a function of latitude. We need to take this into account when averaging. One way to do this is to use the cosine of the latitude as a proxy for the varying sizes.

weights = np.cos(np.deg2rad(t2m_C.latitude))
weights.name = "weights"
t2m_C_weighted = t2m_C.weighted(weights)
Lille_t2m = t2m_C_weighted.mean(["longitude", "latitude"])

Now we select only the data for 2020:

Lille_2020 = Lille_t2m.sel(time='2020')

We can now calculate the max daily 2m temperature for each day in September 2020:

Lille_2020_max = Lille_2020.groupby('time.day').max('time')

Let’s plot the results in a chart:

fig, ax = plt.subplots(1, 1, figsize = (12, 6))

ax.plot(Lille_2020_max.day, Lille_2020_max)
ax.set_title('Max daily t2m for Sep 2020 in Lille region')
ax.set_ylabel('° C')
ax.set_xlabel('day')
ax.grid(linestyle='--')
for i,j in zip(Lille_2020_max.day, np.around(Lille_2020_max.values, 0).astype(int)):
    ax.annotate(str(j),xy=(i,j))
../../_images/5adb026a96af8ee742d55d5f4cb7dc775fd26641e14d812d84668e9a1f63f80c.png
print('The maximum temperature in September 2020 in this area was', 
      np.around(Lille_2020_max.max().values, 1), 'degrees Celsius.')
The maximum temperature in September 2020 in this area was 32.1 degrees Celsius.

Which day in September had the highest maximum temperature?

Is this typical for Northern France? How does this compare with the long term average? We will seek to answer these questions in the next section.


Compare maximum temperatures with climatology#

We will now seek to discover just how high the temperature for Lille in mid September 2020 was when compared with typical values exptected in this region at this time of year. To do that we will calculate the climatology of maximum daily 2m temperature for each day in September for the period of 1979 to 2019, and compare these with our values for 2020.

First we select all data prior to 2020:

Lille_past = Lille_t2m.loc['1979':'2019']

Now we calculate the climatology for this data, i.e. the average values for each of the days in September for a period of several decades (from 1979 to 2019).

To do this, we first have to extract the maximum daily value for each day in the time series:

Lille_max = Lille_past.resample(time='D').max().dropna('time')

We will then calculate various quantiles of the maximum daily 2m temperatures for the 40 year time series for each day in September:

Lille_max_max = Lille_max.groupby('time.day').max()
Lille_max_min = Lille_max.groupby('time.day').min()
Lille_max_mid = Lille_max.groupby('time.day').quantile(0.5)

Let’s plot this data. We will plot the, maximum, minimum and 50th quantile of the maximum daily temperature to have an idea of the expected range in this part of France in September, and compare this range with the values for 2020:

fig = plt.figure(figsize=(16,8))
ax = plt.subplot()

ax.plot(Lille_2020_max.day, Lille_max_mid, color='green', label='Daily max t2m 50th quantile')
ax.plot(Lille_2020_max.day, Lille_2020_max, 'bo-', color='darkred', label='Daily max t2m Sep 2020')
ax.fill_between(Lille_2020_max.day, Lille_max_max, Lille_max_min, alpha=0.1, 
                label='Max and min values of max t2m from 1979 to 2019')

ax.set_xlim(1,30)
ax.set_ylim(10,33)
ax.set_title('Daily max t2m for Sep 2020 compared with climatology for Sep from 1979 to 2019')
ax.set_ylabel('t2m (Celsius)')
ax.set_xlabel('day')
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles, labels)
ax.grid(linestyle='--')

fig.savefig(f'{DATADIR}Max_t2m_clim_Sep_Lille.png')
C:\Users\cxcs\AppData\Local\Temp/ipykernel_5360/21074062.py:5: UserWarning: color is redundantly defined by the 'color' keyword argument and the fmt string "bo-" (-> color='b'). The keyword argument will take precedence.
  ax.plot(Lille_2020_max.day, Lille_2020_max, 'bo-', color='darkred', label='Daily max t2m Sep 2020')
../../_images/6b14167f1193ad73c1594bbfa3859808ae288057cb28478453f0d6e6f2ef00f6.png

Interestingly, we see from this plot that while the temperatures from 14 to 16 Sep 2020 were the highest in the ERA5 dataset, on 25 September 2020, the lowest of the maximum temperatures was recorded for this dataset.

We will now look more closely at the probability distribution of maximum temperatures for 15 September in this time period. To do this, we will first select only the max daily temperature for 15 September, for each year in the time series:

Lille_max = Lille_max.dropna('time', how='all')
Lille_15 = Lille_max[14::30]

We will then plot the histogram of this:

Lille_15.plot.hist()
(array([ 2.,  3., 11.,  6.,  7.,  5.,  1.,  2.,  3.,  1.]),
 array([12.54313564, 13.95039225, 15.35764885, 16.76490545, 18.17216206,
        19.57941866, 20.98667526, 22.39393187, 23.80118847, 25.20844507,
        26.61570168]),
 <BarContainer object of 10 artists>)
../../_images/57b823d232114fcb4740c491fd5688ee2263526f06f161c3db442de003e40ca2.png

Look at the range of maximum temperatures for 15 September in the period from 1979 to 2019. Has the temperature in this period ever exceeded that of 15 September 2020?

The histogram shows the distribution of maximum temperature of one day in each year of the time series, which corresponds to 41 samples. In order to increase the number of samples, let’s plot the histogram of maximum temperatures on 15 September, plus or minus three days. This would increase our number of samples by a factor of seven.

To do this, we first need to produce an index that takes the maximum 2m air temperature values from 12 to 18 September (15 September +/- three days) from every year in the time series. The first step is to initiate three numpy arrays:

  • years: with the number of years [0:40]

  • days_in_sep: index values of day range [11:17]

  • index: empty numpy array with 287 (41 years * 7) entries

years = np.arange(41)
days_in_sep = np.arange(11,18)
index = np.zeros(287)

In a next step, we then loop through each entry of the years array and fill the empty index array year by year with the correct indices of the day ranges for each year. The resulting array contains the index values of interest.

for i in years:
    index[i*7:(i*7)+7] = days_in_sep + (i*30)
index = index.astype(int)

We then apply this index to filter the array of max daily temperature from 1979 to 2019. The resulting object is an array of values representing the maximum 2m air temperature in Lille between 12 and 18 September for each year from 1979 to 2019:

Lille_7days = Lille_max.values[index]

Now we can plot the histogram of maximum daily temperatures in the days 12-18 September from 1979-2019:

fig, ax = plt.subplots(1, 1, figsize = (12, 6))

ax.hist(Lille_7days, bins = np.arange(10,32,1), color='lightgrey', ec='grey')
ax.set_title('Histogram of maximum 2m temperature in the days from 12-18 Sep in the period 1979-2019')
ax.set_xticks(np.arange(10,32,1))
ax.set_ylabel('Accumulated days')
ax.set_xlabel('Maximum 2m temperature (° C)')

fig.savefig(f'{DATADIR}Hist_max_t2m_mid-Sep_1979-2019.png')
../../_images/8791f8bc0c3987422b463945d984ed65c691b820567b469cfe1a1ad66704d989.png

In the histogram above, you see that even if we take an increased sample covering a wider temporal range, the maximum daily temperature still never reached that of 15 September 2020. To increase the sample even further, you could include data from a longer time period. The C3S reanalysis dataset now extends back to 1940 and is accessible here ERA5 hourly data on single levels from 1940 to present.