logo

Tutorial on July 2023 record-breaking global surface temperatures using climate data from C3S#

This notebook can be run on free online platforms, such as Binder, Kaggle and Colab, or they can be accessed from GitHub. The links to run this notebook in these environments are provided here, but please note they are not supported by ECMWF.

binder kaggle colab github

Example of work

Learning Objectives#

In this tutorial we will access data from the Climate Data Store (CDS) of the Copernicus Climate Change Service (C3S), and analyse air and sea surface temperatures comparing July 2023 record breaking values with climatologies. We will:

  • Calculate a global surface temperature climatology

  • Compute and visualise anomalies with respect to the climatology

  • View time series and rank global surface temperature records

  • Analyse North Atlantic sea surface temperature trends


Prepare your environment#

Set up the CDSAPI and your credentials#

The code below will ensure that the cdsapi package is installed. If you have not setup your ~/.cdsapirc file with your credenials, you can replace None with your credentials that can be found on the how to api page (you will need to log in to see your credentials).

!pip install -q cdsapi
# If you have already setup your .cdsapirc file you can leave this as None
cdsapi_key = None
cdsapi_url = None

(Install and) Import libraries#

We will be working with data in NetCDF format. To best handle this data we will use libraries for working with multidimensional arrays, in particular Xarray. We will also need libraries for plotting and viewing data, in this case we will use Matplotlib and Cartopy.

import urllib3
urllib3.disable_warnings()

import numpy as np
import xarray as xr
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
import cdsapi

plt.style.use('bmh')

Specify data directory#

# Directory to store data
DATADIR = './data_dir/'
# Create this directory if it doesn't exist
os.makedirs(DATADIR, exist_ok=True)

Explore Data#

To search for data, visit the CDS website: https://cds.climate.copernicus.eu. Here you can search for ERA5 data using the search bar. The data we need for this tutorial is the ERA5 monthly averaged data on single levels from 1940 to present. ERA5 is the 5th version of the ECMWF Reanalysis dataset. Reanalysis uses a state of the art forecast model and data assimilation system to create a consistent “map without gaps” of observed and modelled climate variables over the past decades.

Search for the data#

Having selected the correct dataset, we now need to specify what product type, variables, temporal and geographic coverage we are interested in. These can all be selected in the “Download data” tab. In this tab a form appears in which we will select the following parameters to download:

Parameters of data to download
  • Product type: Monthly averaged reanalysis

  • Variable: 2m temperature

  • Year: 1940 to present year

  • Month: july

  • Time: 00:00 (default)

  • Geographical area: Sub-region extraction and leave the default values, 'area': [90, -180, -90, 180], # maxlat, minlon, minlat, maxlon (this will be explained further below)

  • Format: NetCDF

At the end of the download forms, select “Show API request”. This will reveal a block of code, which you can simply copy and paste into a cell of your Jupyter Notebook (see cell below). Having copied the API request into the cell below, running this will retrieve and download the data you requested into your local directory.

Please note that text below will appear in a warning box when rendered on the JupyterBook html pages

:warning: Please remember to accept the terms and conditions of the dataset, at the bottom of the CDS download form!

Download the Data#

With the API request copied into the cell below, running this cell will retrieve and download the data you requested into your local directory.

c = cdsapi.Client()
c.retrieve(
    'reanalysis-era5-single-levels-monthly-means',
    {
        'data_format': 'netcdf_legacy',
        'product_type': 'monthly_averaged_reanalysis',
        'variable': '2m_temperature',
        'year': [
            '1940', '1941', '1942',
            '1943', '1944', '1945',
            '1946', '1947', '1948',
            '1949', '1950', '1951',
            '1952', '1953', '1954',
            '1955', '1956', '1957',
            '1958', '1959', '1960',
            '1961', '1962', '1963',
            '1964', '1965', '1966',
            '1967', '1968', '1969',
            '1970', '1971', '1972',
            '1973', '1974', '1975',
            '1976', '1977', '1978',
            '1979', '1980', '1981',
            '1982', '1983', '1984',
            '1985', '1986', '1987',
            '1988', '1989', '1990',
            '1991', '1992', '1993',
            '1994', '1995', '1996',
            '1997', '1998', '1999',
            '2000', '2001', '2002',
            '2003', '2004', '2005',
            '2006', '2007', '2008',
            '2009', '2010', '2011',
            '2012', '2013', '2014',
            '2015', '2016', '2017',
            '2018', '2019', '2020',
            '2021', '2022', '2023',
        ],
        'month': '07',
        'time': '00:00',
        'area': [90, -180, -90, 180], # maxlat, minlon, minlat, maxlon
    },
    f'{DATADIR}t2m_global_july_1940-2023.nc'
)
2024-06-16 21:46:19,158 INFO Welcome to the CDS
2024-06-16 21:46:19,158 INFO Sending request to https://cds.climate.copernicus.eu/api/v2/resources/reanalysis-era5-single-levels-monthly-means
2024-06-16 21:46:19,282 INFO Request is queued
2024-06-16 21:46:20,347 INFO Request is running
2024-06-16 21:46:40,413 INFO Request is completed
2024-06-16 21:46:40,413 INFO Downloading https://download-0000-clone.copernicus-climate.eu/cache-compute-0000/cache/data5/adaptor.mars.internal-1718567197.247748-30483-9-b46e2bfe-4461-43da-a87b-bd7e7a603f2a.nc to ./DATA-2024-06-16/t2m_global_july_1940-2023.nc (166.4M)
2024-06-16 21:47:36,309 INFO Download rate 3M/s                                                                        
Result(content_length=174434368,content_type=application/x-netcdf,location=https://download-0000-clone.copernicus-climate.eu/cache-compute-0000/cache/data5/adaptor.mars.internal-1718567197.247748-30483-9-b46e2bfe-4461-43da-a87b-bd7e7a603f2a.nc)

Inspect data#

Now that we have downloaded the data, we can inspect it. We have requested the data in NetCDF format. This is a commonly used format for array-oriented scientific data. To read and process this data we will make use of the Xarray library. Xarray is an open source project and Python package that makes working with labelled multi-dimensional arrays simple and efficient. We will read the data from our NetCDF file into an xarray.Dataset.

ds = xr.open_dataset(f'{DATADIR}t2m_global_july_1940-2023.nc')

Now we can query our newly created Xarray dataset… Let’s have a look at the ds.

ds
<xarray.Dataset>
Dimensions:    (longitude: 1440, latitude: 721, time: 84)
Coordinates:
  * longitude  (longitude) float32 -180.0 -179.8 -179.5 ... 179.2 179.5 179.8
  * latitude   (latitude) float32 90.0 89.75 89.5 89.25 ... -89.5 -89.75 -90.0
  * time       (time) datetime64[ns] 1940-07-01 1941-07-01 ... 2023-07-01
Data variables:
    t2m        (time, latitude, longitude) float32 ...
Attributes:
    Conventions:  CF-1.6
    history:      2024-06-16 19:46:38 GMT by grib_to_netcdf-2.28.1: /opt/ecmw...

We see that the dataset has one variable called t2m, which stands for “2 metre temperature”, and three coordinates of longitude, latitude and time.

Before we have commented to choose the sub-region extraction option but selecting the whole region in the request to the CDSAPI, 'area': [90, -180, -90, 180], # maxlat, minlon, minlat, maxlon. If we had chosen the whole available region option then the longitudes in the Xarray Dataset would be on a [0, 360] grid instead of [-180, 180] and we should correct this performing some extra steps.

If your longitudes are on a [0, 360] grid you could do the following to bring the longitude coordinates to a [-180, 180] grid:

ds_180 = ds.assign_coords(
    longitude=(((ds.longitude + 180) % 360) - 180)
).sortby('longitude')

There is also an expver coordinate. More on this later.

Select the icons to the right of the table above to expand the attributes of the coordinates and data variables. What are the units of the temperature data?

While an Xarray dataset may contain multiple variables, an Xarray data array holds a single multi-dimensional variable and its coordinates. To make the processing of the t2m data easier, we convert it into an Xarray data array. We will call it da_tmp (a temporary data array) because we will transform the data in some ways.

da_tmp = ds['t2m']

Let’s view this data:

da_tmp
<xarray.DataArray 't2m' (time: 84, latitude: 721, longitude: 1440)>
[87212160 values with dtype=float32]
Coordinates:
  * longitude  (longitude) float32 -180.0 -179.8 -179.5 ... 179.2 179.5 179.8
  * latitude   (latitude) float32 90.0 89.75 89.5 89.25 ... -89.5 -89.75 -90.0
  * time       (time) datetime64[ns] 1940-07-01 1941-07-01 ... 2023-07-01
Attributes:
    units:      K
    long_name:  2 metre temperature

From the result of the cell above you can see that now we have a xarray.DataArray.

Merge the two ERA5 experiments (1 and 5, expver = [1,5])#

The hourly data from ERA5 are made available in the Climate Data Store with a delay of about 5 days behind real-time. These data are only preliminary, also called initial release. If a serious problem is detected by the ERA5 team, ERA5 may be rerun. This happens extremely rarely. This preliminary dataset is called ERA5T (‘T’ for temporary) and the corresponding ERA5 experiment version (or expver) is expver=5. The final ERA5 data are released in the CDS with a 3-month delay. They correspond to the ERA5 experiment version expver=1 and overwrite the data from expver=5. The same distinction also applies to the monthly means. See ERA5 documentation and the announcement of release of ERA5T for more information.

When we use near real time ERA5 data from the CDS the requests may return a mix of ERA5 and ERA5T data. In order to differentiate between the two, a new coordinate, expver, is added to the netcdf file. If we work only with ERA5 or ERA5T, not a mix of both, then the expver coordinate will not be present. To take this into account we may create code that checks for this:

if 'expver' in da_tmp.coords:
    da_tmp = da_tmp.sum(dim='expver')

Let’s check again the da_tmp data array. If there was an expver coordinate we reduce this dimension by performing a nansum operation, i.e. a sum of the array elements over this axis, treating Not a Numbers (NaNs) as zero. The result is a new xarray.DataArray merging the data along the expver dimension:

da_tmp
<xarray.DataArray 't2m' (time: 84, latitude: 721, longitude: 1440)>
[87212160 values with dtype=float32]
Coordinates:
  * longitude  (longitude) float32 -180.0 -179.8 -179.5 ... 179.2 179.5 179.8
  * latitude   (latitude) float32 90.0 89.75 89.5 89.25 ... -89.5 -89.75 -90.0
  * time       (time) datetime64[ns] 1940-07-01 1941-07-01 ... 2023-07-01
Attributes:
    units:      K
    long_name:  2 metre temperature

Now the data array contains the three expected dimensions: time, latitude and longitude.

Change temperature units from Kelvin to Celsius#

Notice that the ERA5 temperature data are in units of Kelvin, the base unit for temperature in the International System of Units (SI). If you want to convert the values from Kelvin to degrees Celsius, you have to subtract 273.15.

# Subtract 273.15 to convert to Celsius
da_celsius = da_tmp - 273.15
# Update the attributes of the DataArray to add Celsius units
da_celsius = da_celsius.assign_attrs(da_tmp.attrs)
da_celsius.attrs['units'] = '°C'

Data to be used#

The da_celsius data array will be used in the rest of the surface temperature exercise. Let’s check what we have:

da_celsius
<xarray.DataArray 't2m' (time: 84, latitude: 721, longitude: 1440)>
array([[[  0.85357666,   0.85357666,   0.85357666, ...,   0.85357666,
           0.85357666,   0.85357666],
        [  0.8590088 ,   0.8590088 ,   0.8590088 , ...,   0.8590088 ,
           0.8590088 ,   0.8590088 ],
        [  0.85357666,   0.85357666,   0.85357666, ...,   0.85357666,
           0.85357666,   0.85357666],
        ...,
        [-55.322693  , -55.320877  , -55.320877  , ..., -55.326324  ,
         -55.326324  , -55.324493  ],
        [-55.233887  , -55.233887  , -55.23207   , ..., -55.23752   ,
         -55.23752   , -55.233887  ],
        [-54.980164  , -54.980164  , -54.980164  , ..., -54.980164  ,
         -54.980164  , -54.980164  ]],

       [[  0.89889526,   0.89889526,   0.89889526, ...,   0.89889526,
           0.89889526,   0.89889526],
        [  0.8952637 ,   0.8952637 ,   0.8952637 , ...,   0.8952637 ,
           0.8952637 ,   0.8952637 ],
        [  0.8807678 ,   0.8807678 ,   0.8807678 , ...,   0.8825989 ,
           0.8807678 ,   0.8807678 ],
...
        [-56.488007  , -56.482574  , -56.478943  , ..., -56.498886  ,
         -56.495255  , -56.49164   ],
        [-56.727234  , -56.725418  , -56.723602  , ..., -56.732666  ,
         -56.730865  , -56.72905   ],
        [-56.506134  , -56.506134  , -56.506134  , ..., -56.506134  ,
         -56.506134  , -56.506134  ]],

       [[  0.73760986,   0.73760986,   0.73760986, ...,   0.73760986,
           0.73760986,   0.73760986],
        [  0.7321472 ,   0.7321472 ,   0.7321472 , ...,   0.7321472 ,
           0.7321472 ,   0.7321472 ],
        [  0.72128296,   0.72128296,   0.72128296, ...,   0.72128296,
           0.72128296,   0.72128296],
        ...,
        [-53.12979   , -53.12616   , -53.122543  , ..., -53.14067   ,
         -53.135223  , -53.131607  ],
        [-53.238525  , -53.236725  , -53.23491   , ..., -53.243973  ,
         -53.243973  , -53.242157  ],
        [-53.29471   , -53.29471   , -53.29471   , ..., -53.29471   ,
         -53.29471   , -53.29471   ]]], dtype=float32)
Coordinates:
  * longitude  (longitude) float32 -180.0 -179.8 -179.5 ... 179.2 179.5 179.8
  * latitude   (latitude) float32 90.0 89.75 89.5 89.25 ... -89.5 -89.75 -90.0
  * time       (time) datetime64[ns] 1940-07-01 1941-07-01 ... 2023-07-01
Attributes:
    units:      °C
    long_name:  2 metre temperature

Now we can see the updated values in Celsius and the units attribute updated accordingly.

Plotting one timestep#

Just to check what we have so far, let’s plot a map of 2m temperature for the first (July 1940) and the last (July 2023) timesteps. We will plot these maps using the convenience method plot available for xarray.DataArray. This allows the creation of simple plots using one line of code. Also, with the xarray method sel(), you can select a data array based on coordinate labels.

da_celsius.sel(time='1940').plot()
<matplotlib.collections.QuadMesh at 0x2d3dfc4ffd0>
../../_images/d2bd9a057937e4274b11f5b081a2fa4f0f58f24c703cadd631994e0fe2ef3737.png
da_celsius.sel(time='2023').plot()
<matplotlib.collections.QuadMesh at 0x2d3972944c0>
../../_images/9a90c02f856228c40b40655db736e38cebeda611d83d51c1b0b18ab7fe32e526.png

Calculate a surface temperature climatology: reference period 1991-2020#

Standard reference periods and climatologies#

Anthropogenic activities and natural variations from years to decades shape the Earth’s climate. In order to evaluate anomalous conditions for a specific month or year, the World Meteorological Organization (WMO) defines standard reference periods used to create climatologies, also known as climate normals. Climatologies can be considered as the typical climate for the period they are based on.

Until 2020, the most current and widely used standard reference period was the 30-year range of 1981-2010. With the start of 2021, the WMO recommended updating the climate normal reference period to the range 1991-2020.

First, let us calculate the temperature climatology for July during the reference period 1991-2020. For this, we will select the reference period and then average along the time dimension. The resulting object contains the average July mean surface air temperature on each grid point.

t2m_ref_per = da_celsius.sel(time=slice('1991-01-01', '2020-12-31')).mean(dim='time')

If we have a look at this data object we will see now we have only two coordinates, latitude and longitude.

t2m_ref_per
<xarray.DataArray 't2m' (latitude: 721, longitude: 1440)>
array([[  0.74816996,   0.74816996,   0.74816996, ...,   0.74816996,
          0.74816996,   0.74816996],
       [  0.7403127 ,   0.74037373,   0.7404948 , ...,   0.73977053,
          0.74001056,   0.74019164],
       [  0.72756857,   0.7276896 ,   0.7279307 , ...,   0.726238  ,
          0.7267222 ,   0.7272064 ],
       ...,
       [-55.41742   , -55.414043  , -55.41071   , ..., -55.427814  ,
        -55.42436   , -55.420856  ],
       [-55.408604  , -55.40709   , -55.40564   , ..., -55.413795  ,
        -55.41198   , -55.410347  ],
       [-55.04577   , -55.04577   , -55.04577   , ..., -55.04577   ,
        -55.04577   , -55.04577   ]], dtype=float32)
Coordinates:
  * longitude  (longitude) float32 -180.0 -179.8 -179.5 ... 179.2 179.5 179.8
  * latitude   (latitude) float32 90.0 89.75 89.5 89.25 ... -89.5 -89.75 -90.0

We can also make a quick plot to have an exploratory view of this new xarray.DataArray:

t2m_ref_per.plot()
<matplotlib.collections.QuadMesh at 0x2d394e6d000>
../../_images/a8031cec61d31e66e485ff0ac1743db8332b1a6a550a058850f18814308a35ba.png

Visualise surface temperature anomalies#

The next step is now to calculate the anomaly for July 2023 with respect to the climatology (1991-2020). The term anomaly refers to the deviation of a value from the long-term average. Positive or negative anomalies indicate that the average temperatures of a particular month were respectively warmer or cooler than the reference value for the same month.

Let us calculate the temperature anomaly for the year 2023. In a first step, we select the average near-surface temperature values for the year 2023 from the xarray.DataArray object da_celsius. As commented before, with the xarray method sel(), you can select a data array based on coordinate labels. The coordinate label of interest is year='2023'. As we are not doing, for instance, an aggregation operation like the mean over a dimension we will have the time dimension in the resulting data array. To remove the time dimension we can use the squeeze method.

t2m_july2023 = da_celsius.sel(time='2023')
t2m_july2023
<xarray.DataArray 't2m' (time: 1, latitude: 721, longitude: 1440)>
array([[[  0.73760986,   0.73760986,   0.73760986, ...,   0.73760986,
           0.73760986,   0.73760986],
        [  0.7321472 ,   0.7321472 ,   0.7321472 , ...,   0.7321472 ,
           0.7321472 ,   0.7321472 ],
        [  0.72128296,   0.72128296,   0.72128296, ...,   0.72128296,
           0.72128296,   0.72128296],
        ...,
        [-53.12979   , -53.12616   , -53.122543  , ..., -53.14067   ,
         -53.135223  , -53.131607  ],
        [-53.238525  , -53.236725  , -53.23491   , ..., -53.243973  ,
         -53.243973  , -53.242157  ],
        [-53.29471   , -53.29471   , -53.29471   , ..., -53.29471   ,
         -53.29471   , -53.29471   ]]], dtype=float32)
Coordinates:
  * longitude  (longitude) float32 -180.0 -179.8 -179.5 ... 179.2 179.5 179.8
  * latitude   (latitude) float32 90.0 89.75 89.5 89.25 ... -89.5 -89.75 -90.0
  * time       (time) datetime64[ns] 2023-07-01
Attributes:
    units:      °C
    long_name:  2 metre temperature
t2m_july2023 = t2m_july2023.squeeze('time')
t2m_july2023
<xarray.DataArray 't2m' (latitude: 721, longitude: 1440)>
array([[  0.73760986,   0.73760986,   0.73760986, ...,   0.73760986,
          0.73760986,   0.73760986],
       [  0.7321472 ,   0.7321472 ,   0.7321472 , ...,   0.7321472 ,
          0.7321472 ,   0.7321472 ],
       [  0.72128296,   0.72128296,   0.72128296, ...,   0.72128296,
          0.72128296,   0.72128296],
       ...,
       [-53.12979   , -53.12616   , -53.122543  , ..., -53.14067   ,
        -53.135223  , -53.131607  ],
       [-53.238525  , -53.236725  , -53.23491   , ..., -53.243973  ,
        -53.243973  , -53.242157  ],
       [-53.29471   , -53.29471   , -53.29471   , ..., -53.29471   ,
        -53.29471   , -53.29471   ]], dtype=float32)
Coordinates:
  * longitude  (longitude) float32 -180.0 -179.8 -179.5 ... 179.2 179.5 179.8
  * latitude   (latitude) float32 90.0 89.75 89.5 89.25 ... -89.5 -89.75 -90.0
    time       datetime64[ns] 2023-07-01
Attributes:
    units:      °C
    long_name:  2 metre temperature

The anomaly will be the difference between t2m_july2023 and t2m_ref_per. A positive value means July 2023 is above the expected mean:

anom = t2m_july2023 - t2m_ref_per

The previous operation results in the anomaly on each longitude and latitude location stored in the anom data array. We can plot this in a map to check where the anomaly was positive (July 2023 warmer than the climatology) or negative (July 2023 colder than the climatology). This time we will create the plot using the matplotlib and cartopy libraries.

The code below is briefly commented in the code cell. The key part is plt.pcolormesh, which is used to plot the 2D data array on each (longitude, latitude) grid cell. The colours are defined using the RdBr_r colormap in a range between -12 and 12 (ºC). This is a diverging colormap that goes from red to blue but in reverse order, this is why it has the _r suffix. So, in this specific case, -12 degrees is shown in dark blue, 12 in dark red and 0, equal to the climate normal, in white.

# create the figure panel and the map using the Cartopy PlateCarree projection
fig, ax = plt.subplots(1, 1, figsize = (16, 8), subplot_kw={'projection': ccrs.PlateCarree()})

# Plot the data
im = plt.pcolormesh(
    anom.longitude, 
    anom.latitude, 
    anom, 
    cmap='RdBu_r',
    vmin=-12, 
    vmax=12
) 

# Set the figure title, add lat/lon grid and coastlines
ax.set_title('Temperature anomaly for July 2023 (with respect to 1991-2020 July mean)', fontsize=16)
ax.gridlines(
    draw_labels=True, 
    linewidth=1, 
    color='gray', 
    alpha=0.5, 
    linestyle='--'
) 
ax.coastlines(color='black')

# Specify the colorbar and set a label for the colorbar
cbar = plt.colorbar(im, fraction=0.05, pad=0.04)
cbar.set_label('Temperature anomaly') 

# Show or save the figure, uncomment the line/s in case of need
#fig.show() # not needed in a notebook inline context
#fig.savefig('near_sfc_t2m_anomaly_july2023.png') # not needed in a notebook inline context
C:\Users\cxcs\AppData\Local\anaconda3\lib\site-packages\cartopy\io\__init__.py:241: DownloadWarning: Downloading: https://naturalearth.s3.amazonaws.com/110m_physical/ne_110m_coastline.zip
  warnings.warn(f'Downloading: {url}', DownloadWarning)
../../_images/37561381c9dd00894399a4b78c2d7162ff5c0d389e3271ba515dcb01210e9272.png