A More Complicated Example: Calculating Climatological Averages

The previous example showed one of the most basic GeoCAT-comp functions. This example showcases how some functions in GeoCAT-comp have a variety of options that can be selected using keyword arguments. The particular function we’ll be looking at is climatology_average.

The datafile we’re using contains surface temperature data at one location generated by numerous weather models in a model ensemble. The temporal resolution is 6 hours from January 1st 1990 to January 1st 1996. We want to take the data from each model and summarize it. We can do this in a few different ways with climatology_average. We can find the daily, monthly, or seasonal climatological averages.

Dependencies

  • geocat-comp

  • geocat-datafiles

  • xarray

  • nc-time-axis

Imports

[1]:
from geocat.comp import climatology_average

import geocat.datafiles as gdf
import xarray as xr
import nc_time_axis
Downloading file 'registry.txt' from 'https://github.com/NCAR/GeoCAT-datafiles/raw/main/registry.txt' to '/home/docs/.cache/geocat'.

Retrieving sample data from GeoCAT-datafiles

GeoCAT-datafiles is a software package that makes it simple to download datafiles from the pacakge’s GitHub repository. The repository holds data files of different types such as NetCDF and GRIB. They are used for GeoCAT tutorials, demonstrations, and test cases. You won’t need to use GeoCAT-datafiles when working with your own data.

[2]:
nc_file = gdf.get('netcdf_files/atm.20C.hourly6-1990-1995-TS.nc')
nc_file
Downloading file 'netcdf_files/atm.20C.hourly6-1990-1995-TS.nc' from 'https://github.com/NCAR/GeoCAT-datafiles/raw/main/netcdf_files/atm.20C.hourly6-1990-1995-TS.nc' to '/home/docs/.cache/geocat'.
[2]:
'/home/docs/.cache/geocat/netcdf_files/atm.20C.hourly6-1990-1995-TS.nc'

Parsing the data with xarray

GeoCAT-comp is designed to work with `xarray <https://xarray.dev/>`__ and `numpy <https://numpy.org/>`__ data structures. Here we will be using xarray to import the netCDF data.

[3]:
ds = xr.open_dataset(nc_file)
ds
[3]:
<xarray.Dataset>
Dimensions:    (member_id: 40, time: 8761, nbnd: 2)
Coordinates:
    lat        float64 ...
    lon        float64 ...
  * member_id  (member_id) int64 1 2 3 4 5 6 7 8 ... 34 35 101 102 103 104 105
  * time       (time) object 1990-01-01 00:00:00 ... 1996-01-01 00:00:00
    time_bnds  (time, nbnd) object ...
Dimensions without coordinates: nbnd
Data variables:
    TS         (member_id, time) float32 ...
Attributes: (12/13)
    Conventions:               CF-1.0
    NCO:                       4.3.4
    Version:                   $Name$
    important_note:            This data is part of the project 'Blind Evalua...
    initial_file:              b.e11.B20TRC5CNBDRD.f09_g16.001.cam.i.1920-01-...
    logname:                   mudryk
    ...                        ...
    revision_Id:               $Id$
    source:                    CAM
    title:                     UNSET
    topography_file:           /scratch/p/pjk/mudryk/cesm1_1_2_LENS/inputdata...
    intake_esm_varname:        TS
    intake_esm_dataset_key:    atm.20C.hourly6-1990-2005

We can see here that the datafile has surface temperature data from 40 model ensemble members. Let’s simplify things and just look at one model for now. We use xarray’s DataArray.isel method to pick out the data for the first member_id at index 0.

[4]:
ds = ds.isel(member_id=0)  # select one model from the ensemble
temp = ds.TS  # surface temperature data
temp
[4]:
<xarray.DataArray 'TS' (time: 8761)>
[8761 values with dtype=float32]
Coordinates:
    lat        float64 ...
    lon        float64 ...
    member_id  int64 1
  * time       (time) object 1990-01-01 00:00:00 ... 1996-01-01 00:00:00
Attributes:
    long_name:  Surface temperature (radiative)
    units:      K

Now that we have the temperature data for the first ensemble member, let’s plot it. xarray has built in methods to quickly plot data. The generated plot isn’t publication quality, but it’s a quick and dirty way to see what your dataset contains.

[5]:
temp.plot()
[5]:
[<matplotlib.lines.Line2D at 0x7f04d29ffe80>]
../../_images/notebooks_geocat-comp_02-more_advanced_functionality_11_1.png

This plot shows that the temperature doesn’t change much throughout the year. This makes sense since the data is for the location 0.4712N 180E, which is in the Pacific Ocean just north of the equator. There is some variation though, and it’d be interesting to know what the average temperature is in each month. We can calculate this using climatology_average.

Using climatology_average

We can see in the documentation that climatology_average calculates long term hourly, daily, monthly, or seasonal averages across all years in a given dataset. This is a good function to quickly summarize our data, so let’s use it.

Adjusting Parameters

Monthly climate averages are great, but what else can climatology_average do? We can also calculate hourly, daily, and seasonal climate averages. For this dataset, we don’t have a fine enough temporal resolution to compute an hourly average, but we can compute daily averages. The only change to the function call is passing in day instead of month.

[8]:
daily_temp = climatology_average(temp, freq='day')
daily_temp
[8]:
<xarray.DataArray 'TS' (time: 365)>
array([299.90625, 299.9027 , 299.9038 , 299.89267, 299.89645, 299.90247,
       299.88416, 299.86523, 299.8602 , 299.84595, 299.8206 , 299.79782,
       299.7791 , 299.7557 , 299.72598, 299.70053, 299.66797, 299.6409 ,
       299.63803, 299.62735, 299.60635, 299.598  , 299.58813, 299.57175,
       299.55392, 299.5314 , 299.5034 , 299.48175, 299.47153, 299.47998,
       299.4739 , 299.4499 , 299.42944, 299.4248 , 299.43323, 299.44882,
       299.45566, 299.45822, 299.47192, 299.4838 , 299.48218, 299.4673 ,
       299.44363, 299.43234, 299.41302, 299.386  , 299.36398, 299.34192,
       299.32126, 299.32187, 299.33716, 299.321  , 299.29294, 299.27682,
       299.26843, 299.27237, 299.28223, 299.2999 , 299.31555, 299.32324,
       299.32336, 299.29977, 299.27838, 299.28983, 299.305  , 299.32205,
       299.34653, 299.3856 , 299.42453, 299.4302 , 299.41373, 299.40256,
       299.4077 , 299.4224 , 299.4483 , 299.48758, 299.52597, 299.54443,
       299.54855, 299.55234, 299.53928, 299.5068 , 299.4921 , 299.5035 ,
       299.52048, 299.54846, 299.5812 , 299.60837, 299.65668, 299.71072,
       299.75012, 299.77316, 299.77704, 299.77573, 299.76965, 299.76822,
       299.7783 , 299.79297, 299.81876, 299.8422 , 299.8571 , 299.8822 ,
       299.89114, 299.8904 , 299.88116, 299.87808, 299.89078, 299.90726,
       299.89944, 299.88193, 299.86453, 299.84775, 299.86725, 299.9086 ,
       299.92896, 299.94403, 299.97934, 300.02603, 300.08334, 300.14673,
...
       300.21384, 300.14963, 300.09665, 300.0553 , 300.02277, 299.99667,
       299.9679 , 299.9452 , 299.91772, 299.90115, 299.9091 , 299.92722,
       299.9534 , 299.9951 , 300.02258, 300.00687, 299.9744 , 299.9299 ,
       299.8682 , 299.8135 , 299.78452, 299.7746 , 299.7549 , 299.7144 ,
       299.68152, 299.6678 , 299.66446, 299.66776, 299.6966 , 299.7579 ,
       299.82526, 299.87814, 299.91672, 299.94186, 299.9529 , 299.973  ,
       300.01187, 300.03363, 300.0538 , 300.08664, 300.11258, 300.12372,
       300.13623, 300.14963, 300.1493 , 300.14633, 300.13452, 300.11118,
       300.09152, 300.06946, 300.05896, 300.04553, 300.0262 , 300.0137 ,
       300.00223, 300.00314, 300.02576, 300.0657 , 300.12503, 300.19614,
       300.25937, 300.2955 , 300.2975 , 300.2842 , 300.26498, 300.23477,
       300.20242, 300.202  , 300.23282, 300.25613, 300.2426 , 300.20947,
       300.17593, 300.1535 , 300.14764, 300.15042, 300.14536, 300.12527,
       300.1005 , 300.0703 , 300.0401 , 300.01175, 299.9951 , 299.99915,
       300.005  , 300.01852, 300.03638, 300.03754, 300.02664, 300.02902,
       300.04208, 300.05743, 300.0775 , 300.10675, 300.13696, 300.14713,
       300.14026, 300.12463, 300.10367, 300.08554, 300.05856, 300.02856,
       300.01175, 300.0094 , 299.98972, 299.941  , 299.89905, 299.87585,
       299.86655, 299.87582, 299.89938, 299.92636, 299.94885],
      dtype=float32)
Coordinates:
    lat        float64 ...
    lon        float64 ...
    member_id  int64 1
  * time       (time) object 1993-01-01 12:00:00 ... 1993-12-31 12:00:00
Attributes:
    long_name:  Surface temperature (radiative)
    units:      K
[9]:
daily_temp.plot()
[9]:
[<matplotlib.lines.Line2D at 0x7f04ca6ae020>]
../../_images/notebooks_geocat-comp_02-more_advanced_functionality_22_1.png