A More Complicated Example: Calculating Climatological Averages¶
The previous example showed one of the most basic GeoCAT-comp functions. This example showcases how some functions in GeoCAT-comp have a variety of options that can be selected using keyword arguments. The particular function we’ll be looking at is climatology_average
.
The datafile we’re using contains surface temperature data at one location generated by numerous weather models in a model ensemble. The temporal resolution is 6 hours from January 1st 1990 to January 1st 1996. We want to take the data from each model and summarize it. We can do this in a few different ways with climatology_average
. We can find the daily, monthly, or seasonal climatological averages.
Dependencies¶
geocat-comp
geocat-datafiles
xarray
nc-time-axis
Imports¶
[1]:
from geocat.comp import climatology_average
import geocat.datafiles as gdf
import xarray as xr
import nc_time_axis
Downloading file 'registry.txt' from 'https://github.com/NCAR/GeoCAT-datafiles/raw/main/registry.txt' to '/home/docs/.cache/geocat'.
Retrieving sample data from GeoCAT-datafiles¶
GeoCAT-datafiles is a software package that makes it simple to download datafiles from the pacakge’s GitHub repository. The repository holds data files of different types such as NetCDF and GRIB. They are used for GeoCAT tutorials, demonstrations, and test cases. You won’t need to use GeoCAT-datafiles when working with your own data.
[2]:
nc_file = gdf.get('netcdf_files/atm.20C.hourly6-1990-1995-TS.nc')
nc_file
Downloading file 'netcdf_files/atm.20C.hourly6-1990-1995-TS.nc' from 'https://github.com/NCAR/GeoCAT-datafiles/raw/main/netcdf_files/atm.20C.hourly6-1990-1995-TS.nc' to '/home/docs/.cache/geocat'.
[2]:
'/home/docs/.cache/geocat/netcdf_files/atm.20C.hourly6-1990-1995-TS.nc'
Parsing the data with xarray¶
GeoCAT-comp is designed to work with `xarray
<https://xarray.dev/>`__ and `numpy
<https://numpy.org/>`__ data structures. Here we will be using xarray
to import the netCDF data.
[3]:
ds = xr.open_dataset(nc_file)
ds
[3]:
<xarray.Dataset> Dimensions: (member_id: 40, time: 8761, nbnd: 2) Coordinates: lat float64 ... lon float64 ... * member_id (member_id) int64 1 2 3 4 5 6 7 8 ... 34 35 101 102 103 104 105 * time (time) object 1990-01-01 00:00:00 ... 1996-01-01 00:00:00 time_bnds (time, nbnd) object ... Dimensions without coordinates: nbnd Data variables: TS (member_id, time) float32 ... Attributes: (12/13) Conventions: CF-1.0 NCO: 4.3.4 Version: $Name$ important_note: This data is part of the project 'Blind Evalua... initial_file: b.e11.B20TRC5CNBDRD.f09_g16.001.cam.i.1920-01-... logname: mudryk ... ... revision_Id: $Id$ source: CAM title: UNSET topography_file: /scratch/p/pjk/mudryk/cesm1_1_2_LENS/inputdata... intake_esm_varname: TS intake_esm_dataset_key: atm.20C.hourly6-1990-2005
We can see here that the datafile has surface temperature data from 40 model ensemble members. Let’s simplify things and just look at one model for now. We use xarray
’s DataArray.isel
method to pick out the data for the first member_id at index 0.
[4]:
ds = ds.isel(member_id=0) # select one model from the ensemble
temp = ds.TS # surface temperature data
temp
[4]:
<xarray.DataArray 'TS' (time: 8761)> [8761 values with dtype=float32] Coordinates: lat float64 ... lon float64 ... member_id int64 1 * time (time) object 1990-01-01 00:00:00 ... 1996-01-01 00:00:00 Attributes: long_name: Surface temperature (radiative) units: K
Now that we have the temperature data for the first ensemble member, let’s plot it. xarray
has built in methods to quickly plot data. The generated plot isn’t publication quality, but it’s a quick and dirty way to see what your dataset contains.
[5]:
temp.plot()
[5]:
[<matplotlib.lines.Line2D at 0x7f04d29ffe80>]
This plot shows that the temperature doesn’t change much throughout the year. This makes sense since the data is for the location 0.4712N 180E, which is in the Pacific Ocean just north of the equator. There is some variation though, and it’d be interesting to know what the average temperature is in each month. We can calculate this using climatology_average
.
Using climatology_average¶
We can see in the documentation that climatology_average
calculates long term hourly, daily, monthly, or seasonal averages across all years in a given dataset. This is a good function to quickly summarize our data, so let’s use it.
Related NCL Functions¶
Notice at the bottom of the documentation page there is a box with links to related NCL and GeoCAT functions. This indicates all of the NCL functions that climatology_average
is replicating through it’s keyword arguments. We want something similar to the clmMon
family of functions to summarize our data using monthly climate averages. We can see that our chosen GeoCAT function replicates that NCL capability.
Let’s summarize our data using monthly climate averages. Using the keyword argument freq
, we can specify the time span for the average. The accepted options are hour
, day
, month
, and season
.
[6]:
monthly_temp = climatology_average(temp, freq='month')
monthly_temp
[6]:
<xarray.DataArray 'TS' (time: 12)> array([299.70563, 299.382 , 299.45645, 299.87674, 300.33078, 300.91946, 301.19077, 300.82486, 300.08514, 299.94785, 300.152 , 300.01614], dtype=float32) Coordinates: lat float64 ... lon float64 ... member_id int64 1 * time (time) object 1993-01-16 12:00:00 ... 1993-12-16 12:00:00 Attributes: long_name: Surface temperature (radiative) units: K
Notice how much smaller the time dimension is. Before, the time dimension had a length of 8761 points, one for every 6 hour interval over 5 years. Now, the time dimension has a length of 12, one for every month of the year. Notice that the value of the time coordinate is for the middle of each month in 1993. Why 1993? It’s approximatly the middle year in the dataset.
[7]:
monthly_temp.plot()
[7]:
[<matplotlib.lines.Line2D at 0x7f04d28e08e0>]
When we plot this, the data is more granular and has even less variation in temperature from month to month. But we can now see which months are coldest and warmest on average for this location.
Adjusting Parameters¶
Monthly climate averages are great, but what else can climatology_average
do? We can also calculate hourly, daily, and seasonal climate averages. For this dataset, we don’t have a fine enough temporal resolution to compute an hourly average, but we can compute daily averages. The only change to the function call is passing in day
instead of month
.
[8]:
daily_temp = climatology_average(temp, freq='day')
daily_temp
[8]:
<xarray.DataArray 'TS' (time: 365)> array([299.90625, 299.9027 , 299.9038 , 299.89267, 299.89645, 299.90247, 299.88416, 299.86523, 299.8602 , 299.84595, 299.8206 , 299.79782, 299.7791 , 299.7557 , 299.72598, 299.70053, 299.66797, 299.6409 , 299.63803, 299.62735, 299.60635, 299.598 , 299.58813, 299.57175, 299.55392, 299.5314 , 299.5034 , 299.48175, 299.47153, 299.47998, 299.4739 , 299.4499 , 299.42944, 299.4248 , 299.43323, 299.44882, 299.45566, 299.45822, 299.47192, 299.4838 , 299.48218, 299.4673 , 299.44363, 299.43234, 299.41302, 299.386 , 299.36398, 299.34192, 299.32126, 299.32187, 299.33716, 299.321 , 299.29294, 299.27682, 299.26843, 299.27237, 299.28223, 299.2999 , 299.31555, 299.32324, 299.32336, 299.29977, 299.27838, 299.28983, 299.305 , 299.32205, 299.34653, 299.3856 , 299.42453, 299.4302 , 299.41373, 299.40256, 299.4077 , 299.4224 , 299.4483 , 299.48758, 299.52597, 299.54443, 299.54855, 299.55234, 299.53928, 299.5068 , 299.4921 , 299.5035 , 299.52048, 299.54846, 299.5812 , 299.60837, 299.65668, 299.71072, 299.75012, 299.77316, 299.77704, 299.77573, 299.76965, 299.76822, 299.7783 , 299.79297, 299.81876, 299.8422 , 299.8571 , 299.8822 , 299.89114, 299.8904 , 299.88116, 299.87808, 299.89078, 299.90726, 299.89944, 299.88193, 299.86453, 299.84775, 299.86725, 299.9086 , 299.92896, 299.94403, 299.97934, 300.02603, 300.08334, 300.14673, ... 300.21384, 300.14963, 300.09665, 300.0553 , 300.02277, 299.99667, 299.9679 , 299.9452 , 299.91772, 299.90115, 299.9091 , 299.92722, 299.9534 , 299.9951 , 300.02258, 300.00687, 299.9744 , 299.9299 , 299.8682 , 299.8135 , 299.78452, 299.7746 , 299.7549 , 299.7144 , 299.68152, 299.6678 , 299.66446, 299.66776, 299.6966 , 299.7579 , 299.82526, 299.87814, 299.91672, 299.94186, 299.9529 , 299.973 , 300.01187, 300.03363, 300.0538 , 300.08664, 300.11258, 300.12372, 300.13623, 300.14963, 300.1493 , 300.14633, 300.13452, 300.11118, 300.09152, 300.06946, 300.05896, 300.04553, 300.0262 , 300.0137 , 300.00223, 300.00314, 300.02576, 300.0657 , 300.12503, 300.19614, 300.25937, 300.2955 , 300.2975 , 300.2842 , 300.26498, 300.23477, 300.20242, 300.202 , 300.23282, 300.25613, 300.2426 , 300.20947, 300.17593, 300.1535 , 300.14764, 300.15042, 300.14536, 300.12527, 300.1005 , 300.0703 , 300.0401 , 300.01175, 299.9951 , 299.99915, 300.005 , 300.01852, 300.03638, 300.03754, 300.02664, 300.02902, 300.04208, 300.05743, 300.0775 , 300.10675, 300.13696, 300.14713, 300.14026, 300.12463, 300.10367, 300.08554, 300.05856, 300.02856, 300.01175, 300.0094 , 299.98972, 299.941 , 299.89905, 299.87585, 299.86655, 299.87582, 299.89938, 299.92636, 299.94885], dtype=float32) Coordinates: lat float64 ... lon float64 ... member_id int64 1 * time (time) object 1993-01-01 12:00:00 ... 1993-12-31 12:00:00 Attributes: long_name: Surface temperature (radiative) units: K
[9]:
daily_temp.plot()
[9]:
[<matplotlib.lines.Line2D at 0x7f04ca6ae020>]