GriddingMachine Tutorial 3

How to use the Indexer module of GriddingMachine.jl?
Tutorial
Author

Yujie Wang

Published

January 30, 2024

Modified

October 29, 2024

The Indexer module of the GriddingMachine.jl is designed to read the entire or partial data from GriddingMachine datasets, which are as netcdf files.

By default, each dataset contains a data and std field, meaning the actual data and the standard deviation, respectively. The primary function provided will read the data and std at the same time, and return the two as a tuple, even if the std may be filled with NaNs. There are only three functions in the Indexer module at the moment, but each function could support multiple methods supported by Julia’s multiple dispatch feature. I will cover the functions in the tutorial below.

lat_ind and lon_ind

All data points in each GriddingMachine dataset represent the mean or sum of the entire grid (not the value on an edge). Therefore, when we read the data using latitude and longitude, by default, the Indexer will first determine where the target site is located and then return the data and std of that particular grid. So, lat_ind and lon_ind are designed to compute the indices of the grid in latitudinal and longitudinal directions (with latitude from 90S to 90N, and longitude from 180W to 180E). For example, one can use the following command to get the latitudinal index:

julia> ilat = lat_ind(32.58; res = 0.25)

here the 32.58 is the latitude and option res is the spatial resolution (0.25 degree) in the latitudinal direction. Similarly, the longitude index can be computed from

julia> ilon = lon_ind(115.33, res = 1/12)

here 115.33 is the longitude and 1/12 is the spatial resolution (1/12 degree) in the longitudinal direction.

Please note here that the lat_ind and lon_ind functions are meant to be used by the read_LUT function below, but users can import these functions for research purposes (simple functions to write, but will save you some time to round up the numbers).

read_LUT (v0.3)

Version 0.3 differed from v0.2 in the following

  • FT control removed
  • fn can be file path or GriddingMachine tag (only accepts file path in v0.2)

The read_LUT function means read look-up table, which is a latitude-longitude(-time) table rather than indexed by plant function type. This function supports several methods:

  • read_LUT(fn::String) to read the entire dataset (2D or 3D dataset)
  • read_LUT(fn::String, cyc::Int) to read a slice of data in time for a 3D dataset
  • read_LUT(fn::String, lat::Number, lon::Number; interpolation::Bool = false) to read the data at a site (2D dataset) or in a column (3D dataset)
  • read_LUT(fn::String, lat::Number, lon::Number, cyc::Int; interpolation::Bool = false) to read the data at a site and time for a 3D dataset

Here,

  • fn is the file name of the dataset or GriddingMachine tag
  • cyc is the index in time
  • lat is the latitude
  • lon is the longitude
  • interpolation is an optional bool (default is false) to determine whether to interpolate the data based on the latitude and longitude

For example,

using GriddingMachine.Collector: download_artifact!
using GriddingMachine.Indexer: read_LUT
julia> ds, ds_std = read_LUT(download_artifact!("VCMAX_2X_1Y_V2"));
julia> ds, ds_std = read_LUT("VCMAX_2X_1Y_V2");
julia> mean_site, std_site = read_LUT(download_artifact!("VCMAX_2X_1Y_V2"), 32.33, 118.03);
julia> mean_site, std_site = read_LUT("VCMAX_2X_1Y_V2", 32.33, 118.03);

Because the methods are pretty straightforward given the input variable names, I will not go through each method. You may go to the CI tests for more examples.

Note here because these functions are all set to be private functions, users need to import these functions before using them (I designed it this way to avoid repeated function names with other modules). This design logic applies to most of the modules I wrote, such as CliMA Land and Emerald.

read_LUT (v0.2)

The read_LUT function means read look-up table, which is a latitude-longitude(-time) table rather than indexed by plant function type. This function supports several methods:

  • read_LUT(fn::String, FT::DataType = Float32) to read the entire dataset (2D or 3D dataset)
  • read_LUT(fn::String, cyc::Int, FT::DataType = Float32) to read a slice of data in time for a 3D dataset
  • read_LUT(fn::String, lat::Number, lon::Number, FT::DataType = Float32; interpolation::Bool = false) to read the data at a site (2D dataset) or in a column (3D dataset)
  • read_LUT(fn::String, lat::Number, lon::Number, cyc::Int, FT::DataType = Float32; interpolation::Bool = false) to read the data at a site and time for a 3D dataset

Here,

  • FT is the output floating number type

For example,

using GriddingMachine.Collector: query_collection, vcmax_collection
using GriddingMachine.Indexer: read_LUT
julia> ds,ds_std = read_LUT(query_collection(vcmax_collection()));
julia> mean_site, std_site = read_LUT(query_collection(vcmax_collection()), 32.33, 118.03);