Site
The Upper Colorado River Basin (UCRB) is a snowmelt-dominated system that covers about 280,000 km2. It extends from headwaters in the Rockies in Colorado and Wyoming to Lees Ferry in Northern Arizona with elevation ranges between 3300m and 900m. During the winter season, from October to the end of April, the snow cover area (SCA) for the UCRB ranges from 50,000 km2 to 280,000 km2 which plays a crucial role in energy28 and hydrological cycles29.
The hydrologic simulation of the UCRB was conducted using the integrated hydrologic model, ParFlow-CLM30,31,32. ParFlow computes both the surface and subsurface fluxes by solving the Richards equation33 in three spatial dimensions together with the kinematic wave equation over a terrain following grid. Furthermore, ParFlow is coupled to a land surface model (Common Land Model; CLM), ParFlow-CLM, to resolve the energy and water balances from the canopy to the ground surface.
The technical details of ParFlow-CLM are well-documented in Maxwell and Miller30, Kollet and Maxwell31,34, Kollet et al.35, Maxwell et al.36, Jefferson and Maxwell37, Maxwell and Condon38 and Kuffour et al.39. ParFlow integrates groundwater and surface water systems using a free surface overland flow boundary condition31. In other words, the surface water and variably saturated groundwater flow equations directly exchange fluxes without a conductance layer. In ParFlow, streams are formed by either Hortonian (excess infiltration40) or Dunne (excess saturation41) runoff without the need of a priori embedded rivers.
CLM is the land surface component of the model. CLM solves the terrestrial energy balance (e.g. net radiation, sensible, latent and ground heat fluxes) in addition to a multi-layer snow model42 and a complete canopy water balance. Sensible and latent heat are solved through a resistance scheme including soil, vegetation and atmospheric resistances43. The ground heat is calculated based on the one-dimensional heat conduction equation35. Ground and sensible heat fluxes are directly dependent on the water content in soil layers which is solved by ParFlow34. Conversely, soil moisture is also dependent on infiltration and plant uptake which is passed back to ParFlow by CLM34,37,44.
The main inputs in this study can be divided into two groups: dynamic atmospheric forcing and static model parameters. The first group of inputs includes a subset from the North American Land Data Assimilation System (NLDAS) project. The second group of inputs includes two types of model parameters: surface information (i.e. topographic slopes and land cover) and subsurface information (i.e. soil, geology and bedrock types and their characteristics).
The NLDAS project is a collaboration between NASA, NOAA and a group of universities to provide high accuracy and consistent datasets for a wide variety of hydrologic studies. Studies modeling streamflow45,46, soil moisture47,48 and snow49,50 have been using NLDAS as inputs. Thus, we decided to use a subset of the NLDAS dataset for this simulation which includes eight variables, namely, precipitation, air temperature, short-wave radiation, long-wave radiation, east-west wind speed, south-north wind speed, atmospheric pressure and specific humidity. The NLDAS has two versions which were used in this study: NLDAS-151,52 which spans from 1983 to 2002 and NLDAS-245,53 which spans from 2003 to 2019. Major improvements from NLDAS-1 to NLDAS-2 include additional measurement sources of precipitation such as gauge (Climate Prediction Center - CPC product), radar (National Centers for Environmental Prediction-NCEP 4-km hourly Doppler radar Stage II) and satellite (CPC MORPHing technique CMORPH)45.
The model parameters consist of two types: surface and subsurface. The surface parameters, topographic slopes and land cover, were computed as follows. Topographic slopes were calculated using the Priority Flow toolbox54 with an elevation input from the Hydrological data and maps based on Shuttle Elevation Derivatives at multiple Scales (HydroSHEDS). Land cover information was obtained from the National Land Cover Database (NLCD) at 30-m resolution. The obtained land cover dataset was then upscaled to model resolution at 1-km. Land cover values are based on the International Geosphere-Biosphere Program (IGBP) classifications.
The subsurface of the ParFlow domain consists of four soil layers at the top and one geology layer at the bottom. Categories for the soil units were obtained from the Soil Survey Geographic Database (SSURGO; https://websoilsurvey.sc.egov.usda.gov) and hydrogeologic categories were obtained from a global permeability map developed by Gleeson et al.55. Parameters such as saturated hydraulic conductivity and van Genuchten relationships of those soil and hydrogeology layers were obtained from Schaap and Leij56. More details about the subsurface parameters and configurations can be found in Condon and Maxwell57, and Maxwell et al.36.
A model spinup is the initialization process used to bring the system into a more realistic set of initial conditions when the true starting point of the model (for example, the pressures everywhere in the UCRB) is unknown. This starting point is particularly important for groundwater systems which take longer time to evolve than the surface systems.
In preparation for the 37-year simulation, we completed a model spinup in two steps. First, potential recharge (calculated as Precipitation Minus Evapotranspiration (PME)) was applied to the model until the change in subsurface storage was less than 3% of the total storage. The potential recharge PME was derived from the average precipitation and evapotranspiration products for the period between 1950 and 2000 by Maurer et al.58. For the second step, the hourly atmospheric forcing for the initial water year (1983) was repeatedly applied to bring the model into quasi-equilibrium.
The spinup process described above provided an initial pressure model of the UCRB for the 37-year simulation. To do this, we simulated each year for a time period spanning from October to the end of September next year, often known as the Water Year (WY) which better matches with the precipitation cycle that occurs in late autumn. All simulations were executed on the Cheyenne supercomputer operated by the National Center for Atmospheric Research (NCAR). On average, one WY simulation used about 6,100 cores hours, which resulted in about a day of wall-clock time given parallel computing and batch submission processes. The entire 37-year simulation used approximately 220,000 core hours of computing time, spanning about 1.2 months of wall-clock time.
A comprehensive comparison between model simulation results, observations and remotely sensed products was conducted. A summary of each dataset is provided below.
Streamflow observations were compiled from the USGS Water Data web service. Since this was a pre-development simulation (i.e. excluding surface water management and groundwater pumping), we filtered out observations from stations that are clearly affected by anthropogenic activities. Although small drainage area basins can have water withdrawals and irrigation ditches, the effect of anthropogenic activities on these basins are much less compared to larger drainage area basins, especially in monthly or annual scales59,60. Thus, we defined a drainage area threshold of 500 km2; stations whose drainage areas are larger than the threshold were then manually inspected. For example, we removed the station at Lees Ferry (drainage area: 289,560 km2) located right after the Glen Canyon Dam.
In total, there were a total of 602 UGSG stream stations in the UCRB with observations from 1983 to 2019 (shown as blue stars in Fig.1). Eight stations situated at the outlet of watersheds that represent medium to large drainage areas were used for comparison demonstration in Fig.2. These stations were: Green River at Green River (116,160 km2), Colorado River near Cisco (62,419 km2), San Juan River near Bluff (59,570 km2), Yampa River at Deerlodge (20,541 km2), Gunnison River near Grand Junction (20,520 km2), Colorado River below Glenwood Springs (15,576 km2), San Juan River at Four Corners (37,813 km2), and East River at Almont (749 km2).
(a) Location and type of observations used to compare observations and data products to model simulations, (b) Locations of the UCRB and its major sub-basins.
Plots of simulated and observed streamflow for eight gages within the UCRB. Streamflow predicted by ParFlow is shown using the red line while streamflow predicted by the natural flow model is shown in blue.
In addition to the USGS stream observations, we also used the Bureau of Reclamation natural flow dataset61,62,63,64,65,66 which is available for 20 stations in the UCRB from 1906 to 2020. The natural flow was constructed by combining history gauge flow with consumptive uses and losses64 and reservoir regulation66. The dataset has been used in several other studies including drought analysis in the UCRB67,68,69.
The next observational set was the USGS groundwater database (https://waterdata.usgs.gov/nwis/gw). All data from wells that have at least two months of observations during the period between 1983 and 2019 were used for comparison here. Measurements that did not pass the USGS quality control (i.e. flagged for potential measurement inconsistency or negative outlier values) were filtered out. Also, wells with water table depths (WTD) below 52m (i.e. below the depth for the center of the bottom grid cell in the model domain) were removed. A total of 36 wells were used to compare water table levels after this filtering process (shown as blue hexagons in Fig.1).
In addition to these temporal groundwater observations, there are a total of 3,865 well locations in the UCRB from the Fan et al.70 water table observations. Fan et al.70 compiled this water table observational dataset by calculating the average WTD for USGS sites between 1927 and 2009. While Fan et al.70 noted that about 90% of the wells have only one observation at different times, they found that wells whose WTD were above 20m aligned well with their global simulated WTD. Based on these findings, their dataset was determined to be an appropriate resource to validate model performance.
We also employed a derived snow cover extent from MODIS for comparison to simulations. The daily cloud-free snow cover dataset71 was developed via a series of mitigated cloud filters and the Variational Interpolation algorithm to the MODIS-Snow Cover Area (SCA) Daily (MOD10C1 and MYD10C1) version 6 product12,72. The product has been proved to effectively capture the dynamic changes of snow from 2000 to 2017 with the average of Probability Of Detection and False Alarm Ratio are 0.955 and 0.179, respectively71. The cloud-free products spatial and temporal resolutions are 0.05 and daily, respectively.
The Snow Water Equivalent (SWE) data was obtained from the Snow Telemetry (SNOTEL) network. There was a total of 133 SNOTEL stations used for comparison. SNOTEL stations have an average elevation of nearly 2,900m with the station in the highest elevation of more than 3,500m at the Italian Creek, CO.
Total water storage (TWS) change measured by the Gravity Recovery and Climate Experiment (GRACE) mission was used to compare with simulated TWS. Launched in 2002, GRACE estimates monthly changes in terrestrial water storages globally based satellite location (http://www2.csr.utexas.edu/grace/RL05_mascons.html). The data used in this study, CSR Release-06 GRACE Mascon Solutions, was released from the Center for Space Research (CSR), the University of Texas at Austin. Mass fluxes (measured in terms of mass concentrationmascon) derived directly from raw GRACE data often have north-south stripes due to modeling errors, measurement noise and observability issues73. To decrease the uncertainty in these mass fluxes, a series of filters were applied to GRACE gravity information in a 1 geodesic grid domain. Those filters include (1) mascon geodesic grid correction, (2) Glacial Isostatic Adjustment (GIA) correction, (3) Degree-1 coefficients (Geocenter) corrections and (4) C20 (degree 2 order 0) replacement. The final total water storage change is obtained by subtracting the mean from 2003 to 2009. Please note that GRACE measures the storage anomaly at approximately monthly intervals, but it does not measure total quantity of water stored. GRACE storage anomalies were available monthly from April 2002 to June 2017 at the time of this analysis with a spatial resolution of 1. Given the relatively low spatial resolution, Scalon et al.19 suggested to use GRACE only for watersheds which have areas greater than 100,000 km2 (The area of the UCRB is approximately 280,000 km2). Uncertainty analysis for CSR RL06 is not available yet, however, uncertainty value suggested for the RL05 version is roughly 2 cm73.
Four stations from the Community Collaborative Rain, Hail and Snow Network (CoCoRaHS) provide potential ET estimates. Additionally, the AmeriFlux station at Niwot Ridge, CO (US-NR174,75) provides latent heat observations (which can be translated directly to ET by dividing to a unit of latent heat of evaporation of water 2256kJ/kg). While ET stations are scarce in the UCRB, because of the diversity in their locations and temporal coverage, we feel that those observations still play a crucial role in the simulation evaluation. First is the range of elevation and land cover that those stations represent: two high elevation stations located at Niwot Ridge, CO (3050m) and Crested Butte, CO (2912m), one moderate elevation station located at Carbondale, CO (1887 m) and two low altitude stations located at Grand Junction, CO (1428m) and Castle Valley, UT (1464m). With respect to land cover, stations at Carbondale, CO and Crested Butte, CO are located in evergreen forest; stations at Niwot Ridge, CO and Grand Junction, CO are located in deciduous forest; the station at Castle Valley, UT is located in shrubland. We compared ParFlow simulations with CoCoRaHs data from June 2012 to present and AmeriFlux data from 1999 to present, respectively.
In addition to ET estimated from stations, we also used remotely sensed ET from Simplified Energy Balance (SSEBop) MODIS product to compare with simulated ET. The SSEBop model provides daily and 1-km ET estimations for the whole UCRB from 2000 to the end of the validation period and has been shown to be reliable in various regions76,77. Senay et al.78 simulates ET in SSEBop by using pre-defined hot and cold boundary conditions. Each pixel is assigned a hot and cold boundary values based on maximum air temperature and differential temperature. Based on land surface temperature (K) obtained from MODIS images, ET fraction is computed and then multiplied with a short grass reference (mm.d1) and a scaling coefficient to produce final ET79.
Lastly, ground temperature data from the National Oceanic and Atmospheric Administration (NOAA) Regional Climate Center (RCC) was used for comparison. The NOAA RCC data consist of observations compiled from the Global Historical Climatology Network (GHCN80) database, and other federal and regional agencies. There are a total of 490 stations that monitor temperature. These stations are well distributed over the UCRB (green diamonds in Fig.1).
Two of the remotely sensed products used for comparison, namely, MODIS-SCA and GRACE, are downscaled to match with the datasets spatial resolution of 1-km and geographic projection (specified at Table1). Specifically, MODIS-SCA and GRACE data were downscaled from 5-km and 100-km, respectively, to 1-km using the Nearest Neighbor algorithm.
For timeseries data, we primarily used two metrics to evaluate model performance, Spearmans Rho and Total Absolute Relative Bias. As explained in Maxwell and Condon38, plotting these two metrics against one another produces a figure that will concisely describe a models ability to reproduce appropriate timing and magnitude of flows. We hereafter refer to this type of figure as a Condon Diagram. Spearmans Rho was used to assess the differences in the simulated and observed variables timing while the relative bias measures differences in their volumes. If simulations are closed to observation, we expect high Speamans Rho value and low relative bias value. Spearmans Rho is computed as:
$$srho=1-frac{6{sum }_{i=1}^{n}{d}_{i}^{2}}{n({n}^{2}-1)}$$
(1)
where di is the difference in the independent ranking for the simulated and observed values at i time step, n is the number of values in each time series. The Total Absolute Relative Bias is calculated as:
$$bias=frac{left|{sum }_{i=1}^{n}{S}_{i}-{sum }_{i=1}^{n}{O}_{i}right|}{{sum }_{i=1}^{n}{O}_{i}}$$
(2)
where S and O are simulated and observed timeseries, respectively, and n is the number of values in each time series.
Additionally, we used the Kling-Gupta Efficiency (KGE81,82) to evaluate the streamflow performance. The KGE coefficient is proposed by Gupta et al.81 to achieve a more balanced evaluation of simulated mean flow, flow variability and daily correlation than the traditional Nash-Sutcliffe efficiency (NSE83)84,85.
For spatial data, we used two categorical validation indices, namely, Probability of Detection and False Alarm Ratio:
$$POD=frac{Hit}{Hit+Miss}$$
(3)
$$FAR=frac{False}{Hit+False}$$
(4)
where Hit is grid where both simulated and observed events occurred; Miss is grid cell where the observed event occurred but the simulated one did not; False is a grid cell where the simulated event occurred but the observed one did not.
Originally posted here:
A hydrological simulation dataset of the Upper Colorado River Basin from 1983 to 2019 | Scientific Data - Nature.com