Methods for improving stochastic groundwater modelling computational times

Stochastic modelling is one of the most important techniques for studying model uncertainty. However, these types of models require considerably longer computational times than deterministic models, affecting their use in commercial projects. This is a challenge not only in hydrogeology, but in many other disciplines, making it an active area of research.

Different techniques for improving stochastic groundwater modelling computational times are explained in this article, some of them developed decades ago and some of them developed rather recently: (1) use of parallel computing, (2) trade-off between model complexity/size and running times, (3) selection of an adequate parameter sampling method, and (4) use of metamodels. Acknowledgements to Dr. Sergei Kucherenko from the Department of Chemical Engineering at Imperial College London for his recommendations on sampling methods and use of metamodels.

This article is focused on stochastic groundwater modelling with Bayesian probability quantification, which is a powerful uncertainty analysis technique, broadly explained in Figure 1. In this method, selected model parameters (K, S, recharge…) are characterized by user-defined probability distributions (known as prior distributions), which are automatically sampled and evaluated in the numerical model for several runs (realizations), leading to a set of stochastic results that can be classified based on user-defined calibration criteria (e.g. NRMS limits). The properly calibrated realizations are used for predictive purposes and for estimating the probability distributions of the parameter values that properly calibrate the model (known as posterior distributions), as shown in Figure 2.


GSMOD, Monte Carlo, stochastic, mining, hydrogeology, groundwater, software, calibration, free, simulation, transport, MODFLOW, MT3D, FEFLOW, slope stability, finite difference, USG, USGS, hydrograph, probability, histogram, python, matplotlib, water resources, climate change, geology, geosciences, probability, concentrations, chemistry, geochemistry, demo, predictions, model, modeling, modelling, flopy, leapfrog, system, risk, evaluation, null space, uncertainty, analysis, heterogeneity, hydrology, dynamic, coupling, numpy, assessment, interactive, parameter, distribution, thomas, booth, pest, modeling
Figure 1: Example of stochastic modelling with Bayesian probability quantification, performed using GSMOD


GSMOD, Monte Carlo, stochastic, mining, hydrogeology, groundwater, software, calibration, free, simulation, transport, MODFLOW, MT3D, FEFLOW, slope stability, finite difference, USG, USGS, hydrograph, probability, histogram, python, matplotlib, water resources, climate change, geology, geosciences, probability, concentrations, chemistry, geochemistry, demo, predictions, model, modeling, modelling, flopy, leapfrog, system, risk, evaluation, null space, uncertainty, analysis, heterogeneity, hydrology, dynamic, coupling, numpy, assessment, interactive, parameter, distribution, thomas, booth, pest, modeling
Figure 2: Example of prior and posterior probability distributions for a given stochastic parameter considering an NRMS < 5% for the properly calibrated models, calculated using GSMOD


Stochastic models with Bayesian probability quantification rely on fewer approximations than other stochastic modelling approaches, which makes them an easy to understand and extremely valuable uncertainty analysis tool; however, they are computationally demanding given the total number of realizations (model runs) that they require. The following numerical and computational recommendations are considered to be highly valuable for reducing the associated simulation times. The definition of an adequate stochastic modelling workflow is also important, topic discussed in a previous article (link).

1) Use of parallel computing

Stochastic modelling with Bayesian probability quantification can be run using parallel and distributed computing (i.e. parallel runs within one or several computers), leading to drastic reductions of computational times. More specifically, these types of models are considered to be “embarrassingly parallel”, with computational times that tend to be inversely proportional to the number of parallel runs (i.e. two parallel runs lead to half the computational time, and so on).

2) Trade-off between model complexity/size and running times

Complex numerical models tend to be slow and many times unstable when considering a wide range of parameter values. These characteristics make them difficult to work with when performing stochastic simulations. That said, complex but stable numerical models can still be used to perform stochastic simulations, especially if several computers are available for the use of distributed computing. It is clear, however, that fast and stable models are the better option for performing stochastic simulations, and they should be implemented if the opportunity exists.

If the only available model leads to unmanageable computational times, a convenient option is to coarsen its grid as much as possible without considerably affecting the numerical results of interest. Coarsening a grid and verifying that the associated negative effects are acceptable is a time-consuming process; however, an algorithm that automatically resamples MODFLOW and MT3D packages into a new user-defined grid will be included in future releases of GSMOD. An example is presented in Figure 3, in which various grids were evaluated for a wellfield MODFLOW model located near a salar in northern Chile. It is concluded that a much coarser grid could have been used for this particular model without considerably affecting the results of interest (e.g. NRMS, heads and balance predictions, etc.), leading to a decrease in computational time of more than 90%. It is important to notice that doubling the cell dimensions (i.e. 2*dx, 2*dy) leads to a 75% decrease in the number of cells, which very likely leads to more than a 75% decrease of computational time due to the relatively poor scalable behaviour of the most common numerical solvers. An alternative approach is to trim the model domain outside of the area of interest/influence; however, this option requires considerable work and might not always be suitable.


GSMOD, Monte Carlo, stochastic, mining, hydrogeology, groundwater, software, calibration, free, simulation, transport, MODFLOW, MT3D, FEFLOW, slope stability, finite difference, USG, USGS, hydrograph, probability, histogram, python, matplotlib, water resources, climate change, geology, geosciences, probability, concentrations, chemistry, geochemistry, demo, predictions, model, modeling, modelling, flopy, leapfrog, system, risk, evaluation, null space, uncertainty, analysis, heterogeneity, hydrology, dynamic, coupling, numpy, assessment, interactive, parameter, distribution, thomas, booth, pest, modeling
Figure 3: Decrease of computational time and associated effect on the resulting NRMS for various grids generated by a future release of GSMOD


3) Selection of an adequate parameter sampling method

One of the main ideas behind a strict stochastic modelling with Bayesian probability quantification method is to generate a set of stochastic realizations that is representative of the user-defined probability distributions (prior distributions) for all selected parameters. In other words, if the modeller claims that certain parameter (e.g. Sy) follows certain distribution (e.g. normal distribution with a mean of 10% and a standard deviation of 2%), the final set of stochastic realizations has to include model runs with values of this parameter that are representative of the defined distribution, as shown in Figure 4a. As more realizations are added (or as more samples are taken), more representative results are obtained.

The problem is more complicated for models in which several stochastic parameters are considered, which will be the case in almost all hydrogeological projects. The total number of realizations has to be representative not only of each individual probability distribution, but of the different combinations of parameter values: lower Sy values run with lower, mid, and upper K values; mid Sy values run with lower, mid, and upper K values, and so on for all combinations of the defined parameters, as shown in Figure 4b for a bivariate normally distributed-independent set of parameters (Sy HU1, K HU2). In mathematical terms, the realizations need to be representative of the multivariate probability distribution. It is concluded that evaluating unnecesary (e.g. non-influential) parameters hinders the stochastic modelling process.

Several parameter sampling methods have been developed over the years to reduce the number of realizations (or samples) needed for obtaining representative results of the user-defined probability distributions. Figure 5 compares results for the Latin Hypercube Sampling (LHS) (near-random method) and Sobol sequence sampling (quasi-random method), in relation to a purely random sampling method. It is clear how the LHS and Sobol approaches lead to more representative results than the purely random method for the same number of realizations.

All of this said, depending on the model, the stochastic results of interest and the selected stochastic parameters, it might not be necessary to strictly represent the entire multivariate probability distribution. A common practice to check that no additional realizations are required is to verify that adding new stochastic model runs has a neglectable effect on the stochastic results of interest. It is worth mentioning that other stochastic methods exist to further reduce the number of required realizations, such as the Null Space Monte Carlo approach included in PEST, and the Markov Chain Monte Carlo method.


GSMOD, Monte Carlo, stochastic, mining, hydrogeology, groundwater, software, calibration, free, simulation, transport, MODFLOW, MT3D, FEFLOW, slope stability, finite difference, USG, USGS, hydrograph, probability, histogram, python, matplotlib, water resources, climate change, geology, geosciences, probability, concentrations, chemistry, geochemistry, demo, predictions, model, modeling, modelling, flopy, leapfrog, system, risk, evaluation, null space, uncertainty, analysis, heterogeneity, hydrology, dynamic, coupling, numpy, assessment, interactive, parameter, distribution, thomas, booth, pest, modeling
Figures 4: (4a) Comparison of user-defined probability distribution for Sy against resulting Sy values for 100 and 500 realizations (or samples), and (4b) comparison of bivariate distribution Sy-K for 500 realizations (or samples) using a Sobol sequence


GSMOD, Monte Carlo, stochastic, mining, hydrogeology, groundwater, software, calibration, free, simulation, transport, MODFLOW, MT3D, FEFLOW, slope stability, finite difference, USG, USGS, hydrograph, probability, histogram, python, matplotlib, water resources, climate change, geology, geosciences, probability, concentrations, chemistry, geochemistry, demo, predictions, model, modeling, modelling, flopy, leapfrog, system, risk, evaluation, null space, uncertainty, analysis, heterogeneity, hydrology, dynamic, coupling, numpy, assessment, interactive, parameter, distribution, thomas, booth, pest, modeling
Figure 5: Comparison of fully random, Latin Hypercube (near-random) and Sobol sequence (quasi-random) sampling methods for 100 realizations (or samples), using GSMOD


4) Use of metamodels

Unmanageable stochastic computational times have led to the use of metamodels in many disciplines. Metamodels (also known as surrogate models) are defined as “models of a model” and have the advantage of being considerably faster and more stable than the original model, allowing to perform uncertainty and sensitivity analyses in shorter periods of time.

Metamodels are however approximations of the original model, reason why they need to be developed and applied with caution. Additionally, the implementation process of metamodels requires time and various runs of the original model, which needs to be considered when defining the best stochastic modelling approach. All of this said, metamodels are an extremely useful tool, which has already been used in stochastic groundwater modelling projects.