Calibration-constrained Monte Carlo and additional methods

Groundwater model predictions can be affected by different sources of uncertainty, potentially leading to important repercussions in the decision-making process that follows them. Modelling workflows that rely exclusively on calibration (meaning history matching) and local sensitivity analysis for data assimilation usually give limited insight into the effect that these sources of uncertainty have on the model predictions. This is particularly important when (1) the future behaviour of the system is expected to be considerably different than the existing and past behaviour, (2) the observation data is not directly related to the predictive variables of interest, and (3) the amount, quality, distribution, and type of observation data leads to a strong non-unique response for specific results of interest.

Among the most important sources of uncertainty, it is worth mentioning:

the uncertainty associated to hydraulic and transport parameters due to the heterogeneity of geological materials and potential lack of representative field information at the required scales,
the structural uncertainty associated to the hydrogeological units, basement and faults/dykes locations and geometries,
the uncertainty associated to the recharge and discharge processes, including climate,
the errors and inadequate representativity of field observations, considering the scales of measurement, heterogeneity, and interest of the hydrogeological study,
and the numerical assumptions and limitations of the modelling software: is the assumption of (equivalent) porous media adequate in my particular study? is the modelling code adding numerical dispersion to my transport models? is the assumption of constant total stress (Δσ = 0) of the traditional groundwater flow equations adequate in models that include excavation and/or deposition of material?

Different methods exist for studying and estimating predictive uncertainty, many of them relying on some form of Bayesian analysis: there is prior knowledge of the system expressed through probability distributions (e.g. hydraulic conductivities estimated from literature and field tests, that vary between Kmin and Kmax with a mode of Km, following a triangular distribution), which is improved by adjusting the model to match existing observations (e.g. heads, concentrations, flows...), leading to posterior knowledge of the system (i.e. a subset of the original K values that reproduces the historic behaviour of the system while being consistent with the literature values and field data).

The most direct implementation of Bayesian analysis in groundwater modelling is usually called calibration-constrained Monte Carlo (or ABC rejection), with an example presented in Figure 1. In this method, selected model parameters (K, S…) are characterized by probability distributions (prior knowledge; purple histogram), which are sampled and evaluated in the model for several runs (realizations), leading to a set of results that can be classified based on user-defined calibration criteria (posterior knowledge; green histogram). The accepted realizations are used for predictive purposes. In the example of Figure 1, the hydraulic conductivity was originally estimated to follow a PERT distribution with a minimum of 2 m/d, a mode of 5 m/d, and a maximum of 12 m/d (prior knowledge; purple histogram), ending up with values that vary between 3 m/d and 11 m/d with a mode of 7.5 m/d (posterior knowledge; green histogram).

GSMOD, Monte Carlo, stochastic, mining, hydrogeology, groundwater, software, calibration, free, simulation, transport, MODFLOW, MT3D, FEFLOW, slope stability, finite difference, USG, USGS, hydrograph, probability, histogram, python, matplotlib, water resources, climate change, geology, geosciences, probability, concentrations, chemistry, geochemistry, demo, predictions, model, modeling, modelling, flopy, leapfrog, system, risk, evaluation, null space, uncertainty, analysis, heterogeneity, hydrology, dynamic, coupling, numpy, assessment, interactive, parameter, distribution, thomas, booth, pest, modeling

Figure 1: Results from a calibration-constrained Monte Carlo analysis using GSMOD (10.000 model runs).

Calibration-constrained Monte Carlo methods offer many advantages as an uncertainty analysis tool, including:

the use of any combination of probability distribution functions to represent the prior knowledge of the system (e.g. uniform, normal, triangular, log-normal, etc.),
no assumptions with respect to the resulting posterior distribution, which can end up having any shape, such as uniform, with one peak (unimodal) or several peaks (multimodal),
no artificial tendency to underestimate regions of extreme parameter values, or relatively low (but still relevant) posterior probabilities (left and right margins of the histograms),
and no relationship between model runs, which means that all realizations can be run independently from each other using parallel computing.

However, the same characteristics that make calibration-constrained Monte Carlo appealing, lead to an important disadvantage: the method tends to generate a low amount of realizations that fit the historic behaviour of the system (e.g. acceptable NRMS) in relation to the total number of model runs, requiring important computational times. This happens because the method does not make any particular effort to find the realizations that best fit the historic observations. As an example, only 3% of the model runs in Figure 1 comply with the defined NRMS criteria. Although several approaches exist for reducing the associated computational times (e.g. use of parallel computing, grid coarsening, narrowing the prior distributions and use of metamodels), calibration-constrained Monte Carlo methods could be unfeasible in many projects.

Different uncertainty analysis methods try to improve the efficiency of calibration-constrained Monte Carlo approaches while aiming to address a similar objective, such as the popular Null Space Monte Carlo method available in PEST (Doherty, 2015) and the also popular Iterative Ensemble Smoother available in PEST++ (White et al., 2020). More information about these methods can be found in the series of webinars Decision-Support Modelling Principles and Practice from the Groundwater Modelling Decision Support Initiative (link). It is worth mentioning at this point that GSMOD is in the process of adding compatibility with PEST++, to offer users the wide variety of tools associated to this popular software package.

Additional to this, a new uncertainty analysis method is being developed specifically for GSMOD. Results can be seen in Figure 2, which shows its application to the same model of Figure 1, increasing the original success rate of 3% to a value of 51%, maintaining good sampling of the parameter space, as seen in Figure 3. It is important to mention that no method can achieve a success rate of 100%, unless the acceptance criteria is considerably high (e.g. high NRMS limit), which would make the method a full Monte Carlo. This new methodology aims to offer a good compromise between Monte Carlo and mathematical optimization approaches, with the following specific advantages:

the use of any probability distribution functions to represent the prior knowledge of the system, and no assumptions with respect to the resulting posterior distribution,
sampling extreme parameter values is not necessarily a problem,
algorithm tries to even out over and underestimated regions of the parameter space as the realizations progress,
and no realizations spent on the calculation of a Jacobian, which tends to make it compatible with models that rely on an important number of parameters.

Figure 2: Comparison between the calibration-constrained Monte Carlo results (top chart) and the new uncertainty analysis method (bottom chart), for the same number of total runs (1000); accepted runs in dark blue, rejected runs in light blue. Right plot indicates NRMS of each model run from 1 to 1000 with the same colour legend as the hydrographs.

Figure 3: Comparison of the posterior probability distribution for K, generated by the calibration-constrained Monte Carlo method (top chart) and the new uncertainty analysis method (bottom chart), for the same number of total runs (1000).

Independent of the chosen uncertainty analysis method, what it is important to understand is that trying to estimate predictive uncertainty is key for the modelling process. A rigorous quantification must be done with more or less sophisticated methods, such as the ones mentioned throughout the article; however, even these methods are meaningless without a good (and very likely multidisciplinary) hydrogeological understanding of the system, which must rely on adequate field information. They are also meaningless if unethical practices are present throughout the modelling process. Needless to say, a good hydrogeological understanding of the system in conjunction with appropriate modelling practices can many times lead to successful results even without the use of advanced uncertainty analysis methods, as long as the modelling objectives are clear and the concepts of uncertainty and risk-benefit are present throughout the development of the project, such as in tier-based risk assessments.

References

Doherty JE (2015). Calibration and Uncertainty Analysis for Complex Environmental Models. Watermark Numerical Computing, Brisbane, Australia.

White JT, Hunt RJ, Doherty JE, and Fienen MN (2020). PEST++ version 5, a parameter estimation and uncertainty analysis software suite optimized for large environmental models. U.S. Geological Survey Techniques and Methods Report, in press.

Hydrogeology and groundwater modelling

Calibration-constrained Monte Carlo and additional methods

Contact form

Follow by email