Monte Carlo Simulations
Posted by D Nathan Meehan November 21, 2011

The concept of using simulations to estimate variability and risk has evolved in acceptance and popularity over my career. An understanding of the concepts involved is so fundamental for reservoir engineers that I have chose to illustrate those concepts very simply. A number of excellent references are given for further study. Commercial software is available that can be used to assist in performing the calculations; these tools can generally be adapted to complex problems relatively easily. The most difficult and most important part of the process is developing credible, realistic distributions incorporating as much data as possible. Practicing Monte Carlo simulations while ignoring available data (or failing to make the effort to obtain and use such data) is not just sloppy but potentially costly and misleading.

Monte Carlo simulations refer to computational methods that rely on random (or nearly random) distribution samplings of independent variables to repeatedly solve complex equations. A repeated series of these calculations results in a distribution of answers that reflects the range of solutions possible if the input distributions accurately reflect the variability of the independent variables. If one or more of the independent variables is actually correlated (positively or negatively) to another variable, modified techniques must be employed to handle such partial dependencies.

Let’s consider a simple example in which two six-sided die are rolled simultaneously and the numbers are added together. An integer ranging from 1 through 6 appear on each face of each roll of each die and occur randomly and independently of each other. It is obvious that the integer results 2 through 12 are the only possible rolls. We know that a “2” occurs only when each die rolls a “1.” The probability of a 2 is then (1/6)*(1/6)= 0.027778. If we rolled these dice 300 times, then (on average) we would expect the following results:

Obviously there can only be an integer number of occurrences of a specific roll. Although we know this situation analytically, let’s use Monte Carlo simulations to do the same exercise. While this can be done many ways, entering the EXCEL function =int(rand()*6+1) will result in a pseudo-random integer that closely approximates the roll of one die. Doing this in two columns and adding the results is very straightforward EXCEL coding but will generate new distributions of answers with every recalculation. Thus if the reader repeats this experiment, the answers obtained should be approximately similar but will not precisely reproduce the results that follow. In this case we “rolled the dice” 300 times for two independent variables and added them together. The input distribution for each variable was a probability density function with the probability (1/6) associated with each outcome (1 through 6). The output was the simple sum of the two sampled distributions. The following figure shows the results of three trials of 300 rolls each along with the theoretical distribution. While the three trials cluster about the theoretical answers, some anomalies are noticed. In the first 300 rolls there were significantly more rolls of “9” than “8” which is (a little) surprising.

The solution to this problem is simply to do more rolls. If instead of 300 rolls we chose to do 5000 rolls, the results show very good agreement with the theoretical solution.

This agreement would further improve if we did 10,000 or more simulations. The mean, mode and median of the resulting distribution is 7 and any reader familiar with the game of “craps” and the house odds in gambling establishments can show that the entire value of each bet results in a negative EMV for the participants.  This does not mean that all participants lose money any more than does our prior illustration with the roulette wheel. But for repeated play, the EMV is negative, independent of strategy.

This exercise has been very simple but extending it slightly can illustrate another important point. What if instead of adding the two die we multiply them together? We know that the output will vary between integers 1 and 36 and will not contain each of the integers in between. The probability of 1 and 36 will remain 0.027778 as in the extreme values. The resulting analytic solution and a 5000 roll simulation results in the following:

In this case we have used the cumulative density function rather than the probability density function. It is also common to display the above graph not in number of realizations, but in the cumulative probability that it is less than or equal to a certain value as follows.

This graph has the advantage that the ordinate can be understood as probabilities and that we can easily note that the median value of this exercise is between 9 and 10, the mean is approximately 12.25 and the distribution is bi-modal. This characteristic in which the mean is larger than the median is common in Monte Carlo simulations of hydrocarbon recoveries and is also noted in distributions of reserve sizes. While some authors like to invoke mathematical explanations, it is clear enough that many things are not normally distributed, but are distributed in a way that is skewed. Income, wealth and the height of adults are distributed in this way; the mean exceeds the median. The mean can be heavily influenced by a relatively small number of large positive values.

In the oil and gas cases, Monte Carlo simulations can be used to estimate hydrocarbons in place, recoverable hydrocarbons, number of wells required and future rates, capital and operating expenses and future net cash flows. They form part of the process in developing bid strategies and can be used in almost any decision under uncertain conditions. In the simplest case, we can calculate the OIP in a potential prospect. Oil in place (N) is calculated (in oilfield units) as:

Examining the variables used to calculate the oil-in-place, the area and net thickness for a potential prospect are obviously the average over the field or drainage area and the product of the two could be replaced with an estimate of the bulk volume of net pay. The oil saturation and porosity must also be averages, but in many cases these are not independent as lower porosities tend to be correlated with lower oil saturations. How do we get the appropriate distributions for these variables? This turns out to be nontrivial and simply guessing a minimum, maximum and most likely value and using a triangular distribution may or may not be better than a single value estimate. Best practice is the careful analysis of porosity data and the distribution of declustered data.

Declustering is absolutely essential due to sample bias. Offset and trend production may be disproportionately from wells in the most productive fields. These data tend to skew the distribution to the most heavily sampled data (this example may be the highest porosity data). As this text is not meant to be a primer on Monte Carlo simulations, the details of the methodologies employed in developing probability distributions, their relative merits, means of handling partial dependencies, etc. are not included here. Similarly, there are excellent commercial software tools available to handle such evaluations.  At the Elsevier website we have added the spreadsheet output for an evaluation of a single prospect production sharing contract that uses the following sets of assumptions. Note that only the spreadsheet software is used in this particular example. This example is not limited to the calculation of volumes but includes the number of required wells, their rates and well performance over time and the economic calculations required.

## Responses

Greg Hedger says:

“Declustering is absolutely essential due to sample bias. Offset and trend production may be disproportionately from wells in the most productive fields. These data tend to skew the distribution to the most heavily sampled data (this example may be the highest porosity data).”

This is easily proven by filtering out the top 10% well producers from the input dataset.

It’s a very small point, but I would caution you that pseudo-random algorithms, such as that applied by the Excel rand() fn, carry innate weaknesses. In point of fact, the sequence of random numbers will eventually repeat after a very long cycle. Sample size factors naturaally heavily into the accuracy of the output (regression toward the mean as per your 5000 vs. 300 rolls example).

Thank you for this informative blog post!