# Subset simulation

Subset simulation[1] is a method used in reliability engineering to compute small (i.e., rare event) failure probabilities encountered in engineering systems. The basic idea is to express a small failure probability as a product of larger conditional probabilities by introducing intermediate failure events. This conceptually converts the original rare event problem into a series of frequent event problems that are easier to solve. In the actual implementation, samples conditional on intermediate failure events are adaptively generated to gradually populate from the frequent to rare event region. These 'conditional samples' provide information for estimating the complementary cumulative distribution function (CCDF) of the quantity of interest (that governs failure), covering the high as well as the low probability regions. They can also be used for investigating the cause and consequence of failure events. The generation of conditional samples is not trivial but can be performed efficiently using Markov chain Monte Carlo (MCMC).

Subset Simulation takes the relationship between the (input) random variables and the (output) response quantity of interest as a 'black box'. This can be attractive for complex systems where it is difficult to use other variance reduction or rare event sampling techniques that require prior information about the system behaviour. For problems where it is possible to incorporate prior information into the reliability algorithm, it is often more efficient to use other variance reduction techniques such as importance sampling. It has been shown that subset simulation is more efficient than traditional Monte Carlo simulation, but less efficient than line sampling, when applied to a fracture mechanics test problem.[2]

## Basic idea

Let X be a vector of random variables and Y = h(X) be a scalar (output) response quantity of interest for which the failure probability ${\displaystyle P(F)=P(Y>b)}$  is to be determined. Each evaluation of h(·) is expensive and so it should be avoided if possible. Using direct Monte Carlo methods one can generate i.i.d. (independent and identically distributed) samples of X and then estimate P(F) simply as the fraction of samples with Y > b. However this is not efficient when P(F) is small because most samples will not fail (i.e., with Y ≤ b) and in many cases an estimate of 0 results. As a rule of thumb for small P(F) one requires 10 failed samples to estimate P(F) with a coefficient of variation of 30% (a moderate requirement). For example, 10000 i.i.d. samples, and hence evaluations of h(·), would be required for such an estimate if P(F) = 0.001.

Subset simulation attempts to convert a rare event problem into more frequent ones. Let ${\displaystyle b_{1}  be an increasing sequence of intermediate threshold levels. From the basic property of conditional probability,

{\displaystyle {\begin{aligned}P(Y>b)&=P(Y>b_{m}\mid Y>b_{m-1})P(Y>b_{m-1})\\&=P(Y>b_{m}\mid Y>b_{m-1})P(Y>b_{m-1}\mid Y>b_{m-2})P(Y>b_{m-2})\\&=\cdots \\&=P(Y>b_{m}\mid Y>b_{m-1})P(Y>b_{m-1}\mid Y>b_{m-2})\cdots P(Y>b_{2}\mid Y>b_{1})P(Y>b_{1})\end{aligned}}}

The 'raw idea' of subset simulation is to estimate P(F) by estimating ${\displaystyle P(Y>b_{1})}$  and the conditional probabilities ${\displaystyle P(Y>b_{i}\mid Y>b_{i-1})}$  for ${\displaystyle i=2,\ldots ,m}$ , anticipating efficiency gain when these probabilities are not small. To implement this idea there are two basic issues:

1. Estimating the conditional probabilities by means of simulation requires the efficient generation of samples of X conditional on the intermediate failure events, i.e., the conditional samples. This is generally non-trivial.
2. The intermediate threshold levels ${\displaystyle b_{i}}$  should be chosen so that the intermediate probabilities are not too small (otherwise ending up with rare event problem again) but not too large (otherwise requiring too many levels to reach the target event). However, this requires information of the CCDF, which is the target to be estimated.

In the standard algorithm of subset simulation the first issue is resolved by using Markov chain Monte Carlo.[3] More generic and flexible version of the simulation algorithms not based on Markov chain Monte Carlo have been recently developed.[4] The second issue is resolved by choosing the intermediate threshold levels {bi} adaptively using samples from the last simulation level. As a result, subset simulation in fact produces a set of estimates for b that corresponds to different fixed values of pP(Y > b), rather than estimates of probabilities for fixed threshold values.

There are a number of variations of subset simulation used in different contexts in applied probability and stochastic operations research[5][6] For example, in some variations the simulation effort to estimate each conditional probability P(Y > bi | Y > bi−1) (i = 2, ..., m) may not be fixed prior to the simulation, but may be random, similar to the splitting method in rare-event probability estimation. [7] These versions of subset simulation can also be used to approximately sample from the distribution of X given the failure of the system (that is, conditional on the event ${\displaystyle \{Y>b\}}$ ). In that case, the relative variance of the (random) number of particles in the final level ${\displaystyle m}$  can be used to bound the sampling error as measured by the total variation distance of probability measures.[8]

## Notes

• See Au & Wang[9] for an introductory coverage of subset simulation and its application to engineering risk analysis.
• Schuëller & Pradlwarter[10] reports the performance of Subset Simulation (and other variance reduction techniques) in a set of stochastic mechanics benchmark problems.
• Chapter 4 of Phoon [11] discusses the application of subset simulation (and other Monte Carlo methods) to geotechnical engineering problems.
• Zio & Pedroni[12] discusses the application of subset simulation (and other methods) to a problem in nuclear engineering.

## References

1. ^ Au, S.K.; Beck, James L. (October 2001). "Estimation of small failure probabilities in high dimensions by subset simulation". Probabilistic Engineering Mechanics. 16 (4): 263–277. CiteSeerX 10.1.1.131.1941. doi:10.1016/S0266-8920(01)00019-4.
2. ^ Zio, E; Pedroni, N (2009). "Subset simulation and line sampling for advanced Monte Carlo reliability analysis". Reliability, Risk, and Safety (PDF). doi:10.1201/9780203859759.ch94. ISBN 978-0-415-55509-8. S2CID 9845287.
3. ^ Au, Siu-Kui (2016). "On MCMC algorithm for Subset Simulation". Probabilistic Engineering Mechanics. 43: 117–120. doi:10.1016/j.probengmech.2015.12.003.
4. ^ Au, Siu-Kui; Patelli, Edoardo (2016). "Rare event simulation in finite-infinite dimensional space" (PDF). Reliability Engineering & System Safety. 148: 67–77. doi:10.1016/j.ress.2015.11.012.
5. ^ Villén-Altamirano, Manuel; Villén-Altamirano, José (1994). "Restart: a straightforward method for fast simulation of rare events". Written at San Diego, CA, USA. Proceedings of the 26th Winter simulation conference. WSC '94. Orlando, Florida, United States: Society for Computer Simulation International. pp. 282–289. ISBN 0-7803-2109-X. acmid 194044.
6. ^ Botev, Z. I.; Kroese, D. P. (2008). "An Efficient Algorithm for Rare-event Probability Estimation, Combinatorial Optimization, and Counting". Methodology and Computing in Applied Probability. 10 (4): 471–505. CiteSeerX 10.1.1.399.7912. doi:10.1007/s11009-008-9073-7. S2CID 1147040.
7. ^ Botev, Z. I.; Kroese, D. P. (2012). "Efficient Monte Carlo simulation via the generalized splitting method". Statistics and Computing. 22 (1): 1–16. doi:10.1007/s11222-010-9201-4. S2CID 14970946.
8. ^ Botev, Z. I.; L’Ecuyer, P. (2020). "Sampling Conditionally on a Rare Event via Generalized Splitting". INFORMS Journal on Computing. arXiv:1909.03566. doi:10.1287/ijoc.2019.0936. S2CID 202540190.
9. ^ Au, S.K.; Wang, Y. (2014). Engineering Risk Assessment with Subset Simulation. Singapore: John Wiley & Sons. ISBN 978-1-118-39804-3.
10. ^ Schuëller, G.I.; Pradlwarter, H.J. (2007). "Benchmark study on reliability estimation in higher dimensions of structural systems – An overview". Structural Safety. 29 (3): 167–182. doi:10.1016/j.strusafe.2006.07.010.
11. ^ Phoon, K.K. (2008). Reliability-Based Design in Geotechnical Engineering: Computations and Applications. Singapore: Taylor & Francis. ISBN 978-0-415-39630-1.
12. ^ Zio, E.; Pedroni, N. (2011). "How to effectively compute the reliability of a thermal–hydraulic nuclear passive system". Nuclear Engineering and Design. 241: 310–327. CiteSeerX 10.1.1.636.2126. doi:10.1016/j.nucengdes.2010.10.029.