# Subset simulation

Subset simulation is a method used in reliability engineering to compute small (i.e., rare event) failure probabilities encountered in engineering systems. The basic idea is to express a small failure probability as a product of larger conditional probabilities by introducing intermediate failure events. This conceptually converts the original rare event problem into a series of frequent event problems that are easier to solve. In the actual implementation, samples conditional on intermediate failure events are adaptively generated to gradually populate from the frequent to rare event region. These 'conditional samples' provide information for estimating the complementary cumulative distribution function (CCDF) of the quantity of interest (that governs failure), covering the high as well as the low probability regions. They can also be used for investigating the cause and consequence of failure events. The generation of conditional samples is not trivial but can be performed efficiently using Markov chain Monte Carlo (MCMC).

Subset Simulation takes the relationship between the (input) random variables and the (output) response quantity of interest as a 'black box'. This can be attractive for complex systems where it is difficult to use other variance reduction or rare event sampling techniques that require prior information about the system behaviour. For problems where it is possible to incorporate prior information into the reliability algorithm, it is often more efficient to use other variance reduction techniques such as importance sampling. It has been shown that subset simulation is more efficient than traditional Monte Carlo simulation, but less efficient than line sampling, when applied to a fracture mechanics test problem.

## Basic idea

Let X be a vector of random variables and Y = h(X) be a scalar (output) response quantity of interest for which the failure probability $P(F)=P(Y>b)$  is to be determined. Each evaluation of h(·) is expensive and so it should be avoided if possible. Using direct Monte Carlo methods one can generate i.i.d. (independent and identically distributed) samples of X and then estimate P(F) simply as the fraction of samples with Y > b. However this is not efficient when P(F) is small because most samples will not fail (i.e., with Y ≤ b) and in many cases an estimate of 0 results. As a rule of thumb for small P(F) one requires 10 failed samples to estimate P(F) with a coefficient of variation of 30% (a moderate requirement). For example, 10000 i.i.d. samples, and hence evaluations of h(·), would be required for such an estimate if P(F) = 0.001.

Subset simulation attempts to convert a rare event problem into more frequent ones. Let $b_{1}  be an increasing sequence of intermediate threshold levels. From the basic property of conditional probability,

{\begin{aligned}P(Y>b)&=P(Y>b_{m}\mid Y>b_{m-1})P(Y>b_{m-1})\\&=P(Y>b_{m}\mid Y>b_{m-1})P(Y>b_{m-1}\mid Y>b_{m-2})P(Y>b_{m-2})\\&=\cdots \\&=P(Y>b_{m}\mid Y>b_{m-1})P(Y>b_{m-1}\mid Y>b_{m-2})\cdots P(Y>b_{2}\mid Y>b_{1})P(Y>b_{1})\end{aligned}}

The 'raw idea' of subset simulation is to estimate P(F) by estimating $P(Y>b_{1})$  and the conditional probabilities $P(Y>b_{i}\mid Y>b_{i-1})$  for $i=2,\ldots ,m$ , anticipating efficiency gain when these probabilities are not small. To implement this idea there are two basic issues:

1. Estimating the conditional probabilities by means of simulation requires the efficient generation of samples of X conditional on the intermediate failure events, i.e., the conditional samples. This is generally non-trivial.
2. The intermediate threshold levels $b_{i}$  should be chosen so that the intermediate probabilities are not too small (otherwise ending up with rare event problem again) but not too large (otherwise requiring too many levels to reach the target event). However, this requires information of the CCDF, which is the target to be estimated.

In the standard algorithm of subset simulation the first issue is resolved by using Markov chain Monte Carlo. More generic and flexible version of the simulation algorithms not based on Markov chain Monte Carlo have been recently developed. The second issue is resolved by choosing the intermediate threshold levels {bi} adaptively using samples from the last simulation level. As a result, subset simulation in fact produces a set of estimates for b that corresponds to different fixed values of pP(Y > b), rather than estimates of probabilities for fixed threshold values.

There are a number of variations of subset simulation used in different contexts in applied probability and stochastic operations research For example, in some variations the simulation effort to estimate each conditional probability P(Y > bi | Y > bi−1) (i = 2, ..., m) may not be fixed prior to the simulation, but may be random, similar to the splitting method in rare-event probability estimation.  These versions of subset simulation can also be used to approximately sample from the distribution of X given the failure of the system (that is, conditional on the event $\{Y>b\}$ ). In that case, the relative variance of the (random) number of particles in the final level $m$  can be used to bound the sampling error as measured by the total variation distance of probability measures.