- Importance sampling
In
statistics , importance sampling is a general technique for estimating the properties of a particular distribution, while only having samples generated from a different distribution rather than the distribution of interest. Depending on the application, the term may refer to the process of sampling from this alternative distribution, the process of inference, or both.Basic theory
More formally, let be a
random variable in . Let be aprobability measure on , and some function on . Then the expectation of under can be written as:
If we have random samples , generated according to, then an empirical estimate of is
:
where for and 0 otherwise.
In that case, we can easily obtain the Monte-Carlo empirical estimate of
:
The basic idea of importance sampling is to draw from a distribution other than , say and modify the above formula to still get a consistent estimate of . A main reason for such a procedure is the potential to reduce the variance of by an appropriate choice of , hence the name importance sampling, as samples from can be more "important" for the estimation of the integral. Other reasons include difficulties to draw samples from distribution or efficiency considerations.
More formally, consideranother probability measure, , with the same support as. From the definition of the expectation given above, we have
:
where , is known asthe "importance weight" and the distribution is frequently referred to as the "sampling" or "proposal" distribution. Then, if we have random samples , generatedaccording to , a Monte Carlo estimate of follows fromthe above equation by viewing the problem as that ofestimating the expectations and .
:
where are the "normalised importance weights".
The technique is completely general and the above analysis can be repeated essentially exactly also for other choices of , for example when it represents a conditional distribution. Note that when is the uniform distribution, we are just estimating the (scaled) integral of over , so the method can also be used for estimating simple integrals.
There are two main applications of importance sampling methods which, naturally, are interrelated. While the aim of both applications is to estimate statistics of random variables, the field of probabilistic inference focuses more on the estimation of or related statistics, while the field of simulation focuses more on the choice of the distribution . Nevertheless, the basic theory and tools are identical.
Application to probabilistic inference
Such methods are frequently used to estimate posterior densities or expectations in state and/or parameter estimation problems in probabilistic models that are too hard to treat analytically, for example in Bayesian networks.
Application to simulation
Importance sampling (IS) is a
variance reduction technique that can be used in theMonte Carlo method . The idea behind IS is that certain values of the inputrandom variables in asimulation have more impact on the parameter being estimated than others. If these "important" values are emphasized by sampling more frequently, then theestimator variance can be reduced. Hence, the basic methodology in IS is to choose a distribution which "encourages" the important values. This use of "biased" distributions will result in a biased estimator if it is applied directly in the simulation. However, the simulation outputs are weighted to correct for the use of the biased distribution, and this ensures that the new IS estimator is unbiased. The weight is given by thelikelihood ratio , that is, theRadon-Nikodym derivative of the true underlying distribution with respect to the biased simulation distribution.The fundamental issue in implementing IS simulation is the choice of the biased distribution which encourages the important regions of the input variables. Choosing or designing a good biased distribution is the "art" of IS. The rewards for a good distribution can be huge run-time savings; the penalty for a bad distribution can be longer run times than for a general Monte Carlo simulation without importance sampling.
Mathematical approach
Consider estimating by simulation the probability of an event , where is a random variable with distribution and
probability density function , where prime denotesderivative . A -lengthindependent and identically distributed (i.i.d.) sequence is generated from the distribution , and the number of random variables that lie above the threshold are counted. The random variable is characterized by theBinomial distribution :
Importance sampling is concerned with the determination and use of an alternate density function (for X), usually referred to as a biasing density, for the simulation experiment. This density allows the event to occur more frequently, so the sequence lengths gets smaller for a given
estimator variance. Alternatively, for a given , use of the biasing density results in a variance smaller than that of the conventional Monte Carlo estimate. From the definition of , we can introduce as below.:
where
:
is a likelihood ratio and is referred to as the weighting function. The last equality in the above equation motivates the estimator
:
This is the IS estimator of and is unbiased. That is, the estimation procedure is to generate i.i.d. samples from and for each sample which exceeds , the estimate is incremented by the weight evaluated at the sample value. The results are averaged over trials. The variance of the IS estimator is easily shown to be
:
Now, the IS problem then focuses on finding a biasing density such that the variance of the IS estimator is less than the variance of the general Monte Carlo estimate. For some biasing density function, which minimizes the variance, and under certain conditions reduces it to zero, it is called an optimal biasing density function.
Conventional biasing methods
Although there are many kinds of biasing methods, the following two methods are most widely used in the applications of IS.
Scaling
Shifting probability mass into the event region by positive scaling of the random variable with a number greater than unity has the effect of increasing the variance (mean also) of the density function. This results in a heavier tail of the density, leading to an increase in the event probability. Scaling is probably one of the earliest biasing methods known and has been extensively used in practice. It is simple to implement and usually provides conservative simulation gains as compared to other methods.
In IS by scaling, the simulation density is chosen as the density function of the scaled random variable , where usually for tail probability estimation. By transformation,
:
and the weighting function is
:
While scaling shifts probability mass into the desired event region, it also pushes mass into the complementary region
Wikimedia Foundation. 2010.