Ziggurat algorithm

Ziggurat algorithm

The ziggurat algorithm is an algorithm to generate random numbers from a non-uniform distribution. It belongs to the class of rejection sampling algorithms and can be used for choosing values from a monotone decreasing probability distribution. It can also be applied to symmetric unimodal distributions such as the normal distribution by choosing a point from one half and then randomly choosing a half. It was developed by George Marsaglia and Wai Wan Tsang.

Like most such algorithms, it assumes an underlying source of uniformly-distributed random numbers, typically a pseudo-random number generator. In the common case (97.5% of the time for normal and exponential distributions with typical table sizes), the algorithm consists of generating a random floating-point value, a random table index, performing one table lookup, one multiply and one comparison. This is considerably faster than the two more commonly used methods to generate normally distributed random numbers, the Marsaglia polar method or the Box-Muller transform, which require at least a logarithm and a square root.

On the other hand, the ziggurat algorithm is more complex to implement and requires precomputed tables, so it is best used when large quantities of random numbers are required.

The ziggurat algorithm is so named because it covers the probability distribution with rectangular segments stacked in decreasing size order, resembling a ziggurat.

[
thumb|300px|The_Ziggurat_algorithm_used_to_generate_sample_values_with_a_normal distribution. (Only positive values are shown for simplicity.) The pink dots are initially uniform-distributed random numbers. The desired distribution function is first segmented into equal areas "A". One layer "i" is selected at random by the uniform source at the left. Then a random value from the top source is multipled by the width of the chosen layer, and the result is "x" tested to see which region of the slice it falls into with 3 possible outcomes: 1) (left, solid black region) the sample clearly under the curve and is passed directly to output, 2) (right, vertically striped region) the sample value may lie under the curve, and must be tested further. In that case, a random "y" value within the chosen layer is generated and compared to "f(x)". If less, the point is under the curve and the value "x" is output. If not, (the third case), the chosen point "x" is rejected and the algorithm is restarted from the beginning.]

Theory of operation

The ziggurat algorithm is a rejection sampling algorithm; it randomly generates a point in a distribution slightly larger than the desired distribution, then tests whether the generated point is inside the desired distribution. If not, it tries again. Given a random point underneath a probability distribution curve, its "x" coordinate is a random number with the desired distribution.

The distribution the ziggurat algorithm chooses from is made up of "n" equal-area regions; "n"−1 rectangles that cover the bulk of the desired distribution, on top of a non-rectangular base that includes the tail of the distribution.

Given a monotone decreasing probability distribution function "f"("x"), defined for all "x"≥0, the base of the ziggurat is defined as all points inside the distribution and below "y"1 = "f"("x"1). This consists of a rectangular region from (0, 0) to ("x"1, "y"1), and the (typically infinite) tail of the distribution, where "x" > "x"1 (and "y" < "y"1).

This layer has area "A". On top of this, add a rectangular layer of width "x"1 and height "A"/"x"1, so it also has area "A". The top of this layer is at height "y"2 = "y"1 + "A"/"x"1, and intersects the distribution function at a point ("x"2, "y"2), where "y"2 = "f"("x"2). This layer includes every point in the distibution function between "y"1 and "y"2, but (unlike the base layer) also includes points such as ("x"1, "y"2) which are not in the desired distribution.

Further layers are then stacked on top. To use a precomputed table of size "n" ("n"=256 is typical), one chooses "x"1 such that "x""n"=0, meaning that the top box, box "n", reaches the distribution's peak at (0, "f"(0)) exactly.

Layer "i" extends vertically from "y""i" to "y""i"+1, and can be divided into two regions horizontally: the (generally larger) portion from 0 to "x""i"+1 which is entirely contained within the desired distribution, and the (small) portion from "x""i"+1 to "x""i", which is only partially contained.

Ignoring for a moment the problem of layer 0, and given uniform random variables "U"0 and "U"1 ∈ [0,1), the ziggurat algorithm can be described as:

# Choose a random layer 0 ≤ "i" < "n".
# Let "x" = "U"0"x""i".
# If "x" &lt; "x""i"+1, return "x".
# Let "y" = "y""i" + "U"1("y""i"+1−"y""i").
# Compute "f"("x"). If "y" &lt; "f"("x"), return "x".
# Otherwise, choose new random numbers and go back to step 1.

Step 1 amounts to choosing a low-resolution "y" coordinate. Step 3 tests if the "x" coordinate is clearly within the desired distribution function without knowing more about the y coordinate. If it is not, step 4 chooses a high-resolution y coordinate and step 5 does the rejection test.

With closely spaced layers, the algorithm terminates at step 3 a very large fraction of the time. Note that for the top layer "n"−1, however, this test always fails, because "x""n" = 0.

Layer 0 can also be divided into a central region and an edge, but the edge is an infinite tail. To use the same algorithm to check if the point is in the central region, generate a fictitious "x"0 = "A"/"y""1". This will generate points with "x" < "x""1" with the correct frequency, and in the rare case that layer 0 is selected and "x" ≥ "x""1", use a special fallback algorithm to select a point at random from the tail. Because the fallback algorithm is used less than one time in a thousand, speed is not essential.

Thus, the full ziggurat algorithm for one-sided distributions is:

# Choose a random layer 0 ≤ "i" < "n".
# Let "x" = "U"0"x""i"
# If "x" &lt; "x""i"+1, return "x".
# If "i"=0, generate a point from the tail using the fallback algorithm.
# Let "y" = "y""i" + "U"1("y""i"+1−"y""i").
# Compute "f"("x"). If "y" &lt; "f"("x"), return "x".
# Otherwise, choose new random numbers and go back to step 1.

For a two-sided distribution, of course, the result must be negated 50% of the time. This can often be done conveniently by choosing "U"0 ∈ (−1,1) and, in step 3, testing if |"x"| &lt; "x""i"+1.

Fallback algorithms for the tail

Because the ziggurat algorithm only generates "most" outputs very rapidly, and requires a fallback algorithm for outliers, it is always more complex than a more direct implementation. The fallback algorithm, of course, depends on the distribution.

For an exponential distribution, the tail looks just like the body of the distribution. One way is to fall back to the most elementary algorithm "E"=−ln("U"1) and let "x"="x""1"−ln("U"1). Another is to call the ziggurat algorithm recursively.

For a normal distribution, Marsaglia suggests a compact algorithm:

# Let "x" = −ln("U"1)/"x"1
# Let "y" = −ln("U"2)
# If 2"y" > x², return "x"+"x"1
# Otherwise, go back to step 1.

Since "x""1" ≈ 3.5 for typical table sizes, the test in step 3 is almost always successful. Note also that −ln("U") is just a simple way to generate an exponentially distributed random number; if you have a ziggurat exponential distribution generator available, you can use it instead.

Optimizations

The algorithm can be performed efficiently with precomputed tables of "x""i" and "y""i" = "f"("x""i"), but there are some modifications to make it even faster:
* Nothing in the ziggurat algorithm depends on the probability distribution function being normalized (integral under the curve equal to 1); removing normalizing constants can speed up the computation of "f"("x").
* Most uniform random number generators are based on integer random number generators which return an integer in the range [0, 232−1] . A table of 2−32"x"i lets you use such numbers directly for "U"0.
* When computing two-sided distributions using a two-sided "U"0 as described earlier, the random integer can be interpreted as a signed number in the range [−231, 231−1] and a scale factor of 2−31 can be used.
* Rather than comparing the result of the multiplication in step 3, it is possible to precompute "x""i"+1/"x""i" and compare "U"0 with that directly. If "U"0 is an integer random number generator, these limits may be premultiplied by 232 (or 231, as appropriate) so an integer comparison can be used.
* With the above two changes, the table of unmodified "x""i" values is no longer needed and may be deleted.
* When generating IEEE 754 single-precision floating point values, which only have a 24-bit mantissa (including the implicit leading 1), the least-significant bits of a 32-bit integer random number are not used. These bits may be used to select the layer number. (See the references below for a detailed discussion of this.)
* The first three steps may be put into an inline function, which can call an out-of-line implementation of the less frequently needed steps.

Generating the tables

It is possible to store the entire table precomputed, or just include the values "n", "y""1", "A", and an implementation of "f"&thinsp;−1("x") in the source code, and compute the remaining values when initializing the random number generator.

As previously described, you can find "x""i" = "f"&thinsp;−1("y""i") and "y""i+1" = "y""i"−"A"/"x""i". Repeat "n"−1 times for the layers of the ziggurat. At the end, you should have "y""n"="f"(0). There will, of course, be some round-off error, but it is a useful sanity test to see that it is acceptably small.

When actually filling in the table values, just assume that "x""n"=0 and "y""n"="f"(0), and accept the slight difference in layer "n"−1's area as rounding error.

Finding "x"1 and "A"

Given an initial (guess at) "x"1, you need a way to compute the area "t" of the tail for which "x">"x"1. For the exponential distribution, this is just "e"−"x"1, while for the normal distribution, assuming you are using the unnormalized "f"("x") = "e"−"x"²/2, this is √(π/2)erfc("x"/√2). For more awkward distributions, numerical integration may be required.

With this in hand, from "x"1, you can find "y""1" = "f"("x"1), the area "t" in the tail, and the area of the base layer "A" = "x"1"y"1 + t.

The compute the series "y""i" and "x""i" as above. If "y""i" > "f"(0) for any "i" &lt; "n", then the initial estimate "x"0 was too low, leading to too large an area "A". If "y""n" &lt; "f"(0), then the initial estimate "x"0 was too high.

Given this, use a root-finding algorithm (such as the bisection method) to find the value "x"1 which produces "y""n"−1 as close to "f"(0) as possible. Alternatively, look for the value which makes the area of the topmost layer, "x""n"−1("f"(0)−"y""n"−1), as close to the desired value "A" as possible. This saves one evaluation of "f"&thinsp;−1("x") and is actually the condition of greatest interest.

ee also

* Box-Muller transform
* Random number generator

References

* Note that this paper numbers the layers from 1 starting at the top, and makes layer 0 at the bottom a special case, while the explanation above numbers layers from 0 at the bottom.
* [http://www.jstatsoft.org/v05/i08/supp/1 C implementation of the ziggurat method for the normal density function and the exponential density function] , that is essentially a copy of the code in the paper.
* Describes the hazards of using the least-significant bits of the integer random number generator to choose the layer number.
* [http://www.mathworks.com/company/newsletters/news_notes/clevescorner/spring01_cleve.html Ziggurat algorithm generates normally distributed random numbers] describing the ziggurat algorithm introduced in MATLAB version 5.
* Tests many algorithms for generating normal (a.k.a. Gaussian) random numbers, and concludes that "when maintaining extremely high noise quality is the first priority, and subject to that constraint, speed is also desired, the Ziggurat method will often be the most appropriate choice."


Wikimedia Foundation. 2010.

Игры ⚽ Поможем написать реферат

Look at other dictionaries:

  • Ziggurat 8 — A character/reference from the Xenosaga series. Ziggurat 8, commonly called nihongo| Ziggy |ジギー |Jigī by MOMO and friends, is a combat cyborg assigned the task of recovering and protecting MOMO, the prototype 100 Series Observational Realian. He… …   Wikipedia

  • Ziggurat (disambiguation) — A ziggurat is a pyramidal structure that first appeared during ancient times in the Middle East. It may also refer to:*a ziggurat algorithm, a number generating algorithm *The Ziggurat, an office building in California *Ziggurats (album), a 2007… …   Wikipedia

  • Normal distribution — This article is about the univariate normal distribution. For normally distributed vectors, see Multivariate normal distribution. Probability density function The red line is the standard normal distribution Cumulative distribution function …   Wikipedia

  • List of numerical analysis topics — This is a list of numerical analysis topics, by Wikipedia page. Contents 1 General 2 Error 3 Elementary and special functions 4 Numerical linear algebra …   Wikipedia

  • Pseudorandom number generator — A pseudorandom number generator (PRNG), also known as a deterministic random bit generator (DRBG),[1] is an algorithm for generating a sequence of numbers that approximates the properties of random numbers. The sequence is not truly random in… …   Wikipedia

  • Rejection sampling — In mathematics, rejection sampling is a technique used to generate observations from a distribution. It is also commonly called the acceptance rejection method or accept reject algorithm .It generates sampling values from an arbitrary probability …   Wikipedia

  • Box-Muller transform — A Box Muller transform (by George Edward Pelham Box and Mervin Edgar Muller 1958) [ [http://projecteuclid.org/DPubS/Repository/1.0/Disseminate?view=body id=pdf 1 handle=euclid.aoms/1177706645 G. E. P. Box and Mervin E. Muller, A Note on the… …   Wikipedia

  • IMSL — International Mathematics and Statistics Library Développeur Visual Numerics Type Analyse Numérique et Statistique IMSL C …   Wikipédia en Français

  • IMSL Bibliothèques Numériques — International Mathematics and Statistics Library Développeur Visual Numerics Type Analyse Numérique et Statistique IMSL C …   Wikipédia en Français

  • International Mathematics and Statistics Library — Développeur Visual Numerics Type Analyse Numérique et Statistique IMSL C …   Wikipédia en Français

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”