Bregman divergence

Bregman divergence

In mathematics Bregman divergence or Bregman distance is similar to a metric, but does not satisfy the triangle inequality nor symmetry. There are two ways in which Bregman divergences are important. Firstly, they generalize squared Euclidean distance to a class of distances that all share similar properties. Secondly, they bear a strong connection to exponential families of distributions; as has been shown by (Banerjee et al. 2005), there is a bijection between regular exponential families and regular Bregman divergences.

Bregman divergences are named after L. M. Bregman, who introduced the concept in 1967. More recently researchers in geometric algorithms have shown that many important algorithms can be generalized from Euclidean metrics to distances defined by Bregman divergence (Banerjee et al. 2005; Nielsen and Nock 2006; Nielsen et al. 2007).

Definition

Let F: Delta o Re be a continuously-differentiable real-valued and strictly convex function defined on a closed convex set Delta

The Bregman distance associated with F for points p, q in Delta is:

B_F(p Vert q) = F(p)-F(q)-langle abla F(q),(p-q) angle

Intuitively this can be thought of as the difference between the value of F at point p and the value of the first-order Taylor expansion of F around point q evaluated at point p.

Properties

* Non-negativity: B_F(p Vert q) ge 0 for all p,q. This is a consequence of the convexity of F.
* Convexity:B_F(p Vert q) is convex in its first argument, but not necessarily in the second argument.
* Linearity: If we think of the Bregman distance as an operator on the function F, then it is linear with respect to non-negative coefficients. In other words, for F_1, F_2 strictly convex and differentiable, and lambda > 0, :B_{F_1 + lambda F_2}(p Vert q) = B_{F_1}(p Vert q) + lambda B_{F_2}(p Vert q)
* Duality: The function F has a convex conjugate F^*. The Bregman distance defined with respect to F^* has an interesting relationship to B_F(p Vert q)

:B_{F^*}(p^* Vert q^*) = B_F(q Vert p)

Here, p^* = abla F(p) is the dual point corresponding to p

Examples

* Squared Euclidean distance B_F(x,y) = |x - y|^2 is the canonical example of a Bregman distance, generated by the convex function F(x) = |x|^2
* The generalized Kullback-Leibler divergence B_F(p,q) = sum p(i) log frac{p(i)}{q(i)} - sum p(i) + sum q(i) is generated by the convex function F(p) = sum_i p(i)log p(i) - sum p(i)

Generalizing projective duality

A key tool in computational geometry is the idea of projective duality, which maps points to hyperplanes and vice versa, while preserving incidence and above-below relationships. There are numerous analytical forms of the projective dual: one common form maps the point p = (p_1, ldots p_d) to the hyperplane x_{d+1} = sum_1^d 2p_i x_i. This mapping can be interpreted (identifying the hyperplane with its normal) as the convex conjugate mapping that takes the point p to its dual point p^* = abla F(p), where F defines the d-dimensional paraboloid x_{d+1} = sum x_i^2.

If we now replace the paraboloid by an arbitrary convex function, we obtain a different dual mapping that retains the incidence and above-below properties of the standard projective dual. This implies that natural dual concepts in computational geometry like Voronoi diagrams and Delaunay triangulations retain their meaning in distance spaces defined by an arbitrary Bregman divergence. Thus, algorithms from "normal" geometry extend directly to these spaces (Nielsen, Nock and Boissonnat, 2006)

References

*cite journal
author = Banerjee, Arindam; Merugu, Srujana; Dhillon, Inderjit S.; Ghosh, Joydeep
title = Clustering with Bregman divergences
journal = Journal of Machine Learning Research
volume = 6
year = 2005
pages = 1705–1749
url = http://portal.acm.org/citation.cfm?id=1194902

*cite journal
author = Bregman, L. M.
title = The relaxation method of finding the common points of convex sets and its application to the solution of problems in convex programming
journal = USSR Computational Mathematics and Mathematical Physics
volume = 7
year = 1967
pages = 200–217
doi = 10.1016/0041-5553(67)90040-7

*cite conference
author = Nielsen, Frank; Boissonnat, Jean-Daniel; Nock, Richard
url = http://www.csl.sony.co.jp/person/nielsen/PT/SoCG07/
title = On Visualizing Bregman Voronoi diagrams
booktitle = Proc. 23rd ACM Conference on Computational Geometry (video track)
year = 2007

*cite conference
author = Nielsen, Frank; Boissonnat, Jean-Daniel; Nock, Richard
url = http://www.csl.sony.co.jp/person/nielsen/BregmanVoronoi/
title = On Bregman Voronoi Diagrams
booktitle = Proc. 18th ACM-SIAM Symposium on Discrete Algorithms (SODA)
year = 2007

*cite conference
author = Nielsen, Frank; Nock, Richard
title = On approximating the smallest enclosing Bregman Balls
booktitle = Proceedings of the 22nd Annual Symposium on Computational Geometry
pages = 485–486
year = 2006
doi = 10.1145/1137856.1137931

External links

* [http://www.csl.sony.co.jp/person/nielsen/BregmanDivergence/index.html Bregman divergence interactive applet]
* [http://www.csl.sony.co.jp/person/nielsen/BVDapplet/index.html Bregman Voronoi diagram applet]

* [http://www.sonycsl.co.jp/person/nielsen/BregmanBall/MINIBALL/ Exact Smallest Enclosing Bregman Ball applet]

* [http://www.sonycsl.co.jp/person/nielsen/BregmanBall/BBC/index.html Approximating smallest enclosing Bregman ball applet]


Wikimedia Foundation. 2010.

Игры ⚽ Нужен реферат?

Look at other dictionaries:

  • Divergence (disambiguation) — Divergence can refer to: In mathematics: Divergence, a function that associates a scalar with every point of a vector field Divergence (computer science), a computation which does not terminate (or terminates in an exceptional state) Divergence… …   Wikipedia

  • Kullback–Leibler divergence — In probability theory and information theory, the Kullback–Leibler divergence[1][2][3] (also information divergence, information gain, relative entropy, or KLIC) is a non symmetric measure of the difference between two probability distributions P …   Wikipedia

  • Mahalanobis distance — In statistics, Mahalanobis distance is a distance measure introduced by P. C. Mahalanobis in 1936.[1] It is based on correlations between variables by which different patterns can be identified and analyzed. It gauges similarity of an unknown… …   Wikipedia

  • Statistical inference — In statistics, statistical inference is the process of drawing conclusions from data that are subject to random variation, for example, observational errors or sampling variation.[1] More substantially, the terms statistical inference,… …   Wikipedia

  • List of mathematics articles (B) — NOTOC B B spline B* algebra B* search algorithm B,C,K,W system BA model Ba space Babuška Lax Milgram theorem Baby Monster group Baby step giant step Babylonian mathematics Babylonian numerals Bach tensor Bach s algorithm Bachmann–Howard ordinal… …   Wikipedia

  • Non-negative matrix factorization — NMF redirects here. For the bridge convention, see new minor forcing. Non negative matrix factorization (NMF) is a group of algorithms in multivariate analysis and linear algebra where a matrix, , is factorized into (usually) two matrices, and… …   Wikipedia

  • Cockatoo — For other uses, see Cockatoo (disambiguation). Cockatoo …   Wikipedia

  • Mixture model — See also: Mixture distribution In statistics, a mixture model is a probabilistic model for representing the presence of sub populations within an overall population, without requiring that an observed data set should identify the sub population… …   Wikipedia

  • Etzel — Irgoun Un logo de l’Irgoun : un fusil brandi devant une carte des territoires revendiqués par l’organisation : les actuels Israël, territoires palestiniens, ainsi que la Jordanie …   Wikipédia en Français

  • Etziel — Irgoun Un logo de l’Irgoun : un fusil brandi devant une carte des territoires revendiqués par l’organisation : les actuels Israël, territoires palestiniens, ainsi que la Jordanie …   Wikipédia en Français

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”