Information geometry

Information geometry

In mathematics and especially in statistical inference, information geometry is the study of probability and information by way of differential geometry. It reached maturity through the work of Shun'ichi Amari in the 1980s, with what is currently the canonical reference book: "Differential-geometrical methods in statistics".

Introduction

The main tenet of information geometry is that many important structures in probability theory, information theory and statistics can be treated as structures in differential geometry by regarding a space of probabilities as a differential manifold endowed with a Riemannian metric and a family of affine connections. For example,
* The Fisher information metric is a Riemannian metric.
* The Kullback-Leibler divergence is one of a family of divergences related to dual affine connections.
* An exponential family is flat submanifold under the e-affine connection.
* The maximum likelihood estimate is a projection under the m-affine connection.
* The unique existence of maximum likelihood estimate on exponential families is the consequence of the e- and m- connections being dual affine.
* The EM algorithm is, under broad conditions, an iterative dual projection method under the e-connection and m-connection.
* The concepts of accuracy of estimators, in particular the first and third order efficiency of estimators, can be represented in terms of imbedding curvatures of the manifold representing the statistical model and the manifold of representing the estimator (the second order always equals zero after bias correction).
*The higher order asymptotic power of statistical test can be represented using geometric quantities.

The importance of studying statistical structures as geometrical structures lies in the fact that geometric structures are invariant under coordinate transforms. For example, a family of probability distributions, such as Gaussian distributions, may be transformed into another family of distributions, such as log-normal distributions, by a change of variables. However, the fact of it being an exponential family is not changed, since the latter is a geometric property. The distance between two distributions in this family defined through Fisher metric will also be preserved.

The statistician Fisher recognized in the 1920s that there is an intrinsic measure of amount of information for statistical estimators. The Fisher information matrix was shown by Cramer and Rao to be a Riemannian metric on the space of probabilities, and became known as Fisher information metric.

The mathematician Cencov (Chentsov) proved in the 1960s and 1970s that on the space of probability distributions on a sample space containing at least three points,
* There exists a unique intrinsic metric. It is the Fisher information metric.
* There exists a unique one parameter family of affine connections. It is the family of alpha-affine connections later popularized by Amari.Both of these uniqueness are, of course, up to the multiplication by a constant.

Amari and Nagaoka's study in the 1980s brought all these results together, with the introduction of the concept of dual-affine connections, and the interplay among metric, affine connection and divergence(information geometry). In particular,
* Given a Riemannian metric "g" and a family of dual affine connections Gamma_alpha, there exists a unique set of dual divergences D_alpha defined by them.
* Given the family of dual divergences D_alpha, the metric and affine connections can be uniquely determined by second order and third order differentiations.Also, Amari and Kumon showed that asymptotic efficiency of estimates and testscan be represented by geometrical quantities.

Basic concepts

* Statistical manifold: space of probability distribution, statistical model.
* Point on the manifold: probability distribution.
* Coordinates: parameters in the statistical model.
* Tangent vector: Fisher score function.
* Riemannian metric: Fisher information metric.
* Affine connections.
* Curvatures: associated with information loss
* Information divergence.

Fisher information metric as a Riemannian metric

Information geometry is based primarily on the Fisher information metric:

:g_{ij}=int frac{partial log p(x, heta)}{partial heta_i} frac{partial log p(x, heta)}{partial heta_j} p(x, heta), dx.

Substituting "i" = −log("p") from information theory, the formula becomes:

:g_{ij}=int frac{partial i(x, heta)}{partial heta_i} frac{partial i(x, heta)}{partial heta_j} p(x, heta), dx.

Intuitively, this says the distance between two points on a statistical differential manifold is the amount of information between them, i.e. the informational difference between them.

Thus, if a point in information space represents the state of a system, then the trajectory of that point will, on average, be a random walk through information space, i.e. will diffuse according to Brownian motion.

With this in mind, the information space can be thought of as a fitness landscape, a trajectory through this space being an "evolution". The Brownian motion of evolution trajectories thus represents the "no free lunch" phenomenon discussed by Stuart Kauffman.

History

The history of information geometry is associated with the discoveries of at least the following people, and many others
* Sir Ronald Aylmer Fisher
* Harald Cramér
* Calyampudi Radhakrishna Rao
* Solomon Kullback
* Richard Leibler
* Claude Shannon
* Imre Csiszár
* Cencov
* Bradley Efron
* Vos
* Shun-ichi Amari
* Hiroshi Nagaoka
* Kass
* Shinto Eguchi
* Ole Barndorff-Nielsen

Some applications

Natural gradient

An important concept in information geometry is the natural gradient. The concept and theory of the natural gradient suggests an adjustment to the energy function of a learning rule. This adjustment takes into account the curvature of the (prior) statistical differential manifold, by way of the Fisher information metric.

This concept has many important applications in blind signal separation, neural networks, artificial intelligence, and other engineering problems that deal with information. Experimental results have shown that application of the concept leads to substantial performance gains.

References

* Shun'ichi Amari - "Differential-geometrical methods in statistics", Lecture notes in statistics, Springer-Verlag, Berlin, 1985
* Shun'ichi Amari, Hiroshi Nagaoka - "Methods of information geometry", Transactions of mathematical monographs; v. 191, American Mathematical Society, 2000
* M. Murray and J. Rice - "Differential geometry and statistics", Monographs on Statistics and Applied Probability 48, Chapman and Hall, 1993.
* R. E. Kass and P. W. Vos - "Geometrical Foundations of Asymptotic Inference", Series in Probability and Statistics, Wiley, 1997.
* N. N. Cencov - "Statistical Decisions Rules and Optimal Inference", Translations of Mathematical Monographs; v. 53, American Mathematical Society, 1982


Wikimedia Foundation. 2010.

Игры ⚽ Нужна курсовая?

Look at other dictionaries:

  • Information — as a concept has a diversity of meanings, from everyday usage to technical settings. Generally speaking, the concept of information is closely related to notions of constraint, communication, control, data, form, instruction, knowledge, meaning,… …   Wikipedia

  • Information theory — Not to be confused with Information science. Information theory is a branch of applied mathematics and electrical engineering involving the quantification of information. Information theory was developed by Claude E. Shannon to find fundamental… …   Wikipedia

  • Information de Fisher — L information de Fisher est une notion de statistique introduite par R.A. Fisher qui quantifie l information relative à un paramètre contenue dans une distribution. Soit f(x;θ) la distribution de vraisemblance d une grandeur x (qui peut être… …   Wikipédia en Français

  • geometry — /jee om i tree/, n. 1. the branch of mathematics that deals with the deduction of the properties, measurement, and relationships of points, lines, angles, and figures in space from their defining conditions by means of certain assumed properties… …   Universalium

  • Geometry shader — A geometry shader (abbreviated GS ) is a shader program model introduced with Shader Model 4.0 of DirectX 10 [ [http://msdn.microsoft.com/en us/library/bb509626(VS.85).aspx msdn: Shader Models vs Shader Profiles] ] . NVIDIA GeForce 8800 GPUs were …   Wikipedia

  • information pollution — (in.fohr.MAY.shun puh.loo.shun) n. The contamination of a culture or of a person s life caused by exposure to excessive amounts of information or data. Example Citations: If you accurately define the cause of consumer pain, then you can create a… …   New words

  • information system — Introduction       an integrated set of components for collecting, storing, processing, and communicating information (information science). Business firms, other organizations, and individuals in contemporary society rely on information systems… …   Universalium

  • Differential geometry — A triangle immersed in a saddle shape plane (a hyperbolic paraboloid), as well as two diverging ultraparallel lines. Differential geometry is a mathematical discipline that uses the techniques of differential and integral calculus, as well as… …   Wikipedia

  • List of differential geometry topics — This is a list of differential geometry topics. See also glossary of differential and metric geometry and list of Lie group topics. Contents 1 Differential geometry of curves and surfaces 1.1 Differential geometry of curves 1.2 Differential… …   Wikipedia

  • List of geometry topics — This is list of geometry topics, by Wikipedia page.*Geometric shape covers standard terms for plane shapes *List of mathematical shapes covers all dimensions *List of differential geometry topics *List of geometers *See also list of curves, list… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”