Geometric median

Geometric median

The geometric median of a discrete set of sample points in a Euclidean space is the point minimizing the sum of distances to the sample points. This generalizes the median, which has the property of minimizing the sum of distances for one-dimensional data, and provides a central tendency in higher dimensions. It is also known as the Fermat–Weber point or 1-median. [The more general "k"-median problem asks for the location of "k" cluster centers minimizing the sum of distances from each sample point to its nearest center.]

The geometric median is an important estimator of location in statistics. It is also a standard problem in facility location, where it models the problem of locating a facility to minimize the cost of transportation.

The special case of the problem for three points in the plane (that is, "m" = 3 and "n" = 2) is sometimes also known as Fermat's problem; it arises in the construction of minimal Steiner trees, and was originally posed as a problem by Pierre de Fermat to Evangelista Torricelli, who solved it. Its solution is now known as the Fermat point of the triangle formed by the three sample points. Alfred Weber's name is associated with the more general Fermat–Weber problem due to a discussion of the problem in his 1909 book on facility location.

Wesolowsky (1993) provides a survey of the problem. See Fekete, Mitchell, and Beurer (2003) for generalizations of the problem to non-discrete point sets.

Definition

Formally, for given a set of "m" points x_1, x_2, dots, x_m, with each x_i in mathbb{R}^n, the geometric median is defined as

:Geometric Median =underset{y in mathbb{R}^n}{operatorname{argmin sum_{i=1}^m left | x_i-y ight |

Note that "argmin" means the argument for which the sum is minimized. In this case, it is the point y from where the sum of all Euclidean distances to the x_i's is minimum.

Properties

* For the 1-dimensional case, the geometric median coincides with the median. This is because the univariate median also minimizes the sum of distances from the points.
* The geometric median is unique whenever the points are not collinear.
* The geometric median is equivariant for Euclidean similarity transformations, including translation and rotation. This means that one would get the same result either by transforming the geometric median, or by applying the same transformation to the sample data and finding the geometric median of the transformed data. This property follows from the fact that the geometric median is defined only from pairwise distances, and doesn't depend on the system of orthogonal Cartesian coordinates by which the sample data is represented. In contrast, the component-wise median for a multivariate data set is not in general rotation invariant, nor is it independent of the choice of coordinates.
* The geometric median has a breakdown point of 0.5. [Lopuhaä and Rousseeuw (1991).] That is, up to half of the sample data may be arbitrarily corrupted, and the median of the samples will still provide a robust estimator for the location of the uncorrupted data.

pecial cases

*For 3 points, if any angle of the triangle is more than 120° then the geometric median is the point making that angle. If all the angles are less than 120°, the geometric median is the point inside the triangle which subtends an angle of 120° to all three pairs of points. This is also known as the Fermat point of the triangle formed by the three points.
*For 4 coplanar points, if one of the four points is inside the triangle formed by the other three points, then the geometric median is that point. Otherwise, the points form a convex quadrilateral and the geometric median is the crossing point of the diagonals of the quadrilateral. The geometric median of four points is also known as the Radon point of the four points.

Computation

Despite being an easy to understand concept, computing the geometric median poses a challenge. The centroid or center of mass, defined similarly to the geometric median as minimizing the sum of the squares of the distances to each sample, can be found by a simple formula — its coordinates are the averages of the coordinates of the samples — but no such formula is known for the geometric median, and it has been shown that no formula involving only arithmetic operations and "k"th roots can exist in general. [Cockayne and Melzak (1969); Bajaj (1988).]

However, it is straightforward to calculate an approximation to the geometric median using an iterative procedure in which each step produces a more accurate approximation. Procedures of this type can be derived from the fact that the sum of distances is a convex function, since the distance to each sample point is convex and the sum of convex functions remains convex. Therefore, procedures that decrease the sum of distances at each step cannot get trapped in a local optimum.

One common approach of this type, called Weiszfeld's algorithm [Weiszfeld (1937); Kuhn (1973); Chandrasekaran and Tamir (1989).] , is a form of iteratively re-weighted least squares. This algorithm defines a set of weights that are inversely proportional to the distances from the current estimate to the samples, and creates a new estimate that is the weighted average of the samples according to these weights. That is,:left. y_{i+1}=left( sum_{j=1}^m frac{x_j}{| x_j - y_i ight) ight/ left( sum_{j=1}^m frac{1}{| x_j - y_i ight).

Bose et al (2003) describe more sophisticated geometric optimization procedures for finding approximately optimal solutions to this problem.

Implicit formula

If "y" is distinct from all the given points, "x""j", then "y" is the geometric median if and only if it satisfies::0 = sum_{j=1}^m frac {x_j - y} {left | x_j - y ight .

This is equivalent to::left. y = left( sum_{j=1}^m frac{x_j}{| x_j - y ight) ight/ left( sum_{j=1}^m frac{1}{| x_j - y ight)

which is closely related to Weiszfeld's algorithm.

If "y" is equal to some of the given points, then "y" is the geometric median if and only if there are vectors "u""j" such that::0 = sum_{j=1}^m u_j

where for "x""j" ≠ "y", :u_j = frac {x_j - y} {left | x_j - y ight

and for "x""j" = "y", :| u_j | leq 1 .

ee also

* Central tendency
* Centroid, which minimizes the sum of "squares" of Euclidean distance

Notes

References

*cite journal
author = Bajaj, C.
title = The algebraic degree of geometric optimization problems
journal = Discrete and Computational Geometry
year = 1988
volume = 3
pages = 177–191
doi = 10.1007/BF02187906

*cite journal
title = Fast approximations for sums of distances, clustering and the Fermat–Weber problem
author = Bose, Prosenjit; Maheshwari, Anil; Morin, Pat
journal = Computational Geometry: Theory and Applications
volume = 24
issue = 3
pages = 135–146
year = 2003
doi = 10.1016/S0925-7721(02)00102-5
url = http://www.scs.carleton.ca/~jit/publications/papers/bmm01.ps

*cite journal
author = Chandrasekaran, R.; Tamir, A.
title = Open questions concerning Weiszfeld's algorithm for the Fermat-Weber location problem
journal = Mathematical Programming, Series A
volume = 44
year = 1989
pages = 293–295
doi = 10.1007/BF01587094

*cite journal
author = Cockayne, E. J.; Melzak, Z. A.
title = Euclidean constructability in graph minimization problems.
journal = Mathematics Magazine
volume = 42
pages = 206–208
year = 1969

*cite journal
author = Fekete, Sándor P.; Mitchell, Joseph S. B.; Beurer, Karin
title = On the continuous Fermat-Weber problem
year = 2003
id = arxiv | archive = cs.CG | id = 0310027

*cite journal
author = Kuhn, Harold W.
title = A note on Fermat's problem
journal = Mathematical Programming
year = 1973
volume = 4
issue = 1
pages = 98–107
doi = 10.1007/BF01584648

*cite journal
author = Lopuhaä, Hendrick P.; Rousseeuw, Peter J.
title = Breakdown points of affine equivariant estimators of multivariate location and covariance matrices
year = 1991
journal = Annals of Statistics
volume = 19
pages = 229–248
url = http://links.jstor.org/sici?sici=0090-5364(199103)19%3A1%3C229%3ABPOAEE%3E2.0.CO%3B2-1
issue = 1
doi = 10.1214/aos/1176347978

*cite book
author = Weber, Alfred
title = Über den Standort der Industrien, Erster Teil: Reine Theorie des Standortes
location = Tübingen
publisher = Mohr
year = 1909

*cite journal
author = Wesolowsky, G.
title = The Weber problem: History and perspective
journal = Location Science
volume = 1
pages = 5–23
year = 1993

*cite journal
author = Weiszfeld, E.
title = Sur le point pour lequel la somme des distances de "n" points donnes est minimum
journal = Tohoku Math. Journal
volume = 43
year = 1937
pages = 355–386


Wikimedia Foundation. 2010.

Игры ⚽ Нужен реферат?

Look at other dictionaries:

  • Median (disambiguation) — Median has different meanings in different contexts: Median, in statistics, a number that separates the lowest value half and the highest value half Median (geometry), in geometry, a line joining a vertex of a triangle to the midpoint of the… …   Wikipedia

  • Median — This article is about the statistical concept. For other uses, see Median (disambiguation). In probability theory and statistics, a median is described as the numerical value separating the higher half of a sample, a population, or a probability… …   Wikipedia

  • Median search — A median search is an algorithm that finds the median value of list. It belongs to the divide and conquer algorithm category and more precisely it is a selection algorithm. A selection algorithm finds the k th element of a list with N elements.… …   Wikipedia

  • Median (geometry) — For another use of the term median in geometry, see Geometric median. The triangle medians and the centroid. In geometry, a median of a triangle is a line segment joining a vertex to the midpoint of the opposing side. Every triangle has exactly… …   Wikipedia

  • Geometric stable distribution — Geometric Stable parameters: α ∈ (0,2] stability parameter β ∈ [−1,1] skewness parameter (note that skewness is undefined) λ ∈ (0, ∞) scale parameter μ ∈ (−∞, ∞) location parameter support: x ∈ R, or x ∈ [μ, +∞) if α < 1 and β = 1, or x ∈… …   Wikipedia

  • Geometric distribution — Probability distribution two name =Geometric type =mass pdf cdf | parameters =0< p leq 1 success probability (real) support =k in {1,2,3,dots}! pdf =(1 p)^{k 1},p! cdf =1 (1 p)^k! mean =frac{1}{p}! median =leftlceil frac{ log(2)}{log(1 p)} ight… …   Wikipedia

  • Geometric graph theory — In mathematics, a geometric graph is a graph in which the vertices or edges are associated with geometric objects or configurations. Geometric graph theory is a specialization of graph theory that studies geometric graphs. Notable geometric… …   Wikipedia

  • Median absolute deviation — In statistics, the median absolute deviation (MAD) is a robust measure of the variability of a univariate sample of quantitative data. It can also refer to the population parameter that is estimated by the MAD calculated from a sample. For a… …   Wikipedia

  • Median polish — The median polish is an exploratory data analysis procedure proposed by the statistician John Tukey. It finds an additively fit model for data in a two way layout table (usually, results from a factorial experiment) of the form row effect +… …   Wikipedia

  • median — A typical value that is identified by arranging a set of numbers in an ascending or descending scale and selecting the middle number (if there are an odd number in the set) or the arithmetic mean of the middle two numbers (if there are an even… …   Big dictionary of business and management

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”