Kendall tau distance

Kendall tau distance

The Kendall tau distance is a metric that counts the number of pairwise disagreements between two lists. The larger the distance, the more dissimilar the two lists are. Kendall tau distance is also called bubble-sort distance since it is equivalent to the number of swaps that the bubble sort algorithm would make to place one list in the same order as the other list. The Kendall tau distance was created by Maurice Kendall.

Definition

The Kendall tau distance between two lists au_1 and au_2 is

: K( au_1, au_2) = |(i,j): i < j, ( au_1(i) < au_1(j) wedge au_2(i) > au_2(j) ) vee ( au_1(i) > au_1(j) wedge au_2(i) < au_2(j) )|.

K( au_1, au_2) will be equal to 0 if the two lists are identical and n(n-1)/2 (where n is the list size) if one list is the reverse of the other. Often Kendall tau distance is normalized by dividing by n(n-1)/2 so a value of 1 indicates maximum disagreement. The normalized Kendall tau distance therefore lies in the interval [0,1] .

Kendall tau distance may also be defined as

: K( au_1, au_2) = egin{matrix} sum_{{i,j}in P} ar{K}_{i,j}( au_1, au_2) end{matrix}

where

* "P" is the set of unordered pairs of distinct elements in au_1 and au_2
* ar{K}_{i,j}( au_1, au_2) = 0 if "i" and "j" are in the same order in au_1 and au_2
* ar{K}_{i,j}( au_1, au_2) = 1 if "i" and "j" are in the opposite order in au_1 and au_2

Kendall tau distance can also be defined as the total number of discordant pairs.

Example

Suppose we rank a group of five people by height and by weight:

Here person A is tallest and third-heaviest, and so on.

In order to calculate the Kendall tau distance, pair each person with every other person and count the number of times the values in list 1 are in the opposite order of the values in list 2.

Since there are 4 pairs whose values are in opposite order, the Kendall tau distance is 4. The normalized Kendall tau distance is

: frac{4}{5(5 - 1)/2} = 0.4.

A value of 0.4 indicates a somewhat low agreement in the rankings.

See also

* Bubble sort
* Kendall's tau
* Spearman's rank correlation coefficient

References

*

* Kendall, M. (1948) "Rank Correlation Methods", Charles Griffin & Company Limited

* Kendall, M. (1938) "A New Measure of Rank Correlation", Biometrica, 30, 81-89.

External links

* [http://rsscse.org.uk/ts/bts/noether/text.html Why Kendall tau?]
* [http://www.wessa.net/rwasp_kendall.wasp Online software: computes Kendall's tau rank correlation]


Wikimedia Foundation. 2010.

Игры ⚽ Нужна курсовая?

Look at other dictionaries:

  • Kendall tau rank correlation coefficient — The Kendall tau rank correlation coefficient (or simply the Kendall tau coefficient, Kendall s tau; or tau test(s)) is a non parametric statistic used to measure the degree of correspondence between two rankings and assessing the significance of… …   Wikipedia

  • Chi Tau — CHICO STATE UNIVERSITY FRATERNITY …   Wikipedia

  • Minimum distance estimation — (MDE) is a statistical method for fitting a mathematical model to data, usually the empirical distribution. Contents 1 Definition 2 Statistics used in estimation 2.1 Chi square criterion …   Wikipedia

  • List of statistics topics — Please add any Wikipedia articles related to statistics that are not already on this list.The Related changes link in the margin of this page (below search) leads to a list of the most recent changes to the articles listed below. To see the most… …   Wikipedia

  • Rank correlation — In statistics, rank correlation is the study of relationships between different rankings on the same set of items. It measures the correspondence between two rankings and assesses its significance. Correlation coefficientsTwo of the more popular… …   Wikipedia

  • List of mathematics articles (K) — NOTOC K K approximation of k hitting set K ary tree K core K edge connected graph K equivalence K factor error K finite K function K homology K means algorithm K medoids K minimum spanning tree K Poincaré algebra K Poincaré group K set (geometry) …   Wikipedia

  • Concordant pair — In statistics, a concordant pair is a pair of a two variable (bivariate) observation data set {X1,Y1} and {X2,Y2}, where: Correspondingly, a discordant pair is a pair, as defined above, where and the sign function, often represented as sgn, is… …   Wikipedia

  • Dodgson's method — Part of the Politics series Electoral methods Single winner …   Wikipedia

  • Correlation and dependence — This article is about correlation and dependence in statistical data. For other uses, see correlation (disambiguation). In statistics, dependence refers to any statistical relationship between two random variables or two sets of data. Correlation …   Wikipedia

  • Contingency table — In statistics, a contingency table (also referred to as cross tabulation or cross tab) is a type of table in a matrix format that displays the (multivariate) frequency distribution of the variables. It is often used to record and analyze the… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”