Comparison sort

Comparison sort
Sorting a set of unlabelled weights by weight using only a balance scale requires a comparison sort algorithm

A comparison sort is a type of sorting algorithm that only reads the list elements through a single abstract comparison operation (often a "less than or equal to" operator) that determines which of two elements should occur first in the final sorted list. The only requirement is that the operator obey two of the properties of a total order:

  1. if ab and bc then ac (transitivity)
  2. for all a and b, either ab or b < a (totalness or trichotomy).

It is possible that both ab and ba; in this case either may come first in the sorted list. In a stable sort, the input order determines the sorted order in this case.

A metaphor for thinking about comparison sorts is that someone has a set of unlabelled weights and a balance scale. Their goal is to line up the weights in order by their weight without any information except that obtained by placing two weights on the scale and seeing which one is heavier (or if they weigh the same).

Contents

Examples

A bubble sort, a sorting algorithm that continuously steps through a list, swapping items until they appear in the correct order.

Some of the most well-known comparison sorts include:

There are many integer sorting algorithms that are not comparison sorts; they include:

Performance limits and advantages of different sorting techniques

There are fundamental limits on the performance of comparison sorts. A comparison sort must have a lower bound of Ω(n log n) comparison operations[1]. This is a consequence of the limited information available through comparisons alone — or, to put it differently, of the vague algebraic structure of totally ordered sets. In this sense, mergesort, heapsort, and introsort are asymptotically optimal in terms of the number of comparisons they must perform, although this metric neglects other operations. The three non-comparison sorts above achieve O(n) performance by using operations other than comparisons, allowing them to sidestep this lower bound (assuming elements are constant-sized).

Nevertheless, comparison sorts offer the notable advantage that control over the comparison function allows sorting of many different datatypes and fine control over how the list is sorted. For example, reversing the result of the comparison function allows the list to be sorted in reverse; for instance, one can sort a list of tuples in lexicographic order by just creating a comparison function that compares each part in sequence:

function tupleCompare((lefta, leftb, leftc), (righta, rightb, rightc))
    if lefta ≠ righta
        return compare(lefta, righta)
    else if leftb ≠ rightb
        return compare(leftb, rightb)
    else
        return compare(leftc, rightc)

Balanced ternary notation allows comparisons to be made in one step, whose result will be one of "less than", "greater than" or "equal to".

Comparison sorts generally adapt more easily to complex orders such as the order of floating-point numbers. Additionally, once a comparison function is written, any comparison sort can be used without modification; non-comparison sorts typically require specialized versions for each datatype.

This flexibility, together with the efficiency of the above comparison sorting algorithms on modern computers, has led to widespread preference for comparison sorts in most practical work.

Number of comparisons required to sort a list

The number of comparisons that a comparison sort algorithm requires increases in proportion to nlog(n), where n is the number of elements to sort. This bound is asymptotically tight:

Given a list of distinct numbers (we can assume this because this is a worst-case analysis), there are n factorial permutations exactly one of which is the list in sorted order. The sort algorithm must gain enough information from the comparisons to identify the correct permutation. If the algorithm always completes after at most f(n) steps, it cannot distinguish more than 2f(n) cases because the keys are distinct and each comparison has only two possible outcomes. Therefore,

2^{f(n)}\geq n!, or equivalently f(n)\geq\log_2(n!).

From Stirling's approximation we know that log 2(n!) is Ω(nlog 2n). This provides the lower-bound part of the claim.

An identical upper bound follows from the existence of the algorithms that attain this bound in the worst case.

The above argument provides an absolute, rather than only asymptotic lower bound on the number of comparisons, namely \lceil\log_2(n!)\rceil comparisons. This lower bound is fairly good (it can be approached within a linear tolerance by a simple merge sort), but it is known to be inexact. For example, \lceil\log_2(13!)\rceil=33, but the minimal number of comparisons to sort 13 elements has been proved to be 34 [2].

Determining the exact number of comparisons needed to sort a given number of entries is a computationally hard problem even for small n, and no simple formula for the solution is known. For some of the few concrete values that have been computed, see OEISA036604.

Lower bound for the average number of comparisons

A similar bound applies to the average number of comparisons. Assuming that

  • all keys are distinct, i.e. every comparison will give either a>b or a<b, and
  • the input is a random permutation, chosen uniformly from the set of all possible permutations of n elements,

it is impossible to determine which order the input is in with fewer than log2(n!) comparisons on average.

This can be most easily seen using concepts from information theory. The Shannon entropy of such a random permutation is log2(n!) bits. Since a comparison can give only two results, the maximum amount of information it provides is 1 bit. Therefore after k comparisons the remaining entropy of the permutation, given the results of those comparisons, is at least log2(n!) - k bits on average. To perform the sort, complete information is needed, so the remaining entropy must be 0. It follows that k must be at least log2(n!).

Note that this differs from the worst case argument given above, in that it does not allow rounding up to the nearest integer. For example, for n = 3, the lower bound for the worst case is 3, the lower bound for the average case as shown above is approximately 2.58, while the highest lower bound for the average case is 8/3, approximately 2.67.

In the case that multiple items may have the same key, there is no obvious statistical interpretation for the term "average case", so an argument like the above cannot be applied without making specific assumptions about the distribution of keys.

Notes

  1. ^ http://planetmath.org/encyclopedia/LowerBoundForSorting.html
  2. ^ Marcin Peczarski: The Ford-Johnson algorithm is still unbeaten for less than 47 elements. Inf. Process. Lett. 101(3): 126-128 (2007) doi:10.1016/j.ipl.2006.09.001

References


Wikimedia Foundation. 2010.

Игры ⚽ Нужна курсовая?

Look at other dictionaries:

  • Sort (C++) — sort is a function in C++ Standard Template Library that takes two random access iterators, the start and the end, as arguments and performs a comparison sort on the range of elements between the two iterators, front inclusive and end exclusive.… …   Wikipedia

  • Comparison — For comparisons within Wikipedia, see Category:Comparisons. Contents 1 Computer science 2 Language 3 Mathemat …   Wikipedia

  • Comparison of browser synchronizers — The following tables compare general and technical information for a number of browser synchronizers. Please see the individual products articles for further information. This article is not all inclusive or necessarily up to date. Unless… …   Wikipedia

  • Comparison of image viewers — This article presents a comparison of image viewers and image organizers which can be used for image viewing. Contents 1 General information 2 Supported file formats 3 Supported desktop environments …   Wikipedia

  • Comparison of statistics journals — This is a comparison of peer reviewed scientific journals published in the field of statistics. Contents 1 General information 2 Impact, indexing, abstracting and reviewing 3 Notes 4 …   Wikipedia

  • Comparison of web browsers — September 2011, web browser usage share. Source: Median values from summary table …   Wikipedia

  • Comparison of Pascal and C — Programming language comparisons General comparison Basic syntax Basic instructions Arrays Associative arrays String operations …   Wikipedia

  • Comparison (computer programming) — Ifeq redirects here. For the MediaWiki feature, see Wikipedia:Ifeq In computer programming, comparison of two data items is effected by the comparison operators typically written as: > (greater than) < (less than) >= (greater than or… …   Wikipedia

  • Comparison of the health care systems in Canada and the United States — Health spending per capita, in U.S. dollars PPP adjusted, with the U.S. and Canada compared amongst other first world nations. Comparison of the health care systems in Canada and the United States are often made by government, public health and… …   Wikipedia

  • Comparison of Android e-book reader software — Contents 1 Navigation features 2 Display features 3 Edit/tool features 4 Book source management features …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”