Interpolation search

Interpolation search

Interpolation search is an algorithm for searching for a given key value in an indexed array that has been ordered by the values of the key. It parallels how humans search through a telephone book for a particular name, the key value by which the book's entries are ordered. In each search step it calculates where in the remaining search space the sought item might be based on the key values at the bounds of the search space and the value of the sought key, usually via a linear interpolation. The key value actually found at this estimated position is then compared to the key value being sought. If it is not equal, then depending on the comparison, the remaining search space is reduced to the part before or after the estimated position. Only if calculations on the size of differences between key values are sensible will this method work. Humans have a rough idea of the distribution of surnames in an alphabetical listing, and can guess that "Baker" is closer to "Abadeeah" than it is to "Zyteca" and interpolate accordingly. In the absence of information about the popularity of surnames, computer representations of text strings do not facilitate such arithmetic.

By comparison, the binary search chooses always the middle of the remaining search space, discarding one half or the other, again depending on the comparison between the key value found at the estimated position and the key value sought. The remaining search space is reduced to the part before or after the estimated position. The linear search uses equality only as it compares elements one-by-one from the start, ignoring any sorting.

On average the interpolation search makes about log(log("n")) comparisons (if the elements are uniformly distributed), where "n" is the number of elements to be searched. In the worst case (for instance where the numerical values of the keys increase exponentially) it can make up to O("n") comparisons.

Performance

In reality, an interpolation search is often no faster than a binary search due to the complexity of the arithmetic calculations for estimating the next probe position. The key may not be easily regarded as a number, as when it is text such as names in a telephone book. Yet it is always possible to regard a bit pattern as a number, and text could be considered a number in base twenty-six (or however many distinct symbols are allowable) that could then be used in the interpolation arithmetic, but each such conversion is more trouble than a few binary chop probes. Nevertheless, a suitable numeric value could be pre-computed for every name and searching would employ those numerical values. Further, it may be easy enough to form a cumulative histogram of those values that follows a shape that can be approximated by a simple function (such as a straight line) or a few such curves over segments of the range. If so, then a suitable inverse interpolation is easily computed and the desired index found with very few probes. The particular advantage of this is that a probe of a particular index's value may involve access to slow storage (on disc, or not in the on-chip memory) whereas the interpolation calculation involves a small local collection of data which with many searches being performed is likely in fast storage. This approach shades into being a special-case Hash table search.

Such analysis should also detect the troublesome case of equal key values. If a run of equal key values exists, then the search will not necessarily select the first (or last) element of such a run, and if not carefully written, runs the risk of attempting a divide by zero during its interpolation calculation.

In the simple case, no analysis of values still less cumulative histogram approximations are prepared. With a sorted list, clearly the minimum value is at index one, and the maximum value is at index "n". Assuming a uniform distribution of values amounts to assuming that the cumulative histogram's shape is a straight line from the minimum to the maximum and so linear interpolation will find that index whose associated value should be the sought value.

ample implementation

The following code example for the Java programming language is a simple implementation. At each stage it computes a probe position then as with the binary search, moves either the upper or lower bound in to define a smaller interval containing the sought value. Unlike the binary search which guarantees a halving of the interval's size with each stage, a misled interpolation may reduce it by only one, thus the O("n") worst case.

public int interpolationSearch(int [] sortedArray, int toFind){ // Returns index of toFind in sortedArray, or -1 if not found int low = 0; int high = sortedArray.length - 1; int mid; while (sortedArray [low] < toFind && sortedArray [high] >= toFind) { mid = low + ((toFind - sortedArray [low] ) * (high - low)) / (sortedArray [high] - sortedArray [low] ); if (sortedArray [mid] < toFind) low = mid + 1; else if (sortedArray [mid] > toFind) //Repetition of the comparison code is forced by syntax limitations. high = mid - 1; else return mid; } if (sortedArray [low] = toFind) return low; else return -1; // Not found }

Notice that having probed the list at index "mid", for reasons of loop control administration, this code sets either "high" or "low" to be not "mid" but an adjacent index, which location is then probed during the next iteration. Since an adjacent entry's value will not be much different the interpolation calculation is not much improved by this one step adjustment, at the cost of an additional reference to distant memory such as disc.

Each iteration of the above code requires between five and six comparisons (the extra is due to the repetitions needed to distinguish the three states of < > and = via binary comparisons in the absence of a three-way comparison) plus some messy arithmetic, while the binary search algorithm can be written with one comparison per iteration and uses only trivial integer arithmetic. It would thereby search an array of a million elements with no more than twenty comparisons (involving accesses to slow memory where the array elements are stored); to beat that the interpolation search as written above would be allowed no more than three iterations.

Book-based searching

The conversion of names in a telephone book to some sort of number clearly will not provide numbers having a uniform distribution (except via immense effort such as sorting the names and calling them name #1, name #2, etc.) and further, it is well-known that some names are much more common than others (Smith, Jones,) Similarly with dictionaries, where there are many more words starting with some letters than others. Some publishers go to the effort of preparing marginal annotations or even cutting into the side of the pages to show markers for each letter so that at a glance a segmented interpolation can be performed.

ee also

*Linear search
*Binary search
*Ternary search
*Hash table

External links

* [http://www.dcc.uchile.cl/~rbaeza/handbook/algs/3/322.srch.p.html Interpolation search]
* [http://www.nist.gov/dads/HTML/interpolationSearch.html National Institute of Standards and Technology]


Wikimedia Foundation. 2010.

Игры ⚽ Нужна курсовая?

Look at other dictionaries:

  • Search algorithm — In computer science, a search algorithm, broadly speaking, is an algorithm that takes a problem as input and returns a solution to the problem, usually after evaluating a number of possible solutions. Most of the algorithms studied by computer… …   Wikipedia

  • Jump search — In computer science, a jump search or block search refers to a search algorithm for ordered lists. It works by first checking all items L km , where k in mathbb{N} and m is the block size, until an item is found that is larger than the search key …   Wikipedia

  • Ternary search tree — In computer science, a ternary search tree (trie,TST) is a ternary (three way) tree data structure which combines the time efficiency of digital tries with the space efficiency of binary search trees. The resulting structure is faster than… …   Wikipedia

  • Ternary search — A ternary search algorithm is a computer science technique for finding the minimum or maximum of a function that is either strictly increasing and then strictly decreasing or vice versa. A ternary search determines either that the minimum or… …   Wikipedia

  • Cuckoo search — (CS) is an optimization algorithm developed by Xin she Yang and Suash Deb in 2009.[1][2] It was inspired by the obligate brood parasitism of some cuckoo species by laying their eggs in the nests of other host birds (of other species). Some host… …   Wikipedia

  • Lebesgue constant (interpolation) — For other uses, see: Lebesgue constant. In mathematics, the Lebesgue constants (depending on a set of nodes and of its size) give an idea of how good the interpolant of a function (at the given nodes) is in comparison with the best polynomial… …   Wikipedia

  • Nearest-neighbor interpolation — (blue lines) in one dimension on a (uniform) dataset (red points) …   Wikipedia

  • Successive parabolic interpolation — is a technique for finding the extremum (minimum or maximum) of a continuous unimodal function by successively fitting parabolas (polynomials of degree two) to the function at three unique points, and at each iteration replacing the oldest point… …   Wikipedia

  • Nearest neighbor search — (NNS), also known as proximity search, similarity search or closest point search, is an optimization problem for finding closest points in metric spaces. The problem is: given a set S of points in a metric space M and a query point… …   Wikipedia

  • List of terms relating to algorithms and data structures — The [http://www.nist.gov/dads/ NIST Dictionary of Algorithms and Data Structures] is a reference work maintained by the U.S. National Institute of Standards and Technology. It defines a large number of terms relating to algorithms and data… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”