Boyer–Moore string search algorithm

Boyer–Moore string search algorithm

The Boyer–Moore string search algorithm is a particularly efficient string searching algorithm, and it has been the standard benchmark for the practical string search literature. [Hume and Sunday (1991) " [Fast String Searching] " SOFTWARE—PRACTICE AND EXPERIENCE, VOL. 21(11), 1221–1248 (NOVEMBER 1991)] It was developed by Bob Boyer and J Strother Moore in 1977. The algorithm preprocesses the target string (key) that is being searched for, but not the string being searched (unlike some algorithms that preprocess the string to be searched and can then amortize the expense of the preprocessing by searching repeatedly). The execution time of the Boyer-Moore algorithm can be sub-linear: it doesn't need to check every character of the string to be searched, but rather skips over some of them. Generally the algorithm gets faster as the key being searched for becomes longer. Its efficiency derives from the fact that with each unsuccessful attempt to find a match between the search string and the text it's searching, it uses the information gained from that attempt to rule out as many positions of the text as possible where the string cannot match.

How the algorithm works

The mismatch "A" in position 5 (3 back from the last letter of the needle) excludes the first 6 of the possible starting positions shown.

The second table is slightly more difficult to calculate: for each value of "i" less than the length of the search string, we must first calculate the pattern consisting of the last "i" characters of the search string, preceded by a "mis"-match for the character before it; then weinitially line it up with the search pattern and determine the least number of characters the partial pattern must be shifted left before the two patterns match. For instance, for the search string ANPANMAN, the table would be as follows: (N signifies any character that is not N)

The amount of shift calculated by the second table is sometimes called the "good suffix shift" [http://www.movsd.com/bm.htm] or "(strong) good suffix rule". The original published Boyer-Moore algorithm [.

Uses

Wikipedia uses a patched version of PHP which uses an implementation of Boyer-Moore to speed up page loading. Wikipedia database developer Domas Mituzas discussed Wikipedia's use of the Boyer-Moore algorithm in a 2007 presentation:

"Fast String Search" is a replacement module for PHP’s strtr() function, which uses a Commentz-Walter–style algorithm for multiple search terms, or the Boyer-Moore algorithm for single search terms. License collisions (GPL code was used for it) do not allow its inclusion in PHP. Using a proper algorithm instead of foreach loops is an incredible boost for some applications. [D. Mituzas, "Wikipedia: Site internals, configuration, code examples and management issues", http://dammit.lt/uc/workbook2007.pdf]

ee also

* Knuth–Morris–Pratt algorithm

References

External links

* [http://www.cs.utexas.edu/~moore/publications/fstrpos.pdf Original article]
* [http://www-sr.informatik.uni-tuebingen.de/~buehler/BM/BM1.html Animation] of the Boyer-Moore algorithm
* [http://www.cs.utexas.edu/users/moore/best-ideas/string-searching/fstrpos-example.html An example of the Boyer-Moore algorithm] from the homepage of J Strother Moore, co-inventor of the algorithm
* [http://www-igm.univ-mlv.fr/%7Elecroq/string/node14.html An explanation of the algorithm (with sample C code)]
* [http://www.cs.nyu.edu/cs/faculty/cole/papers/CHPZ95.ps Cole et al, Tighter lower bounds on the exact complexity of string matching]


Wikimedia Foundation. 2010.

Игры ⚽ Нужно сделать НИР?

Look at other dictionaries:

  • Rabin-Karp string search algorithm — The Rabin Karp algorithm is a string searching algorithm created by Michael O. Rabin and Richard M. Karp in 1987 that uses hashing to find a substring in a text. It is used for multiple pattern matching rather than single pattern matching. For… …   Wikipedia

  • String searching algorithm — String searching algorithms, sometimes called string matching algorithms, are an important class of string algorithms that try to find a place where one or several strings (also called patterns) are found within a larger string or text. Let Σ be… …   Wikipedia

  • Boyer–Moore–Horspool algorithm — In computer science, the Boyer–Moore–Horspool algorithm or Horspool s algorithm is an algorithm for finding substrings in strings. It is a simplification of the Boyer Moore algorithm which is related to the Knuth Morris Pratt algorithm. The… …   Wikipedia

  • Boyer — is a French, English, German and even Turkish last name. Boyer in English comes from bowyer, meaning bow maker or bow seller. In French, it means ox leader , the one leading the ox while plowing. In Turkish, it comes from boy er, boy meaning size …   Wikipedia

  • Moore — Contents 1 People 2 Places 3 Science 4 Programming …   Wikipedia

  • Algoritmo de búsqueda de cadenas Boyer-Moore — El algoritmo de búsqueda de cadenas Boyer Moore es un particularmente eficiente algoritmo de búsqueda de cadenas, y ha sido el punto de referencia estándar para la literatura de búsqueda de cadenas práctica.[1] Fue desarrollado por Bob Boyer y J… …   Wikipedia Español

  • J Strother Moore — (his first name is the alphabetic character J ndash; not an abbreviated J. ) is a computer scientist, and he is a co developer of the Boyer Moore string search algorithm and the Boyer Moore automated theorem prover, Nqthm. A good example of the… …   Wikipedia

  • Knuth–Morris–Pratt algorithm — The Knuth–Morris–Pratt string searching algorithm (or KMP algorithm) searches for occurrences of a word W within a main text string S by employing the observation that when a mismatch occurs, the word itself embodies sufficient information to… …   Wikipedia

  • Robert S. Boyer — Robert Stephen Boyer, aka Bob Boyer, is a professor of computer science, mathematics, and philosophy at The University of Texas at Austin. He and J Strother Moore invented the Boyer Moore string search algorithm, a particularly efficient string… …   Wikipedia

  • Algorithme De Boyer-Moore — L algorithme de Boyer Moore est un algorithme de recherche de sous chaîne particulièrement efficace. Il a été développé par Bob Boyer et J. Strother Moore en 1977. Sommaire 1 Présentation 2 Fonctionnement de l algorithme 2.1 Pré traitement …   Wikipédia en Français

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”