Statistical parsing

Statistical parsing

Statistical parsing is a group of parsing methods within natural language processing. The methods have in common that they associate grammar rules with a probability. Grammar rules are traditionally viewed in computational linguistics as defining the valid sentences in a language. Within this mindset, the idea of associating each rule with a probability then provides the relative frequency of any given grammar rule and, by deduction, the probability of a complete parse for a sentence. (The probability associated with a grammar rule may be induced, but the application of that grammar rule within a parse tree and the computation of the probability of the parse tree based on its component rules is a form of deduction.) Using this concept, statistical parsers make use of a procedure to search over a space of all candidate parses, and the computation of each candidate's probability, to derive the most probable parse of a sentence. The expectation maximization algorithm is one popular method of searching for the most probable parse.

"Search" in this context is an application of the very useful search algorithm in artificial intelligence.

By way of example, think about the sentence "The can can hold water." A reader would instantly see that there is an object called "The can" and that this object is performing the action 'can' (i.e. is able to); and the thing the object is able to do is "hold"; and the thing the object is able to hold is "water". Using more linguistic terminology, "The can" is a noun phrase composed of a determiner followed by a noun, and "can hold water" is a verb phrase which is itself composed of a verb followed by a verb phrase. But is this the only interpretation of the sentence? Certainly "The can can" is a perfectly valid noun-phrase referring to a type of dance, and "hold water" is also a valid verb-phrase, although the coerced meaning of the combined sentence is non-obvious. This lack of meaning is not seen as a problem by most linguists (for a discussion on this point, see Colorless green ideas sleep furiously) but from a pragmatic point of view it is desirable to obtain the first interpretation rather than the second and statistical parsers achieve this by ranking the interpretations based on their probability.

(In this example I have made various assumptions about the grammar, such as a simple left-to-right derivation rather than head-driven, its use of noun-phrases rather than the currently fashionable determiner-phrases, and no type-check preventing a concrete noun being combined with an abstract verb phrase. None of these assumptions affect the thesis of my argument and a comparable argument can be made using any other grammatical formalism.)

There are a number of methods that statistical parsing algorithms frequently use. While few algorithms will use all of these they give a good overview of the general field. Most statistical parsing algorithms are based on a modified form of chart parsing. The modifications are necessary to support an extremely large number of grammatical rules and therefore search space, and essentially involve applying classical artificial intelligence algorithms to the traditionally exhaustive search. Some examples of the optimisations are only searching a likely subset of the search space (stack search), for optimising the search probability (Baum-Welch algorithm) and for discarding parses that are too similar to be treated separately (Viterbi algorithm).

Notable people in statistical parsing

* Eugene Charniak Author of Statistical techniques for natural language parsing [http://www.cs.brown.edu/people/ec/home.html] amongst many other contributions
* Fred Jelinek Applied and developed numerous techniques from Information Theory to build the field [http://www.clsp.jhu.edu/people/jelinek/]
* David Magerman Major contributor to turning the field from theoretical to practical by managing data [http://www-cs-students.stanford.edu/~magerman/]
* James Curran Applying the MaxEnt alogrithm, word representation, and other contributions [http://www.it.usyd.edu.au/about/people/staff/james.shtml]
* Michael Collins (computational linguist) First very high performance statistical parser [http://people.csail.mit.edu/mcollins/]
* Joshua Goodman Hypergraphs, and other generalizations between different methods [http://research.microsoft.com/~joshuago/]

ee also

*Stochastic context-free grammar


Wikimedia Foundation. 2010.

Игры ⚽ Поможем решить контрольную работу

Look at other dictionaries:

  • Parsing — In computer science and linguistics, parsing, or, more formally, syntactic analysis, is the process of analyzing a sequence of tokens to determine their grammatical structure with respect to a given (more or less) formal grammar.Parsing is also… …   Wikipedia

  • Stochastic context-free grammar — A stochastic context free grammar (SCFG; also probabilistic context free grammar, PCFG) is a context free grammar in which each production is augmented with a probability. The probability of a derivation (parse) is then the product of the… …   Wikipedia

  • Stochastic grammar — A stochastic grammar (statistical grammar) is a grammar framework with a probabilistic notion of grammaticality: *Stochastic context free grammar *Statistical parsing *Data oriented parsing *Hidden Markov model *Estimation theoryStatistical… …   Wikipedia

  • List of statistics topics — Please add any Wikipedia articles related to statistics that are not already on this list.The Related changes link in the margin of this page (below search) leads to a list of the most recent changes to the articles listed below. To see the most… …   Wikipedia

  • Michael Collins (computational linguist) — Michael J. Collins Born March 4, 1971 (1971 03 04) (age 40) Sheffield …   Wikipedia

  • Viterbi algorithm — The Viterbi algorithm is a dynamic programming algorithm for finding the most likely sequence of hidden states ndash; called the Viterbi path ndash; that results in a sequence of observed events, especially in the context of Markov information… …   Wikipedia

  • Co-training — is a machine learning algorithm used when there are only small amounts of labeled data and large amounts of unlabeled data. One of its uses is in text mining for search engines. It was introduced by Avrim Blum and Tom Mitchell in 1998. Contents 1 …   Wikipedia

  • Natural language processing — (NLP) is a field of computer science and linguistics concerned with the interactions between computers and human (natural) languages; it began as a branch of artificial intelligence.[1] In theory, natural language processing is a very attractive… …   Wikipedia

  • Joshua MT Tool — is an open source tool for statistical machine translation which is parsing based. The toolkit achieves state of the art translation performance on the French English translation task. Contenido 1 History 2 Goals 3 Main functions implemented in… …   Wikipedia Español

  • N-gram — An n gram is a sub sequence of n items from a given sequence. n grams are used in various areas of statistical natural language processing and genetic sequence analysis. The items in question can be phonemes, syllables, letters, words or base… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”