Collocation extraction


Collocation extraction

Collocation extraction is the task of extracting collocations automatically from a corpus using a computer.

Within the area of corpus linguistics, collocation is defined as a sequence of words or terms which co-occur more often than would be expected by chance. 'Crystal clear', 'middle management', 'nuclear family', and 'cosmetic surgery' are examples of collocated pairs of words. Some words are often found together because they make up a compound noun, for example 'riding boots' or 'motor cyclist'.

The traditional method of performing collocation extraction is to find a formula based on the statistical quantities of those words to calculate a score associated to every word pairs. Proposed formulas are mutual information, t-test, z test, chi-square test and likelihood ratio.[1]

See also

External links

References

  1. ^ Manning, C. D.; Schütze, H. (1999). Foundations of statistical natural language processing. Cambridge, MA: MIT Press. ISBN 978-0262133609. http://nlp.stanford.edu/fsnlp/. 

Wikimedia Foundation. 2010.

Look at other dictionaries:

  • Collocation — This article is about the corpus linguistics notion. For other uses, see Colocation (disambiguation). In corpus linguistics, collocation defines a sequence of words or terms that co occur more often than would be expected by chance. In… …   Wikipedia

  • Collostructional analysis — is a family of methods developed by (in alphabetical order) Stefan Th. Gries (University of California, Santa Barbara) and Anatol Stefanowitsch (University of Bremen). Collostructional analysis aims at measuring the degree of attraction or… …   Wikipedia

  • Optimal control — theory, an extension of the calculus of variations, is a mathematical optimization method for deriving control policies. The method is largely due to the work of Lev Pontryagin and his collaborators in the Soviet Union[1] and Richard Bellman in… …   Wikipedia

  • Société de Fiennes — Création 10 décembre 1837 Disparition 30 mai 1870 Siège social …   Wikipédia en Français

  • Data Intensive Computing — is a class of parallel computing applications which use a data parallel approach to processing large volumes of data typically terabytes or petabytes in size and typically referred to as Big Data. Computing applications which devote most of their …   Wikipedia

  • Polonium — (] [cite journal author = Momoshima N., Song L.X., Osaki S.,Maeda Y., title = Biologically induced Po emission from fresh water journal =J Environ Radioact. year = 2002 volume = 63 issue = 2 pages = 187–197 doi =10.1016/S0265 931X(02)00028 0]… …   Wikipedia

  • WordNet — is a lexical database for the English language.[1] It groups English words into sets of synonyms called synsets, provides short, general definitions, and records the various semantic relations between these synonym sets. The purpose is twofold:… …   Wikipedia

  • Allocation Universelle — L Allocation universelle désigne le versement d’un revenu unique à tous les citoyens d un pays, quels que soient leurs revenus, leur patrimoine, et leur statut professionnel : ce revenu permettrait à chaque individu de satisfaire ses besoins …   Wikipédia en Français

  • Borgerlon — Allocation universelle L Allocation universelle désigne le versement d’un revenu unique à tous les citoyens d un pays, quels que soient leurs revenus, leur patrimoine, et leur statut professionnel : ce revenu permettrait à chaque individu de …   Wikipédia en Français

  • Dividende territorial — Allocation universelle L Allocation universelle désigne le versement d’un revenu unique à tous les citoyens d un pays, quels que soient leurs revenus, leur patrimoine, et leur statut professionnel : ce revenu permettrait à chaque individu de …   Wikipédia en Français


Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”

We are using cookies for the best presentation of our site. Continuing to use this site, you agree with this.