Brill tagger

Brill tagger

The Brill tagger is a method for doing part-of-speech tagging. It was described by Eric Brill in his 1993 PhD thesis [http://www.cs.jhu.edu/~brill/dissertation.ps] . It can be summarized as an "error-driven transformation-based tagger". It is
* error-driven in the sense that it recourses to supervised learning
* transformation-based in the sense that a tag is assigned to each word and changed using a set of predefined rules. Note: If the word is known, it first assigns the most frequent tag, or if the word is unknown, it naively assigns the tag "noun" to it. Applying over and over these rules, changing the incorrect tags, a quite high accuracy is achieved.

Algorithm

The algorithm goes as follows:
* Initialisation:
** Known words (in vocabulary): assigning the most frequent tag associated to a form of the word
** Unknown words (out of vocabulary) :
*** Proper noun if capitalised and simple noun else (1992)
*** Learning or guessing rules on the same basis as contextual rules (1994)
* Learning Phase
** Iteratively compute the error score of each candidate rule (difference between the number of errors before and after applying the rule)
** Select the best (higher score) rule.
** Add it to the rule set and apply it to the text.
** Repeat until no rule has a score above a given threshold (that is, until applying new rules leaves the text in the same state, which is then supposed to be the final state of the tagging).

Rules

Lexical rules are used for the initialisation, and contextual rules are used to correct the tags.
*Lexical rules: "word" → "tag" IF "Condition" (example: identification of suffixes like "-tion")
*Contextual rules: "tag1" → "tag2" IF "Condition" (example: "preceding/following tag is "X", "preceding/following word is "w")

Code

Brill's code pages at John Hopkins University are no longer on the web. A mirror of the Brill tagger at its latest version is available at Plymouth Tech, here. [http://www.tech.plym.ac.uk/soc/staff/guidbugm/software/RULE_BASED_TAGGER_V.1.14.tar.Z]


Wikimedia Foundation. 2010.

Игры ⚽ Поможем сделать НИР

Look at other dictionaries:

  • Brill (surname) — Brill is a fairly uncommon family name, most seen in the Dutch language. People with the surname Brill include: * Abraham Brill, Austrian born American psychiatrist and disciple of Jung * Charlie Brill, American actor * Debbie Brill, Canadian… …   Wikipedia

  • Tagger — may refer to:* Brill tagger, a method for doing part of speech tagging, or grammatical tagging * Tag editor, software that supports editing metadata of multimedia file formats * A position in Australian rules football * A type of graffiti… …   Wikipedia

  • Brill (disambiguation) — Placename: * Brill, Buckinghamshire, England is a village known for a nearby palace built by King Edward the Confessor. * Brill is a small village to the west of Constantine in the Cornish District of Kerrier, UK. * Brielle, The Netherlands… …   Wikipedia

  • Eric Brill — Dr. Eric Brill is a computer scientist specializing in Natural Language Processing. He is famous for his Brill Tagger, a supervised part of speech tagger.External links* [http://research.microsoft.com/users/brill/ Eric Brill s home page] …   Wikipedia

  • Part-of-speech tagging — (POS tagging or POST), also called grammatical tagging or word category disambiguation, is the process of marking up the words in a text as corresponding to a particular part of speech, based on both its definition, as well as its context i.e.,… …   Wikipedia

  • Part-of-speech Tagging — Unter Part of speech Tagging versteht man die Zuordnung von Wörtern und Satzzeichen eines Textes zu Wortarten (engl. part of speech). Hierzu wird sowohl die Definition des Wortes als auch der Kontext (z.B. angrenzende Adjektive oder Nomen)… …   Deutsch Wikipedia

  • General Architecture for Text Engineering — Infobox Software name = GATE caption = General Architecture for Text Engineering. developer = [http://gate.ac.uk/ GATE research team] , Dept. Computer Science, University of Sheffield released = 1996 frequently updated = yes programming language …   Wikipedia

  • Liste deutschsprachiger Schriftsteller/B — Deutschsprachige Schriftsteller   A B C D E F G H I …   Deutsch Wikipedia

  • List of important publications in computer science — This is a list of important publications in computer science, organized by field. Some reasons why a particular publication might be regarded as important: Topic creator – A publication that created a new topic Breakthrough – A publication that… …   Wikipedia

  • EXtended WordNet — The eXtended WordNet is a project at the University of Texas at Dallas (and funded by the National Science Foundation) which aims to improve WordNet by semantically parsing the glosses, thus making the information contained in these definitions… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”