Text simplification

Text simplification

Text simplification is an operation used in natural language processing to modify, enhance, classify or otherwise process an existing corpus of human-readable text in such a way that the grammar and structure of the prose is greatly simplified, while the underlying meaning and information remains the same. Text simplification is an important area of research, because natural human languages ordinarily contain complex compound constructions that are not easily processed through automation.

Example

Text Simplification is illustrated with an example. The first sentence contains two relative clauses and one conjoined verb phrase. A text simplification system aims to simplify the first sentence to the second sentence.

* "Also contributing to the firmness in copper, the analyst noted, was a report by Chicago purchasing agents, which precedes the full purchasing agents report that is due out today and gives an indication of what the full report might hold."

* "Also contributing to the firmness in copper, the analyst noted, was a report by Chicago purchasing agents. The Chicago report precedes the full purchasing agents report. The Chicago report gives an indication of what the full report might hold. The full report is due out today."

See also

*Controlled natural language
*Simplified English

External links

Resources

* [http://www1.cs.columbia.edu/~as372/LEC02.pdf An Architecture for a Text Simplification System]
* [http://repository.upenn.edu/cgi/viewcontent.cgi?article=1110&context=ircs_reports Automatic Induction of Rules for Text Simplification]
* [http://www.isi.edu/~marcu/papers/factoids04.pdf Text Simplification for Information-Seeking Applications]


Wikimedia Foundation. 2010.

Игры ⚽ Нужно сделать НИР?

Look at other dictionaries:

  • Simplification de textes — La simplification de textes(TS) est une opération utilisée dedans en traitement automatique du langage naturel pour modifier, augmenter, classifier ou traiter autrement un corpus existant de texte lisible pour l homme de telle manière que la… …   Wikipédia en Français

  • Rongorongo text K — Text K of the rongorongo corpus, also known as the (Small) London tablet, is one of two dozen surviving rongorongo texts, and nearly duplicates the recto of tablet G. Other namesK is the standard designation, from Barthel (1958). Fischer (1997)… …   Wikipedia

  • Ambiguities in Chinese character simplification — Main article: Simplified Chinese characters A relatively small number of Chinese characters known as (Chinese (PRC)): 简繁一对多; (Chinese (Taiwan)): 簡繁一對多 do not have a one to one mapping between their simplified and traditional forms. This is… …   Wikipedia

  • Natural language processing — (NLP) is a field of computer science and linguistics concerned with the interactions between computers and human (natural) languages; it began as a branch of artificial intelligence.[1] In theory, natural language processing is a very attractive… …   Wikipedia

  • Ontology learning — (ontology extraction, ontology generation, or ontology acquisition) is a subtask of information extraction. The goal of ontology learning is to semi automatically extract relevant concepts and relations from a given corpus or other kinds of data… …   Wikipedia

  • Information extraction — In natural language processing, information extraction (IE) is a type of information retrieval whose goal is to automatically extract structured information, i.e. categorized and contextually and semantically well defined data from a certain… …   Wikipedia

  • Semantic gap — The semantic gap characterizes the difference between two descriptions of an object by different linguistic representations, for instance languages or symbols. In computer science, the concept is relevant whenever ordinary human activities,… …   Wikipedia

  • Terminology extraction — Terminology extraction, term extraction, or glossary extraction, is a subtask of information extraction. The goal of terminology extraction is to automatically extract relevant terms from a given corpus.In the semantic web era, a growing number… …   Wikipedia

  • Chinese characters — Unless otherwise specified Chinese text in this article is written in the format (Simplified Chinese / Traditional Chinese; Pinyin). In cases where the Simplified and Traditional Chinese characters are identical, the Chinese term is written only… …   Wikipedia

  • Approximant de Padé — Le concept de l article doit son nom à Henri Padé (1863 1953) un mathématicien français. En mathématiques, et plus précisément en analyse complexe, l approximant de Padé est une méthode d approximation d une fonction analytique par une fonction… …   Wikipédia en Français

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”