Definite clause grammar

Definite clause grammar

A definite clause grammar (DCG) is a way of expressing grammar, either for natural or formal languages, in a logic programming language such as Prolog. DCGs are usually associated with Prolog, but similar languages such as Mercury also include DCGs. They are called definite clause grammars because they represent a grammar as a set of definite clauses in first-order logic.

The term DCG refers to the specific type of expression in Prolog and other similar languages; not all ways of expressing grammars using definite clauses are considered DCGs. However, all of the capabilities or properties of DCGs will be the same for any grammar that is represented with definite clauses in essentially the same way as in Prolog.

The definite clauses of a DCG can be considered a set of axioms where the validity of a sentence, and the fact that it has a certain parse tree can be considered theorems that follow from these axioms[1]. This has the advantage of making it so that recognition and parsing of expressions in a language becomes a general matter of proving statements, such as statements in a logic programming language.

Contents

History

The history of DCGs is closely tied to the history of Prolog, and the history of Prolog revolves around several researchers in both Marseilles, France, and Edinburgh, Scotland. According to Robert Kowalski, an early developer of Prolog, the first Prolog system was developed in 1972 by Alain Colmerauer and Phillipe Roussel.[2] The first program written in the language was a large natural-language processing system. Fernando Pereira and David Warren at the University of Edinburgh were also involved in the early development of Prolog.

Colmerauer had previously worked on a language processing system called Q-systems that was used to translate between English and French[3]. In 1978, Colmerauer wrote a paper about a way of representing grammars called metamorphosis grammars which were part of the early version of Prolog called Marseille Prolog. In this paper, he gave a formal description of metamorphosis grammars and some examples of programs that use them.

Fernando Pereira and David Warren, two other early architects of Prolog, coined the term definite clause grammar and created the notation for DCGs that is used in Prolog today. They gave credit for the idea to Colmeraur and Kowalski, and they note that DCGs are a special case of Colmeraur's metamorphosis grammars. They introduced the idea in an article called "Definite Clause Grammars for Language Analysis", where they describe DCGs as a "formalism ... in which grammars are expressed clauses of first-order predicate logic" that "constitute effective programs of the programming language Prolog"[4].

Pereira, Warren, and other pioneers of Prolog later wrote about several other aspects of DCGs. Pereira and Warren wrote an article called "Parsing as Deduction", describing things such as how the Earley Deduction proof procedure is used for parsing[5]. Pereira also collaborated with Stuart Sheiber on a book called "Prolog and Natural Language Analysis", that was intended as a general introduction to computational linguistics using logic programming[6].

Extensions

Since DCGs were introduced by Pereira and Warren, several extensions have been proposed. Pereira himself proposed an extension called extraposition grammars (XGs)[7]. This formalism was intended, in part to make it easier to express certain grammatical phenomena, such as left-extraposition. Pereira states, "The difference between XG rules and DCG rules is then that the left-hand side of an XG rule may contain several symbols." This makes it easier to express rules for context-sensitive grammars.

Another, more recent, extension was made by researchers at NEC Corporation called Multi-Modal Definite Clause Grammars (MM-DCGs) in 1995. Their extensions were intended to allow the recognizing and parsing expressions that include non-textual parts such as pictures.[8].

Another extension, called definite clause translation grammars (DCTGs) was described by in 1984[9]. DCTG notation looks very similar to DCG notation; the major difference is that one uses ::= instead of --> in the rules. It was devised to handle grammatical attributes conveniently[10]. The translation of DCTGs into normal Prolog clauses is like that of DCGs, but 3 arguments are added instead of 2.

Example

A basic example of DCGs helps to illustrate what they are and what they look like.

sentence --> noun_phrase, verb_phrase.
noun_phrase --> det, noun.
verb_phrase --> verb, noun_phrase.
det --> [the].
det --> [a].
noun --> [cat].
noun --> [bat].
verb --> [eats].

This generates sentences such as "the cat eats the bat", "a bat eats the cat". One can generate all of the valid expressions in the language generated by this grammar at a Prolog interpreter by typing sentence(X,[]). Similarly, one can test whether a sentence is valid in the language by typing something like sentence([the,bat,eats,the,bat],[]).

Translation into definite clauses

DCG notation is just syntactic sugar for normal definite clauses in Prolog. For example, the previous example could be translated into the following:

sentence(S1,S3) :- noun_phrase(S1,S2), verb_phrase(S2,S3).
noun_phrase(S1,S3) :- det(S1,S2), noun(S2,S3).
verb_phrase(S1,S3) :- verb(S1,S2), noun_phrase(S2,S3).
det([the|X], X).
det([a|X], X).
noun([cat|X], X).
noun([bat|X], X).
verb([eats|X], X).

Difference lists

The arguments to each functor, such as (S1,S3) and (S1,S2) are difference lists; difference lists are a way of representing a list as the difference of two lists. Using Prolog's notation for lists, a list L can be represented with the pair ([L|X],X).

Difference lists are used to represent lists with DCGs for reasons of efficiency. It is much more efficient to concatenate difference lists, in the circumstances that they can be used, because the concatenation of (S1,S2) and (S2,S3) is just (S1,S3).[11]

Non-context-free grammars

In pure Prolog, normal DCG rules with no extra arguments on the functors, such as the previous example, can only express context-free grammars; there is only one argument on the left side of the production. However, context-sensitive grammars can also be expressed with DCGs, by providing extra arguments, such as in the following example:

s --> symbols(Sem,a), symbols(Sem,b), symbols(Sem,c).
symbols(end,_) --> [].
symbols(s(Sem),S) --> [S], symbols(Sem,S).

This set of DCG rules describes the grammar which generates the language that consists of strings of the form anbncn, by structurally representing n.[12]

Representing features

Various linguistic features can also be represented fairly concisely with DCGs by providing extra arguments to the functors.[13] For example, consider the following set of DCG rules:

sentence --> pronoun(subject), verb_phrase.
verb_phrase --> verb, pronoun(object).
pronoun(subject) --> [he].
pronoun(subject) --> [she].
pronoun(object) --> [him].
pronoun(object) --> [her].
verb --> [likes].

This grammar allows sentences like "he likes her" and "he likes him", but not "her likes he" and "him likes him".

Parsing with DCGs

An example parse tree for this grammar.

The main practical use of a DCG is to parse sentences of the given grammar, i.e. to construct a parse tree. This can be done by providing "extra arguments" to the functors in the DCG, like in the following rules:

sentence(s(NP,VP)) --> noun_phrase(NP), verb_phrase(VP).
noun_phrase(np(D,N)) --> det(D), noun(N).
verb_phrase(vp(V,NP)) --> verb(V), noun_phrase(NP).
det(d(the)) --> [the].
det(d(a)) --> [a].
noun(n(bat)) --> [bat].
noun(n(cat)) --> [cat].
verb(v(eats)) --> [eats].

One can now query the interpreter to yield a parse tree of any given sentence:

| ?- sentence(Parse_tree, [the,bat,eats,a,cat], []).
Parse_tree = s(np(d(the),n(bat)),vp(v(eats),np(d(a),n(cat)))) ? ;

Other uses

DCGs can serve as a convenient syntactic sugar to hide certain parameters in code in other places besides parsing applications. In the programming language Mercury, which borrows DCG syntax from Prolog, for example, DCGs are used to hide io__state arguments in I/O code.[14] They are also used in other, similar situations in Mercury.

See also

Notes

  1. ^ Johnson, M. (1994). "Two ways of formalizing grammars". Linguistics and Philosophy 17 (3): 221–240. doi:10.1007/BF00985036. 
  2. ^ Kowalski, R. A.. The early years of logic programming. 
  3. ^ Colmerauer, A. (1978). "Metamorphosis grammars". Natural Language Communication with Computers: 133–189. 
  4. ^ Pereira, F.; D. Warren (1980). Definite clause grammars for language analysis. 
  5. ^ Pereira, F. C. N.; D. H. D. Warren (1983). "Parsing as deduction". Proceedings of the 21st annual meeting on Association for Computational Linguistics. Association for Computational Linguistics Morristown, NJ, USA. pp. 137–144. 
  6. ^ Pereira, F. C. N.; S. M. Shieber (2002). Prolog and natural-language analysis. Microtome Publishing. 
  7. ^ Pereira, F. (1981). "Extraposition grammars". Computational Linguistics 7 (4): 243–256. 
  8. ^ Shimazu, H.; Y. Takashima (1995). "Multimodal definite clause grammar". Systems and Computers in Japan 26 (3). 
  9. ^ Abramson, H. (1984). Definite clause translation grammars. 
  10. ^ Sperberg-McQueen, C. M.. "A brief introduction to definite clause grammars and definite clause translation grammars". http://www.w3.org/People/cmsmcq/2004/lgintro.html#id2628117. Retrieved 2009-04-21. 
  11. ^ Fleck, Arthur. "Definite Clause Grammar Translation". http://www.cs.uiowa.edu/~fleck/dcgTrans.htm. Retrieved 2009-04-16. 
  12. ^ Fisher, J. R.. "Prolog Tutorial -- 7.1". http://www.csupomona.edu/~jrfisher/www/prolog_tutorial/7_1.html. Retrieved 2009-04-16. 
  13. ^ "DCGs give us a Natural Notation for Features". http://www.coli.uni-saarland.de/projects/milca/courses/coal/xhtml/CFG.SEC.DGSWITHFEATURES.xhtml. Retrieved 2009-04-21. 
  14. ^ "Mercury Tutorial: DCG Notation". http://ftp.mercury.cs.mu.oz.au/tutorial/dcgs.html. Retrieved 2009-04-21. 

External links


Wikimedia Foundation. 2010.

Игры ⚽ Поможем сделать НИР

Look at other dictionaries:

  • grammar — I (New American Roget s College Thesaurus) Mode of speaking and writing Nouns 1. grammar; accidence, syntax, analysis, synopsis, praxis, punctuation, syllabi[fi]cation; agreement. See speech, language, writing. 2. a. part of speech; participle;… …   English dictionary for students

  • HEBREW GRAMMAR — The following entry is divided into two sections: an Introduction for the non specialist and (II) a detailed survey. [i] HEBREW GRAMMAR: AN INTRODUCTION There are four main phases in the history of the Hebrew language: the biblical or classical,… …   Encyclopedia of Judaism

  • Romanian grammar — Romanian (technically called Daco Romanian ) shares practically the same grammar and most of the vocabulary and phonological processes with the other three surviving Eastern Romance languages: Aromanian, Megleno Romanian, and Istro Romanian.As a… …   Wikipedia

  • Modern Greek grammar — Main article: Modern Greek The grammar of Standard Modern Greek, as spoken in present day Greece and Cyprus, is basically that of Demotic Greek, but it has also assimilated certain elements of Katharevousa, the archaic, learned variety of Greek… …   Wikipedia

  • Portuguese grammar — Portuguese grammar, the morphology and syntax of the Portuguese language, is similar to the grammar of most other Romance languages especially Galician and the other languages of Iberian Peninsula. It is a synthetic, fusional language. Nouns,… …   Wikipedia

  • Modern Hebrew grammar — is the grammar of the Modern Hebrew language. It is partly analytical, expressing such forms as dative, ablative, and accusative using prepositional particles rather than morphological cases. However, inflection plays a decisive role in the… …   Wikipedia

  • Dutch grammar — series Dutch grammar Dutch verbs Dutch conjugation t kofschip T rules Dutch nouns Dutch declension Gender in Dutch grammar Dutch orthography Dutch dictionary IJ Dutch phonology …   Wikipedia

  • Relative clause — A relative clause is a subordinate clause that modifies a noun. For example, the noun phrase the man who wasn t there contains the noun man , which is modified by the relative clause who wasn t there . In many languages, relative clauses are… …   Wikipedia

  • Danish grammar — This article is part of the series on: Danish language Use: Alphabet Phonology Grammar Other topics …   Wikipedia

  • Functional Grammar — (FG) ist eine linguistische Theorie, die Ende der 1970er Jahre von Simon Cornelis Dik in Amsterdam entwickelt wurde, ausdrücklich als Gegenmodell zum Standard Modell der Transformationsgrammatik von Noam Chomsky. Nach dem Tod Diks 1995 wurde die… …   Deutsch Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”