Lemma (linguistics)

Lemma (linguistics)

In linguistics a lemma (plural "lemmas" or "lemmata") has two distinct interpretations:
# morphology / lexicography: the canonical form or citation form of a set of forms (headword); e.g. in English, "run", "runs", "ran" and "running" are forms of the same lexeme, with "run" as the lemma.
# psycholinguistics: Abstract conceptual form that has been mentally selected for utterance in the early stages of speech production, but before any sounds are attached to it.

A lemma in morphology is the canonical form of a lexeme. "Lexeme", in this context, refers to the set of all the forms that have the same meaning, and "lemma" refers to the particular form that is chosen by convention to represent the lexeme. In lexicography, this unit is usually also the "citation form" or headword by which it is indexed. Lemmas have special significance in highly inflected languages such as Czech. The process of determining the "lemma" for a given word is called lemmatisation.

The psycholinguistics interpretation refers to one of the more widely accepted psycholinguistic models of speech production, referring to an early stage in the mental preparation for an utterance. Here, "lemma" is the abstract form of a word that arises after the word has been selected mentally, but before any information has been accessed about the sounds in it (and thus before the word can be pronounced). It therefore contains information concerning only meaning and the relation of this word to others in the sentence. This notion of lemma is similar to the Sanskrit sphota (6th c.), an invariant mental word, of which the sound is a feature.

Morphology / Lexicography

In a dictionary, the lemma "go" represents the inflected forms "go", "goes", "going", "went", and "gone". The relationship between an inflected form and its lemma is usually denoted by an angle bracket, e.g. "went" < "go". The disadvantage of such simplifications is, of course, the inability to look up a declined or conjugated form of the word, although some dictionaries, like Webster's, will list "went". Multilingual dictionaries vary in how they deal with this issue: the Langenscheidt dictionary of German does not list "ging" (< "gehen"); the Cassell does.

The form that is chosen to be the lemma is usually the least marked form, though there are occasional exceptions; e.g. in Finnish, the dictionaries lists verbs not under the verb root, but under the first infinitive marked with "-(t)a", "-(t)ä".

Lemmas or word stems are used often in corpus linguistics for determining word frequency. In such usage the specific definition of "lemma" is flexible depending on the task it is being used for.

Lemmas in different languages

In English, the citation form of a noun is the singular: e.g. "mouse" rather than "mice". For multi-word lexemes which contain possessive adjectives or reflexive pronouns, the citation form uses a form of the indefinite pronoun "one": e.g. "do one's best", "perjure oneself". In languages with grammatical gender, the citation form of regular adjectives and nouns is usually the masculine singular. If the language additionally has cases, the citation form is often the masculine singular nominative.

In many languages, the citation form of a verb is the infinitive: French "aller", German "gehen". In English it usually the full infinitive ("to go"), but the bare infinitive for some defective verbs ("must"). In Latin and Greek, however, the first person singular present tense is normally used, though occasionally the infinitive may also be seen, and in Japanese the non-past (present and future) tense is used. (For contracted verbs in Greek, an uncontracted first person singular present tense is used to reveal the contract vowel, e.g. φιλέω "philéō" for φιλῶ "philō" "I love" [implying affection] ; αγαπάω "agapáō" for αγαπῶ "agapō" "I love" [implying regard] ).

In Arabic, which has no infinitives, the third person singular masculine of the past tense is the least-marked form, and is used for entries in modern dictionaries. In older dictionaries, which are still commonly used today, the triliteral of the word, either a verb or a noun, is used. Hebrew often uses the 3rd person masculine "qal" perfect, e.g. ברא "bara' " create, כפר "kaphar" deny. For Korean, "-da" is attached to the stem.

Some phrases are cited in a sort of lemma, e.g. "Carthago delenda est" (literally, "Carthage must be destroyed") is a common way of citing Cato, although what he said was more like, "Ceterum censeo Carthaginem esse delendam" ("As to the rest, I hold that Carthage must be destroyed").

Psycholinguistics

When we produce a word, we are essentially turning our thoughts into sounds (a process known as lexicalisation). In many psycholinguistic models this is considered to be at least a two-stage process. The lemma is thus intermediate between the semantic level (where meaning is specified) and the phonological level (where the sounds of the word are specified). It is an abstract form containing syntactic information (about how the word can be used in a sentence), but no information about the pronunciation of the word. In this context, the lexeme is the phonologically specified form that is selected after the lemma.

This two-staged model is the most widely supported theory of speech production in psycholinguistics [Harley, T. (2005) "The Psychology of Language." Hove; New York: Psychology Press: 359] , although it has been recently challenged. [e.g. Caramazza, A. (1997) How many levels of processing are there in lexical access? "Cognitive Neuropsychology", 14, 177-208.] For example, there is some evidence to indicate that the grammatical gender of a noun is retrieved from the word's phonological form (the lexeme) rather than from the lemma. [e.g. Starreveld, P. A. and La Heij, W. (2004) Phonological facilitation of grammatical gender retrieval. "Language and Cognitive Processes", 19 (6), 677-711.] This is easily explained by Caramazza's Independent Network model, which does not assume a distinct level between the semantic and the phonological stages (so there is no lemma representation); in this model, syntactic information about the word in this model is activated in the semantic or phonological level (so gender would be activated in the latter). [Caramazza (1997)]

ee also

* Linguistics
* Corpus linguistics
* Morphology
* Psycholinguistics
* Markedness
* Principal parts
* Root (linguistics)
* Null morpheme
* Lemmatisation
* Lexeme
* Uninflected word
* lexical markup framework

References

External links

* [http://torvald.aksis.uib.no/corpora/1999-4/0038.html Lemma vs lexeme]


Wikimedia Foundation. 2010.

Игры ⚽ Нужно сделать НИР?

Look at other dictionaries:

  • Lemma — may refer to: * Lemma (mathematics), a proven statement used as a stepping stone toward the proof of another statement * Lemma (linguistics), the canonical form of a word * Lemma (logic), which is simultaneously a premise for a contention above… …   Wikipedia

  • Lemma (logic) — In informal logic and argument mapping, a lemma is simultaneously a contention for premises below it and a premise for a contention above it. See also * Co premise * Objection * Inference objection* Lemma (mathematics) * Lemma (linguistics) …   Wikipedia

  • Lemma (morphology) — In morphology and lexicography, a lemma (plural lemmas or lemmata) is the canonical form, dictionary form, or citation form of a set of words (headword). In English, for example, run, runs, ran and running are forms of the same lexeme, with run… …   Wikipedia

  • lemma — UK [ˈlemə] / US noun [countable] Word forms lemma : singular lemma plural lemmas or lemmata UK [ˈlemətə] / US 1) a philosophical statement that you accept as true in order to find out whether another statement is true 2) linguistics a headword in …   English dictionary

  • lemma — I. /ˈlɛmə/ (say lemuh) noun (plural lemmas or lemmata /ˈlɛmətə/ (say lemuhtuh)) 1. a subsidiary proposition introduced in proving some other proposition; a helping theorem. 2. Mathematics a statement which has been proved, which can then be used… …  

  • List of linguistics topics — Linguistics is the scientific study of human language. Someone who engages in this study is called a linguist. See also the List of basic linguistics topics, the List of phonetics topics, the List of linguists, and the List of cognitive science… …   Wikipedia

  • Morphology (linguistics) — For other uses, see Morphology. Linguistics …   Wikipedia

  • Outline of linguistics — See also: Index of linguistics articles The following outline is provided as an overview of and topical guide to linguistics: Linguistics is the scientific study of natural language. Someone who engages in this study is called a linguist.… …   Wikipedia

  • Root (linguistics) — The root word is the primary lexical unit of a word, and of a word family (root is then called base word), which carries the most significant aspects of semantic content and cannot be reduced into smaller constituents. Content words in nearly all …   Wikipedia

  • Marker (linguistics) — In linguistics, a marker is a free or bound morpheme that indicates the grammatical function of the marked word, phrase, or sentence. In analytic languages and agglutinative languages, markers are generally easily distinguished. In fusional… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”