Lexical similarity

Lexical similarity

In linguistics, lexical similarity is a measure of the degree to which the word sets of two given languages are similar. A lexical similarity of 1 (or 100%) would mean a total overlap between vocabularies, whereas 0 means there are no common words.

There are different ways to define the lexical similarity and the results vary accordingly. For example, Ethnologue's method of calculation consists in comparing a standardized set of wordlists and counting those forms that show similarity in both form and meaning. Using such a method, English was evaluated to have a lexical similarity of 60% with German and 27% with French.

Lexical similarity can be used to evaluate the degree of genetic relationship between two languages. Percentages higher than 85% usually indicate that the two languages being compared are likely to be related "dialects".

The lexical similarity is only an indication of the mutual intelligibility of the two languages, since the latter also depends on the degree of phonetical, morphological, and syntactical similarity. It is worth noting that the variations due to differing wordlists weigh on this- for example, lexical similarity between French and English is considerable in lexical fields relating to culture, etc., whereas their similarity is smaller as far as basic (function) words are concerned. Unlike mutual intelligibility, lexical similarity can only be symmetrical.

Indo-European languages

The table below shows some lexical similarity values for pairs of select Romance, Germanic, and Slavic languages, as collected and published by Ethnologue. Fact|date=October 2008

"Notes:"
*Language codes are from standard ISO 639-3.
*Ethnologue does not specify for which Sardinian variety was the lexical similarity calculated.

References

* [http://www.ethnologue.com/web.asp "Ethnologue.com"] (lexical similarity values available at some of the individual language entries)
* [http://www.ethnologue.com/ethno_docs/introduction.asp Definition of lexical similarity at "Ethnologue.com"]
*Rensch, Calvin R. 1992. "Calculating lexical similarity." In Eugene H. Casad (ed.), "Windows on bilingualism ", 13-15. (Summer Institute of Linguistics and the University of Texas at Arlington Publications in Linguistics, 110). Dallas: Summer Institute of Linguistics and the University of Texas at Arlington.

ee also

*Lexis (linguistics)
*Vocabulary
*Language family
*Dialect


Wikimedia Foundation. 2010.

Игры ⚽ Поможем написать курсовую

Look at other dictionaries:

  • Mass lexical comparison — or mass comparison is a highly controversial method developed by the well known linguist Joseph Greenberg to find genetic relationships among languages in the remote past, which he considered unsuitable for the mainstream comparative method, or… …   Wikipedia

  • Semantic similarity — or semantic relatedness is a concept whereby a set of documents or terms within term lists are assigned a metric based on the likeness of their meaning / semantic content. Concretely, this can be achieved for instance by defining a topological… …   Wikipedia

  • Quantitative comparative linguistics — is a branch of comparative linguistics that applies mathematical models to the problem of classifying language relatedness. This includes the use of computational phylogenetics and cladistics to define an optimal tree (or network) to represent a… …   Wikipedia

  • Varieties of Chinese — Chinese Geographic distribution: mainland China, Hong Kong, Macau, Taiwan, Singapore and other areas with historic immigration from China. Linguistic classification: Sino Tibetan Sinitic …   Wikipedia

  • Spanish language — Castellano and Español redirect here. For the village in Italy, see Castellano, Trentino. For people with the surname Castellano, see Castellano (surname). Castilian castellano Pronunciation [kasteˈʎano] Spoken in …   Wikipedia

  • Comparative linguistics — Linguistics …   Wikipedia

  • Second language phonology — Second language (L2) phonology is different from first language (L1) phonology in various ways. The differences are considered to come from general characteristics of L2, such as slower speech rate (Derwing and Munro, 1997) and lower proficiency… …   Wikipedia

  • Romanian language — Not to be confused with Romani language. Romanian, Daco Romanian română, limba română Pronunciation [roˈmɨnə] Spoken in By a majority …   Wikipedia

  • Sign language — Two men and a woman signing. A sign language (also signed language) is a language which, instead of acoustically conveyed sound patterns, uses visually transmitted sign patterns (manual communication, body language) to convey meaning… …   Wikipedia

  • Automatic summarization — is the creation of a shortened version of a text by a computer program. The product of this procedure still contains the most important points of the original text. The phenomenon of information overload has meant that access to coherent and… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”