Speaker recognition

Speaker recognition

:"Voice recognition redirects here. For software that converts speech to text, see Speech recognition."

Speaker recognition is the computing task of validating a user's claimed identity using characteristics extracted from their voices.

There is a difference between "speaker recognition" (recognizing who is speaking) and "speech recognition" (recognizing what is being said). These two terms are frequently confused, as is "voice recognition". Voice recognition is a synonym for speaker, and thus not speech, recognition. In addition, there is a difference between the act of authentication (commonly referred to as speaker verification or speaker authentication) and identification.

"Speaker recognition" has a history dating back some four decades and uses the acoustic features of speech that have been found to differ between individuals. These acoustic patterns reflect both anatomy (e.g., size and shape of the throat and mouth) and learned behavioral patterns (e.g., voice pitch, speaking style). Because speaker verification has earned speaker recognition its classification as a "behavioral biometric."

Verification versus identification

There are two major applications of "speaker recognition" technologies and methodologies. If the speaker claims to be of a certain identity and the voice is used to verify this claim this is called "verification" or "authentication". On the other hand, "identification" is the task of determining an unknown speaker's identity. In a sense "speaker verification" is a 1:1 match where one speaker's voice is matched to one template (also called a "voice print" or "voice model") whereas "speaker identification" is a 1:N match where the voice is compared against N templates.

From a security perspective, identification is different from verification. For example, presenting your passport at border control is a verification process - the agent compares your face to the picture in the document. Conversely, a police officer comparing a sketch of an assailant against a database of previously documented criminals to find the closest match(es) is an identification process.

"Speaker verification" is usually employed as a "gatekeeper" in order to provide access to a secure system (e.g.: telephone banking). These systems operate with the user's knowledge and typically requires their cooperation. "Speaker identification" systems can also be implemented covertly without the user's knowledge to identify talkers in a discussion, alert automated systems of speaker changes, check if a user is already enrolled in a system, etc.

In forensic applications, it is common to first perform a speaker identification process to create a list of "best matches" and then perform a series of verification processes to determine a conclusive match.

Variants of speaker recognition

Each "speaker recognition" system has two phases: Enrollment and verification. During enrollment, the speaker's voice is recorded and typically a number of features are extracted to form a "voice print", "template", or "model". In the verification phase, a speech sample or "utterance" is compared against a previously created voice print. For identification systems, the utterance is compared against multiple voice prints in order to determine the best match(es) while verification systems compare an utterance against a single voice print. Because of the process involved, verification is faster than identification.

"Speaker recognition" systems fall into two categories: text-dependent and text-independent.

If the text must be the same for enrollment and verification this is called text-dependent recognition. In a text-dependent system, prompts can either be common across all speakers (e.g.: a common pass phrase) or unique. In addition, the use of shared-secrets (e.g.: passwords and PINs) or knowledge-based information) can be employed in order to create a multi-factor authentication scenario.

Text-independent systems are most often used for speaker identification as they require very little if any cooperation by the speaker. In this case the text during enrollment and test is different. In fact, the enrollment may happen without the user's knowledge, as in the case for many forensic applications. As text-independent technologies do not compare what was said at enrollment and verification, verification applications tend to also employ speech recognition to determine what the user is saying at the point of authentication.

Technology

The various technologies used to process and store "voice prints" include frequency estimation, hidden Markov models, gaussian mixture models, pattern matching algorithms, neural networks, matrix representation and decision trees. Some systems also use "anti-speaker" techniques, such as cohort models, and world models.

Ambient noise levels can impede both collection of the initial and subsequent voice samples. Noise reduction algorithms can be employed to improve accuracy, but incorrect application can have the opposite effect. Performance degradation can result from changes in behavioral attributes of the voice and from enrollment using one telephone and verification on another telephone ("cross channel"). Integration with two-factor authentication products is expected to increase. Voice changes due to aging may impact system performance over time. Some systems adapt the speaker models after each successful verification to capture such long-term changes in the voice, though there is debate regarding the overall security impact imposed by automated adaptation.

Capture of the biometric is seen as non-invasive. The technology traditionally uses existing microphones and voice transmission technology allowing recognition over long distances via ordinary telephones (wired or wireless).

Technology Vendors

* Agnitio - text-independent and text-dependent technologies for forensic, homeland security and corporate customers
* Nuance - text-dependent technologies for corporate customers
* Persay - text-dependent and text-independent technologies for police and corporate customers
* Voice Vault - hosted text-dependent authentication solutions
* IBM - text-independent technology for enterprise authentication
* Trade Harbor - hosted text-dependent authentication solutions
* VOICE.TRUST - Common Criteria certified text-dependent authentication solutions
* RecoMadeEasy(TM) -- a text-independent, language-independent speaker verification, identification, segmentation, classification and tracking engine by Recognition Technologies, Inc..

ource

* [http://www.itl.nist.gov/div893/biometrics/Biometricsfromthemovies.pdf National Institute of Standards and Technology]
*Elisabeth Zetterholm, Voice Imitation. A Phonetic Study of Perceptual Illusions and Acoustic Success. Phd thesis, Lund University. (2003)

References

External links

* [http://www.SpokenProof.com SpokenProof.com is dedicated to Voice Biometrics and contains extensive technical coverage]
* [http://ditelo.itc.it/people/falavi/IdVe.html Speaker Identification and Verification]
* [http://www.cs.bgu.ac.il/~orlovm/storage/speaker-recognition.pdf Speaker Recognition: A Tutorial] from IEEE, complex
* [http://www.phon.ucl.ac.uk/resource/sfs/wasp.htm Free Voice analyzer and Biometrics voice print displaying software from University College London]
* [http://www.phonelosers.org/pla-radio-episode-17-voice-authentication/ Circumventing Voice Authentication] The PLA Radio podcast recently featured a simple way to fool rudimentary voice authentication systems.


Wikimedia Foundation. 2010.

Игры ⚽ Нужен реферат?

Look at other dictionaries:

  • speaker recognition — /ˈspikə rɛkəgnɪʃən/ (say speekuh rekuhgnishuhn) noun → voice recognition (def. 1) …  

  • Recognition (disambiguation) — Recognition is identification of something already known or acknowledgement of something as valid. The term may have the following specialized meanings.*Recognition (sociology), an acknowledgement of merits. *Recognition (diplomacy), acceptance… …   Wikipedia

  • Recognition of same-sex unions in Poland — Legal recognition of same sex relationships Marriage Argentina Belgium Canada Iceland Netherlands Norway Portugal South Africa Spain Sweden …   Wikipedia

  • Speaker of the United States House of Representatives — Infobox Political post post = Speaker of the House body = Representatives insignia = Seal of the Speaker of the US House of Representatives.svg insigniasize = 100px insigniacaption = Official seal incumbent = Nancy Pelosi incumbentsince = January …   Wikipedia

  • Speaker of the Oklahoma House of Representatives — The Speaker of the Oklahoma House of Representatives is the presiding officer of the lower house of the Oklahoma Legislature, the Oklahoma House of Representatives. The Speaker exercises administrative and procedural functions in the House, but… …   Wikipedia

  • Recognition of same-sex unions in Rhode Island — Legal recognition of same sex relationships Marriage Argentina Belgium Canada Iceland Netherlands Norway Portugal South Africa Spain Sweden …   Wikipedia

  • speaker identification — noun identification of a person from the sound of their voice • Syn: ↑talker identification • Hypernyms: ↑recognition, ↑identification …   Useful english dictionary

  • voice recognition — /ˈvɔɪs rɛkəgnɪʃən/ (say voys rekuhgnishuhn) noun 1. Also, speaker recognition. the identifying by a computer of a particular voice which it has been programmed to learn to recognise with the purpose of allowing the owner of the voice to operate… …  

  • Speech recognition — For the human linguistic concept, see Speech perception. The display of the Speech Recognition screensaver on a PC, in which the character responds to questions, e.g. Where are you? or statements, e.g. Hello. Speech recognition (also known as… …   Wikipedia

  • International recognition of Kosovo — Kosovo This article is part of the series: Politics and government of Kosovo Political status of Kosovo Declaration of independence …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”