CMU Sphinx

CMU Sphinx

CMU Sphinx sometimes simply known as Sphinx is the general term to describe a group of speech recognition systems developed at Carnegie Mellon University. These includes a series of speech recognizers (Sphinx 2 - 4) and an acoustic model trainer (SphinxTrain).

In 2000, the Sphinx group at Carnegie Mellon committed to open source several speech recognizer components, including Sphinx 2 and later Sphinx 3 (in 2001). More generally CMU Sphinx refers to a group of open source systems related to speech recognition. For the most part, the speech decoders are perhaps the best known, since they come with acoustic models and sample applications. The available resources include in addition software for acoustic model training, language model compilation and a public-domain pronunciation dictionary, cmudict (see the project [http://cmusphinx.org/ home page] for details).

CMU Sphinx is perhaps the only open source, large vocabulary, continuous speech recognition project that consistently releases its work under the liberal BSD-license. Sphinx encompasses a number of distinct software systems, described below.

phinx

Sphinx was the first continuous-speech, speaker-independent recognition system making use of hidden Markov acoustic models (HMMs) and an n-gram statistical language model. It was developed by Kai-Fu Lee (who went on to establish speech recognition efforts at Apple and at Microsoft). Sphinx was significant in that it demonstrated the feasibility of continuous-speech, speaker-independent large-vocabulary recognition, the possibility of which was in dispute at the time (1986). Sphinx is of historical interest only; it has been superseded in performance by subsequent versions. An [http://www.ri.cmu.edu/pub_files/pub2/lee_k_f_1990_1/lee_k_f_1990_1.pdf archival article] describes the system in detail.

phinx 2

A fast performance-oriented recognizer, originally developed by X-D Huang at Carnegie Mellon and released as Open source with a BSD-style license on SourceForge by Kevin Lenzo at LinuxWorld in 2000. Sphinx 2 focuses on real-time recognition suitable for spoken language applications. As such it incorporates functionality such as end-pointing, partial hypothesis generation, dynamic language model switching and so on. It is widely used in dialog systems and language learning systems. It can be used in computer based PBX systems such as Asterisk. Sphinx 2 code has also been incorporated into a number of commercial products. It is no longer under active development (other than for routine maintenance). Current real-time decoder development is taking place in the Pocket Sphinx project. An [http://citeseer.ist.psu.edu/cache/papers/cs/8141/http:zSzzSzwww.dcs.shef.ac.ukzSz~ljupcozSzpaperszSzCMU-CS-92-112.pdf/huang92sphinxii.pdf archival article] describes the system.

phinx 3

Sphinx 2 used a "semi-continuous" representation for acoustic modeling (i.e., a single set of Gaussians is used for all models, with individual models represented as a weight vector over these Gaussians). Sphinx 3 adopted the prevalent "continuous" HMM representation and has been used primarily for high-accuracy, non-real-time recognition. Recent developments(in algorithms and in hardware) have made Sphinx 3 "near" real-time, although not yet suitable for critical interactive applications. Sphinx 3 is under active development and in conjunction with SphinxTrain provides access to a number of modern modeling techniques, such as LDA/MLLT, MLLR and VTLN, that improve recognition accuracy (see the article on Speech Recognition for descriptions of these techniques).

phinx 4

Sphinx 4 is a complete re-write of the Sphinx engine with the goal of providing a more flexible framework for research in speech recognition, written entirely in the Java programming language. Sun Microsystems supported the development of Sphinx 4 and contributed software engineering expertise to the project. Participants included individuals at MERL, MIT and CMU.

Current development goals include:

* developing a new (acoustic model) trainer
* implementing speaker adaptation (eg MLLR)
* improving configuration management
* creating a graph-based UI for graphical system design

PocketSphinx

A version of Sphinx that can be used in embedded systems (e.g., based on an ARM processor). Pocket Sphinx is under active development and incorporates features such as fixed-point arithmetic and efficient algorithms for GMM computation.

ee also

* List of speech recognition software

External links

* [http://cmusphinx.org/ CMU Sphinx homepage]
* [http://sphinx.subwiki.com Sphinx subwiki] - Getting started tutorials + python integration information.
* [http://sourceforge.net/projects/cmusphinx SourceForge] hosts Sphinx software and should be considered the definitive source for code.
* [http://ftp.nice.ch/peanuts/GeneralData/Documents/user-groups/OnCampus/NOCFall90/NOCFall90Text.ps.gz NeXT on Campus Fall 1990] (This document is postscript format compressed with gzip.) "Carnegie Mellon University - Breakthroughs in speech recognition and document management", pgs. 12-13


Wikimedia Foundation. 2010.

Игры ⚽ Нужна курсовая?

Look at other dictionaries:

  • CMU Pronouncing Dictionary — Developer(s) Carnegie Mellon University Stable release 0.7a / February 18, 2008; 3 years ago (2008 02 18) Development status Maintained …   Wikipedia

  • Sphinx (disambiguation) — Sphinx or Sphynx can refer to: *Sphinx, an iconic image of a recumbent lion with a human head in Egyptian and Greek mythology **Great Sphinx of Giza, a giant Sphinx statue near the Great Pyramids Entertainment and fiction * Sphinx (film) , an… …   Wikipedia

  • Sphinx — puede referirse: a los monumentos Esfinge; a la banda Sphinx de Cádiz, España; al gato doméstico tipo Sphynx; al motor de búsqueda para SQL, distribuido bajo licencia GPL v2, llamado Sphinx; al reconocedor de voz CMU Sphinx; [1] la película… …   Wikipedia Español

  • Sphinx — Cette page d’homonymie répertorie les différents sujets et articles partageant un même nom.  Pour l’article homophone, voir Sphynx. Sur les autres projets Wikimedia : « sphinx » …   Wikipédia en Français

  • Le Sphinx — Sphinx Cette page d’homonymie répertorie les différents sujets et articles partageant un même nom.  Pour l’article homophone, voir Sphynx. Le Sphinx est une créature fantastique présente dans plusieurs traditions mythologiques,… …   Wikipédia en Français

  • Le sphinx — Sphinx Cette page d’homonymie répertorie les différents sujets et articles partageant un même nom.  Pour l’article homophone, voir Sphynx. Le Sphinx est une créature fantastique présente dans plusieurs traditions mythologiques,… …   Wikipédia en Français

  • Acoustic Model — An acoustic model is created by taking audio recordings of speech, and their text transcriptions, and using software to create statistical representations of the sounds that make up each word. It is used by a speech recognition engine to… …   Wikipedia

  • Project LISTEN — (Literacy Innovation that Speech Technology ENables) at Carnegie Mellon University, is a National Science Foundation and Heinz Endowment funded project to create a reading tutor that listens. The project is headed by David Jack Mostow, Ph.D. It… …   Wikipedia

  • Julius (Software) — Julius Maintainer Lee Akinobu (Nagoya Institute of Technology) Aktuelle Version 4.1.5 (4. Juni 2010) Betriebssystem Unix artige (GNU/Linux, BSD etc.), Windows (über Cygwin) …   Deutsch Wikipedia

  • List of open source software packages — This is a list of open source software packages: computer software licensed under an open source license. Software that fits the Free software definition may be more appropriately called free software; the GNU project in particular objects to… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”