Path: senator-bedfellow.mit.edu!faqserv
From: [email protected] (Andrew Hunt)
Newsgroups: comp.speech,comp.answers,news.answers
Subject: comp.speech Frequently Asked Questions - part 3/3
Supersedes: <comp-speech-faq/[email protected]>
Followup-To: comp.speech
Date: 12 Jul 1998 12:00:30 GMT
Organization: Speech Applications Group, Sun Microsystems Laboratories
Lines: 4577
Approved: [email protected]
Expires: 23 Aug 1998 12:00:04 GMT
Message-ID: <comp-speech-faq/[email protected]>
References: <comp-speech-faq/[email protected]>
Reply-To: [email protected] (Andrew Hunt)
NNTP-Posting-Host: penguin-lust.mit.edu
Summary: Information on Speech Technology
X-Last-Updated: 1998/07/08
Originator: [email protected]
Xref: senator-bedfellow.mit.edu comp.speech:18457 comp.answers:32123 news.answers:134644

Archive-name: comp-speech-faq/part3
Last-modified: 1998/07/06
URL: http://www.speech.su.oz.au/comp.speech/

                  COMP.SPEECH FAQ POSTING - PART 3/3


[Note: this document has been automatically extracted from a WWW site:
       http://www.speech.su.oz.au/comp.speech/
This may introduce some formatting errors.]


                             Speech Synthesis

                        comp.speech FAQ Section 5

         * SpeechLinks: Speech Synthesis
         * Q5.1: What is speech synthesis?
         * Q5.2: How can speech synthesis be performed?
         * Q5.3: References/Books on Synthesis
         * Q5.4: Speech Synthesis on the WWW
         * Q5.5: Speech Synthesis Software/Hardware


___________________________________________________________________________

                       Q5.1: What is speech synthesis?

  Speech synthesis programs convert written input to spoken output by
  automatically generating synthetic speech. Speech synthesis is often
  referred to a "Text-to-Speech" conversion (TTS).


___________________________________________________________________________

                      Q5.2: Performing speech synthesis

  There are several algorithms. The choice depends on the task they're
  used for. The easiest way is to just record the voice of a person
  speaking the desired phrases. This is useful if only a restricted
  volume of phrases and sentences is used, e.g. messages in a train
  station, or schedule information via phone. The quality depends on the
  way recording is done.

  More sophisticated but worse in quality are algorithms which split the
  speech into smaller pieces. The smaller those units are, the less are
  they in number, but the quality also decreases. An often used unit is
  the phoneme, the smallest linguistic unit. Depending on the language
  used there are about 35-50 phonemes in western European languages,
  i.e. there are 35-50 single recordings. The problem is combining them
  as fluent speech requires fluent transitions between the elements. The
  intellegibility is therefore lower, but the memory required is small.

  A solution to this dilemma is using diphones. Instead of splitting at
  the transitions, the cut is done at the center of the phonemes,
  leaving the transitions themselves intact. This gives about 400
  elements (20*20) and the quality increases.

  The longer the units become, the more elements are there, but the
  quality increases along with the memory required. Other units which
  are widely used are half-syllables, syllables, words, or combinations
  of them, e.g. word stems and inflectional endings.

  The Museum of Speech Analysis and Synthesis has pictures of artificial
  speech systems going back over 150 years: worth a visit. (
  http://mambo.ucsc.edu/psl/smus/smus.html)


___________________________________________________________________________

                     Q5.3: References/Books on Synthesis

 Books and Papers

    * Thierry Dutoit, An Introduction to Text-to-Speech Synthesis,
      Kluwer Academic Publishers (Dordrecht), 1997, ISBN 0-7923-4498-7,
      312 pages. Volume 3 in the series on Text, Speech and Language
      Technology.
    * Douglas O'Shaughnessy, Speech Communication: Human and Machine
      Addison Wesley series in Electrical Engineering: Digital Signal
      Processing, 1987.
    * T.V. Raman, Auditory User Interfaces --Toward The Speaking
      Computer Kluwer Academic Publishers, Boston, ISBN 0-7923-9984-6,
      August 1997, 168 pp.
    * D. H. Klatt, "Review of Text-To-Speech Conversion for English",
      Jnl. of the Acoustic Society of America (JASA), Vol 82, pp
      737-793.
    * "Talking Machines, Theories, Models and Designs" Eds, G. Bailly &
      C. Benoit (Elsevier: North Holland)
    * I. H. Witten. Principles of Computer Speech, London: Academic
      Press, Inc., 1982.
    * W.B. Kleijn and K.K. Paliwal (Eds.), Speech Coding and Synthesis,
      Elsevier, Amsterdam, 1995.
      Contents, preface etc on the WWW:
      http://www.elsevier.nl/section/engtech/scs/menu.htm
    * John Allen, Sharon Hunnicut and Dennis H. Klatt, "From Text to
      Speech: The MITalk System", Cambridge University Press, 1987.
    * J.P.H. van Santen, R. W. Sproat, J. P. Olive, and J. Hirschberg,
      "Progress in Speech Synthesis", Springer, 1996.

 On the WWW

    * Survey of the State of the Art in Human Language Technology
      Report edited by Ronald A. Cole et. al. with a section on
      Text-to-Speech Technologies.
      http://www.cse.ogi.edu/CSLU/HLTsurvey/ch5node1.html

 Bibliographies and Reference Lists

    * WWW searchable online-bibiliography for Phonetics and Speech
      Technology with more than 8000 entries. Provided by Institut fur
      Phonetik at Johann Wolfgang Goethe-Universitat Frankfurt.
      http://www.uni-frankfurt.de/~ifb/bib_engl.html
    * Computational Speech Processing: Speech Analysis, Recognition,
      Understanding, Compression, Transmission, Coding, Synthesis ; Text
      to Speech Systems, Speech to Tactile Displays, Speaker
      Identification, Prosody Processing : BIBLIOGRAPHY, by Conrad F.
      Sabourin, 1994, 2 volumes, 1187p, ISBN 2-921173-21-2, INFOLINGUA
      inc., P.O. Box 187 Snowdon, Montreal, H3X 3T4, Canada.
      See also: http://gomer.mlink.net/infolingua.html


___________________________________________________________________________

                  Q5.4: Speech Synthesis on the WWW

  Most of the following are links to WWW pages with demonstrations of
  speech synthesis. Plenty more links are included in the detailed list
  of speech synthesis software/hardware in Q5.5.

  Speech Synthesis "Museum"
         URL: http://www.cs.bham.ac.uk/~jpi/synth/museum.html
         Maintained by Jon Iles ([email protected]) at the
         University of Birmingham.
         Information and speech samples for

         + YorkTalk
         + Loughborough Sound Images
         + University of Birmingham - FDFS
         + Eurovocs
         + DECtalk
         + AT&T Bell Labs Synthesiser
         + S.W.A.Ll.C. - Welsh Synthesis from CSTR
         + All-Prosodic Speech Synthesis - IPOX
         + Orator from Bellcore

  The Festival Speech Synthesis System
         http://www.cstr.ed.ac.uk/projects/festival.html
         Pre-synthesized examples in English, Welsh and Spanish, and
         online demo of English.

  Pavarobotti
         http://www.shc.uiowa.edu/fun/pavarobotti/pavarobotti.html
         WWW demo of the Pavarobotti synthesis technology developed at
         the National Center for Voice and Speech
         (http://www.shc.uiowa.edu/ncvs_home.html).

  Say...
         http://wwwtios.cs.utwente.nl/say
         WWW demo of the rsynth speech synthesis software. The WWW
         capability was implemented by Axel Belinfante.

  Musee sonore de la synthese de la Parole en francais
         http://www.icp.grenet.fr/exemples_synthese/ex.html
         Speech synthesis examples from a series of French language
         speech synthesisers plus links to other speech synthesis demo
         pages.

         + ICP-Grenoble
         + CNET-Lannion (with TD-PSOLA)
         + KTH-Stockholm
         + Universite-Mons - several versions

  Lucent Technologies Bell Labs Text-to-Speech
         http://www.bell-labs.com/project/tts/
         Demos and samples of the latest Lucent Technologies Bell Labs
         Text-to-Speech system.

  WATSON FlexTalk from AT&T Advanced Speech Products Group
         http://www.att.com/aspg/demo.html
         WWW interface to the WATSON FlexTalk speech synthesis
         demonstration.

  AT&T Bell Laboratories Voices
         http://www.research.att.com/cgi-bin/cgiwrap/mjm/voices.cgi
         WWW interface to the AT&T Bell Laboratories text to speech
         (TTS) synthesizer

  Laureate from British Telecom
         http://www.labs.bt.com/innovate/speech/laureate/
         Demo of the Laureate speech synthesis system - not yet
         commercially available.

  ORATOR from Bellcore
         Online demo of the ORATOR system developed at Bellcore.
         http://www.bellcore.com/ORATOR/

  SVOX from TIK, ETH in Zurich
         http://www.tik.ee.ethz.ch/cgi-bin/w3svox
         Demo of German speech synthesis from Institut fur Technische
         Informatik und Kommunikationsnetze.

  Speech Synthesis Research at OGI
         http://www.cse.ogi.edu/CSLU/research/TTS
         Examples of diphone speech corpora and algorithms developed at
         OGI for synthesis of American English and Mexican Spanish using
         the Festival framework.

  Lyricos
         http://www.cse.ogi.edu/CSLU/research/TTS/research/sing.html
         Demos of the Lyricos singing voice synthesis system.
         Concatenation-based synthesis of singing voice from MIDI input.

  Multi-Lingual TTS from Gerhard-Mercator University, Duisburg
         http://www.fb9-ti.uni-duisburg.de/demos/speech.html
         Synthesis in German, English or Japanese.

  TMH: Institutionen for Taloverforing och Musikakustik, Kungliga
         Tekniska Hogskolan
         http://www.speech.kth.se/info/software.html
         Synthesis in Swedish, Finish, Norwegian, Icelandic, Danish,
         British and American English, French, German, Italian, Spanish,
         LA Spanish and Greek.

  Haskins Laboratory WWW Site
         http://www.haskins.yale.edu/Haskins/MISC/special.html
         Examples of several types of speech synthesis. Articulatory
         Synthesis by HyperASY. SineWave Synthesis. Gestural
         Computational Model. Pattern Playback system of the 1940's!

  BeSTspeech from Berkeley Speech Technologies, Inc., (BST)
         http://www.bestspeech.com/weblang.html

  Eurovocs Multilingual Speech Synthesis
         http://www.elis.rug.ac.be/ELISgroups/speech/research/eurovocs.h
         tml
         Based on Lernout and Hauspie technology.

  HADIFIX German Speech Synthesis
         http://asl1.ikp.uni-bonn.de/~tpo/Hadiq.en.html
         Provided by the Instituts fur Kommunikationsforschung und
         Phonetik, Universitat Bonn.

  Centigram's TruVoice Demo
         http://www.centigram.com/centigram/TruVoice/index.html
         Allows control of speech rate, pitch and other prosodic
         charateristics.

  MBROLA: Free Speech Synthesis Project
         http://tcts.fpms.ac.be/synthesis/modelcmp.html
         WWW demo of MBROLA which compares the quality of PSOLA,
         MBR-PSOLA, LPC, and Hybrid Harmonic/Stochastic concatenative
         synthesizers. Provided by the TCTS Lab, Faculti Polytechnique
         de Mons, Belgium

  Institute of Phonetic Sciences
         http://fonsg3.let.uva.nl/IFA-Features.html
         Links to lots of on-line speech synthesis demonstrations
         provided by the Institute of Phonetic Sciences of the Faculty
         of Arts of the University of Amsterdam.

  Yahoo page on speech generation
         http://www.yahoo.com/Science/Computer_Science/Artificial_Intell
         igence/Natural_Language_Processing/Speech_Generation/


___________________________________________________________________________

                  Q5.5: Speech Synthesis Software/Hardware

  Please email any updates, corrections or additions to the following
  list. The range of commercially available synthesis software is
  growing rapidly so any help in keeping up to date will be appreciated.

  Other lists of speech synthesis software on the WWW include:

   Kevin Lenzo's list of Macintosh Speech Resources and Apps
         http://www.cs.cmu.edu/~lenzo/mac_speech_apps.html

   Speech Toys Speech Synthesis Information
         http://www.speechtoys.com/spchtoys/spsyn.html

 In the FAQ...

  The following speech recognition software/hardware is described in the
  comp.speech FAQ.

  _Apple Macintosh_
         * BeSTspeech from Berkeley Speech Technologies, Inc., (BST)
         * Infovox Product Range
         * Macintosh Speech Output Applications
         * Macintosh Speech Synthesis Manager
         * MacYack Pro
         * MBROLA: Free Speech Synthesis Project
         * ProVoice Developer's Speech Toolkit from First Byte
         * SENSYN speech synthesizer
         * Sound Bytes DeveloperUs Kit
         * Macintosh Speech Synthesis Manager

  _Windows (including 95, NT, 3.1)_
         * AcuVoice
         * AT&T Watson Speech Synthesis
         * BeSTspeech from Berkeley Speech Technologies, Inc., (BST)
         * Creative TextAssist and TextAssist API
         * DECtalk: Text-to-Speech from Digital
         * ETI-Eloquence
         * HADIFIX
         * Infovox Product Range
         * IPOX: All Prosodic Speech Synthesis Architecture
         * Lernout and Hauspie Text-To-Speech Windows SDK
         * Listen2 Text Reader
         * MBROLA: Free Speech Synthesis Project
         * Monologue for Windows from First Byte
         * PAM - A Text-To-Speech Application
         * ProVerbe Speech Engine from ELAN Informatique
         * ProVoice Developer's Speech Toolkit from First Byte
         * SENSYN speech synthesizer
         * Sound Bytes DeveloperUs Kit
         * Tinytalk
         * TruVoice from Centigram
         * WinSpeech
         * ZMD Speech Synthesis

  _DOS_
         * CSRE: Computerized Speech Research Environment
         * Infovox Product Range
         * MBROLA: Free Speech Synthesis Project
         * ProVoice Developer's Speech Toolkit from First Byte
         * SENSYN speech synthesizer
         * spchsyn.exe
         * Tinytalk
         * ZMD Speech Synthesis

  _OS/2_
         * ProVerbe Speech Engine from ELAN Informatique
         * ProVoice Developer's Speech Toolkit from First Byte
         * Sound Bytes DeveloperUs Kit

  _Unix_
         * AcuVoice
         * AsTeR
         * BeSTspeech from Berkeley Speech Technologies, Inc., (BST)
         * DECtalk: Text-to-Speech from Digital
         * ETI-Eloquence
         * Emacspeak - A Speech Output Subsystem For Emacs
         * Festival Speech Synthesis System
         * JSRU
         * Klatt-style synthesiser
         * KPE80 - A Klatt Synthesiser and Parameter Editor
         * "learph": Trainable text-to-phoneme software by Antonio Lucca

         * Lucent Technologies Bell Labs Text-to-Speech system
         * MBROLA: Free Speech Synthesis Project
         * Orator from Bellcore
         * ProVerbe Speech Engine from ELAN Informatique
         * rsynth
         * SENSYN speech synthesizer
         * SGI Developers Toolbox Synthesiser
         * Speak
         * TrueTalk
         * TruVoice from Centigram

  _Integrated Circuits and Dedicated Hardware_
         * Eurovocs
         * Infovox Product Range
         * ProVerbe Speech Engine from ELAN Informatique
         * RC Systems V8600/V8601 Text to Speech synthesizers

  _Other Platforms_
         * BeSTspeech from Berkeley Speech Technologies, Inc., (BST)
         * TheBigMouth (NeXT)
         * MBROLA: Free Speech Synthesis Project
         * Narrator Translator Library (Amiga)
         * Narrator (Amiga)
         * TextToSpeech Kit (NeXT)
         * Orator from Bellcore
         * SENSYN speech synthesizer
         * WreadFiles: File reader for Commodore Amiga

  _Unknown_
         * Lernout and Hauspie Text-To-Speech (3 products)
         * SIMTEL
         * Text to Phoneme Program 1
         * Text to phoneme program 2
         * Text to phoneme program 3



AcuVoice

    * Platform: Windows, Solaris
    * Description: AcuVoice is a natural sounding text-to-speech system
      built using a concatenative approach. Currently it is available
      for an American English Male Voice. Software Developer Kits are
      available for the Windows Platform (32-Bit) and also for the
      Solaris Platform. More information and samples are available on
      the Acuvoice web site.
    * Contact: AcuVoice, Inc.
      84 W. Santa Clara Street, Suite 720, San Jose, CA 95113-1810
      Ph: 1(408)289-1661, Fax: 1(408)289-1201
      Demo: 1(408)289-1177
      Email: [email protected]
      WWW: http://www.acuvoice.com/



AsTeR

    * Platform: UNIX
    * Description: TTS front-end program which encodes structural
      information about documents in speech synthesis. For more
      information check out:

               http://www.research.digital.com/CRL/personal/raman/aster/
               aster-toplevel.html

    * Operation requirements: Lisp: Lucid, clisp
    * Contact: T. V. Raman
      WWW: http://www.research.digital.com/CRL/personal/raman/raman.html

      Email: [email protected]



AT&T Watson Speech Synthesis

    * Platform: Windows 95/NT on a Pentium 75 Mhz or higher
    * Description: Watson is a software implementation of AT&T Bell
      Laboratories voice processing technology. Watson includes BLASR
      Speech Recognition (see Q6.6) and FlexTalk speech synthesis. It
      requires no special hardware to run other than a standard sound
      card and/or phone card. Technical details for the FlexTalk speech
      synthesis include:
         + Compliant with MS Speech API.
         + Male and Female Voices available
         + 8 KHz and 11 KHz output
         + SoundBlaster compatible sound card and drivers required
         + Context sensitive abbreviation expansion
         + Accurate pronunciation of most proper names
         + Adjustable vocal tract size, speed, volume, pitch, etc.
         + American English only - other languages in development
      The AT&T Advanced Speech Products Group home page provides more
      detailed information including a Frequently Asked Questions list,
      information for application developers on the Independent Software
      Vendor (ISV) Program (including info on the SDK, licensing, and
      the training program).
    * Requirements: Uses 2 MB RAM, 10 MB Disk. Requires a Pentium 75 MHz
      or higher (uses
    * Cost and Availability: WATSON is a software-based speech platform
      with a Software Developers Kit (SDK) that allows application
      developers to use voice processing in their applications. It is
      not available as a stand-alone product.
      Licensing information (inc. price) is provided in the AT&T
      Advanced Speech Products Group home page
    * See also: Watson BLASR speech recognition in Q6.5, Microsoft
      Speech API, and Advanced Speech API.
    * Contact: AT&T Advanced Speech Products Group
      Suite 700, 44 East Mifflin Street, Madison, WI 53703, USA
      Ph: 1-800-5-WATSON, Fax: 1-608-259-2269
      Email: [email protected]
      WWW: http://www.att.com/aspg/



BeSTspeech from Berkeley Speech Technologies, Inc., (BST)

    * Platform: available for Macintosh, Sun, Silicon Graphics, Windows
      PC and IBM RS/6000 platforms, and can be ported to others.
    * Description: BeSTspeech reads ASCII text no vocabulary limits.
      Available for Dutch, English (male and female), French, German,
      Italian, Portuguese, Spanish, Arabic, Cantonese, Japanese, Korean,
      Malay, Mandarin and Russian.
    * Availability: Berkeley Speech Technologies, Inc does not sell end
      user toolkits or products.
    * Contact: Berkeley Speech Technologies, Inc.
      2246 Sixth Street, Berkeley, California 94710, USA
      Ph: (510) 841-5083, Fax: (510) 841-5093
      Email: [email protected]
      WWW: http://www.bestspeech.com/index.html



TheBigMouth - a Text to Speech Program

    * Platform: NeXT
    * Description: Text to speech program based on concatenation of
      pre-recorded speech segments.
    * Availability:
      ftp://ftp.cs.keio.ac.jp/pub/NeXT/source/TheBigMouth1.0.tar.Z



Creative TextAssist

    * Platform: Windows
    * Description: Based on DECtalk speech synthesis. A detailed
      description of TextAssist is provided on the Creative WWW pages.
      TextAssist TextReader provides a convenient Windows user interface
      for text reading.
    * Availability: Creative TextAssist is bundled with most (all?)
      Creative Sound Blaster audio cards. TextAssist preview software is
      available from the Creative Labs TextAssist home page.
    * Contact: Creative Labs, Inc.
      Address, phone, email etc unknown
      WWW: http://www.creaf.com/ :
      http://www.creaf.com/wwwnew/tech/devcnr/tassist.html

Creative TextAssist API

    * Platform: Windows
    * Description: The TextAssist API (TAAPI) is created for Microsoft
      Windows 3.1x and Windows 95 developers who intend to develop
      16-bit Text-to-Speech software applications using Creative's
      TextAssist speech engine. It supports direct control of speech
      output characteristics, concurrent playback of text-to-speech and
      wave files, foreign language support, speech synchronization,
      exception dictionaries. It also includes a voice editing tool for
      creating new custom voices, a Visual Basic Custom Control for
      high-level support in Visual Basic and other languages
    * Availability: The TextAssist API is released to registered
      developers at no cost.
    * Contact: WWW: http://www.creaf.com/
      FAQ: http://www.creaf.com/wwwnew/tech/devcnr/tassfaq.html



CSRE: Computerized Speech Research Environment

    * Platform: DOS
    * Description: CSRE is a software system which includes in an
      implementation of the Klatt speech synthesizer. See the CSRE entry
      in Q1.9 and the AVAAZ WWW pages for more detail.
    * Contact: AVAAZ Innovations Inc.
      P.O.Box 8040, 1225 Wonderland Rd. N, London, Ontario, CANADA, N6G
      2B0
      Ph: +1-519-472-7944 , Fax: +1-519-472-7814
      Email: [email protected]
      WWW: http://www.icis.on.ca/homepages/avaaz/



DECtalk Speech Synthesis

    * Platform: Windows NT, Alpha with Digital UNIX and RS232 ports
    * Description: Converts ordinary text into natural-sounding,
      intelligible speech. Provides personalized voices, and extensive
      user controls. DECtalk technology is available for the following
      packaging options.
         + DECtalk PC card option: An industry-standard ISA/EISA bus
           card implementation that can be integrated with any Intel 486
           processor-based system running DOS or Windows. Applications
           can be interfaced to the bus via a DOS Terminate and Stay
           Resident (TSR) driver or a Windows Dynamic Link Library
           (DLL). This option is available with an external speaker with
           volume control and headphone jack.
         + DECtalk Express external package: An external, portable
           package that you can plug in to any PC or serial port. The
           external package includes a built-in speaker and headphone
           jack, plus combined on/off and volume controls and a
           rechargeable battery pack.
         + DECtalk Software solution: Software-only text to speech for
           Alpha or Intel systems running Windows NT or Alpha systems
           running Digital UNIX. Provides complete speech synthesis
           capabilities so developers can enhance applications with
           DECtalk technology. DECtalk Software output can be directed
           to audio devices, into WAVE files, or into memory buffers.
    * Pricing:
      ://www.systems.digital.com/DIcatalog/html/DECtalk-Speech-Synthesis
      -oi.html
    * More Information:
      Digital Equipment Corporation WWW pages: http://www.digital.com/
      DECtalk page:
      http://www.systems.digital.com/DIcatalog/html/DECtalk-Software.htm
      l
      Ph: 1-800-DIGITAL

DECtalk Software

    * Platform: Digital UNIX and Windows NT
    * Description: DECtalk converts standard ASCII text into natural,
      intelligible speech. Speech output through any audio device is
      supported by Microsoft Video for Windows or Multimedia Services
      for Digital UNIX. An API gives developers direct access to
      text-to-speech functions. Provides nine voice personalities (4
      female, 4 male, 1 child). Provides punctuation and tonal control,
      supports customized pronunciation of trade jargon and acronyms.
      Common programming interface works with both Alpha and Intel
      platforms.
    * More Information:
      Digital Equipment Corporation WWW pages: http://www.digital.com/
      DECtalk Software page:
      http://www.systems.digital.com/DIcatalog/html/DECtalk-Software.htm
      l
      WWW:
      http://www.systems.digital.com/DIcatalog/html/DECtalk-Speech-Synth
      esis.html
      Ph: 1-800-DIGITAL



ETI-Eloquence

    * Platform: MS Windows (Win95,NT,3.1), Solaris, SunOS, SGI, RS/6000
    * Description: ETI-Eloquence is a software based text-to-speech
      system. It generates waveforms completely algorithmically instead
      of by concatenating waveforms, for maximum flexibility and
      naturalism. For instance, when the user requests a deeper voice,
      the software simulates a larger vocal tract, instead of simply
      pitch-shifting samples. It uses high-level linguistic parsing,
      which obviates the need for a huge dictionary. It handles numbers,
      acronyms, currency, etc. It includes a set of annotation symbols,
      for placing stress on particular words, expressing
      excitement/boredom, etc. Also allows phonetic input. Supports MS
      SAPI.
      Produces male and female voices for General American English.
      Dialects under development include Alabama and Brooklyn.
    * Price: Flexible license agreements on application.
    * Availability:Eloquent Technology, Inc.
      2389 North Triphammer Road, Ithaca, NY 14850 , USA
      Ph: (607) 266-7025, Fax: (607) 266-7030
      Email: [email protected]
      WWW: http://www.eloq.com/



Emacspeak - A Speech Output Subsystem For Emacs

    * Platform: UNIX, Emacs
    * Description: Emacspeak is a speech output system that will allow
      someone who cannot see to work directly on a UNIX system.
      Emacspeak is built on top of Emacs. With emacspeak loaded, Emacs
      provides spoken feedback for everything you do. Emacspeak
      currently supports the new Dectalk Express speech synthesizer, as
      well as older versions of the Dectalk e.g. the MultiVoice. See the
      Emacspeak WWW page, the Emacspeak FAQ or the Emacspeak
      distribution for additional details.
    * Requirements: Requires GNU FSF Emacs 19 (version 19.23 or later)
      and TCLX 7.3B (Extended TCL) to run Emacspeak.
    * Availability:

       Emacspeak WWW page
               http://www.research.digital.com/CRL/personal/raman/emacsp
               eak/emacspeak.html

       Emacspeak source
               http://www.research.digital.com/CRL/personal/raman/emacsp
               eak/emacspeak.tar.gz

    * Contact: T. V. Raman, [email protected]



Eurovocs

    * Platform: Various - RS232 hardware connection
    * Description: Eurovocs is a stand-alone text-to-speech synthesizer
      which uses the text-to-speech technology of Lernout and Hauspie
      Speech Products. Available for Dutch, French, German and American
      English with other languages planned for release soon. One
      Eurovocs device can support two different languages. Eurovocs can
      be connected to any computer via a standard serial interface
      (RS232). It supports personal dictionaries, generation of DTMF
      tones, and pronunciation of special character sequences such as
      digit strings, telephone-numbers, date and time indications,
      abbreviations, alphanumeric strings etc.
    * Contact: Technologie & Revalidatie
      Postbus 128, B-9000 Gent, Belgium
      Ph: +32-9-264 33 97, Fax: +32-9-264 35 94
      E-mail: [email protected]
      WWW:
      http://www.elis.rug.ac.be/ELISgroups/speech/research/eurovocs.html



Festival Speech Synthesis System

    * Platform: General Unix (including Solaris (2.4,2.5), SunOS, HPUX,
      SGIs, Linux, Dec Alpha, FreeBSD)
    * Description: Festival is a general multi-lingual speech synthesis
      system developed at CSTR, University of Edinburgh. It offers a
      full text to speech system with various APIs, as well an
      environment for development and research of speech synthesis
      techniques. It is written in C++ with a Scheme-based command
      interpreter for general control. Festival's home page offers
      demos, the full manual and access to the download page. The
      distribution includes full source and documentation, and lexicons
      and speech databases for British English text to speech.
    * Price: Free for non-commercial use
    * Availability: by anonymous ftp:
      WWW: http://www.cstr.ed.ac.uk/projects/festival/download.html
      ftp: ftp://ftp.cstr.ed.ac.uk/pub/festival/



HADIFIX

    * Platform: Windows
    * Description: German speech synthesis system developed at the
      Institute for Communications Research and Phonetics , University
      of Bonn. Provides conversion of input text to phonemes, automatic
      prediction of stress, phrasing and pitch, and speech generation by
      concatenation of small units of natural speech. Demisyllables and
      similar units are used; they comprise all consonants before the
      vowel and the beginning of the vowel (initial demisyllable) or the
      end of the vowel and the following consonants (final
      demisyllable). For example, the word 'Strolch' is formed by
      concatenating 'Stro' and 'olch'.
    * Demo: Windows demo software available. Limited to synthesis of one
      short text (text.txt) at a time. Speech format limitations too.
      1.3MB file.
      ftp://asl1.ikp.uni-bonn.de/pub/hadifix/hadidemo.zip
      A 1993 version is available with unlimited synthesis from a string
      of phonemic symbols and accent markers. 6MB file.
      ftp://asl1.ikp.uni-bonn.de/pub/hadifix/hadi25.lzh
    * WWW: http://asl1.ikp.uni-bonn.de/~tpo/Hadifix.en.html
    * On-line demo: http://asl1.ikp.uni-bonn.de/~tpo/Hadiq.en.html



Infovox Product Range

    * Description: Multilingual Text-to-speech systems, languages
      available: American English, British English, German, French,
      Spanish, Italian, Swedish, Norwegian, Icelandic, Danish and
      Finnish.
    * Product name:INFOVOX 500, PC BOARD
         + Product description: Half length expansion board for IBM PC,
           XT, AT, PS/2 model 30 or compatible personal computers. The
           board can also be connected via the serial port. Language and
           control program for downloading into RAM or mounted on EPROMs
         + Platform: DOS/Windows with IBM PC, XT, AT, PS/2 model 30 or
           compatible
         + Delivered standard interface: MS DOS I/O driver
    * Product name: INFOVOX 600, OEM BOARD
         + Product description: OEM board built with CMOS IC's. Language
           and control program are stored in on-board fixed memory.
         + Platform: any, hardware interface: 9-pole D-SUB (RS 232-C)
           300-9600 Baud.
         + Delivered standard interfaces: MS DOS I/O driver and
           interface to Apple Speech manager.
    * Product name: INFOVOX 700, DESKTOP UNIT
         + Product description: Desktop unit with built in Infovox 600
           to be connected to any computer or terminal via an RS 232-C
           serial interface. Built in loudspeaker and rechargable
           battery for 4 hours use, and control knobs for continuous
           control of speech volume and speed.
         + Platform: various through hardware interface
         + Delivered standard interfaces: MS DOS I/O driver and
           interface to Apple Speech manager
    * Product name: INFOVOX 650, OEM BOARD
         + Product description: OEM-board built with CMOS IC's. Language
           and control program are stored in on-board memory.
         + Platform: any, hardware interface: 9 pole D-SUB (RS 232-C)
           300-9600 Baud
         + Delivered standard interfaces: MS DOS I/O driver and
           interface to Apple Speech manager
    * Product name: INFOVOX 750, DESKTOP UNIT
         + Product description: Desktop unit with built in Infovox 650
           to be connected to any computer or terminal via an RS 232-C
           serial interface. Built in loudspeaker and rechargable
           battery for 5 hours use, and a control knob for continuous
           control of speech volume.
         + Platform: various through hardware interface. Delivered
           standard interfaces include MS DOS I/O driver and interface
           to Apple Speech manager
    * Product name: Infovox 210, software for Apple Macintosh
         + Product description: Software based text-to-speech
           conversion. Produces 16 bit and 8 bit sound. Delivered on
           3.5" diskettes with user lexicon and a complete
           documentation.
         + Platform: Apple Macintosh with minimum 68030, 33 MHz
           microprocessor.
         + Delivered standard interfaces: Standard interface to Apple
           Speech manager
    * Product name: Infovox 220, software for Microsoft Windows.
         + Product description: Software based text-to-speech
           conversion. Produces 16 bit sound and conforms to Microsoft
           Windows multimedia standard MCI. Delivered on 3.5" diskettes
           with user lexicon and a complete documentation.
         + Platform: Windows on IBM compatible PC with minimum 486/25MHz
           microprocessor.
         + Delivered standard interfaces: Standard interface to
           Microsoft Windows 3.1 and sound boards supporting Microsoft
           Windows multimedia driver for audio.
    * Contact: Telia Promotor Infovox AB
      TTS Sales Division
      P.O. Box 2069, S-171 02 Solna, Sweden
      Ph: +46 8 764 35 00, Fax: +46 8 735 78 76
      Email: [email protected]
      WWW: http://www.promotor.telia.se/NYA/cc/t-s/index.html



IPOX: All Prosodic Speech Synthesis Architecture

    * Platform: Windows
    * Description: IPOX is an experimental, all-prosodic speech
      synthesizer, developed by Arthur Dirksen and John Coleman. IPOX is
      freely available (after registration) for evaluation and
      non-profit research purposes.
    * Requirements: PC (preferably a fast 486) running Windows 3.1 or
      higher. Sound output requires a 16-bit Windows-compatible sound
      card
    * Availability: By WWW from
      http://www.tue.nl/ipo/people/adirksen/ipox/ipox.htm



JSRU

    * Platform: UNIX and PC
    * Cost: 100 pounds sterling (from academic institutions and
      industry)
    * Description: A C version of the JSRU system, Version 2.3 is
      available. It's written in Turbo C but runs on most Unix systems
      with very little modification. A Form of Agreement must be signed
      to say that the software is required for research and development
      only.
    * Contact: Dr. E.Lewis [email protected])_



Klatt-style synthesiser

    * Platform: Unix
    * Cost: Free
    * Description: Software posted to comp.speech in late 1992.
    * Availability: By ftp from the comp.speech ftp site
         +
           ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/synthesis/klatt.3.
           04.tar.gz
         +
           ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/synthesis/klatt.3.
           04.tar.Z
    * See also: KPE80 - A Klatt Synthesiser and Parameter Editor.



KPE80 - A Klatt Synthesiser and Parameter Editor

    * Platform: Unix
    * Description: The KPE80 program provides a graphical interface for
      the implementation of the Klatt 1980 formant synthesiser written
      by Jon Iles and Nick Ing-Simmons. It was inspired by IGE, a piece
      of code written by Rob Fletcher (
      http://www.york.ac.uk/~rpf1/IGE.html).
    * Technical Desc.: It is comprised of an X-Window interface and
      version 3.03 of the synthesiser code. The interface allows users
      to display and edit Klatt parameters using a graphical display
      which includes the time-amplitude waveform of both the original
      speech and its synthetic copy, and some signal analysis
      facilities. Most of the work in choosing the parameter values to
      produce the synthetic copy has to be done by the user. KPE will
      estimate the fundamental frequency contour from an original token;
      this estimate will need to be amended where errors occur. It is
      possible to specify the formant trajectories with some precision
      by overlaying the appropriate formant frequency parameter tracks
      on the spectrogram of the target waveform. A number of facilities
      exist to help in the refinement of parameter values: original and
      synthetic waveforms can be compared aurally, spectrally, and
      spectrographically using built-in speech analysis facilities.
    * File formats: KPE will read RIFF (.wav) files and SFS files. (SFS
      is a suite of speech-signal processing programs available free
      from Phonetics and Linguistics, UCL.)
    * Availability:

       KPE for SunOs 4.1.3 (statically compiled libraries)
               ftp://pitch.phon.ucl.ac.uk/pub/kpe/kpe80.sun413.tar.Z

       KPE for Linux (statically compiled libraries)
               ftp://pitch.phon.ucl.ac.uk/pub/kpe/kpe80.linux.tar.Z

       The source code (needs gcc and SUIT to compile)
               ftp://pitch.phon.ucl.ac.uk/pub/kpe/kpe80.src.tar.Z

       A postscript overview of KPE
               ftp://pitch.phon.ucl.ac.uk/pub/kpe/OVERVIEW.ps

       The SFS distribution
               ftp://pitch.phon.ucl.ac.uk/pub/sfs/

    * See also: Public domain Klatt-style speech synthesis code.
    * Contact: Andrew Simpson
      Department of Phonetics and Linguistics, University College London

      Wolfson House, 4 Stephenson Way, London NW1 2HE
      Email: [email protected]
      WWW: http://www.phon.ucl.ac.uk/home/andrew/home.html



"learph": Trainable text-to-phoneme software by Antonio Lucca

    * Platform: UNIX
    * Description: Experimental software which learns text to phoneme
      translation from examples using decision-tree-like data
      structures. It is based on the assumption that each letter can
      correspond to different phoneme strings depending on the context.
    * Availability: Examples and source are available on the WWW:
      http://www.silab.dsi.unimi.it/~al367212/ttsdoc.html
    * Contact: Antonio Lucca: [email protected]



Lernout & Hauspie Text-to-Speech (3 products)

  Lernout & Hauspie have three TTS products. The functionality of the
  products is similar, however, they differ in hardware implementation
  and other details where described below.

    * L&H tts2000/T: TTS for the Telephony and Telecommunications Market
    * L&H tts2000/M: TTS for the Computer and Multimedia Market
    * L&H tts3000/C: TTS for the Buisness and Consumer Electronics
      Market

    * Description: Text to Speech (TTS) software based on parameterized
      segment concatenation (diphones, triphones and tetraphones)
      algorithms. Available for US English, German, Dutch, French,
      Spanish (Castilian), Italian and Korean. General features include:
         + The control of volume, speech rate and speech pitch.
         + The use of control sequences to customize TTS output (adding
           pauses, using phonetic input, etc.).
         + Switching between languages at run time.
         + A personal vocabulary editor is available for building
           exception dictionaries.
         + Readout modes: letter by letter, word by word or sentence by
           sentence.
         + Input formats: orthographic input, phonetic input, phonetic
           input with prosodic information.
    * tts2000/T
         + Output formats: 8 bit mu-law PCM, 8 bit A-law PCM, 16 bit
           linear PCM.
         + Sampling Frequency: 8kHz
         + Single channel platform examples: SHARP SH7000, ARM6/ARM7,
           Intel i960, TI TMS320C31, AT&T DSP3210
         + Multi-channel platform examples: TI TMS320C31, AT&T DSP3210
    * tts2000/M
         + Output formats: 8/16 bit wave format, 8 bit mu-law PCM, 8 bit
           A-law PCM, 16 bit linear PC.
         + Sampling Frequency: 8/10/11.025 kHz
         + Single processor platform examples: ARM6/ARM7, Intel
           386/486/Pentium, Motorola 68040
         + Two processor platform examples: {Intel 386/486/Pentium or
           Motorola 68030} and {ADI ADSP21XX or Motorola 5600X or TI
           TMS320C25/20C5X}
    * tts3000/C
         + Output formats: 8 bit mu-law PCM, 8 bit A-law PCM, 16 bit
           linear PCM.
         + Sampling Frequency: 10kHz
         + Single processor platform examples: SHARP SH7000, ARM6/ARM7,
           Intel i960, TI TMS320C31, AT&T DSP3210
         + Two processors platform examples: { SHARP SH7000 or ARM6/ARM7
           or Intel 386EX or Motorola 683XX} and {ADI ADSP21XX or
           Motorola 5600X or TI TMS320C25/C5X or TI TSP50C10}
    * See also: L&H Windows TTS SDK
    * More Information: on the Lernout & Hauspie WWW pages:
      http://www.lhs.com/tts.html
    * Price: Unknown
    * Contact: Lernout and Hauspie Speech Products
      20 Mall Road, 4th Floor
      Burlington, MA 01803, USA
      Ph: +1-617-238-0960, Fax: +1-617-238-0986
      Email: [email protected]
      WWW: http://www.lhs.com/



Lernout & Hauspie Text-to-Speech Windows SDK

    * Platform: Windows
    * Description: The L&H Text-to-Speech software developers kit is
      able to integrate text-to-speech technology with your own or
      existing PC applications under Microsoft Windows 3.1. This
      software will allow conversion of written text into clear human
      sounding synthetic speech.
    * Requirements: IBM-compatible PC 386 DX/33 + 8Mb RAM + MS DOS 5.0 +
      MS Windows 3.1 (or higher) + SoundBlaster compatible sound board.
    * See also: L&H TTS Products
    * More Information: on the Lernout & Hauspie WWW pages:
      http://www.lhs.com/tts.html
    * Price: Unknown
    * Contact: Lernout and Hauspie Speech Products
      20 Mall Road, 4th Floor
      Burlington, MA 01803, USA
      Ph: +1-617-238-0960, Fax: +1-617-238-0986
      Email: [email protected]
      WWW: http://www.lhs.com/



Listen2 Text Reader

    * Platform: Windows
    * Description: Listen2 is a multi-voice, multi-language text reader.
      Listen2 comes in two versions, English only that uses high quality
      male and female voices, and the International version that can
      speak up to 5 different languages: English, German, French,
      Spanish or Italian, all in male voices. The basic International
      program comes with built-in English and additional language fonts
      can be purchased separately. The English version comes complete.
      Both programs are dynamically switchable and configurable. This
      means that you can press a hot key to speed up the speech, make it
      louder or quieter, etc., as it is reading a file. You can also
      insert flags in text files to make it switch voices or switch
      languages, depending on what version you have.
      Listen2 has all the features of the JTS Reader shareware program
      plus a few more. It will voice your reminder messages or
      appointment list on start-up. It will also speak a reminder
      message on shutting down.
    * WWW: A more complete description is available on the Listen2 web
      page
    * Contact: Tom Slemko: e-mail: [email protected], or,
      JTS Micro Consulting Ltd
      10931 Lytton Road, RR#4, Ladysmith, B.C., Canada, V0R 2E0
      WWW: http://www.islandnet.com/jts/



Lucent Technologies Bell Labs Text-to-Speech system

    * Platform: UNIX and Win-95/NT
    * Description:Lucent Technologies provides a web site with demos and
      samples of their latest speech synthesis technology. The site has
      interactive demos in American English, German, and Mandarin
      Chinese, and the capability to adjust voice parameters on the fly.
      Pre-synthesized demos for French, Italian, Russian, and Romanian
      are also provided.
      The site includes downloadable papers with detailed system
      descriptions.
    * WWW: http://www.bell-labs.com/project/tts/



Macintosh Speech Output Applications

    * Platform: Macintosh
    * Description: A comprehensive list of Macintosh Speech Applications
      is provided by Kevin Lenzo at CMU:
      http://www.cs.cmu.edu/~lenzo/mac_speech_apps.html
      The Apple Speech WWW Site also has some useful information:
      http://www.speech.apple.com/



Speech Manager and PlainTalk

    * Platform: Macintosh
    * Description: Apple's text-to-speech system extensions that enable
      applications to perform text-to-speech conversion. The Speech
      Manager runs on most Macs, but PlainTalk (and the high quality
      voices) requires a 68020 Mac or better.
    * Availability: By anonymous ftp from:
      ftp://ftp.support.apple.com/pub/apple_sw_updates/US/Macintosh/Syst
      em/PlainTalk 1.4.1/
      This directory contains subdirectories for recent versions of
      PlainTalk. The current release (PlainTalk 1.4.1) contains the
      English Text-To-Speech with about a dozen voices
      (English_Text-to-Speech.hqx: 5.3 MByte), Mexican Spanish
      (Mexican_Spanish_TTS.hqx: 2.8 MByte), and the English Speech
      Recognition software (English_Speech_Recognition.hqx: 2.3MByte).
    * Cost: Free
    * WWW: The latest information is available from Apple's WWW page for
      speech recognition and synthesis:
      http://www.speech.apple.com/
    * Note 1: Check out Kevin Lenzo's list of Macintosh Speech
      Applications.
    * Note 2: Joshua Baer ([email protected]) runs a mailing list for
      Plaintalk. For subscription and other information visit the
      Plaintalk Discussion List Home page
    * Contact: Apple Computer, Inc.
      1 Infinite Loop, Cupertino, CA 95014, USA
      WWW: http://www.speech.apple.com/
      Email: [email protected]



MacYack Pro

    * Platform: Macintosh
    * Description: MacYack Pro is a commercial speech package for
      Macintosh that uses the PlainTalk Text-to-Speech synthesis
      software. Features include:
         + Add speech to any word processor.
         + Hear notification dialogs and other dialog boxes.
         + See and hear a customized message at startup or shutdown.
         + Hear calculations instantly.
         + Correct pronounciation errors.
         + Create custom double-clickable "speech files."
         + Have speaking alert sounds.
         + Add speech to HyperCard stacks.
         + Use AppleScript to add speech to other programs.
    * Price: $29.95 for a limited time, reduced from $49.95 regular
      price. 30 days money back guarantee.
    * Contact: Scantron Quality Computers
      20200 Nine Mile Rd. St. Clair Shores, MI 48080
      Ph: 1-800-777-3642, Fax: 810-774-2698
      E-mail: [email protected]
      WWW: http://www.sqc.com/
      Product Info: http://www.lowtek.com/macyack/



MBROLA: Free Speech Synthesis Project

    * Platform: Sun4, Sun/SunOS5.4, HP, VAX/VMS, DEC Alpha/VMS, PS/DOS,
      PS/Windows 3.1, PS/Windows 95, PC/Solaris2.4, PC/Linux, SGI
      INDY/IRIX, NeXT, and soon for Macintosh.
    * Description: MBROLA is a high-quality, diphone-based speech
      synthesizer which is available for free. It is provided by the
      TCTS Lab of the Faculte Polytechnique de Mons (Belgium) which aims
      to obtain a set a speech synthesizers for as many languages as
      possible which will be free of use for non-commercial,
      non-military applications.
      MBROLA 2.00 takes a list of phonemes as input, together with
      prosodic information (duration of phonemes and a piecewise linear
      description of pitch), and produces 16bit speech samples at the
      sampling frequency of the diphone database (typically 16kHz). (It
      is therefore NOT a Text-To-Speech (TTS) synthesizer, since it does
      not accept raw text as input.) Databases are now being prepared
      for English, Spanish, Italian, Dutch, and Romanian. Collaborations
      are welcome. More information can be found at the MBROLA project
      homepage.
    * Demonstration: WWW demo of MBROLA which compares the quality of
      PSOLA, MBR-PSOLA, LPC, and Hybrid Harmonic/Stochastic
      concatenative synthesizers is available at
      http://tcts.fpms.ac.be/synthesis/modelcmp.html.
    * Contact: Dr Thierry Dutoit
      Faculte Polytechnique de Mons, TCTS Lab,
      31, bvd Dolez, B-7000 Mons, Belgium.
      Ph: +32-65-374133, Fax: +32-65-374129
      e-mail: [email protected]
      WWW: http://tcts.fpms.ac.be/synthesis/mbrola.html



Monologue for Windows from First Byte

    * Platform: Windows
    * Description: Monologue is a software program that reads text from
      the clipboard in Windows 16 or 32 bit applications. It can be
      found as a bundled product with many sound cards and multimedia
      general purpose computer systems. Monologue can add the element of
      speech to virtually any text oriented application. Any
      pronounceable combination of letters and numbers will be spoken
      clearly. It can be applied to tasks such as eyes-free
      proofreading, data verification (e.g. spreadsheets), reading
      E-mail and more. User-changeable parameters provide control over
      the sound quality by allowing for changes in pitch, and the speed
      of speech. An exception dictionary saves preferred pronunciation
      of words and abbreviations.
      Monologue Win32 now includes support for the Microsoft SAPI.
      Monologue male "SpeechFonts" are available for US English, British
      English, German, French, Latin American Spanish, Italian. A US
      English Female SpeechFont is also available.
      For more detailed information and examples go to the First Byte
      WWW pages.
    * Availability: Currently bundled with many sound cards and
      multimedia general purpose computer systems. For pricing,
      licensing details, and release information see the First Byte WWW
      pages or email [email protected].
    * See also: ProVoice Developer's Speech Toolkit from First Byte
    * Contact: First Byte
      19840 Pioneer Ave., Torrance, CA 90503
      Ph: 310-793-0610 Fax: 310-793-0611
      Email: [email protected]
      WWW: http://www.firstbyte.davd.com/



Narrator Translator Library

    * Platform: Amiga
    * Description: A US English text to phoneme translator, implemented
      as a resident software library, for use with the Amiga Narrator
      Device. This software was supplied as a standard part of the Amiga
      operating system software up to O.S version 2.04. (Translator
      version 37.1, 1991) Approximately 700 translation rules are used
      to create the 'ARPAbet' phonemes. This software is functional on
      all current Amiga systems (O.S. 3.1).
    * Availability: limited to pre-owned system software disks and
      unsold O.S upgrade kits (Pre-O.S. 2.1).

Replacement Library: Translator42

    * Platform: Amiga
    * Description: an independent replacement for the Commodore-supplied
      "translator.library" which is a part of the Narrator speech
      synthesis package. It implements multi-lingual text-to-speech for
      an Amiga. The translation rules for each language are defined in a
      plain text 'Accent' file.
      There is a provision for the selection of unique languages for
      text segments by inserting in-line markup codes in the text: e.g.
      "Hello there! \french{Bonjour} \deutsch{gute morgen}".
      'Accent' files for American English, British English, Swedish,
      Maori, Finnish, German, Icelandic, Klingon, Polish, Italian, and
      Welsh languages included in the archive.
    * Availability: Amiga The most current version, 42.4, of the library
      and source are available by anonymous ftp from Aminet:
      ftp://ftp.doc.ic.ac.uk/pub/aminet/util/libs/translator42.lha
      ftp://ftp.doc.ic.ac.uk/pub/aminet/dev/src/tran42src.lha



Narrator

    * Platform: Amiga
    * Description: Formant based speech synthesis. Includes a
      Engish-to-phoneme translation library, and a SPEAK: pseudo-device
      for speech output.
    * Hardware: Standard Amiga hardware
    * Availability: Part of AmigaOS
    * See Also: The Narrator Translation library



TextToSpeech Kit

    * Platform: NeXT Computers
    * Description: The TextToSpeech Kit does unrestricted conversion of
      English text to synthesized speech in real-time. The user has
      control over speaking rate, median pitch, stereo balance, volume,
      and intonation type. Text of any length can be spoken, and
      messages can be queued up, from multiple applications if desired.
      Real-time controls such as pause, continue, and erase are
      included. Pronunciations are derived primarily by dictionary
      look-up. The Main Dictionary has nearly 100,000 hand-edited
      pronunciations which can be supplemented or overridden with the
      User and Application dictionaries. A number parser handles numbers
      in any form. A letter-to-sound knowledge base provides
      pronunciations for words not in the Main or customized
      dictionaries. Dictionary search order is under user control.
      Special modes of text input are available for spelling and
      emphasis of words or phrases. The actual conversion of text to
      speech is done by the TextToSpeech Server. The Server runs as an
      independent task in the background, and can handle up to 50 client
      connections.
    * Misc: The TextToSpeech Kit comes in two packages: the Developer
      Kit and the User Kit. The Developer Kit enables developers to
      build and test applications which incorporate text-to-speech. It
      includes the TextToSpeech Server, the TextToSpeech Object, the
      pronunciation editor PrEditor, several example applications,
      phonetic fonts, example source code, and developer documentation.
      The User Kit provides support for applications which incorporate
      text-to-speech. It is a subset of the Developer Kit.
    * Hardware: Uses standard NeXT Computer hardware.
    * Cost:
         + TextToSpeech User Kit: $175 CDN ($145 US)
         + TextToSpeech Developer Kit: $350 CDN ($290 US)
         + Upgrade from User to Developer Kit: $175 CDN ($145 US)
    * Availability: Trillium Sound Research

   1500, 112 - 4th Ave. S.W., Calgary, Alberta, Canada, T2P 0H3
   Tel: (403) 284-9278 Fax: (403) 282-6778
   Order Desk: 1-800-L-ORATOR (US and Canada only)
   Email: [email protected]



Orator Text-to-Speech Synthesizer

    * Platform: SUN SPARC, Decstation 5000. Written in C, and therefore
      portable to other UNIX platforms. Some successful ports: HP,
      RS-6000, PC-Unix [Linux].
    * Description: Sophisticated speech synthesis package. Has text
      preprocessing (for abbreviations, numbers), acronym rules, and
      human-like spelling routines. Natural-sounding synthesis based on
      demisyllable concatenation. Has high accuracy for pronunciation of
      names of people, places and businesses in America; good accuracy
      for English text; rules for stress and intonation marking; various
      methods of user control and customization at most stages of
      processing.
      A new version of the ORATOR system is under development. Both
      ORATOR and this new "ORATOR II" system are capable of general text
      synthesis. The ORATOR II system has a more natural-sounding voice.
    * Hardware: Runs on common SPARC or Decstation workstations, using
      their internal audio output capability. Recommend at least 16M of
      memory.
    * WWW: More detailed information plus examples of ORATOR synthesis
      are available on the ORATOR WWW pages:
      http://www.bellcore.com/ORATOR/
    * Misc 1: A free demo cassette is available.
    * Misc 2: Examples of Orator are also available on the University of
      Birmingham Speech Synthesis "Museum" WWW site (see Q5.4).
    * Availability and Pricing: Contact Bellcore's Licensing Office
      Tel: 1-800-521-CORE (521-2673)
      Fax: 1-908-336-2559
      Email: Anthony Lindsey: [email protected]
      WWW: http://www.bellcore.com/ORATOR/



PAM - A Text-To-Speech Application

    * Platform: Windows
    * Description: PAM is a talking personal assistant and text reader
      application. It uses the ProVoice TTS package. PAM will verbally
      advise about appointments and reminder messages at specified times
      during the day. It can read text files, clipboard text, and text
      sent in DDE messages. Using the full verbal interface, PAM can be
      used by visually challenged individuals. Shareware - thirty day
      free trial.
    * Requirements: Any Windows sound card, speakers or headphones. Min.
      memory - 4 megs, 8 megs recommended.
    * WWW: A more complete description is available on the JTS homepage:
      http://www.islandnet.com/~tslemko/
    * Availability: The shareware can be downloaded by ftp from
      ftp://ftp.islandnet.com/jts/pam_en3c.zip. The file size is approx.
      1 MByte.
    * Price: $US40 for the registered version.
    * Contact: Tom Slemko: e-mail: [email protected], or,
      JTS Micro Consulting Ltd
      10931 Lytton Road, RR#4, Ladysmith, B.C., Canada, V0R 2E0



ProVerbe Speech Engine from ELAN Informatique

    * Platform: Windows 3.x, NT, 95, OS/2, Unix Solaris, Unix SCO and
      hardware
    * Description: The ProVerbe Speech Engine from ELAN Informatique
      produces natural sounding speech from written text. Naturalness is
      achieved by using the TD-PSOLA process from the CNET (France
      telecom's research lab.) which is based on the concatenation of
      elementary speech units (including diphones). Supported languages
      are British English, American English, Russian, German, French and
      Spanish. For multi-channel applications Elan Informatique also
      provides hardware platforms.
      Elan Informatique provides a SDK reference document (sdken.doc:
      WinWord6 format).
    * Demo versions: Telephone demonstration: +33-561 17 67 01
      Sample sound files and demonstration software available.
      A CD-ROM with all these demonstrations is available by
      registration.
    * Contact: Elan Informatique
      4 rue Jean Rodier, 31400 TOULOUSE FRANCE
      Contact person: Pierre Delrat
      Phone: +33-561-36-0777 Fax: +33-61-36-0770
      BBS: +33-561-36-0788
      E-mail: [email protected]
      ftp: ftp://ftp.elan.fr
      WWW: http://www.elan.fr/



ProVoice Developer's Speech Toolkit from First Byte

    * Platform: ProVoice Developer's Toolkits are available for DOS,
      Windows 3.1, Windows 95, Windows NT, OS/2, and Macintosh.
    * Description: ProVoice allows programmers to add synthesized speech
      to their applications. Your program passes text strings to the
      ProVoice speech engine that translates text into audible speech.
      Male and/or female "SpeechFonts" are available for many languages;
      English, French, German, UK British English, Italian, and Spanish.

      ProVoice converts text to speech in two phases using a set of
      phonetic translation and pronunciation rules. First, the software
      analyzes and translates text into "sound descriptors", a phonetic
      language with pitch, duration, and amplitude codes which are
      needed to produce stress patterns in phrases and sentences. Rules
      are used to analyze words, numbers, and punctuation. The second
      phase converts the intermediate phonetic language in speech
      signals; algorithms drive distinct speech signals into smooth
      flowing, continuous, clear speech. Real time synchronization of
      mouth movement and word boundaries allows animation of a graphical
      talking character, or highlighting of displayed text as it is
      spoken.
      Necessary tools and examples are provided for programmers to
      manipulate the ProVoice speech technology; including installation
      instructions, extensive samples programs, and complete
      documentation. In addition, sample code is provided on disk to
      illustrate speech programming techniques.
    * Note 1: First Byte will perform custom work for embedded systems.
    * Note 2: ProVoice Windows includes support for the Microsoft SAPI.
      It will speak through any Windows-supported wave audio device.
    * Note 3: Distribution of ProVoice for commercial use is subject to
      execution of a Commercial Product Distribution License Agreement.
    * WWW: For more detailed information and examples go to the First
      Byte WWW page: http://www.firstbyte.davd.com/
    * See also: Monologue for Windows from First Byte
    * Price and Availability: Contact First Byte
    * Contact: First Byte
      19840 Pioneer Ave., Torrance, CA 90503
      Ph: 310-793-0610, Fax: 310-793-0611
      Email: [email protected]
      WWW: http://www.firstbyte.davd.com/



RC Systems V8600/V8601 Text to Speech synthesizers

    * Platform 1: IBM PC: ISA card.
    * Platform 2: Interface to PC/104 standard microcontrollers.
    * Platform 3: Standalone (or embedded) hardware thru RS232 or
      parallel printer port or processor bus.
    * Description: Converts plain ASCII text to speech. Programmable
      voices, pitch rate, volume, etc. Built-in DTMF and tone
      generators.
    * Price: $151-$299 US (qty 1)
    * Contact: RC Systems

   1609 England Avenue, Everett, WA 98203, USA
   Ph: (206) 355-3800 Fax: (206) 355-1098
   Europe: +44181 539-0285



rsynth

    * Platform: Various (including Solaris2.3, SunOS4.1.3, HPUX, SGI
      Irix4.x, Linux)
    * Description: Public domain text-to-speech systm assembled from a
      variety of sources. It supports CMU and BEEP format dictionaries
      (as described in Q1.10) and now utilises stress marks in the
      dictionary in synthesising intonation.
    * Price: Free
    * Misc: Axel Belinfante has implemented a WWW rsynth demo:
      http://wwwtios.cs.utwente.nl/say.
    * Availability: by anonymous ftp from

               ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/synthesis/rsy
               nth-2.0.tar.Z

               ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/synthesis/rsy
               nth-2.0.tar.gz



SENSYN speech synthesizer

    * Platform: PC/DOS/Windows, Macintosh, Sun, and NeXT
    * Rough Cost: $300
    * Description: This formant synthesizer produces speech waveform
      files based on the (Klatt) KLSYN88 synthesizer. It is intended for
      laboratory and research use. Note that this is NOT a
      text-to-speech synthesizer, but creates speech sounds based upon a
      large number of input variables (formant frequencies, bandwidths,
      glottal pulse characteristics, etc.) and would be used as part of
      a TTS system. Includes full source code.
    * Availability: Sensimetrics Corporation
      Sidney Street, Cambridge MA 02139.
      Fax: (617) 225-0470; Tel: (617) 225-2442.
      Email: [email protected]
      WWW: http://www.sens.com/



SGI Developers Toolbox Synthesiser

    * Platform: SGI
    * Description: The SGI Developer Toolbox 4.0 CDROM contains a
      basicpublic domain text-to-speech program in the publics/speak
      directory. The directory includes man pages and source.
    * Availability: on the SGI Developer Toolbox 4.0 CDROM



SIMTEL

  A wide range of speech related software, sound-blaster software and
  signal processing software for PCs is available on SimTel and its
  mirror sites. It can be obtained by ftp from:

         ftp://ftp.coast.net/SimTel/msdos/voice/

  and is now on the WWW:

         http://www.acs.oakland.edu/oak/SimTel/win3/sound.html

   Voicemaker

  The archives include the program Voicemaker which synthesises speech
  from phonemes using "concatenation" of phonemes recorded by the user.
  Voicemaker is a freeware program. It requires an IBM or compatible,
  512KB RAM, sound blaster compatible sound card.

         ftp://ftp.coast.net/SimTel/msdos/voice/vm110.zip



Sound Bytes DeveloperUs Kit

    * Platform: Subroutine library for Windows, OS/2 and Macintosh
    * Hardware: Windows - 16 MHz 80386 (minimum) running Windows 3.1; 4
      Mb RAM with at least 1.4 Mb RAM free. Disk space 1.4 Mb.
      OS/2 - 16 MHz 80386 (minimum) running OS/2 2.0 or above; 8 Mb RAM
      with at least 1.4 Mb RAM free.
      Mac - Any Mac with at least 2.5 Mb of RAM running 6.0.4 or higher.
      Telephone compatible. Compatible with commonly used sound cards.
    * Description: SBDK is a software-only sentence-level synthesizer
      that converts unrestricted English text (ASCII) into synthesized
      voice through diphone concatenation. SBDK utlizes parsing to
      incorporate the intonational and rhythmic patterns of normal
      speech. The developerUs kit includes two voices, one female and
      one male. The product has a 55,000-word built-in dictionary and a
      tool for creating customized user dictionaries. It converts
      numbers, dates, dollars, phone numbers and times to words, and has
      a SoundOut facility that provides a choice of pronouncing unknown
      words phonetically or spelling them out. Developers can vary voice
      pitch (130-220 Hz) and rate (65-200 wpm), synchronize speech to
      other events, have multiple channels of speech to the same or
      different boards, etc. Speech sampling options: 8-bit linear;
      8-bit companded at 11 kHz (Windows); 8-bit mu-law PCM at 8 or 11
      kHz; 16-bit linear at 11 kHz.
    * Cost: Sound Bytes may be licensed for internal use or resale. Site
      license fee= $3750. Resale or Internal runtime fees= 2% of net
      sales price per runtime sold, OR $150 per telephone port, OR per
      unit pricing for internal use determined case-by-case.
    * Misc: Demo disks are available for Windows and the Mac.
    * Availability: Natural Speech Technologies, Inc.
      Ph: (619) 457-2526.



spchsyn.exe

    * Platform: DOS
    * Availability: By anonymous ftp as a self extracting DOS archive.
      ftp://evans.ee.adfa.oz.au/mirrors/tibbs/applications/spchsyn.exe
    * Requirements: May require special TI product(s), but all source is
      there.



"Speak" - a Text to Speech Program

    * Platform: Sun SPARC
    * Description: Text to speech program based on concatenation of
      pre-recorded speech segments. A function library can be used to
      integrate speech output into other code.
    * Hardware: SPARC audio I/O
    * Availability: by anonymous ftp
      ftp://wilma.cs.brown.edu/pub/speak.tar.Z



Speech Manager and PlainTalk

    * Platform: Macintosh
    * Description: Apple's text-to-speech system extensions that enable
      applications to perform text-to-speech conversion. The Speech
      Manager runs on most Macs, but PlainTalk (and the high quality
      voices) requires a 68020 Mac or better.
    * Availability: By anonymous ftp from:
      ftp://ftp.support.apple.com/pub/apple_sw_updates/US/Macintosh/Syst
      em/PlainTalk 1.4.1/
      This directory contains subdirectories for recent versions of
      PlainTalk. The current release (PlainTalk 1.4.1) contains the
      English Text-To-Speech with about a dozen voices
      (English_Text-to-Speech.hqx: 5.3 MByte), Mexican Spanish
      (Mexican_Spanish_TTS.hqx: 2.8 MByte), and the English Speech
      Recognition software (English_Speech_Recognition.hqx: 2.3MByte).
    * Cost: Free
    * WWW: The latest information is available from Apple's WWW page for
      speech recognition and synthesis:
      http://www.speech.apple.com/
    * Note 1: Check out Kevin Lenzo's list of Macintosh Speech
      Applications.
    * Note 2: Joshua Baer ([email protected]) runs a mailing list for
      Plaintalk. For subscription and other information visit the
      Plaintalk Discussion List Home page
    * Contact: Apple Computer, Inc.
      1 Infinite Loop, Cupertino, CA 95014, USA
      WWW: http://www.speech.apple.com/
      Email: [email protected]



Text to phoneme program (1)

    * Platform: unknown
    * Description: Text to phoneme program. Based on Naval Research
      Lab's set of text to phoneme rules.
    * Availability: by anonymous ftp
      ftp://shark.cse.fau.edu/pub/src/phon.tar.Z



Text to phoneme program (2)

    * Platform: unknown
    * Description: Text to phoneme program.
    * Availability: by anonymous ftp
      ftp://ftp.doc.ic.ac.uk/packages/unix-c/utils/phoneme.c.gz



Text to phoneme program (3)

    * Description: A public domain version of the same Naval Research
      Lab text to phoneme rules.
    * Availability: By anonymous ftp
      ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/synthesis/english2phon
      eme.tar.gz



Tinytalk

    * Platform: DOS / Windows???
    * Description: Shareware package is a speech 'screen reader' which
      is used by many blind users.
    * Price: Tinytalk is now $150. There are package deals on Tinytalk
      with various speech synthesizers.
    * Availability: Tinytalk is available by anonymous ftp from the
      following site

       Files: ttexe167.zip and ttdoc167.zip (executable and
               documenation)
               ftp://ftp.netcom.com/pub/eb/ebohlman/

      (Note: it is a busy ftp server.)
    * Contact: Eric Bohlman

   OMS Development
   610-B Forest Ave., Wilmette, IL 60091
   Ph: (800)831-0272 Fax: 708-251-5793
   Outside North America: (708)-251-5787
   Email: [email protected]



TrueTalk

    * Platform: Sun Sparcstation 1+/2/LX/5/10/20 with SunOS 4.1.3, or
      SGI Indy/Indigo/Indigo2 with IRIX 5.2. More platforms in
      development.
    * Description: Personal TrueTalk, by Entropic Research Laboratory,
      Inc., is an all-software Text-to-Speech (TTS) system designed to
      voice-enable UNIX X-Windows workstations. It combines a graphical
      interface with a powerful TTS engine based on technology developed
      by AT&T Bell Laboratories. Features include:
         + Intelligible, prosodically natural speech.
         + Text taken from file input, highlighted X selections, the
           interface scratch pad, other programs connected through a
           TCP/IP socket, or Tcl/Tk applications via the Tk "send"
           mechanism.
         + Stop, pause and resume while speech is in progress.
         + Visual indication of corresponding text position when paused.
         + Nine speaking voices, with Male and Female versions of each
           voice.
         + Adjustable speaking rate and volume.
         + Supports drop-in text filters; "email" and "lively" examples
           included.
         + Audio output through workstation headphones or speaker.
         + Complete on-line documentation, including mouse-activated
           help windows.
    * Misc: A more detailed description of TrueTalk is available on the
      Entropic WWW server: http://www.entropic.com/truetalk.com
    * Availability: You can obtain Personal TrueTalk through the
      Internet. For details, see

               ftp://ftp.entropic.com/pub/truetalk/README.ptt

      Personal TrueTalk is available free of charge for evaluation
      purposes. You can fully-enable your evaluation copy at any time by
      purchasing a license key from Entropic.
    * Requirements: 12MB disk space, 8MB process size (24MB system RAM
      recommended).
    * Cost: US$495; US$395 academic
    * Contact: Entropic Research Laboratory, Inc.,
      Washington, D.C.
      Voice: 1-800-ENTROPIC (North America), (202) 547 1420
      Fax: (202) 547-6648
      Email: [email protected]
      WWW: http://www.entropic.com/



TruVoice from Centigram

    * Platform: Windows-NT, Windows 95, Windows 3.1 (limited release),
      Sun Solaris 2.x
    * Description: TruVoice., an advanced text-to-speech converter, is
      available for multiple environments. TruVoice converts text into
      spoken language. TruVoice adds intelligible, natural-sounding
      speech to sound enabled platforms.
         + Small, 1.5MB, memory footprint
         + Advanced text pre-processing
         + No vocabulary restrictions
         + User-definable pronunciation dictionary
         + Accurately pronounces surnames and place names
         + Preprocessor provides e-mail and spreadsheet reading
           capabilities and expands abbreviations.
         + Multiple languages available: American English, Latin
           American Spanish, German, French, Italian
         + Flexible pitch, volume and speech rate
         + Intonation support for punctuation
         + Supports navigational capabilities such as, pause, resume and
           jump forward / jump back with sentence or word boundaries
      More detailed information is provided in the brochure page on the
      Centigram WWW site.
      A demonstration of TruVoice is available on the Centigram WWW
      pages.
    * Cost:
         + Windows versions are $495 for the SDK
         + Solaris versions are $995
         + Contact Centigram for other pricing.
    * Contact: TruVoice Sales
      Centigram Communications Corporation
      91 East Tasman Drive, San Jose, CA 95134
      Ph: (408) 944-0250 Fax: (408) 428-3732
      Demo: 800-746 1632
      Email: [email protected]
      WWW: http://www.centigram.com/



WinSpeech

    * Platform: Windows
    * Description: WinSpeech is a text-to-speech application that reads
      text and produces speech to the audio output. Features basic text
      editing tools, talk from editing window, DDE server allows other
      Windows applications to send text for talking, coach mode for
      providing audio instructions throughout the program, dictionary
      editing tools for customizing pronunciation.
      WSPLIB text-to-speech DLL is a speech functions library for
      developers. More information available by email.
    * Requirements: System requirements: IBM PC or compatible computer
      with Windows 3.1 or higher. Sound card is recommended but not
      required.
    * Availability: Freeware available through the PC WholeWare WWW
      page.
    * Contact: PC WholeWare
      33 Justin Street, Lexington, MA 02173, U.S.A.
      Email: [email protected]
      WWW: http://www.pcww.com/index.html



WreadFiles: File reader for Commodore Amiga

    * Platform: Commodore Amiga
    * Description: WreadFiles is a vocal text file reader program for
      use on the Commodore Amiga. The text is printed to the screen and
      spoken. Features include:
         + Text is read in sentences rather than lines.
         + Dynamic Speech Correction on over 4000 word or word
           fragments.
         + Pronunciations for many place names, personal names, foreign
           names, foreign expressions and abbreviations.
         + Run from Workbench or CLI.
         + Used with A1000 (OS 1.3), A3000 (OS 2.04-2.1), and A4000 (OS
           3.0)
    * Requirements: Standard Amiga Translator.library and
      Narrator.device required. 2.04 versions recommended. 1 Meg or more
      ram recommended. External speakers required.
    * Availability: No fee requested for non-commercial use. From:
         + GEnie: Page 555,3 File Number 24627
         + Aminet
           ftp://ftp.wustl.edu/pub/aminet/util/misc/WreadFiles47.lha
    * Contact: Written by Michael L. Barlow
      Email: [email protected] or [email protected] or
      [email protected]



ZMD Speech Synthesis

 "Speaky" Speech Synthesis from ZMD

    * Platform: DSP solution for platform independent speech synthesis
      implementation
    * Description: "Speaky" provides German speech synthesis system in a
      DSP solution. It includes pre-processing of input ASCII text with
      unlimited vocabulary, both parametric and non-parametric speech
      synthesis algorithms, and prosody modelling. More detailed
      information and audio samples can be found at the ZMD WWW Site.
    * Contact: Zentrum Mikroelektronik Dresden GmbH
      Grenzstrasse 28, D-01109 Dresden, Germany
      Ph: +49-351-8822-306, Fax: +49-351-8822-337
      Email: [email protected]
      WWW: http://www.zmd-gmbh.de/

 ZMD PCMCIA Speech Synthesis Card

    * Platform: MS-DOS, Windows
    * Description: Complete text-to-speech synthesis system for the
      German language with unlimited vocabulary using VOICE Processor
      "Speaky". The required pre-processing of the input ASCII text is
      performed by a software programm that is downloaded automatically
      from the PCMCIA Speech Synthesis Card during the card's
      initialising routine. Headphone or active loudspeaker can be
      connected directly for signal output. More detailed information
      and audio samples can be found at the ZMD WWW Site.
    * Requirements: PC Card slot, Card & Socket Services Software
    * Contact: Zentrum Mikroelektronik Dresden GmbH
      Grenzstrasse 28, D-01109 Dresden, Germany
      Ph: +49-351-8822-306, Fax: +49-351-8822-337
      Email: [email protected]
      WWW: http://www.zmd-gmbh.de/


___________________________________________________________________________

                            Speech Recognition

                        comp.speech FAQ Section 6

         * SpeechLinks: Speech Recognition
         * Q6.1: What is speech recognition?
         * Q6.2: How is speech recognition performed?
         * Q6.3: How can I build a simple speech recogniser?
         * Q6.4: References & books on speech recognition
         * Q6.5: Speech Recognition Hardware/Software
         * Q6.6: Speaker Recognition (Verification and Identification)
         * Q6.7: Integrated Speech Products


___________________________________________________________________________

                  Q6.1: What is speech recognition?

Automatic Speech Recognition

  Automatic speech recognition is the process by which a computer maps
  an acoustic speech signal to text.

  Automatic speech understanding is the process by which a computer maps
  an acoustic speech signal to some form of abstract meaning of the
  speech.

What does speaker dependent / adaptive / independent mean?

  A speaker dependent system is developed to operate for a single
  speaker. These systems are usually easier to develop, cheaper to buy
  and more accurate, but not as flexible as speaker adaptive or speaker
  independent systems.

  A speaker independent system is developed to operate for any speaker
  of a particular type (e.g. American English). These systems are the
  most difficult to develop, most expensive and accuracy is lower than
  speaker dependent systems. However, they are more flexible.

  A speaker adaptive system is developed to adapt its operation to the
  characteristics of new speakers. It's difficulty lies somewhere
  between speaker independent and speaker dependent systems.

What does small/medium/large/very-large vocabulary mean?

  The size of vocabulary of a speech recognition system affects the
  complexity, processing requirements and the accuracy of the system.
  Some applications only require a few words (e.g. numbers only), others
  require very large dictionaries (e.g. dictation machines). There are
  no established definitions, however, try

    * small vocabulary - tens of words
    * medium vocabulary - hundreds of words
    * large vocabulary - thousands of words
    * very-large vocabulary - tens of thousands of words.

What does continuous speech or isolated-word mean?

  An isolated-word system operates on single words at a time - requiring
  a pause between saying each word. This is the simplest form of
  recognition to perform because the end points are easier to find and
  the pronunciation of a word tends not affect others. Thus, because the
  occurrences of words are more consistent they are easier to recognise.

  A continuous speech system operates on speech in which words are
  connected together, i.e. not separated by pauses. Continuous speech is
  more difficult to handle because of a variety of effects. First, it is
  difficult to find the start and end points of words. Another problem
  is "coarticulation". The production of each phoneme is affected by the
  production of surrounding phonemes, and similarly the the start and
  end of words are affected by the preceding and following words. The
  recognition of continuous speech is also affected by the rate of
  speech (fast speech tends to be harder).


___________________________________________________________________________

              Q6.2: How is speech recognition performed?

  A wide variety of techniques are used to perform speech recognition.
  There are many types of speech recognition. There are many levels of
  speech recognition / analysis / understanding.

  Typically speech recognition starts with the digital sampling of
  speech. The next stage is acoustic signal processing. Most techniques
  include spectral analysis; e.g. LPC analysis (Linear Predictive
  Coding), MFCC (Mel Frequency Cepstral Coefficients), cochlea modelling
  and many more.

  The next stage is recognition of phonemes, groups of phonemes and
  words. This stage can be achieved by many processes such as DTW
  (Dynamic Time Warping), HMM (hidden Markov modelling), NNs (Neural
  Networks), expert systems and combinations of techniques. HMM-based
  systems are currently the most commonly used and most successful
  approach.

  Most systems utilise some knowledge of the language to aid the
  recognition process.

  Some systems try to "understand" speech. That is, they try to convert
  the words into a representation of what the speaker intended to mean
  or achieve by what they said.


___________________________________________________________________________

          Q6.3: How can I build a simple speech recogniser?

   QUICKY RECOGNIZER sketch:

  Doug Danforth provides a detailed account in article 253 in the
  comp.speech archives. A summary is provided below. It is also
  available by anonymous ftp

         ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/info/DIY_SpeechReco
         gnition

  This is a simple recognizer that should give you 85%+ recognition
  accuracy. The accuracy is a function of the words you have in your
  vocabulary. Long distinct words are easy. Short similar words are
  hard. You can get 98+% on the digits with this recognizer.

  Overview:

    * Find the begining and end of the utterance.
    * Filter the raw signal into frequency bands.
    * Cut the utterance into a fixed number of segments.
    * Average data for each band in each segment.
    * Store this pattern with its name.
    * Collect training set of about 3 repetitions of each pattern
      (word).
    * Recognize unknown by comparing its pattern against all patterns in
      the training set and returning the name of the pattern closest to
      the unknown.

  Many variations upon the theme can be made to improve the performance.
  Try different filtering of the raw signal and different processing
  methods.

   Public Domain Recognition Software

  Q6.5 contains information on public domain speech recognition software
  including: Lotec and Myers' Hidden Markov Model software.

   Discrete Hidden Markov Model Demonstration Software

  Hidden Markov Models (HMMs) are widely used in speech recognition
  systems. Joe Picone has put together some demonstration software for
  basic discrete HMMs including Viterbi and Baum-Welch training and
  evaluation, random sequence generation (generating data from a model),
  and model updating (useful for incremental training). There is a
  simple demo program that supports all of these modes from command line
  arguments. This allows experiments to test the classic coin-toss
  examples commonly described in textbooks. The code closely parallels
  the following textbook:

    * J.R. Deller, Jr., J.G. Proakis, and J.H.L. Hansen, Discrete-Time
      Processing of Speech Signals, MacMillan, 1993, ISBN:
      0-02-328301-7.

  The code is written in C++ and is intended to facilitate learning and
  understanding of the algorithms. The code is available on the ISIP web
  site:
  http://www.isip.msstate.edu/software/

  Lecture notes corresponding to the examples are also available:
  http://www.isip.msstate.edu/publications/1996/speech_recognition_short
  _course


___________________________________________________________________________

            Q6.4: References & books on speech recognition

    * Product Reviews and Comparisons
    * Using Speech Recognition: Health Issues
    * On the WWW
    * Technology: General and Introductory
    * Technical
    * Course Notes
    * Bibliographies and Reference Lists

 Product Reviews and Comparisons

    * "Talk Show", Wayne Rash Jr., PC Magazine (USA), Dec 20, 1994.
    * "Seybold Report on Desktop Publishing" published a nine-page,
      head-to-head comparison of Dragon's DOS software with IBM's OS/2
      software. March 7, 1994; Volume 8, Number 7; Pages 3-11;
      ISSN:0889-9762; Seybold Publications, P.O. Box 644, Media, PA
      19063 USA, phone (610) 565-2480.
    * McGraw-Hill Inc.'s "BYTE, the Magazine of Technology Integration,"
      published a two-page review of IBM's Personal Dictation System
      software. May 1994; Volume ?, Number ?; Pages 145-146;
      ISSN:0360-5280; Editorial, Executive, and Circulation address: One
      Phoenix Mill Lane, Peterborough, NH 03458 USA, phone ?

 Using Speech Recognition: Health Issues

    * The National Center for Voice and Speech provides some basic
      information on preserving "Vocal Health" on their WWW site:
      http://www.shc.uiowa.edu/hygiene/home.html
    * Voice Users Mailing List: detail in Q1.4.html of the FAQ.
    * Typing Injury FAQ: http://www.cs.princeton.edu:80/~dwallach/tifaq/
      has a range of information on Typing Injuries, avoiding them,
      alternatives and more.
    * Typing Injuries Page:
      http://alumni.caltech.edu/~dank/typing-archive.html has links to
      dozens of useful resources.
    * Voice Problems -- Prevention and Correction: advice on preserving
      your voice with specific hints for using speech recognition.
      ftp://ftp.csua.berkeley.edu/pub/typing-injury/voice-problems
    * " Talking to a PC May Be Hazard To Your Throat", by Julie Chao in
      the Wall Street Journal.
    * " Talking to Computers Has its Hazards", by Gordon Arnaut in The
      Globe and Mail

 On the WWW

    * Survey of the State of the Art in Human Language Technology:
      Report edited by Ronald A. Cole et. al. with a section on Spoken
      Input Technologies.
      http://www.cse.ogi.edu/CSLU/HLTsurvey/ch1node2.html

 Technology: General and Introductory

  Some general introduction books on speech recognition technology:

    * Fundamentals of Speech Recognition; Lawrence Rabiner & Biing-Hwang
      Juang Englewood Cliffs NJ: PTR Prentice Hall (Signal Processing
      Series), c1993, ISBN 0-13-015157-2
    * Speech recognition by machine; W.A. Ainsworth London: Peregrinus
      for the Institution of Electrical Engineers, c1988
    * Speech synthesis and recognition; J.N. Holmes Wokingham: Van
      Nostrand Reinhold, c1988
    * Speech Communication: Human and Machine, Douglas O'Shaughnessy;
      Addison Wesley series in Electrical Engineering: Digital Signal
      Processing, 1987.
    * Electronic speech recognition: techniques, technology and
      applications, edited by Geoff Bristow, London: Collins, 1986
    * Readings in Speech Recognition; edited by Alex Waibel & Kai-Fu
      Lee. San Mateo: Morgan Kaufmann, c1990

 Technical

    * Hidden Markov models for speech recognition; X.D. Huang, Y. Ariki,
      M.A. Jack. Edinburgh: Edinburgh University Press, c1990
    * Speech Recognition: The Complete Practical Reference Guide; T.
      Schalk, P. J. Foster: Telecom Library Inc, New York; ISBN
      O-9366648-39-2; 377 pages; paperback only. Covers speech
      recognition in a telephony environment and wish to use call
      processing hardware based in PCs. It is written using Dialogic
      hardware as the example for the hardware.
    * Automatic speech recognition: the development of the SPHINX
      system; by Kai-Fu Lee; Boston; London: Kluwer Academic, c1989
    * An Introduction to the Application of the Theory of Probabilistic
      Functions of a Markov Process to Automatic Speech Recognition, S.
      E. Levinson, L. R. Rabiner and M. M. Sondhi; in Bell Syst. Tech.
      Jnl. v62(4), pp1035--1074, April 1983
    * Review of Neural Networks for Speech Recognition, R. P. Lippmann;
      in Neural Computation, v1(1), pp 1-38, 1989.
    * Automatic Speech and Speaker Recognition: Advanced Topics, C.H.
      Lee, F.K. Soong and K.K. Paliwal (Eds.), Kluwer, Boston, 1996.

 Course Notes

    * Joseph Picone of the Institute for Signal and Information
      Processing (ISIP) at Mississippi State University has put the
      course notes for "Fundamentals of Speech Recognition" on the WWW.
      The course covers background probability and phonetics/acoustics,
      speech signal analysis, dynamic programming, dynamic time warping,
      hidden Markov modelling, language modelling, neural networks, etc.
      The WWW sites provides the syllabus and lecture notes.
      WWW: http://www.isip.msstate.edu/publications/1996/ee_8993/

 Bibliographies and Reference Lists

    * WWW searchable online-bibiliography for Phonetics and Speech
      Technology with more than 8000 entries. Provided by Institut fur
      Phonetik at Johann Wolfgang Goethe-Universitat Frankfurt.
      http://www.uni-frankfurt.de/~ifb/bib_engl.html
    * Computational Speech Processing: Speech Analysis, Recognition,
      Understanding, Compression, Transmission, Coding, Synthesis ; Text
      to Speech Systems, Speech to Tactile Displays, Speaker
      Identification, Prosody Processing : BIBLIOGRAPHY, by Conrad F.
      Sabourin, 1994, 2 volumes, 1187p, ISBN 2-921173-21-2, INFOLINGUA
      inc., P.O. Box 187 Snowdon, Montreal, H3X 3T4, Canada.
      See also: http://gomer.mlink.net/infolingua.html


___________________________________________________________________________

            Q6.5: Speech Recognition Hardware and Software

  The number of speech recognition packages, and the information about
  the software is changing rapidly. Any help with keeping this
  information up to date will be appreciated.

    * Products in the FAQ
    * Speech Recognition Processors (ICs)
    * Recognition Information on the WWW
    * Speech Recognition Resellers and Value-Add

 In the FAQ:

  The following speech recognition software/hardware is described in the
  comp.speech FAQ.

  _Apple Macintosh_
         * Digital Dreams Speech Recognition Plug-Ins
         * Dragon Dictation Products
         * Macintosh Speech Recognition Manager
         * PowerSecretary

  _Windows (including 95, NT, 3.1)_
         * AT&T Watson Speech Recognition
         * Cambridge Voice for Windows
         * CustomVoice and CustomTelephone: A&G Graphics Interface Inc.
         * DragonDictate for Windows
         * Dragon Dictation Products
         * Dragon Developer Tools
         * Ficomp Interpreter 6000
         * IBM VoiceType Dictation and Control
         * IN CUBE
         * Kurzweil Speech Recognition (2 products)
         * Lernout & Hauspie ASR SDK
         * Listen for Windows 2.0 from Verbex Voice Systems
         * Microsoft Speech Recognition
         * NCC Dictate
         * Phonetic Engine 500 (PE500) from Speech Systems, Inc.
         * Philips Speech Recognition (2 products)
         * ProNotes Voice Tools
         * PureSpeech
         * smARTspeak from Advanced Recognition Technologies, Inc.
         * Visual Voice from Stylus Innovation
         * VoiceAssist for Windows from Creative Labs, Inc.
         * VoiceServer for Windows
         * Whisper
         * WildCard Speech Products

  _DOS_
         * DATAVOX - French
         * Dragon Developer Tools
         * Ficomp Interpreter 6000
         * Jialong He's Speech Recognition Research Tool
         * smARTspeak from Advanced Recognition Technologies, Inc.
         * Votan VPC2100 Voice Card and VSP 1010 Speech Processor

  _OS/2_
         * IBM VoiceType Dictation and Control

  _Unix_
         * AbbotDemo
         * BBN Hark Telephony Recognizer
         * EARS: Single Word Recognition Package
         * Ficomp Interpreter 6000
         * Hidden Markov Model Toolkit (HTK) from Entropic
         * IN CUBE
         * Jialong He's Speech Recognition Research Tool
         * Lotec Speech Recognition Package
         * Myers' Hidden Markov Model software
         * NICO Artificial Neural Network Toolkit
         * Nuance Speech Recognition System
         * PureSpeech
         * recnet

  _Integrated Circuits and Dedicated Hardware_
         * HM2007 - Speech Recognition Chip
         * OKI VRP6679 - Speech Recognition Chip
         * Sensory Inc. Integrated Circuits
         * Speech Commander - Verbex Voice Systems
         * Voice Control Systems Recognition
         * VCS 2030 & 2060 Voice Dialer

  _Other Platforms_
         * Simon Says (NeXT)
         * Voice Command Line Interface (Amiga)
         * Visus SpeechKit

  _Unknown_
         * Berkeley Restaurant Project (BeRP)
         * Lernout & Hauspie ASR (3 products)
         * Voice-Trek 2.0
         * Voicetek Corp.
         * Voice Processing Corporation Speech Recognition Product Line

 Speech Recognition Processors (ICs)

  Jean-Pierre Lereboullet has put together a detailed list of Voice
  Recognition Processors which covers about 15 ICs and pieces of related
  hardware (including D6106, HM2007, MSM6679, RSC-164, TC8860F/64F/65F,
  5A128).
  The document is available on the comp.speech ftp server:
  ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/info/VoiceRecognitionProce
  ssors

 Recognition Information on the WWW

  In addition to the entries on speech recognition in this FAQ, the
  following WWW sites provide information on speech recognition:

   Commercial Speech Recognition: Russ Wilcox of PureSpeech Inc.

         http://www.tiac.net/users/rwilcox/speech.html

   Macintosh Speech Resources and Apps
         http://www.cs.cmu.edu/~lenzo/mac_speech_apps.html

   Speech Recognition Information: 21st Century Eloquence
         http://www.voicerecognition.com/

   Applied Speech Technology Laboratory of CLSI at Stanford
         http://csli-www.stanford.edu/users/bscott/SRTech.html

   Speech Toys Speech Recognition Page
         http://www.speechtoys.com/spchtoys/sprec.html

   Speech recognition product lists: postings to comp.speech
         ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/info/SpeechRecognit
         ionProducts

   Search Alta Vista for Speech Recognition

   Search Lycos for Speech Recognition

   Yahoo pages on Speech Recognition
         http://www.yahoo.com/business/corporations/computers/software/v
         oice_recognition/
         http://www.yahoo.com/Science/Computer_Science/Artificial_Intell
         igence/Natural_Language_Processing/Speech_Recognition/

 Speech Recognition Resellers and Value-Added Services

   1stVoice
         2470 El Camino Real, Suite 110, Palo Alto CA 94306-1701
         Ph: 415-857-1320, Fax: 415-856-6996
         WWW: http://www.1stvoice.com/
         Email: [email protected]
         Dragon Dictation Products

   21st Century Eloquence
         325-A Royal Poinciana Plaza, Palm Beach, Florida 33480, USA
         Ph: 800-245-2133, Fax: 407-835-4901
         WWW: http://www.voicerecognition.com/
         Kurzweil, IBM VoiceType, Dragon, Kolvox

   Auscript (Australia)
         Suite 2, Level 3, 60-70 Elizabeth St, Sydney, NSW 2000,
         Australia
         Ph: +61-2-238 6565, Fax: +61-2-238 6566
         WWW: http://www.auscript.com.au/
         Dragon Systems

   BRITE
         WWW: http://www.brite.com/
         Computer Telephony Integration & Interactive Voice Response

   DAX Systems, Inc.
         30 Chapin Road, Unit 1201, P.O. Box 778, Pine Brook, NJ/USA
         07058
         Ph: +1-201-227-8111, Fax: +1-201-227-8197
         Email: [email protected]
         WWW: http://www.daxsystems.com/
         Computer Telephony and Integrated Voice Response

   HealthCare Resources
         1444 Aviation Blvd, #103, Redondo Beach, CA 90278, USA
         Ph: +1-310-937-5156, Fax: +1-310-937-5159
         EMail: [email protected]
         Power Secretary & Dragon Dictate. Specializing in:
         Medical/Dental, Motion Picture Industry, Carpal Tunnel related
         and Disabled Persons.

   O'Brien Resources
         Ph: (540) 347-4988 (Address unknown)
         Email: [email protected]
         WWW: http://www.crosslink.net/~obrien/
         Kurzweil Voice Recognition Products

   SCI VoiceAutomated
         215 1/2 Main Street, Huntington Beach, CA 92648, USA
         Ph: 800-597-6600, Ph: +1-714-969-7632, Fax: +1-714-969-0122
         http://www.voiceautomated.com/
         IBM VoiceType, Kurzweil Voice, DragonDictate and Philips
         speech.

   Synapse
         3095 Kerner Blvd., Suite S, San Rafael, CA 94901, USA
         Ph: (415) 455-9700, Fax: (415) 455-9801
         Email: [email protected]
         WWW: http://www.synapseadaptive.com/
         Dragon Systems, Kurzweil and IBM products.

   Talk Technology
         Ph: 1-800-270-1672, Fax: 1-516-360-1213
         Email: [email protected]
         http://www.talktechnology.com/

   Talk Technology, Inc.
         Tel: +1-718-745-9199, Fax: +1-718-499-6480
         Email: [email protected]
         WWW: http://www.usbusiness.com/talk/
         Dragon Dictate and portable (notebook) solutions

   ToppCopy Telecom
         Email: [email protected]
         WWW: http://www.toppcopy.com/
         Philips Digital Dictation

   VoiceWare Systems
         230 California Street, Suite 410, San Francisco, CA 94111
         Ph: (415) 433-2001, Fax: (415) 433-6909
         Email: [email protected]
         WWW: http://www.talk2type.com/home.htm
         IBM, Dragon Systems, Kurzweil Applied Intelligence, WildCard
         Technologies

   WorkLink
         A.D.A. Solutions by WorkLink
         2566-A Telegraph Avenue, Berkeley, California 94704 USA
         Ph: 510-848-8363, Fax:510-848-7322
         WWW: http://www.worklink.net/
         Email: [email protected]
         Dragon Dictation Products



AbbotDemo

    * Platform: SunOS4, IRIX, Linux, HU-UX
    * Description: Large vocabulary, speaker independent, continuous
      automatic speech recognition system. Uses recurrent neural
      networks and hidden Markov models with a 5,000 word vocabulary
      upgradable) and a trigram word grammar. Includes a front end for
      waveform capture and display (including spectrogram) and a
      graphical display of the phoneme representation as well as a
      rewriting display of the best guess word sequence.
    * Requirements: UN*X, X, 8 Mbyte free RAM, 486DX or faster
      processor, 16 bit soundcard, reasonable quality microphone and a
      copy of the Wall Street Journal newspaper.
    * Price: Free for non-commercial use
    * Availability: By anonymous ftp from

       ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/recognition/AbbotDemo

    * Note 1: This is not a complete system for dictation.
    * Note 2: At present there are no sources with this distribution.
      For sources for an earlier version see the recnet entry.
    * Note 3: Not supported.
    * Contact: [email protected]
      Tony Robinson
      Cambridge University Engineering Department
      Trumpington Street, Cambridge, CB2 1PZ, UK
      Tel: +44-1223-332815 Fax: +44-1223-332662



AT&T Watson Speech Recognition

    * Platform: Windows 95/NT on a Pentium 75 Mhz or higher
    * Description: Watson is a software implementation of AT&T Bell
      Laboratories voice processing technology. Watson includes BLASR
      Speech Recognition and FlexTalk speech synthesis (see Q5.5). It
      requires no special hardware to run other than a standard sound
      card and/or phone card. Technical details for BLASR Speech
      Recognition include:
         + Compliant with Microsoft Speech API and Telephone API
         + Speaker independent, continuous speech recognition
         + Fast, run-time vocabulary change
         + Open mic and telephone line environments
         + SoundBlaster compatible sound card and drivers required
         + Subword models and whole-word digit models
         + Background, silence, and filler/garbage models
         + 50 word name vocabulary or 100 word phrase real-time
           recognition with 95% accuracy
         + Rejection of out-of-vocabulary words
         + American English only - other languages in development
         + Barge-in speech begin/end notification - requires hardware
           echo cancellation
      The AT&T Advanced Speech Products Group home page provides more
      detailed information including a Frequently Asked Questions list,
      information for application developers on the Independent Software
      Vendor (ISV) Program (including info on the SDK, licensing, and
      the training program).
    * Requirements: Uses 2 MB RAM, 10 MB Disk. Requires a Pentium 75 MHz
      or higher CPU (uses
    * Cost and Availability: WATSON is a software-based speech platform
      with a Software Developers Kit (SDK) that allows application
      developers to use voice processing in their applications. It is
      not available as a stand-alone product.
      Licensing information (inc. price) is provided in the AT&T
      Advanced Speech Products Group home page
    * See also: Watson FlexTalk speech synthesis in Q5.5, Microsoft
      Speech API, and Advanced Speech API.
    * Contact: AT&T Advanced Speech Products Group
      Suite 700, 44 East Mifflin Street, Madison, WI 53703, USA
      Ph: 1-800-5-WATSON, Fax: 1-608-259-2269
      Email: [email protected]
      WWW: http://www.att.com/aspg/



BBN Hark Telephony Recognizer

    * Platform: Available for Unix-based workstation and PC platforms
      including IBM RS6000/AIX and Pentium/SCO Unix.
    * Description: Large vocabulary (2,000+ words), speaker independent,
      continuous ASR software. Specifically designed for large scale
      telephony applications. Using a client/server architecture, all
      features and capabilities are integrated in one software product
      instead of on separate boards. Very memory efficient, the Hark
      Telephony Recognizer runs in as little as 2MB of physical memory.
      Multiple recognizers can be run on a single platform. Uses Hidden
      Markov Model and phoneme-based BBN recognition algorithms. An API
      is provided for integration with existing applications. A
      developer's toolkit is available.
    * Price and availability: Price varies depending on vocabulary size.
      Version 3.0 available immediately.
    * Misc: BBN Hark provides application design and human factors
      consulting services. Regular monthly training classes on
      developing speech-enabled applications are held at BBN Hark's
      Cambridge (Mass) headquarters.
    * WWW: For additional information see BBN Hark's home page.
    * Contact: BBN Hark Systems
      70 Fawcett Street, Cambridge, MA 02138, USA
      Tel: 617-873-4636 Fax: 617-873-2473
      WWW: http://www.bbn.com/bbn_hark/HarkHome.html



Berkeley Restaurant Project (BeRP)

    * Description: BeRP is a test bed for a speech recognition system
      being developed by the International Computer Science Institute in
      Berkeley, CA. BeRP is a medium-vocabulary, speaker-independent
      spontaneous continuous speech understanding system. BeRP functions
      as a knowledge consultant whose domain is the restaurants in the
      city of Berkeley. The system serves as a testbed for several
      research projects, including robust feature extraction,
      connectionist phonetic likelihood estimation, automatic induction
      of multiple pronunciation lexicons, foreign accent detection and
      modeling, advanced language models, and lip-reading.
    * Note: As far as I know the BeRP software is in-house software -
      that is, it is not made available for distribution.
    * More information: http://www.icsi.berkeley.edu/real/berp.html



Cambridge Voice for Windows

    * Platform: Windows
    * Description: Speaker-independent recognition of continuous speech
      in real time. Vocabularies can range from small to very large
      (more than 60,000 word forms). Support is planned for languages
      including English, Danish, Dutch, French, German, Italian,
      Norwegian, Spanish, Swedish, and Japanese. The engine complies
      with the Microsoft Speech API.
    * Contact: Cambridge Group Research, Ltd.
      Box 7290, Buffalo Grove, IL 60089
      Ph: (708) 821-1040, Fax: (708) 821-1041
      E-mail: [email protected]



CustomVoice and CustomTelephone: A&G Graphics Interface Inc.

    * Platform: Windows
    * CustomVoice: Speech recognition custom control for Visual Basic,
      Visual C++, Borland C++, and other development platforms that
      support *.VBX. Provides an engine/proprietary independent
      development platform for speech recognition. Currently supports
      ICSS, but should soon support other platforms. Includes a grammar
      debugger and parser APIs to parse spoken speech into useful data
      types.
      Requirements: 486/DX or better PC, Windows 3.1 or Windows for
      Workgroups, 8Mb RAM (minimum), SoundBlaster 16, microphone, and
      mouse. Supports Visual Basic, Visual C++, Borland C++, and Delphi.
    * CustomTelephone: Windows-based developers tool that allows
      programmers to build speech enabled "telephony" applications via
      standard custom control properties (VBX). It supports IBM
      VoiceType Application Factory (VTAF), a continuous speech, speaker
      independent speech recognizer, and supports voice response boards
      such as Dialogic. Comes with a VB custom control, pre-built
      grammar sets for common data types, an interactive grammar
      debugger to identify valid speech patterns, and parser API
      functions that convert recognized speech into data types supported
      by VB, C++ and Delphi. Includes sample applications with source
      code, and VBX, VCL and DLLs. Bundled with speech recognition
      engines.
      Requirements: 486/DX or better, Windows 3.1 or Windows for
      Workgroups, 8Mb RAM (minimum), SoundBlaster or compatible sound
      card, Dialogic D2X or D4X board, and mouse. Microphone and speaker
      optional. Supports Visual Basic, Visual C++, Borland C++, and
      Delphi.
    * Contact: A&G Graphics Interface
      51 Gore Street, Cambridge, MA 02141-1213 , USA
      Ph: +1-617-492-0120, Fax: +1-617-427-2133
      Email: [email protected]
      CompuServe: 74774,273 CompuServe ( GO SPEECH )
      WWW: http://www.customvoice.com/



DATAVOX - French

    * Platform: PC / DOS
    * Description: Continuous speech - speaker independent or dependent.
    * Requirements: 2 PC format boards (RdF1000 and TdS 96/25) and an
      A/D - D/A module (ASA116)
    * Misc: Application software may dialog with DATAVOX through 2 types
      of interfaces :
         + Keyboard overlay: The application software may be used with
           any PC compatible package. No specific adaptation is
           necessary, you only need to define your configuration with
           the application software.
         + C library: Allows a user-written program to drive the
           recognition system.
      DATAVOX is based on the AMADEUS speech recognition software
      developed at LIMSI. It provides
         + Continuous speech recognition with 500 words speaker
           dependent, 50 words speaker independent (custom-made
           vocabulary).
         + Grammar of the application language (syntax acquisition,
           verification and simplification software).
         + Large vocabulary : DATAVOX can recognize vocabularies of
           several thousand words as long as there are no more than 500
           words in the active vocabulary at any given node. It takes
           less than 1 second to change syntax and vocabulary.
         + Training controlled by the system (use of co-articulation
           models).
         + Response time less than 500 ms for any phrase length.
         + Synthetis (ADPCM) can be heard simultaneously while
           recognition is being carried out.
    * Contact: VECSYS
      Le Chene rond, 91570 Bievres, France
      Voice: 33 1 69 41 15 04, Fax: 33 1 69 41 24 30



Digital Dreams Speech Recognition Plug-Ins

    * Platform: Apple Macintosh
    * Description (General): A suite of speech plug-ins for the
      interactive multimedia market which enable developers to quickly
      incorporate speech recognition into their titles without having to
      resort to a low-level programming language, such as C. Speech
      plug-ins bridge the gap between a speech recognition API, such as
      Apple's PlainTalk Speech Recognition technology, and
      authoring/development environments, such as Macromedia Director or
      HyperCard. Digital Dreams currently offers Macintosh speech
      plug-ins for Macromedia Director and HyperCard. Support for other
      environments, including AppleScript, Apple Media Tool, Authorware,
      and Windows is being developed. Currently available for North
      American Adult English. More information is available on the
      Digital Dreams WWW site.
    * ShockTalk: is a combination of Netscape, ShockWave and Speech
      Recognition technologies for the Power Macintosh and Quadra AVs
      that enables you to navigate web sites and hyperlinks using spoken
      commands as well as create shockwave movies that respond to spoken
      user interactions.
    * Requirements: Power Macintosh (PowerPC w/ MacOS)
      Microphone (PlainTalk compatible)
      PlainTalk Speech Synthesis and PlainTalk Speech Recognition
      Netscape Navigator
    * Contact: Digital Dreams
      4308 Harbord Drive, Oakland, CA, 94618, USA
      Tel: (510) 547-6929 Fax: (510) 547-6799
      email: [email protected]
      WWW: http://www.surftalk.com/
      FTP: ftp://ftp.surftalk.com/



DragonDictate for Windows

    * Platform: Windows
    * Description: Information moved to the page on Dragon Dictation
      products including DragonDictate for Windows



Dragon Dictation Products

    * Dragon NaturallySpeaking
    * DragonDictate for Windows
    * Dragon PowerSecretary
    * General Information

 Dragon NaturallySpeaking

    * Platform: Windows
    * Description: General purpose, continuous speech dictation system.
      Personal Edition has a 30,000 word active vocabulary and comes
      with a 200,000+ word pronunciation dictionary; users can also add
      their own words or phrases.
      More information on Dragon's NaturallySpeaking web site.
    * Requirements: 133Mhz Pentium, 32 MB RAM (Windows 95) or 48 MB RAM
      (Windows NT 4.0), supported sound card.
    * Price: see Dragon's NaturallySpeaking web site.
    * Related products: see general information below
    * Contact: see general information below

 DragonDictate for Windows

    * Platform: Windows
    * Description: Speech-to-text dictation system. Discrete dictation;
      continuous command/control; speaker-adaptive. Also provides mouse
      movement for hands-free operation of Windows. Comes with a 120,000
      word pronunciation dictionary; users can also add their own words
      or phrases. Dictate directly into any application. Available in US
      and UK English, French, Italian, German, Spanish, and Swedish.
      Add-on vocabularies for medicine, law, business and finance,
      computers and technology, journalism.
      Available as DragonDictate Singles Editions (10,000 words active),
      DragonDictate Personal Edition (10,000 words active),
      DragonDictate Classic Edition (30,000 words active), DragonDictate
      Power Edition (60,000 words active).
      Includes Office97 support.
      More information on the Dragon Systems web site.
    * Requirements: 486/66, 7-10 MB dedicated RAM (depending on
      edition), Windows 3.1x, NT 3.51, or 95.
      Supported sound boards: Creative Labs Sound Blaster 16, Microsoft
      Windows Sound System, IBM M-Audio Capture/Playback Adapter, many
      notebooks with built-in audio.
      See Dragon Systems Compatibility list for details.
    * Price: Check at the Dragon Systems web site.
    * Related products: see general information below
    * Contact: see general information below

 Dragon PowerSecretary

    * Platform: Apple Macintosh
    * Description: Speaker dependent/adaptive system requiring words to
      be separated by short pauses. Available as PowerSecretary Power
      Edition, Personal Edition, PowerSecretary MED for Healthcare
      Professionals.
      Vocabulary: 30,000 - 60,000 at any one time, automatically
      selected from 120,000-word dictionary.
    * Requirements: Power Macintosh 6100, 7100, 8100, Performa 6100
      series, Powerbook 540, 68040 class Macintosh such as Quadra 660AV,
      700, 800, 840AV, 900, 950, Centris 650 and 660AV.
      Hard Disk with at least 25Mb free.
      System 7.5 or greater
      (Some systems require add-on hardware)
    * More information: PowerSecretary home page
    * Related products: see general information below
    * Contact: see general information below

 General Information

   Dragon Dictation Products

    * Dragon NaturallySpeaking
    * DragonDictate for Windows
    * Dragon PowerSecretary
    * General Information

   Dragon Developer Products

    * Dragon PhoneQuery
    * DragonXTools
    * Dragon SpeechTool
    * Dragon VoiceTools

   Related Web Sites

    * Simon Crosby's FAQ for DragonDictate

   Contact:

    * Dragon Systems, Inc.
      320 Nevada Street, Newton, MA 02160, USA
      Tel: 1-617-965-5200 or 1-800-TALK-TYP
      Fax: 1-617-527-0372
      Email: [email protected]
      WWW: http://www.dragonsys.com/
      CompuServe: GO DRAGON



Dragon Developer Tools

    * Dragon PhoneQuery
    * DragonXTools
    * Dragon SpeechTool
    * Dragon VoiceTools

 Dragon PhoneQuery

    * Platform: Windows NT
    * Description: Software for building voice response systems. Callers
      are able to do the following: Ask for information using completely
      natural and continuous language. Have a spoken dialog to fine tune
      a request. Request information to be faxed, sent by electronic
      mail, or read over the phone, using text-to-speech.
      More information on the Dragon Systems telephony pages.
    * Requirements: Pentium or Pentium Pro PC running Windows NT 4.0.
      Telephone interconnect requirements vary by application.
    * Related products: see general information below
    * Contact: see general information below

 DragonXTools

    * Platform: Windows
    * Description: VBX and OCX controls that allow an application to
      control DragonDictate's capabilities, ranging from small
      vocabulary command and control to customized large vocabulary
      dictation. More information is available on the Dragon Developer
      pages
    * Related products: see general information below
    * Contact: see general information below

 Dragon SpeechTool

    * Platform: Windows
    * Description: Create small, optimized vocabularies for your
      speech-enabled applications, or supplement DragonDictate's
      extensive built-in vocabularies with specialized terms and names.
      More information is available on the Dragon Developer pages
    * Related products: see general information below
    * Contact: see general information below

 Dragon VoiceTools

    * Platform: Windows, DOS
    * Description: integrate small-vocabulary speech recognition
      directly into your DOS and Windows 3.1x applications. More
      information is available on the Dragon Developer pages
    * Related products: see general information below
    * Contact: see general information below

 General Information

   Dragon Dictation Products

    * Dragon NaturallySpeaking
    * DragonDictate for Windows
    * Dragon PowerSecretary
    * General Information

   Dragon Developer Products

    * Dragon PhoneQuery
    * DragonXTools
    * Dragon SpeechTool
    * Dragon VoiceTools

   Related Web Sites

    * Simon Crosby's FAQ for DragonDictate

   Contact:

    * Dragon Systems, Inc.
      320 Nevada Street, Newton, MA 02160, USA
      Tel: 1-617-965-5200 or 1-800-TALK-TYP
      Fax: 1-617-527-0372
      Email: [email protected]
      WWW: http://www.dragonsys.com/
      CompuServe: GO DRAGON



EARS: Single Word Recognition Package

    * Platform: Linux and Unixs with the Voxware sound driver
    * Description: Intended as a limited ready-to-use single word
      recognizer. However, its design aims at being a platform for
      various kinds of methods used in speech recognition (SR). EARS is
      designed to be a flexible environment for recognition system
      components; for example, take this feature extractor and that
      recognizing method, and this list of words. New methods for single
      word recognition can be integrated easily, as EARS uses C++
      abstract base classes. You speak the words you want to be
      recognized later. Your utterances can be saved to RIFF WAV files
      for inspection, change or delete them before they are further
      processed to the pattern files on which the recognizer is finally
      trained. As of version 0.20, the feature extractors are:
      Rasta-PLP, PLP, LPC, Mel-Cepstrum. The implemented recognizers
      are: DTW and non-recurrent neural nets on fixed-size sound
      patterns.
    * Requirements: Soundcard with mic
    * Misc 1: The current version is an Alpha release.
    * Misc 2: For more information subscribe to the EARS mailing list.
      Send email to [email protected] with "subscribe ears-list"
      in the body.
    * Misc 3: Niels Thorwirth ([email protected])
      has made changes to Version 0.14 which support the AF audio server
      software (see Q1.11) and the OGI Speech Tools (see Q1.9) so that
      EARS is more portable to other UNIX platforms. Available by email
      to Niels.
    * Requirements: Soundcard with mic
    * Availability: Source and Linux binaries are available by anonymous
      ftp
      ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/recognition/ears-0.26.
      tar.gz
      ftp://sunsite.unc.edu/pub/Linux/apps/sound/speech/ears-0.26.tar.gz
    * Contact: Ralf W. Stephan: [email protected]



Ficomp Interpreter 6000

    * Platform: DOS, Windows 3.1, Win95, Win NT, UNIX
    * Description: Ficomp Systems, inc., is a systems integrator that
      has developed commercial speaker-dependent, continuous-speech
      recognition applications for use in high noise environments on
      several platforms. Applications are specialized in the finance
      industry for exchange floors, banks and brokerage firms.
    * Contact: Ficomp Systems, Inc.
      Ph: (732) 274-2600, Fax: (732) 274-2601
      117 Docks Corner Road, Dayton, NJ 08810
      E-Mail: [email protected]
      WWW: http://www.ficompsystems.com/



HM2007 - Speech Recognition Chip

    * Platform: Intergrated circuit.
    * Description: HM2007 is a 48-pin single chip CMOS voice recognition
      LSI circuit with on-chip analog front end, voice analysis,
      recognition process and system control functions. A 40 word
      isolated-word voice recognition system can be composed of an
      external microphone, keyboard, SRAM and a few other components.
      When combined with a microprocessor, an intelligent recognition
      system can be built. A demo board for this chip is being
      distributed by The Summa Group.
    * Cost: Approx US$16 for the HM2007 and US$160 for the demo board.
    * Misc: Jean-Pierre Lereboullet's document on Voice Recognition
      Processors provides additional information on the HM2007.
    * Producer: HUALON Microelectronic Corp. USA
      Tel: (415) 288 0390 Fax: (415) 288-0399
    * Distributor 1: Marywale Engineering Company
      Tel: (602) 247 4451 Fax: (602) 247 6167
      Email: [email protected]
    * Distributor 2: The Summa Group Limited
      One California Street, Suite #1940,
      San Francisco, CA 94111
      Ph: (415) 288-0390
    * Distributor 3: Images Company
      39 Seneca Loop, Staten Island, NY 10314, USA
      Ph: +1-718-698-8305, Fax: +1-718-982-6145
      Sells single piece quanities of HM2007 48Pin Dip Chip and HM2007
      52 Pin PLCC style chip. Sells HM2007 Demo Kits unassembled $100.00
      and assembled $135.00 (using 48 Pin dip chip)



Entropic's HTK (HMM Toolkit)

    * Platform: Range of Unix platforms.
    * Description: HTK is a software toolkit for building continuous
      density HMM based speech recognisers. It consists of a number of
      library modules and a number of tools. Functions include speech
      analysis, training tools, recognition tools, results analysis, and
      an interactive tool for speech labelling. Many standard forms of
      continuous density HMM are possible. Can perform isolated word or
      connected word speech recognition. It van model whole words, sub-
      word units. Can perform speaker verification and other pattern
      recognition work using HMMs. HTK is now integerated with the
      ESPS/Waves speech research environment which is described in
      Section 1.9.
    * Misc 1: The availability of HTK changed in early 1993 when
      Entropic obtained exclusive marketing rights to HTK from the
      developers at Cambridge.
    * Misc 2: More detailed information on HTK is available from the
      Entropic WW server: http://www.entropic.com/htk.html
    * Cost: On request.
    * Contact:

   Entropic Research Laboratory,
   600 Pennsylvania Ave, S.E. Suite 202,
   Washington, D.C. 20003, USA
   Phone: (202) 547-1420.
   email - [email protected]
   WWW: http://www.entropic.com/



IBM VoiceType Dictation

    * Platform: OS/2 and Windows
    * Description: IBM VoiceType Dictation supports speech input at
      70-100 words a minute and can be used to control your desktop and
      applications. Isolated-word, speaker-dependent system using a
      speech adapter card. Available for U.S. English, U.K. English,
      French, German, Italian, Spanish and Arabic. Provided with a
      general office vocabulary and support for major OS/2 and Windows
      applications. Additional specialised vocabularies are available:
         + US: Legal, Emergency Medicine, Radiology and Journalism
         + UK: Legal
         + IT: Radiology
    * Requirements: See
      http://www.software.ibm.com/workgroup/voicetyp/vtprod13.html
    * Cost: See
      http://www.software.ibm.com/workgroup/voicetyp/vtordna.html
    * Misc: An IBM VoiceType Dictation FAQ is supported by UltraMedia
      Systems International (a distributor of IBM VoiceType):
      http://www.infi.net/~ums/ibmfaq.htm
    * Demo software: Available on the IBM WWW site:
      http://www.software.ibm.com/workgroup/voicetyp/vtcust1.html
    * Contact: US Ph: 1-800-TALK-2-ME or 1-914-766-1900.
      Email: [email protected]
      WWW: http://www.software.ibm.com/workgroup/voicetyp/vtcust1.html

IBM VoiceType Control (US Only)

    * Platform: OS/2 and Windows
    * Description: VoiceType Control is a speech recognition navigator
      that lets you control programs by speaking. VoiceType Control
      converts voice commands to keystroke macros. The program provides
      speaker independent, continuous speech recognition, so you do not
      have to train the program for your specific speech patterns.
    * Requirements: ?
    * Cost: ?
    * Demo software:
      http://www.software.ibm.com/workgroup/voicetyp/vtcust2.html
    * Contact: US Ph: 1-800-TALK-2-ME or 1-914-766-1900.
      Email: [email protected]
      WWW: http://www.software.ibm.com/workgroup/voicetyp/vtcust2.html



IN CUBE

    * Platform: Three versions for Windows 95, Windows NT and Sun
      SPARCstations
    * IN CUBE for Windows 95: Developed for general purpose Windows 95
      users. It is packaged for online distribution with a full working
      demo and an option to register and unlock the full product. The
      system uses Command Corp's Mark II continuous speech recognition
      engine and handles changable lexicons of up to 75 commands.
         + Price: $49.95 US
         + Requirements: 386/25MHz processor or better, Microsoft
           Windows 3.1 or later, Windows compatible sound card or
           built-in audio, and microphone.
         + Availability: http://www.commandcorp.com/cci/win95.html
           Demo mode available.
    * IN CUBE Mark II Pro for Windows NT: IN CUBE is a continuous
      realtime speech recognition system developed to provide a fast and
      convenient means of window navigation and voice macro command
      input for command intensive applications like CAD and publishing.
      Speaker-dependent training and ability to add new commands and
      macros.
         + Price: $495 including the PRO 8 microphone. $540 including
           the MT 858 desk microphone.
         + Requirements: Windows NT, Windows NT-compatible audio board
           (16-bit audio recommended).
         + Availability: http://www.commandcorp.com/cci/pront.html
           Demo available.
    * IN CUBE Voice Command for Sun SPARCstations: Provides continuous
      realtime speech recognition system for window navigation and voice
      macro command input to the workstation. Speaker-dependent training
      and ability to add new commands and macros.
      An IN CUBE Application Programming Interface is available with a
      library of linkable object modules is available for developers.
         + Price: $495 per seat. The developer's API sells for $695.
         + Requirements: SUN OS 4.1.x or Solaris 2.x with OpenWindows
           and Motif. Works with all audio-equipped SPARCs and clones.
           Models range from SPARCStation 1s to SPARCStation 20s.
         + Availability: http://www.commandcorp.com/cci/in3sparc.html
           A free 5 day evaluation license is available.
    * Contact: Command Corp. Inc.,
      3761 Venture Drive, PO Box 956099, Duluth, Georgia, 30136, USA
      Ph: +1-770-813-8030
      Email: [email protected]
      WWW: http://www.commandcorp.com/incube_welcome.html



Jialong He's Speech Recognition Research Tool

    * Platform: SUN SPARC (SunOS), PC (MSDOS)
    * Description: This is a speech recognition research tool. it
      contains a feature extraction program and three speech
      recognizers: a DTW recognizer, discrete didden Markov model (DHMM)
      based recognizer and Continuous density hidden Markov mode (CHMM)
      with Gaussian mixture functions based recognizer. The utilities
      are grouped as:
         + feature -- extract featue vectors from a speech signal (MFCC
           etc.)
         + dtwcmp -- dynamic time-wapping (DTW) comparision.
         + gensym -- turn vector sequences to discrete observation
           symbols.
           dhmm -- discrete HMM training program.
           dtest -- DHMM companion test program.
         + chmm -- continuous density HMM training program.
           viterbi -- CHMM companion test program.
      Note, this is a research tool not a complete speech recognition
      system.
    * Availability: By anonymous ftp:

       MSDOS Version
               UK:
               ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/recognition/s
               pchtool.zip
               Germany:
               ftp://ftp.informatik.uni-ulm.de/pub/NI/jialong/spchtool.z
               ip

       Sun SPARC version, compiled with GNU C
               UK:
               ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/recognition/s
               pch_sun_v1.tar.gz
               Germany:
               ftp://ftp.informatik.uni-ulm.de/pub/NI/jialong/speech_sun
               _v1.tar.gz

    * See also: Jialong He's Speaker Recognition (Identification) Tool
    * Contact: Jialong He
      email: [email protected]



Kurzweil Voice for Windows

    * Platform: Windows 3.1 or later
    * Description: Kurzweil Voice for Windows is a dictation product
      enabling the user to create text and enter data by speaking to
      Windows-based applications. System is adaptive but requires no
      initial training. Users can choose either 30,000 or 60,000 word
      active vocabulary. Application command translation templates for
      popular Windows application such as WordPerfect, 1-2-3, Organizer,
      Word (30+ applications are listed on the Kuzweil WWW pages). More
      detailed information is available on the Kurzweil WWW pages.
    * Requirements: 486DX/33 or higher, 8 or 16 MB dedicated memory
      (depends on vocabulary, 30 MBs dedicated disk space, VGA or
      higher, Kurzweil-supplied microphone and DSP board.
    * Contact:
      Kurzweil Applied Intelligence, Inc.
      411 Waverley Oaks Road, Waltham, MA 02154 USA
      Phone: 1-800-380-1234
      Email: [email protected]
      WWW: http://www.kurzweil.com/

Kurzweil Clinical Reporter

    * Platform: Windows 3.1 or later
    * Description: Kurzweil Clinical Reporter is a voice-activated
      clinical reporting system for computer-based patient records. The
      family of products includes:
         + VoiceEM for emergency medicine
         + VoiceEM/TR for triage reporting
         + VoiceRAD for diagnostic imaging and radiology
         + VoicePATH for surgical and anatomical pathology
         + VoiceMED for Primary Care for family medicine, internal
           medicine and pediatrics
         + VoiceORTHO for office-based orthopaedic surgery
         + VoiceCATH for invasive cardiology
         + VoiceReport for general reporting
    * More information: from the Kurzweil WWW pages:
      http://www.kurzweil.com/medical/
    * Contact:
      Kurzweil Applied Intelligence, Inc.
      411 Waverley Oaks Road, Waltham, MA 02154 USA
      Phone: 1-800-380-1234
      Email: [email protected]
      WWW: http://www.kurzweil.com/



Lernout & Hauspie ASR 1000/T and 1000/M

  [Note: L&H asr200/A is described below.]

    * L&H asr1000/T: ASR for the Telephony and Telecommunications Market
    * L&H asr1000/M: TTS for the Computer and Multimedia Market

    * Description: Automatic speech recognition software providing
      continuous speech recognition, isolated word recognition, keyword
      spotting or continuous digits recognition. The engine is speaker
      independent, and phoneme-based with optimization for commonly used
      words. General features include:
         + Languages available: US English, German, French, Spanish
           (Castilian), Dutch.
         + Available vocabulary: >100,000 words.
         + Line adaptation.
         + Rejection of out of vocabulary/grammar words.
         + N-best alternatives for isolated word recognition and keyword
           spotting.
         + Push to talk.
    * asr1000/T
         + Single channel platform examples: Motorola 56156, TI
           TMS320C2X/C3X/C5X
         + Multi-channel platform examples: TI TMS320C3X/C5X, AT&T
           DSP32C/3210, Motorola 96000
         + Input: 8 kHz telephone sampling
    * asr1000/M
         + Single processor platform examples: Intel 486/Pentium
         + Input: 8 kHz telephone or 11 kHz microphone sampling
    * See also: L&H ASR SDK for Windows
    * More Information: on the Lernout & Hauspie WWW pages:
      http://www.lhs.com/asr.html
    * Cost: Unknown
    * Contact: Lernout & Hauspie Speech Products
      800 West Cummings Park, Suite 3100
      Woburn, MA 01801, USA
      Tel: (617) 238 0960
      Fax: (617) 238 0986
      Email: [email protected]
      WWW: http://www.lhs.com/

Lernout & Hauspie ASR 200/A for the Automotive and Industrial Market

    * Description: Automatic speech recognition software providing
      isolated word recognition, keyword spotting and alphabet
      recognition (optional). This engine is robust, speaker independent
      and word based. Other features:
         + Vocabulary: 100 words US English
         + Voice activation detection
         + Response time
         + Platform examples: Analog Devices ADSP2101/5
         + Input: 8 kHz telephone or microphone sampling
    * See also: L&H ASR SDK for Windows
    * More Information: on the Lernout & Hauspie WWW pages:
      http://www.lhs.com/asr.html
    * Cost: Unknown
    * Contact: Lernout and Hauspie Speech Products
      20 Mall Road, 4th Floor
      Burlington, MA 01803, USA
      Ph: +1-617-238-0960, Fax: +1-617-238-0986
      Email: [email protected]
      WWW: http://www.lhs.com/



Lernout & Hauspie ASR SDK

    * Platform: Windows
    * Description: Windows based Software Development Kits are available
      for integrating automatic speech recognition technology with
      Windows based PC applications.
    * Requirements: IBM-compatible 486 DX/33 MHz + 8 MB RAM + MS DOS 5.0
      + MS Windows 3.1 (or higher) + Sound Blaster compatible sound
      board.
    * See also: L&H ASR Products
    * More Information: on the Lernout & Hauspie WWW pages:
      http://www.lhs.com/asr.html
    * Contact: Lernout and Hauspie Speech Products
      20 Mall Road, 4th Floor
      Burlington, MA 01803, USA
      Ph: +1-617-238-0960, Fax: +1-617-238-0986
      Email: [email protected]
      WWW: http://www.lhs.com/



Listen for Windows 2.0 from Verbex Voice Systems

    * Platform: Windows
    * Description: Listen for Windows Version 2.0 is a Speaker
      Independent software product that provides continuous speech
      recognition for Windows applications. The product works with most
      industry standard sound cards and PCs with inbedded audio chips.
      Listen for Windows comes with over 16,000 commands in speech
      interfaces for over 40 software applications, such as MS Office,
      Lotus SmartSuite,Quicken, etc. The Listen Command Editor allows a
      user to change or add commands to existing speech interfaces or
      create new speech interfaces for most Windows applications.
      More detailed information is available on the Verbex Listen for
      Windows page.
      Verbex also sells Verbal Advantage Voice Browser for controlling a
      web browser, Verbal Advantage DeskTop for controlling desktop
      applications.
    * Requirements: 486/25SX PC or higher
    * Pricing and Availbility: See the Verbex ordering page for pricing.
      Verbex products are available over the web or can be shipped.
      Microphones available from Verbex.
    * Demo: A "Freeware" demo is available from the Verbex WWW site demo
      page.
    * Contact: Verbex Voice Systems
      1090 King Georges Post Rd., Bldg 107, Edison NJ 08837, USA
      Ph: 1-800-ASK-VRBX, (908) 225-5225, Fax:(908) 225-7764
      WWW: http://www.verbex.com/



Lotec Speech Recognition Package

    * Platform: Sun
    * Description: Public domain speech recognition software. Operates
      from input in Sun audio format (.au files) and outputs word
      hypotheses and time labelling data. The software includes programs
      to collect speech samples, a labeller, a "featurizer" which
      parameterises speech files, a word spotter and the recogniser. The
      software can real time recognition on a Sparc 10 for small
      vocabularies.
    * Requirements: Sun SPARC audio input and a "decent" microphone Sun
      multimedia demo software (in /usr/demo/SOUND) and X.
    * Availability: By anonymous ftp
      ftp://ftp.sanpo.t.u-tokyo.ac.jp/pub/nigel/lotec/lotec.tar.Z
    * Contact: Nigel Ward: [email protected] _


Macintosh Speech Recognition Manager

    * Platform: Macintosh
    * Description: supports developers who wish to add speech
      recognition to existing Macintosh applications. Provides speaker
      independent recognition and robustness to noise. Apple's Speech
      home page provides developer information and the complete speech
      recognition and synthesis synthesis SDKs. The recognition SDK
      includes samples code, control panels, interfaces, documentation
      and the recognizer.
    * Availability: under licensing conditions from the Macintosh Speech
      Developer's page
      http://www.speech.apple.com/speech/dev/dev.html.
    * Requirements: Power Macintosh with 16-bit sound, System 7.5, and a
      PlainTalk Microphone or equivalent
    * Cost: Free
    * See also: Macintosh Plaintalk and Speech Manager (Q5.5).
    * Note: Check out Kevin Lenzo's list of Macintosh Speech
      Applications.
    * Contact: Apple Computer, Inc.
      1 Infinite Loop, Cupertino, CA 95014, USA
      WWW: http://www.speech.apple.com/
      Email: [email protected]



Microsoft Speech Recognition

   Microsoft Dictation Research Demonstration

    * Platform: Windows 95 or Windows NT 4.0
    * Description: A free demonstration of research technology that
      enables a computer to transcribe what you speak into Windows
      applications such as email and word-processors. Features of the
      demo software include:
         + 60,000 word vocabulary with the ability to add new words
         + High recognition accuracy
         + Works with any Windows 5application
         + "Dictation Pad" provides enhanced dictation features
         + "IntelliSense" converts spoken numbers and times
           automatically
         + Compatible with the Microsoft Speech API
    * Requirements: Windows 95 or Windows NT 4.0, Pentium 90 or better
      (RISC builds are available), 16 megabytes of RAM on Windows 95,
      Sound card with 16 kHz 16 bit input signals, High quality
      close-talk microphone, Speakers.
    * Availability: Free demo software is available at:
      http://www.research.microsoft.com/research/srg/install.htm
    * More information: http://www.research.microsoft.com/research/srg/

   Microsoft Command and Control Engine

    * Platform: Windows 95
    * Description: Provides command and control speech recognition using
      SAPI (the Microsoft Speech API) and "Whisper", Microsoft's speech
      recognition technology. Features include:
         + Speaker independent, continuous, sub-word modeling, context
           free grammars
         + Has its own letter-to-sound rules means it can recognize any
           words in a grammar.
         + North American English
         + PC microphone and telephone speech recognition with high
           performance
         + Word spotting option
         + Results objects containing top-N choices, segmentation, and
           confidence
         + Written to SAPI, the Microsoft Speech API.
    * Requirements: Windows 95 or Windows NT 4.0, Pentium 60 or better.
      (RISC builds are available), 1.5 megabyte working set, 16 kHz or 8
      kHz input signals, 6 megabytes on disk, Requires Microsoft Speech
      SDK to use.
    * Availability: Free demo software is available at:
      http://www.research.microsoft.com/research/srg/install.htm
    * More information: http://www.research.microsoft.com/research/srg/



Myers' Hidden Markov Model software

    * Platform: Unix
    * Description: Hidden Markov model software for automatic speech
      recognition. C++ code that implements a basic left-right hidden
      Markov model and corresponding Baum-Welch (ML) training algorithm.
      It is meant as an example of the HMM algorithms described by
      L.Rabiner and others. The code was built in order to learn how HMM
      systems work and we are now offering it to the net so that others
      can learn how to use HMMs for speech recognition. Keep in mind
      that ease of understanding was our primary concern, not
      efficiency. The code can be used to build an experimental speech
      recognition systems using "train_hmm" and "test_hmm", and can be
      used in conjunction with written tutorials on HMMs to understand
      how they work.
    * Availability: By anonymous ftp from the comp.speech archive site.
      There are two files in the directory
         + ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/recognition/
      The files are
         + hmm.README
         + hmm-1.03.tar.gz
    * Contact: Richard Myers: [email protected]



NCC Dictate

    * Platform: Windows
    * Description: NCC Digital DictateTM is an add-on, enhanced
      interface for use with IBM's VoiceType(TM) Dictation for Windows
      and various Windows 3.1 applications (e.g. MS Word, WordPerfect).
      Digital DictateTM provides faster corrections and dictation rates
      and various other features. This version is not a stand alone
      product; it requires VoiceTypeTM Dictation to provide the speech
      recognition engine and the Windows application. Features include:
         + Direct dictation into Windows applications with access to all
           functions while dictating.
         + Versions for MS Word, WordPerfect, Ami Pro, and other Windows
           applications.
         + Speech enabled editing.
         + Capability to save speaker models and defer corrections.
         + Microphone "pause and restore" functions controlled with
           speech commands.
         + Add-on vocabularies for legal, medical, science and business.
         + SWITCH-ITTM foot pedal control or CardSwitchTM infrared
           wireless control available which switch between dictation and
           proofing/correction modes.
    * Requirements: IBM's VoiceTypeTM Dictation for Windows; a computer
      system meeting VoiceTypeTM Dictation for Windows requirements;
      VoiceTypeTM Dictation Adapter.
    * Availability: Through computer dealerships.
    * Price: $US295
    * Contact: NCC Incorporated
      5808 E. Turquoise, Scottsdale, AZ 85253
      Ph: (602) 922-6236 Fax: (602) 596-9050



NICO Artificial Neural Network Toolkit

    * Platform: UNIX (ANSI C source code)
    * Description: The NICO Toolkit is an artificial neural network
      toolkit specifically designed and optimized for automatic speech
      recognition applications. Networks with both recurrent connections
      and time-delay windows are easily constructed. The network
      topology is flexible -- any number of layers is allowed and layers
      can be arbitrarily connected. Tools for extracting input-features
      from the speech signal are included as well as tools for computing
      target values from standard phonetic label-files.
    * Availability: Through the NICO homepage
      (http://www.speech.kth.se/NICO/index.html)
      or the download page.
    * Contact: Nikko Strom, [email protected]



Nuance Speech Recognition System

    * Platform: UNIX-based workstations including Sun and SGI.
    * Description: The Nuance Recognizer features client-server
      architecture with multiple recognizers available on a single
      processing platform. Primarily developed for telephony-based
      applications, the system accepts speaker-independent, continuous
      speech and supports very large vocabularies. Included is a
      "template matching" natural language capability for identifying
      the meaning of speech. A toolkit is available for use in
      developing a wide variety of speech recognition applications.
    * Price and availability: Contact Nuance
    * Contact: Nuance Communications
      1380 Willow Road, Menlo Park, CA 94025, USA
      Ph: +1-650-847-0000, Fax: +1-650-847-7979
      WWW: http://www.nuance.com/



OKI VRP6679 - Voice Recognition Processor

    * Platform: Intergrated circuit.
    * Description: Speech recognition IC. 25 words max. Speaker
      independent recognition capability. Recognition rate quoted as 97%
      in a noisy environment (e.g. a car).
    * Misc: Alias MSM6679
    * Misc 2: More information is provided in Jean-Pierre Lereboullet's
      document on Voice Recognition Processors.
    * Cost: Approx US$20. Demo board $876
    * Availability: OKI Semiconductor and OKI Distributors
      Corporate Headquarters
      785 North Mary Avenue, Sunnyvale, CA, 94086 2909
      Tel: (408) 720 1900, Fax: (408) 720 1918



Phonetic Engine 500 (PE500) from Speech Systems, Inc.

    * Platform: Windows
    * Description: Speaker independent, 40,000 word vocabulary,
      continuous speech recognition for MS Windows. Grammars with high
      perplexity possible. Includes noise rejection. Uses proprietary
      DSP board.
    * Cost: Prices in US$ - quantity one. The PE500 SDK is $995.00
      including board, microphone, and runtime software. Runtime only is
      $595.00. SpeechWizard(r) adds speech input to existing Windows
      applications, $295.00. Two-day training: $295.00 with purchase,
      $595.00 without.
    * Misc: The user defines the grammar of allowed utterances and must
      write software to invoke the board driver functions that control
      recognition. The user must also write software to
      collect/parse/interpret the ASCII text strings returned when
      recognition succeeds.
    * Misc 2: SSI now offers speech application development services.
    * Contact:

   Speech Systems, Inc.
   2945 Center Green Court South
   Boulder, CO 80301-2275, USA
   Tel: 303.938.1110 Fax: 303.938.1874
   http://www.speechsys.com



Philips Speech Recognition (2 products)

   SpeechMagic: Dictation

    * Platform: Windows 3.1 and higher
    * Description: A continuous speech recognizer providing a 64,000
      word vocabulary, speaker adaptation and multiple languages.
      SpeechMagic is currently available for English and German.
      SpeechMagic acts as a server application, processing speech input
      and providing text output. Uses an add-on ISA compatible
      recognition accelerator board. SpeechMagic provided a correction
      editor, editing and playback of recordings, and a vocabulary
      manager for entering new words, abbreviations, macros and special
      transcriptions (e.g. for foreign words). Windows DDE support and a
      native API are provided for integration.
    * Hardware Requirements: IBM compatible personal computer (486DX/ 66
      MHz or higher), minimum 16 MB of RAM, hard disk capacity > 500 MB,
      and a Philips LFH 6210 Accelerator Board.
    * More Information: For more information visit the SpeechMagic WWW
      page or the Philips Speech home page.

   Speech Processing System 6000s (Europe only)

    * Description: Dictation of medical findings using continuous speech
      recognition. Designed for German speaking radiologists and
      encompasses the complete radiology vocabulary. The authors use
      dictation stations (PCs) which are fitted with microphones. The
      transcriptionists use editing stations (also PCs) which are
      additionally fitted with headphones and footswitches. The SP6000s
      has a single speech recognition unit serving all users, and it
      offers automatic data transfer as well as the advantages of
      digital dictation functions. For more information visit the
      Philips SP6000s WWW page.
    * More Information: For more information visit the Philips SP6000s
      WWW page or the Philips Speech home page.



Dragon PowerSecretary

    * Platform: Apple
    * Description: Information moved to the page on Dragon Dictation
      products including Dragon PowerSecretary
      (Previously Articulate PowerSecretary.)



ProNotes Voice Tools

    * Platform: Windows
    * Description: ProNotes Voice Tools are designed to bring the speech
      recognition capabilities of the IBM VoiceTypeTM Dictation System
      for Windows into any program without the need for the programmer
      to directly interface with the speech engine at the API level.
      There are five tools, as described below, which are all available
      in three forms: Visual Basic(TM) Custom Controls (known as VBXs),
      16-bit OLE Custom Controls, and 32-bit OLE Custom Controls. The
      tools are intended for use by Windows(TM) developers working with
      Windows 3.1(TM), Windows for Workgroups 3.11(TM), Windows NT 3.51
      Workstation(TM), and Windows 95(TM). The custom controls can be
      utilized with any application development environment which
      supports the use of such controls (e.g. Visual Basic and Visual
      C++).

       Playback and Record
               An object which allows developers to use the IBM Speech
               Engine to record and play back sound files. Can be used
               to add voice prompts and to allow end users to record and
               playback sound files.

       Voice Button
               An object having standard button properties and behavior,
               which can additionally be controlled by voice. The button
               can also be used as a label or a 3D panel.

       Dictation Window
               A text box that allows free dictation, voice macro
               utilization, and correction by voice. Each Dictation
               Window has access to global and context sensitive
               vocabularies for both command and dictation. There are
               three correction modes.

       Voice List Box
               Has standard list box properties and behavior, but can
               additionally be controlled by voice. A user can select
               items by pronouncing the entry's text or the entries can
               be numbered and selected accordingly.

       Voice Navigator
               Provides navigation by voice within an application
               developed with the Voice Tools, between voice-enabled
               objects described above, as well as some standard objects
               found within the application.

    * Requirements: Hardware: 80486/33 DX or higher, 60MB hard disk
      space for IBM VoiceType Dictation software, 10MB hard disk space
      for ProNotes Voice Tools, 3.5" floppy, VGA (or compatible), 16MB
      RAM, IBM VoiceType Dictation adapter, microphone, and speakers.
      Software: DOS version 6.0 or later, with SHARE.EXE running,
      Windows 3.1 or later, IBM VoiceType Dictation software, any
      programming environment or system compatible with Visual Basic or
      OLE Custom Controls.
    * Price: Unknown
    * Contact: Pronotes, Inc.
      1546 Magee Avenue, Philadelphia, PA 19149, USA
      Ph: 800-70-NOTES or +1-215-533-8569, Fax: +1-215-533-1276
      Email: [email protected]
      WWW: http://www.pronotes.com/



PureSpeech 2.0 Recognition Engine

    * Platform: Windows 3.1, Windows 95, Unix, Dialogic Antares DSP
    * Description: Speaker-independent, continuous speech, large active
      vocabulary speech recognition engine for American English, UK
      English, French, German and Spanish. Permits on-the-fly additions
      to the vocabulary using phonetic models and telephone or wideband
      microphone input. Flexible grammar, natural language processing,
      discourse models. Software only with a small RAM/CPU footprint.
      Can be used as a voice user interfaces (VUI's) for PC software
      applications. Can also be used for high-volume call center
      telephony, especially in banks, finance and other specialized
      applications.
      A toolkit for the Dialogic Antares is available.
    * Availability: PureSpeech is not available as a stand-alone
      product. It is available embedded in Windows-based software or as
      a toolkit.
    * Contact: PureSpeech, Inc
      100 Cambridge Park Drive, Cambridge, MA 02140, USA
      Ph: (617) 441-0000 Fax: (617) 441-0001
      Email: [email protected]
      WWW: http://www.speech.com/



recnet

    * Platform: UNIX
    * Description: Speech recognition for the speaker independent TIMIT
      and Resource Management tasks. It uses recurrent networks to
      estimate phone probabilities and Markov models to find the most
      probable sequence of phones or words. The system is a snapshot of
      evolving research code. There is no documentation other than
      published research papers. The components are:
         + A preprocessor which implements many standard and many non-
           standard front end processing techniques.
         + A recurrent net recogniser and parameter files
         + Two Markov model based recognisers, one for phone recognition
           and one for word recognition
         + A dynamic programming scoring package. The complete system
           performs competatively.
    * Cost: Free
    * Requirements: TIMIT and Resource Management databases
    * Contact: Tony Robinson: [email protected]_
    * Availability: by anonymous ftp

               ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/recognition/r
               ecnet-1.3.tar.Z



Sensory Inc. Integrated Circuits

    * Platform: Integrated circuits
    * Description: Sensory's low cost high quality Interactive Speech
      line of speech recognition IC's are designed for consumer
      telephony products, portable consumer electronics, and other
      consumer applications. Technologies available include speech
      recognition (speaker-independent and speaker-dependent), speaker
      verification, speech/music synthesis, digital record/playback, and
      general product control on one chip. Development tools and
      demonstration units are available. Detailed product information on
      the Interactive Speech chips is available from the Sensory
      Circuits WWW site.
    * Contact: Sensory, Inc.
      521 E. Weddell Drive, Sunnyvale, CA 94089
      Ph: +1-408-744-9000, Fax: +1-408-744-1299
      Email: [email protected]
      WWW: http://www.sensoryinc.com/



Simon Says (NeXT)

    * Platform: NeXT
    * Description: Provides the ability to link commands to spoken
      phrases.
    * Availability:By anonymous ftp.
      Simon Says demo
      ftp://ftp.informatik.uni-muenchen.de/pub/comp/platforms/next/Audio
      /audio-apps/SimonSaysDemo.1.5.1.N.b.tar.gz
      Readme file
      ftp://ftp.informatik.uni-muenchen.de/pub/comp/platforms/next/Audio
      /audio-apps/SimonSaysDemo.1.5.1.README
    * Contact: Metrosoft
      710 13th Street, Suite 310 X, San Diego, California 92101
      Ph: 619.488.9411 Fax: 619.488.3045
      Email: [email protected] [NeXTmail welcome]



smARTspeak from Advanced Recognition Technologies, Inc.

    * Platform: Windows, Windows 95, DOS, and General Magic
      It also works on the following Processors/Microcontollers: Intel's
      80 x 86, Intel's 8031, 8051, Motorola's 68000, and Hitachi's SH1,
      SH3, SH8.
    * Description: smARTspeak is suited to voice command and control
      applications, such as voice dialing in cellular and desktop
      telephones, or voice command operation in computers and multimedia
      products. It uses a compact (10KB size on 16 bit machines), fast,
      user dependent recognition engine.
      smARTspeak can recognize any language in any accent.
      ART recently completed a Software Developer Kit (SDK) for
      smARTspeak, running under Windows 3.1 or higher which allows the
      voice recognition engine to be used within Windows Applications.
      More detailed information on smARTspeak and the SDK is available
      on the ART WWW pages.
    * Availability: Currently liscensed to other equipment manufacturers
      (OEMs), system integraters, software, and application developers,
      and value added resellers (VARs) who port are technology into
      their product.
    * Contact: Advanced Recognition Technologies, Inc.
      International Office:
      43 Brodezky Street, POB 39918, 61398 Tel Aviv, lsrael
      Ph: 972-3-642-7242, Fax: 972-3-642-5887
      Email: [email protected]
      WWW: http://www.artcomp.com/
      US Office:
      9574 Topanga Canyon Blvd. Chatsworth, CA 91311, USA
      Ph: 818-678-3999, Fax: 8181-678-3994
      WWW: http://www.artcomp.com/



Speech Commander - Verbex Voice Systems

    * Platform: Various: external hardware with serial port connection
    * Description: A hand-held (portable) device about the size of a
      paperback book which provides speaker-dependent continuous speech
      recognition. The active vocabulary is dependent on the model
      chosen and can vary from 300 to 10,000 active words. The device
      connects through a serial port, so it can be connected to a wide
      range of computers. It comes with a battery pack.
    * Contact: Verbex Voice Systems
      1090 King Georges Post Rd., Bldg 107,
      Edison NJ 08837, USA
      Ph: (908) 225-5225, Fax: (908) 225-7764
      Email: [email protected]
      WWW: http://www.verbex.com/



'Speech Recognition Expert' Toolkit for Windows

    * Description: Provides an object-oriented development tool designed
      to rapidly build speech enabled applications without writting
      source code. Currently supports IBM's VoiceType Application
      Factory. Future versions to support other platforms. Includes
      BlackBox library and Custom Grammar Tools.
    * Requirements: Layout for Windows from Objects, Inc.
    * Price: $US349 + Shipping/Handling
    * Contact: Speech Technologies, Inc.
      P.O. Box 3905
      Naperville, IL 60567-3905
      CompuServe @102147,3521
      Ph: (708)983-7634



Visual Voice from Stylus Innovation

    * Platform: Microsoft Windows
    * Description: Visual Voice is a toolkit for building Windows-based
      voice processing and telephony applications including interactive
      voice response (e.g. touch-tone banking), fax-on-demand, and voice
      mail. Visual Voice can be used to add voice recognition to your
      telephony applications.
      Voice Recognition (VR) Support for Visual Voice is exposed as a
      standard VBX control and provides one or more voice recognition
      "resources" to your application. Applications can dynamically
      assign resources across several voice lines. Voice recognition is
      either "discrete" or "continuous". Discrete recognition is
      slightly more accurate and requires the speaker to pause briefly
      between words. Continuous recognition provides a natural way to
      enter information by speaking without pauses. Three configurations
      are supported:

       Software-Only Solution
               The software only solution uses Telaccount's SpeechEasy
               technology for discrete recognition using your PC's CPU.
               A vocabulary is included with digits, basic command words
               and more.

       Hardware-Assisted Solution with Dialogic AEB boards
               Discrete voice recognition in over 25 languages using
               Dialogic D/41D voice boards and the Dialogic VR/40 board.
               Vocabularies are included with digits, basic command
               words, voice mail vocabulary and more.

       Hardware-Assisted Solution with Dialogic PEB boards.
               Use the VR control with any Dialogic PEB-based voice
               board, such as the D/12x or D/24x, to access voice
               recognition resources from your phone lines. This
               requires a Dialogic VRP board with either 1 to 4 VRM/40
               modules (4 channel discrete voice recognition modules)
               and/or 1 to 4 VRM/2C modules (2 channel continuous voice
               recognition modules). You can have up to 4 modules on
               each VRP: 4 VRM/40s for 16 channels of discrete voice
               recognition; 4 VRM/2Cs for 8 channels of continuous
               recognition; or a combination. Over 25 languages
               supported. Includes vocabularies as described above.

    * Pricing: Unknown
    * Availability: From Stylus Innovations Inc. or from the
      distributors listed on the Stylus WWW pages.
    * Misc: More detailed technical information, slide show
      demonstration software is available on the Stylus home page.
    * Contact: Stylus Innovation Inc.
      One Kendall Square, Building 300, Cambridge, MA 02139
      Ph: (617) 621 9545, Fax: (617) 621 7862
      WWW: http://www.stylus.com/
      Compuserve forum: GO STYLUS
      Email: [email protected]



Voice Command Line Interface

    * Platform: Amiga
    * Description: VCLI will execute CLI commands, ARexx commands, or
      ARexx scripts by voice command through your audio digitizer. VCLI
      allows you to launch multiple applications or control any program
      with an ARexx capability entirely by spoken voice command. VCLI is
      fully multitasking and will run in the background, continuously
      listening for your voice commands even while other programs are
      running. Documentation is provided in AmigaGuide format. VCLI 6.0
      runs under either Amiga DOS 2.0 or 3.0.
    * Requirements: Supports the DSS8, PerfectSound 3, Sound Master,
      Sound Magic, and Generic audio digitizers.
    * Availability: by ftp from wuarchive.wustl.edu in the file
      systems/amiga/incoming/audio/VCLI60.lha and from
      amiga.physik.unizh.ch as the file pub/aminet/util/misc/VCLI60.lha
    * Contact: Author's email is [email protected]



Voice Control Systems Continuous Speech Recognition

    * Description: Voice Control Systems (VCS) continuous speech
      recognition is a proprietary phonetic recognizer based on
      technology developed at VCS over the last 17 years. It is robust
      for applications such as the "hands-free" automotive environment
      or telephone networks, both wireless and wireline. VCS speech
      recognition is used by many developers and manufacturers in
      telecommunications. VCS technology is a software-based capability
      which VCS has currently developed for a limited number of
      processing environments. VCS offers "off-the-shelf" capabilities
      for the TI-C3X and C4X DSPs with other hardware platform support
      planned for the future. As a benchmark, today's VCS continuous
      technology requires about 1/2 of a 33Mhz TMS320C31. VCS continuous
      technology is available in cellular and wireline based libraries
      for continuous digit input in approximately 15 languages. VCS
      continuous recognition is a modified HMM decision strategy built
      upon the foundation of VCS phonetic "front end".
    * Availability: VCS continuous technology is available today in
      software form from VCS or implemented in hardware or speech
      systems from VCS distributors including Dialogic Corporation,
      Brite Voice, Intervoice, Periphonics, and Syntellect.
    * Cost: Software royalties are volume based and range from per unit
      costs of $500 per recognizer to less than $5 in large quantities.
    * See also: the VCS Phonetic Dictionary Recognizer and VCS Isolated
      Word Speech Recognition below, and the VCS 2030 & 2060 Voice
      Dialers.
    * Contact: Voice Control Systems, Inc.
      14140 Midway Rd., Dallas, Tx. 75244, USA
      Ph: +1-214-386-0300, Fax: +1-214-386-5555
      Email: [email protected]
      WWW: http://www.voicecontrol.com/

Voice Control Systems Phonetic Dictionary Recognizer

    * Description: This recognizer is based upon a HMM type recognition
      strategy coupled with the VCS "front end" (feature extraction
      software). The HMM modeling is based upon the basic phonetic
      building blocks in each language. In American English this is
      approximately 43 units. The recognition vocabulary is built up by
      combining these units into word models. By building the words in
      this way new recognition vocabularies may be constructed. The
      phonetic assembly can also be used for "word spotting" recognition
      libraries.
    * Platform: This VCS recognition software runs on the TI TMS320C30
      DSP. Two recognizers can operate on a single 55mhz C30. Currently
      the software may be purchased as an Enhanced Technology from VCS
      to run on the Dialogic VR/160p speech recognizer board. The
      hardware is purchased from Dialogic, with the "Enhanced" software
      purchased from VCS. Up to four phonetic recognizers can run on a
      single 160; one per VRM2C (C30-33mhz DSP) daughtercard.
    * Note: This recognizer is in its late "beta" stage of development
      and is available for U.S. English vocabularies. Other languages
      are presently under development.
    * Price: VCS software is priced at $350 per recognizer for unit
      quantities with volume discounts available.
    * See also: VCS Continuous Recognition above, VCS Isolated Word
      Speech Recognition below, and the VCS 2030 & 2060 Voice Dialers.
    * Contact: Voice Control Systems, Inc.
      14140 Midway Rd., Dallas, Tx. 75244, USA
      Ph: +1-214-386-0300, Fax: +1-214-386-5555
      Email: [email protected]
      WWW: http://www.voicecontrol.com/

Voice Control Systems Isolated Word Speech Recognition

    * Description: Voice Control Systems (VCS) isolated word recognition
      using VCS phonetic recognizer technology. It is robust in
      demanding environments such as the "hands-free" automotive
      environment, telephone networks, wireless or wireline.
      Capabilities include speaker-independent, speaker-dependent and
      speaker-adaptive recognition. Libraries are available for 45+
      languages and custom vocabulary development services are
      available. The technology is suited for many applications
      including:
         + Desktop computing: such as keyboard accelerators
           orinteractive multimedia.
         + Network telephony: such as automating operator functions or
           voice dialing.
         + Computer telephony: such as remote access to a personal
           computers.
         + Automotive accessory control: such as voice activated
           cellular phones or other automotive accessories.
         + Consumer electronics: such as voice controllers for video
           games or VCRs and televisions.
    * Platform: Include Intel-X86, TI-C5X, C3X, C4X and C2X, OKI 6679,
      and NEC-V20 and V30, and can operate on 16 bit microcontrollers.
      As a benchmark, 8 recognizers can run on an Intel 486-33 DX.
    * Availability: The technology is available under software licenses
      direct from VCS or by purchasing hardware from an OEM. VCS OEMs
      include: Dialogic, Oki Semiconductor, Intervoice, Periphonics,
      etc.
    * Cost: VCS isolated word recognition software is available under a
      volume pricing license agreement. Small quantity royalties are in
      the $500.00 per recognizer range while large (millions) quantity
      royalties are less than $1.00 per recognizer.
    * See also: VCS Continuous Speech Recognition and VCS Phonetic
      Dictionary Recognizer above, and the VCS 2030 & 2060 Voice
      Dialers.
    * Contact: Voice Control Systems, Inc.
      14140 Midway Rd., Dallas, Tx. 75244, USA
      Ph: +1-214-386-0300, Fax: +1-214-386-5555
      Email: [email protected]
      WWW: http://www.voicecontrol.com/



Visus SpeechKit

    * Platform: NeXT
    * Description: SpeechKit is based on SPHINX, a speaker-independent,
      1000 word or so, continuous speech recognition system which allows
      you to incorporate speech recognition into your applications. You
      can design your vocabulary and grammars.
    * Contact: Visus - no address or phone provided. A possible contact
      is Robert Brennan at Carnegie Mellon University. email:
      [email protected]



VCS 2060 Voice Dialer

VCS 2030 Voice Dialer

    * Platform: Stand-alone hardware, TMS320C5X based with VCS phonetic
      speech recognition and CELP speech compression.
    * Description: The VCS 2060 is a telephone dialing system which
      recognizes 50 names - and speed dials the associated telephone
      number. The VCS 2030 has 20 memories. Users use
      speaker-independent recognition to select the "call", "program",
      or "list" menu, then place a call, enroll a new memory, or listen
      to playback of entries in the phonebook. Enrollment is simple and
      includes a "name tag" enrollment pass so that when one selects an
      entry to call, the selection is confirmed by repeating the
      memory's associated name tag, e.g. "calling Pete". The system uses
      both speaker-independent and speaker-dependent technology from
      Voice Control Systems, Inc.
    * Installation: The VCS 2060 can be installed in series (RJ-11) with
      one phone for single phone operation or installed in parallel
      (RJ-31) to provide voice dialing from every phone in a house.
    * Cost: Standard retail prices:
         + VCS 2030 Voice Dialer - $269.00
         + VCS 2060 Voice Dialer - $299.00
    * Availability: From catalogs or direct from Voice Control Systems.
      Voice Control Systems
      14140 Midway Rd., Dallas, Tx. 75225, USA
      Ph: 800-VCS-7525, Fax: +1-214-386-5555
      Email: [email protected]
      WWW: http://www.voicecontrol.com/



Voice-Trek 2.0

    * Platform: Unknown.
    * Description: VoiceTrek is primarily used by the United States
      Postal Service to sort mail. Tardis Technology Inc. was created to
      develop and market applications that utilize speech recognition.
      They do consulting work as well as turnkey systems.
    * Contact: Tardis Technology Inc., Voice Recognition Div.
      6444 E. Spring St., #286, Long Beach, CA 90815-1500, USA
      Phone: +1-310-497-0077, Fax: +1-310-497-0080



VoiceAssist for Windows from Creative Labs, Inc.

    * Platform: Windows
    * Description: Seeking a description.
    * Availability: VoiceAssist preview software is available from the
      Creative Labs VoiceAssist home page.
    * Contact: Creative Labs, Inc.
      Ph: 1-800-998-1000 (Sales)
      Ph: 1-800-998-5227 (Product info and dealer referrals)
      CompuServe: support forum: GO BLASTER
      WWW: http://www.creaf.com/



VoiceServer for Windows

    * Platform: Windows
    * Description: Speaker dependent, each with an independent
      directory. Isolated words. Up to 1000 words/user, 300
      words/window. 1 word occupies 2Kb on hard disk. Can be used to
      control Windows applications by issuing voice commands instead of
      menu selection.
    * Rough Cost: 292 Pounds(UK)
    * Requirements: None
    * Misc: Price includes a half-sized AT voice card (including a DSP),
      software, documentation & a microphone (attachable to keyboard or
      speaker). A light-weight high-spec headset is an optional extra.
    * Contact:

   Mark Redwood
   Applied Voice Technologies
   26 Danbury Street, Islington,
   London, UK, N1 8JU
   Ph: + 44 71 454 1224 : Fax: + 44 71 454 1225



Voicetek Corp.

    * Platform: Unknown.
    * Description:Voicetek Corporation provides voice processing
      solutions, training and consulting services and an
      object-oriented, graphical Generations Platform for development of
      integrated computer telephony systems.
    * Contact: Voicetek Corporation
      19 Alpha Road, Chelmsford, MA 01824, USA
      Ph: +1-508-250-9393, Fax: +1-508-250-9378
      WWW: http://www.voicetek.com/



Votan VPC2100 Voice Card and VSP 1010 Speech Processor

    * Platform: DOS
    * VPC2100 Voice Card: a hardware and software system based on the
      TMS320C10. providing continuous speech recognition. The VPC2100
      consists of a circuit board, microphone, speaker, software, and
      documentation. It is designed to add voice I/O and telephone
      management capabilities to the PC/AT and compatibles. Features:
         + Voice store-and-forward at 4- to 16.4-Kb/s speed
         + Speaker-independent speech recognition (0-9, YES, NO)
         + Continuous speaker-dependent speech recognition
         + Telephone interface, pulse or tone dialing, call progress,
           and DTMF
         + Software for development, voice mail, telephone management,
           and VoiceKey
         + High-level applications-generator software
    * Votan VSP 1010 speech-processor board: can service a single voice
      channel, providing recognition, voice output, and telephone
      interfacing. Digital signal processing is performed by a TMS320
      integrated circuit.
    * Costs: Unknown
    * WWW: http://www.ti.com/sc/docs/dsps/develop/3rdparty/vot.htm
    * Contact: Votan Division, MOSCOM Corporation
      6920 Koll Center Parkway, Suite 214, Pleasanton, CA 94566, USA
      Ph: +1-510-426-5600, Fax: +1-510-426-6767



Voice Processing Corporation Speech Recognition Product Line

    * Platform: Unknown.
    * Description: Voice Processing Corporation (VPC) supplies automated
      speech recognition systems. VPC's products are used in the
      telecommunications, cellular and personal computer markets to
      enable computers to understand human speech. The company's VPro
      product line is sold to original equipment manufacturers (OEMs),
      value added resellers (VARs), system integrators and application
      developers. VPC's speech recognition systems are currently used in
      applications such as voice mail, voice activated dialing,
      interactive voice response, and command and control of personal
      computers.
      The following are descriptions of the Voice Processing
      Corporation's VPro Product Line: VProContinuous, VPro/XD, VPro/RT,
      VProCel, VProSpeller, VProPRL, VPro hardware platforms, and the
      application Osprey.
      More information is available on these products at the VPC WWW
      site: http://www.vpro.com/
    * VProContinuous(TM) is a speaker-independent, continuous digit
      recognizer. It recognizes digit strings spoken in a continuous
      manner, by any caller, without unnatural beeps or pauses.
      VProContinuous uses out-of-vocabulary rejection and word spotting
      technologies to reject extraneous words and phrases often spoken
      by callers. The VProContinuous vocabulary consists of the words
      "zero" through "nine," "yes," "no," and "oh." The product is
      language-independent. American English, Australian English,
      Brazilian Portuguese, Canadian French, Castilian Spanish, French,
      German, Italian, Mexican Spanish, Portuguese, Swiss German and
      U.K. English versions are available.
    * VPro/XD(TM) is a discrete or multiword speech recognizer for
      extra-demanding applications and/or vocabularies. This robust
      discrete product recognizes isolated discrete utterances (words or
      very short phrases). VPro/XD utilizes proprietary
      out-of-vocabulary rejection and word-spotting technologies.
      VPro/XD is speaker-independent and includes Talkover capability
      allowing speech-interrupt over prompts. Pre-trained vocabulary
      libraries are available in American English, Australian English,
      Brazilian Portuguese, Canadian French, Castilian Spanish, Central
      American Spanish, German, Italian, Mandarin Chinese, Mexican
      Spanish, Portuguese, Swiss German and UK English. Pre-trained
      vocabularies consisting of voice mail words, voice dialing words,
      call control words, banking, and emergency words are available in
      American English (both cellular and land-line).
    * VPro/RT(TM) is a discrete speech recognizer for rapid training of
      vocabularies in the field. This robust discrete product recognizes
      isolated discrete utterances. Application designers and end-users
      define the vocabulary of their choice and train the system in
      real-time either prior to system start-up, or adapting on-the-fly
      while the system is running live. Vocabularies can be subset, and
      applications involving thousands of words can be developed
      quickly. VPro/RT, which also supports Talkover, is suited to
      speaker-dependent recognition tasks, such as the personal
      directory of names in a voice-activated dailing application.
      VPro/RT is also good for applications that require
      speaker-independent vocabularies to be developed quickly in the
      field or those that require many vocabularies. VPro/RT can also be
      used as a tool for quick prototyping of applications.
    * VProCel consists of speaker-independent VProContinuous, VPro/XD
      and speaker-dependent VPro/RT specifically tuned for the cellular
      environment. The speaker-dependent discrete feature of VProCel
      allows for a user-defined 20-word personal directory, with a
      one-pass enrollment whereby users need only speak their chosen
      commands once. In addition, cellular-ready VPro/XD vocabularies
      consisting of voice-activated dialing command words are also
      available. VProCel is suited to voice-activated dialing
      applications using either digit strings or a listing of words in a
      personal directory.
    * VProSpeller is a recognizer that can determine which name or word
      is being spelled by a caller. Users may spell a string of letters
      (up to 32 letters) in an uninterrupted manner (without prompts or
      beeps between each letter). VProSpeller can recognize confusable
      letters by conducting an automated search of a database of words
      maintained by the application for the best candidates to match.
    * VProPRL Designed for customers who wish to enable VPC speech
      recognition technologies on platforms other than those supported
      by VPro hardware, the VProPRL is a portable recognizer library of
      VProContinuous, VPro/XD and VPro/RT, which can be embedded into a
      wide variety of hardware platforms. It consists of a library of
      object modules which can be linked with a user application or
      task.
    * VPro Hardware Platforms: VPro-42, VPro-84, VPro-88 : The VPro
      platforms are ISA compliant PC/AT boards. Each supports four to
      eight Virtual Speech Processors (VSPs). Each VSP, depending on
      load factors, can handle multiple telephone lines. Application and
      host computers communicate with each of the VSPs as separate
      autonomous units. VPro platforms use Texas Instruments TMS320C31
      microprocessors which provide up to 133 MFLOPS of compute power.
      The platforms can have up to 8 megabytes of memory shared among
      all processors. In addition, each processor has 512K bytes of
      local memory. Both the PEB and MVIP PCM audio buses are supported
      by all VPro platforms.
    * Osprey is a call management software application that performs the
      kinds of telephone related activities typically done by a personal
      assistant, such as answering the phone, screening callers, routing
      calls, and taking and delivering messages. It is an automated
      phone attendant.
    * Price and availability: Contact Voice Processing Corporation
    * Contact: Kelli V. Smith

   Voice Processing Corporation
   1 Main Street, Cambridge, MA, 02142 USA
   Ph: (617)494-0100 Fax: (617)494-4970
   e-mail: [email protected]
   WWW: http://www.vpro.com/



Whisper

  See the new page for Microsoft speech recognition software.
    * Platform: Windows 95 and Windows NT 4.0
    * Description: Command and control recognition.



WildCard Speech Products

    * Platform: Windows 3.1 and Windows 95
    * OfficeTalk for Windows: provides voice commands for dictation,
      navigation, command and control, and formatting for business uses
      of computers. Provides user voice access to a wide variety of
      software applications in office suites from Microsoft,
      Novell/WordPerfect, and Lotus. More information on the WildCard
      OfficeTalk page.
    * LawTalk for Windows: adds features and interfaces that meet the
      specific needs of legal users. More information on the WildCard
      LawTalk page.
    * VoiceCompanion for the Internet: Surf the net using voice
      commands. Controls browsers like Netscape and Microsoft Explorer.
      More information on the VoiceCompanion web page.
    * VoiceCompanion - RemoteAccess: Over the telephone remote access to
      your desktop PC, for voicemail, FAX forwarding and address book
      information. More information on the VoiceCompanion web page.
    * Availability: WildCard Technologies Inc.
      180 West Beaver Creek Road, Richmond Hill, Ontario, Canada L4B 1B4

      Phone: (905) 731-6444, Fax: (905) 731-7017
      Email: [email protected]
      WWW: http://www.wildcardtech.com/


___________________________________________________________________________

     Q6.6: Speaker Recognition (Verification and Identification)

    * Introduction
    * In the FAQ
    * On the WWW

 Introduction

  Speaker recognition is the process of automatically recognizing who is
  speaking on the basis of individual information included in speech
  signals. It can be divided into Speaker Identification and Speaker
  Verification. Speaker identification determines which registered
  speaker provides a given utterance from amongst a set of known
  speakers. Speaker verification accepts or rejects the identity claim
  of a speaker - is the speaker the person they say they are?

  Speaker recognition technology makes it possible to a the speaker's
  voice to control access to restricted services, for example, phone
  access to banking, database services, shopping or voice mail, and
  access to secure equipment.

  Both technologies require users to "enroll" in the system, that is, to
  give examples of their speech to a system so that it can characterise
  (or learn) their voice patterns.

 In the FAQ:

         * ImagineNation: Voice Activated UnLock Technology
         * Jialong He's Speaker Recognition (Identification) Tool
         * Keyware Biometric Security Products
         * SpeakerKey Voice Verifier from ITT
         * SpeakEZ Voice Print Speaker Verification
         * Voice Control Systems: Speaker Verification Technology

 On the WWW

   Survey of the State of the Art in Human Language Technology
         Report edited by Ronald A. Cole et. al. with a section on
         Speaker Recognition.
         http://www.cse.ogi.edu/CSLU/HLTsurvey/ch1node47.html

   Speaker Identification And Verification: LIMSI Report
         A technical description.
         http://www.limsi.fr/Recherche/TLP/reco/2pg95-sv/2pg95-sv.html

   Long Index of References on Automatic Speaker Verification
         A list of more than 350 papers on speaker verification in text
         or BibTeX format. Provided by G.Matas.
         http://sig.enst.fr/~chollet/ForMehdi/SpRecV1.l_ind.html

   CAVE: Caller Verification in Banking and Telecommunications
         European consortium developing speaker recognition
         technologies.
         http://www.ptt-telecom.nl/cave/

   Hangai Lab demonstrations of speaker verification and speaker
         identification.
         Do it yourself demonstrations:
         http://miya8f05.ee.kagu.sut.ac.jp/study/speech/speech1.html
         http://miya8f05.ee.kagu.sut.ac.jp/study/speech/speech2.html



Voice Activated UnLock Technology (VAULT): ImagineNation

    * Description: Password-based voice verification technology using a
      card to store voice-print data. Introductory information and the
      VAULT FAQ are provided on the ImagineNation WWW pages.
    * Contact: Imagine
      PO Box 212, Swansea, MA 02777, USA
      Ph: +1-508-678-9563
      Fax: 508-678-1470
      Email: [email protected]
      WWW: http://www.ImagineNation.com/



Jialong He's Speaker Recognition (Identification) Tool

    * Platform: SUN SPARC (SunOS), PC (MSDOS)
    * Description: This package contains a set of speaker recognition
      research utilities, including Gaussian mixture models, VQ codebook
      designing program and MLP network. They can also be used as
      general classifiers. The utilities are divided into the following
      categories:
         + Feature extraction and dimensional reduction
           cepstrum -- extract features from speech sigals (LPCC, MFCC,
           etc.).
           search -- select effective features (SFS, SBS method).
           randline -- randomize the a sequence, auxiliary utility.
           bin2asc -- binary to ASCII, auxiliary utility.
         + MLP network
           mlptrain -- MLP network training program.
           mlptest -- MLP network test program.
         + VQ codebook training and test programs
           lbglvq -- VQ codebook training program.
           nearest -- VQ codebook test program.
         + Gaussian mixture model (GMM)
           gmmtrain -- GMM training program.
           gmmtest -- GMM test program.
      Note: this is a research tool not a true speaker recognition
      system.
    * Availability: By anonymous ftp:

       MSDOS Version
               UK:
               ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/recognition/s
               pkrtool.zip
               Germany:
               ftp://ftp.informatik.uni-ulm.de/pub/NI/jialong/spkrtool.z
               ip

       Sun SPARC version, compiled with GNU C
               UK:
               ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/recognition/s
               pkr_sun_v1.tar.gz
               Germany:
               ftp://ftp.informatik.uni-ulm.de/pub/NI/jialong/speaker_su
               n_v1.tar.gz

    * See also: Jialong He's Speech Recognition Research Tool
    * Contact: Jialong He
      email: [email protected]



Keyware Biometric Security Products

    * Description: VoiceGuardian and S2 Security Server provide
      authentication and access control technologies. An online demo of
      Voice Guardian is available.
    * Contact: Keyware Technologies
      _USA_
      Keyware Technologies
      500 West Cummings Park, Suite 3600, Woburn, MA 01801, USA
      Ph: (617) 933 1311, Fax: (617) 933 1554
      _Belgium_
      Keyware Technologies
      Excelsiorlaan 28-30, 1930 Zaventem, Belgium
      Ph: 32 2 721 4574, Fax: 32 2 721 5015
      _Email:_ [email protected]
      _WWW:_ http://www.keywareusa.com/



SpeakerKey Voice Verifier from ITT

    * Platform: Windows/Pentium and Solaris/SPARC
    * Description: SpeakerKey provides over-the-phone voice
      verification. It is configurable for use in a wide range of
      applications.
      SpeakerKey provides a Speaker Verification API (SVAPI).
      SpeakerKey uses two technologies: (1) speaker-independent digit
      recognition using hidden Markov models, (2) speaker verification
      using "Nearest Neighbour Matching with Likelihood Ratio Scoring
      and cohort speakers."
      Dr. Joe Campbell maintains a SpeakerKey FAQ on the WWW. It
      provides a more detailed description of SpeakerKey and discusses
      several speaker verification issues:
      http://www.vitro.bloomington.in.us:8080/~BC/REPORTS/SpeakerKeyFAQ.
      html
    * Requirements: Minimum 60 MHz Pentium (with sound card) or
      SPARCstation 5, plus phone line interface devices.
    * Price: Evaluation kits available from $75. Developer's kits are
      $1500. Run-time licenses are priced from $600 to $10,000 depending
      upon the number of user and/or verifications per hour. Application
      customization is available.
    * Contact: ITT Industries
      Fort Wayne, IN, USA
      Ph: +1-219-487-6321, Fax: +1-219-487-6126
      Email: [email protected]



SpeakEZ Voice Print Speaker Verification

    * Description: Designed to prevent cell phone theft and cloning
      fraud by comparing the cellular caller's statement of a
      pass-phrase to a stored digital "voice print" of the authorized
      subscriber. If the caller's voice patterns do not match the stored
      voice print, service will be denied or the caller will be referred
      to operator assistance for further validation processing. Features
      include:
         + Customer selected password.
         + Vocabulary and language independent.
         + No special hardware required by customer.
         + Multiple delivery options.
    * Contact: T-NETIX, Inc.
      6675 South Kenton Street Englewood, CO 80111 USA
      Phone: (800) 352-8628, (303) 790-9111, Fax: (303) 790-9540
      WWW: http://www.t-netix.com/



Voice Control Systems: Speaker Verification Technology

    * Description: SpeechPrint ID technology provides language
      independent speaker verification. Features:
         + Multiple speech input formats
         + Operates over various microphones or the telephone network
         + Can can be used in conjunction with discrete and continuous
           recognition
         + Robust against background noise and spurious telephone
           channel noise
      For more information on features, hardware and software
      requirements, pricing and availability, contact Voice Control
      Systems, Inc. or visit their the VCS WWW site or the SpeechPrint
      ID WWW page.
    * See also: VCS speech recognition products in Q6.5.
    * Contact: Voice Control Systems, Inc.
      14140 Midway Rd., Dallas, Tx. 75244, USA
      Ph: +1-214-386-0300, Fax: +1-214-386-5555
      Email: [email protected]
      WWW: http://www.voicecontrol.com/


___________________________________________________________________________

                   Q6.7: Integrated Speech Products

  This section lists those products which integrate different speech
  technologies into a single user package. For example, speech
  recognition and speech synthesis can be combined to provide a dialog
  management system. Strictly speaking, this doesn't really belong under
  in Section 6 (Speech Recognition) but since these products all include
  speech recognition, it seems a reasonable place to put it for now!

 In the FAQ...

         * SpeechWorksfrom Applied Language Technologies, Inc.
         * Nortel Speech Technology Products



SpeechWorksfrom Applied Language Technologies, Inc.

    * Description: SpeechWorks and companion products provide advanced
      speech recognition technology for the telephony market.
      SpeechWorks can be used by developers to "speech-enable" call
      center, messaging, enhanced services, and other types of
      applications. The three major system modules - SpeechWorks,
      DialogModules and SpeechBuilder - are described below. More
      detailed information is available from the Applied Language
      Technologies home page.
      ALTech develops and markets speech understanding software which
      provides large vocabulary, speaker-independent, phonetic speech
      recognition. ALTech's software contains a comprehensive set of
      features for speech-enabling telephone-based transactions and
      services. SpeechWorks is based on technology licensed from the
      Spoken Language Systems Group at the Massachusetts Institute of
      Technology.
    * SpeechWorks: provides the core speech recognition capabilities.
      Features include:
         + Phonetic segment-based, speaker-independent, large
           vocabulary, continuous speech recognition
         + Real-time vocabulary generation directly from text
         + Database integration
         + "Barge-in" capability
         + Adaptive channel normalization
         + "n-best" output and associated confidence scores
         + Support for multiple languages
         + Software-only or DSP-based implementations
         + Support for multiple platforms and operating systems (e.g.,
           SCO UNIX, WindowsNT, etc.)
    * DialogModules: manage the "conversation" between the system and
      the caller within an application. They provide high-level
      application building blocks which enable developers to quickly and
      easily add speech interfaces to computer telephony applications.
      Each DialogModule accomplishes a particular task within an
      application, ranging from "simple" tasks such as capturing a
      yes/no response or a phone number, to more complex tasks such as
      capturing credit card information or name and address information.

      DialogModules provide "out-of-the-box" functionality. They contain
      pre-built grammars, user-interface design, internal call flow and
      error recovery routines, parameters for customization and a set of
      C++ class libraries and C APIs.
    * SpeechBuilder: provides tools for customizing the DialogModules
      and for developing and maintaining applications. A GUI-based
      Vocabulary Editor provides the ability to generate and maintain
      vocabulary or word lists. Pronunciations can be generated
      automatically using the built-in dictionary or can be
      automatically generated using a set of text-to-phoneme rules.
    * Product Bundles: are available which combine SpeechWorks and
      multiple DialogModules into application templates for a set of
      generic application categories.
         + SpeechForms SpeechForms provides an interactive method for
           entering data over the phone, such as ordering products,
           filling out surveys and completing registration forms.
           Typical applications include: order entry, reservations,
           catalog and literature requests, catalog shopping,
           subscriptions, change of service, claims, credit card
           activation, home banking, stock transactions, and warranty
           reservations.
         + SpeechQuery SpeechQuery is used to deliver information in
           response to voice requests over the phone, such as airline
           information, product delivery status and retirement benefit
           information. Typical applications include: order status,
           product information, account balance, flight status, movie
           listings, job listings, stock quotes, guide
           services,classified ads, claims status, dealer locator
           services, and technical support.
         + SpeechAgent SpeechAgent provides a set of modules for
           automating telephone-based voice messaging applications, such
           as integrated messaging, single-number services and
           voice-dialing. Typical applications include: voice messaging,
           voice dialing, auto attendant, address book access, email
           access, and scheduling.
    * Platform: Platforms and Operating systems: ALTech's software can
      be deployed on industry-standard hardware platforms and operating
      systems including: Sun SPARC-based systems running SunOS or
      Solaris, IBM RS/6000s running AIX, HP systems running HP-UX, and
      486/Pentium-based PCs and servers running Windows, WindowsNT, SCO
      UNIX, or Solaris. ALTech's systems are designed to run all or some
      of the software on a digital signal processor.
    * Availability: contact ALTech for licensing information.
    * Contact: Applied Language Technologies, Inc.
      215 First Street, Cambridge, MA 02142
      Ph: 617-225-0012, Fax: 617-225-0322
      Email: to Alisa Moyer: [email protected]
      WWW: http://www.altech.com/



Nortel Speech Technology Products

    * Nortel's AudioGram Delivery Service (ADS):
      When a busy or no answer condition is encountered, an intercept
      message offers ADS, which provides a service to the calling party
      by taking a message automatically. ADS records the caller's
      message and attempts delivery repeatedly if needed until the
      message is delivered. ADS is comprised of four independent
      services: 0+, 1+ and Local, Intentional, and Millenium AudioGram.
      ADS services utilize Nortel's Flexible Voice Recognition (FVR)
      voice-processing capabilities. ADS features include:
         + Cost-saving common service platform (NAV)
         + Builds upon existing network investment in toll
           infrastructure capabilities of AABS (Automated Alternate
           Billing Service)
         + Leverages the capabilities of existing TOPS (Traffic Operator
           Position System) attendants.
      More information: is available on the Nortel Multimedia Network
      Applications WWW page for AudioGram Delivery Service.
    * Nortel's Voice-Activated Auto Attendant (VAAA):
      Replaces touch tone menu with easy-to-use voice interface. Geared
      to businesses and corporations to provide more effective
      management of incoming customer calls. Residing on the Network
      Applications Vehicle (NAV) platform, VAAA uses Flexible Vocabulary
      Recognition (speaker-independent) technology to recognize spoken
      words, and directs calls accordingly. Other features include:
         + Cost-saving common service platform (NAV)
         + Serves DTMF and rotary dial callers.
         + Handles incoming calls for all corporate users (Centrex, PBX,
           or key systems)
      More information: is available on the Nortel Multimedia Network
      Applications WWW page for Voice-Activated Auto Attendant.
    * Nortel's Voice-Activated Dialing (VAD):
      Phoneme-based speech dialing capabilities provided through
      speaker-trained and speaker-independent technologies. Residing on
      the Network Applications Vehicle (NAV) platform, VAD enables
      subscribers to dial using speech, as well as to create and
      customize personal telephone directories. Other features include:
         + Cost-saving common service platform (NAV)
         + Speech playback and Text-to-speech synthesis
         + Dual Language capability (optional)
         + Speech Recording
         + Canadian French speechware (optional, prompts and FVR)
         + Spanish speechware (optional, prompts and FVR)
         + 75-name VAD directory size
         + Word-spotting
         + DTMF tone detection
         + Directory sharing
         + Scalable service deployment
         + Talk-through
      More information: is available on the Nortel Multimedia Network
      Applications WWW page for Voice-Activated Dialing.
    * Nortel's Voice-Activated Premier Dialing (VAPD):
      Enables businesses to take advantage of the public network
      directories to stimulate customer calls. Residing on the Network
      Applications Vehicle (NAV) platform, VAPD uses Flexible Vocabulary
      Recognition (speaker-independent) technology to recognize business
      names, and routes calls to the appropriate business entity. VAPD
      promotes cost savings by utilizing a common service platform, the
      Network Applications Vehicle (NAV). It services DTMF callers as
      well as rotary dialers, and handles incoming calls for all
      corporate users: Centrex, PBX, and key systems. More information:
      is available on the Nortel Multimedia Network Applications WWW
      page for Voice-Activated Premier Dialing.
    * Platform: This speech-based service operates on the Network
      Applications Vehicle (NAV) platform. NAV is a multi-application,
      digital signal processing platform supporting both speech- and
      display-based applications. The NAV platform provides the speech
      recognition capabilities and application logic used by NAV
      features an open, modular hardware architecture and flexible
      software design. Other features include:
         + Scalable hardware - from 24 to over 2000 ports per NAV node;
           1 to 24 independent application shelves per node
         + Powerful speech processing - speaker-independent and
           speaker-trained speech processing support
         + Reliability - N+1, N+M, and 2N redundancy
         + Central Management - access via graphical user interface to
           remote connections
    * See Also: Nortel Feature Planning Guide, reference number
      50004.11; NAV Applications and Planning Guide, reference number
      50118.16.
      Nortel's Multimedia web pages:
      http://www.nortel.com/entprods/multimedia/
    * Contact: NORTEL
      Multimedia Communications Systems Division
      Multimedia Network Applications
      1000 Park Forty Plaza
      Durham, NC 27713 USA
      Ph: 1-800-4NORTEL
      WWW: http://www.nortel.com/entprods/multimedia/


___________________________________________________________________________

  Copyright (c) 1993-6 by Andrew Hunt, all rights reserved.
  This FAQ may be posted to any USENET newsgroup, on-line service, or BBS as
  long as it is posted in its entirety and includes this copyright statement.
  This FAQ may not be distributed for financial gain.
  This FAQ may not be included in any collections or compilations
  without express permission from the author.



---

Andrew Hunt
Speech Applications Group
Sun Microsystems Laboratories       Ph:  (978) 442-2681
2 Elizabeth Drive, MS UCHL03-207    Fax: (978) 250-5067
Chelmsford, MA 01824, USA           Email: [email protected]