Spoken Language Translator: First-Year Report

Karlgren, Jussi and Gambäck, Björn and Rayner, Manny and Samuelsson, Christer (1994) Spoken Language Translator: First-Year Report. [SICS Report]



This document is the first-year report for a project whose long-term goal is the construction of a practically useful system capable of translating continuous spoken language within a restricted domain. The main deliverable resulting from the first year is a prototype, the Spoken Language Translator (SLT), which can translate queries from spoken English to spoken Swedish in the domain of air travel planning. The system was developed by SRI International, the Swedish Institute of Computer Science, and Telia Research AB. Most of it is constructed from previously existing pieces of software, which have been adapted for use in the speech translation task with as few changes as possible. The main components are connected together in a pipelined sequence as follows. The input signal is processed by SRI's DECIPHER(TM), a speaker-independent continuous speech recognition system. It produces a set of speech hypotheses which is passed to the English-language processor, the SRI Core Language Engine (CLE), a general natural- language processing system. The CLE grammar associates each speech hypothesis with a set of possible logical-form-like representations, typically producing 5 to 50 logical forms per hypothesis. A preference component is then used to give each of them a numerical score reflecting its linguistic plausibility. When the preference component has made its choice, the highest-scoring logical form is passed to the transfer component, which uses a set of simple non-deterministic recursive pattern-matching rules to rewrite it into a set of possible corresponding Swedish representations. The preference component is now invoked again, to select the most plausible transferred logical form. The result is fed to a second copy of the CLE, which uses a Swedish- language grammar and lexicon developed at SICS to convert the form into a Swedish string and an associated syntax tree. Finally, the string and tree are passed to the Telia Prophon speech synthesizer, which utilizes polyphone synthesis to produce the spoken Swedish utterance. The system's current performance figures, measured on previously unseen test data, are as follows. For sentences of length 12 words and under, 65% of all utterances are such that the top-scoring speech hypothesis is an acceptable one. If the speech hypothesis is correct, then a translation is produced in 80% of the cases; and 90% of all translations produced are acceptable. Nearly all incorrect translations are incorrect due to their containing errors in grammar or naturalness of expression, with errors due to divergence in meaning between the source and target sentences accounting for less than 1% of all translations. Making fairly conservative extrapolations from the current SLT prototype, we believe that simply continuing the basic development strategy could within three to five years produce an enhanced version, which recognized about 90% of the short sentences (12 words or less) in a specific domain, and produced acceptable translations for about 95-97% of the sentences correctly recognized. Since the greater part of the system's knowledge would reside in domain-independent grammars and lexicons, it would be possible to port it to new domains with a fairly modest expenditure of effort.

Item Type:SICS Report
Additional Information:Also published as report CRC-043, SRI International
Uncontrolled Keywords:speech translation, unification grammars, preferences
ID Code:2125
Deposited By:Vicki Carleson
Deposited On:22 Oct 2007
Last Modified:17 Jan 2013 16:14

Repository Staff Only: item control page