jueves, 26 de abril de 2012
PHONETICS AND TECHNOLOGY NOWADAYS
Recent advances in computer technology now make it feasible to commercialize products
that perform man-machine communication by voice in real time. This has fueled many
companies to invest in speech technology, creating many jobs during the last few years.
Academic research has also benefited from this growth because companies are
conducting joint projects with universities. Many of these projects are funded by the
European Commission.
Nowadays, there are software packages for personal computers that can perform limited
Automatic Speech Recognition (from here on abbreviated to ASR). After the system has
adapted to the user’s voice, it is able to recognize words separated by pauses with error
rates below 5%. Likewise, there are software-only Text-To-Speech (from here on
abbreviated to TTS) systems that can generate intelligible speech. Modern
microprocessors are powerful enough as to perform both TTS and limited ASR in real
time, without the need for additional hardware. While acknowledging the many
accomplishments, we also have to accept the many limitations of current systems. While
intelligibility of the best TTS systems is high enough to be useful in certain applications,
speech quality is still low enough that the technology will not be ubiquitous until a major
breakthrough appears. The limitations of ASR systems are even greater: the word error
rate for continuous speech is still too high to be useful except for some special
applications. Even the best systems are too fragile to the presence of new words and
moderately noisy environments. The technology is still in its infancy and the challenges
are large indeed, but momentum is clearly growing and commercially viable spoken
language interfaces will emerge before the year 2000.
A solution of the ultimate problem in speech technology, the development of a
conversational computer, is an extremely difficult task that has eluded researchers for the
last 30 years. While a great deal of progress has been achieved, it could easily be another
30 years until we have a machine that can pass the so-called Turing test (under this test, a
blind-folded human cannot distinguish whether he or she is talking to another human or
to a computer). This means that while both industry and academia are creating many job
opportunities today, they will likely create many more in the years to come. A market
research study conducted in 1992 (Meisel, 1992) forecast that world-wide revenues from
speech technology products in 1995 will approach $2.5 billion, reaching $26 billion in
the year 2000. A total of 137 organizations were listed in this study as suppliers of speech
technology products in 1992, 22 of those being European.
The existence of many different languages in Europe makes it difficult for a speech
product to easily reach a broad coverage. Unlike other computer products such as word
processors, spreadsheets and databases, which are relatively easy to translate from one
language to another, localization of speech technology products is a very labor intensive
process. This barrier will inevitably slow down the introduction of speech products in
some countries with smaller markets. Nevertheless, it also implies that a number of
specific jobs will be created to generate a version of the product for each language.
Nevertheless, it is important to note that advances in speech technology are reducing the
dissimilarities of speech systems in different languages by defining more general
frameworks under which to share more components. The possibility of contributing to
change the way we communicate with machines is a very exciting proposition. Building a
system like HAL (the human-like robot in “2001: A Space Odyssey”) promises to be a
very challenging task, and the road to these systems will be filled with excitement.
Suscribirse a:
Enviar comentarios (Atom)
No hay comentarios:
Publicar un comentario