jueves, 26 de abril de 2012

PHONETICS AND TECHNOLOGY NOWADAYS

Recent advances in computer technology now make it feasible to commercialize products that perform man-machine communication by voice in real time. This has fueled many companies to invest in speech technology, creating many jobs during the last few years. Academic research has also benefited from this growth because companies are conducting joint projects with universities. Many of these projects are funded by the European Commission.

Nowadays, there are software packages for personal computers that can perform limited Automatic Speech Recognition (from here on abbreviated to ASR). After the system has adapted to the user’s voice, it is able to recognize words separated by pauses with error rates below 5%. Likewise, there are software-only Text-To-Speech (from here on abbreviated to TTS) systems that can generate intelligible speech. Modern microprocessors are powerful enough as to perform both TTS and limited ASR in real time, without the need for additional hardware. While acknowledging the many accomplishments, we also have to accept the many limitations of current systems. While intelligibility of the best TTS systems is high enough to be useful in certain applications, speech quality is still low enough that the technology will not be ubiquitous until a major breakthrough appears. The limitations of ASR systems are even greater: the word error rate for continuous speech is still too high to be useful except for some special applications. Even the best systems are too fragile to the presence of new words and moderately noisy environments. The technology is still in its infancy and the challenges are large indeed, but momentum is clearly growing and commercially viable spoken language interfaces will emerge before the year 2000.

A solution of the ultimate problem in speech technology, the development of a conversational computer, is an extremely difficult task that has eluded researchers for the last 30 years. While a great deal of progress has been achieved, it could easily be another 30 years until we have a machine that can pass the so-called Turing test (under this test, a blind-folded human cannot distinguish whether he or she is talking to another human or to a computer). This means that while both industry and academia are creating many job opportunities today, they will likely create many more in the years to come. A market research study conducted in 1992 (Meisel, 1992) forecast that world-wide revenues from speech technology products in 1995 will approach $2.5 billion, reaching $26 billion in the year 2000. A total of 137 organizations were listed in this study as suppliers of speech technology products in 1992, 22 of those being European.

The existence of many different languages in Europe makes it difficult for a speech product to easily reach a broad coverage. Unlike other computer products such as word processors, spreadsheets and databases, which are relatively easy to translate from one language to another, localization of speech technology products is a very labor intensive process. This barrier will inevitably slow down the introduction of speech products in some countries with smaller markets. Nevertheless, it also implies that a number of specific jobs will be created to generate a version of the product for each language. Nevertheless, it is important to note that advances in speech technology are reducing the dissimilarities of speech systems in different languages by defining more general frameworks under which to share more components. The possibility of contributing to change the way we communicate with machines is a very exciting proposition. Building a system like HAL (the human-like robot in “2001: A Space Odyssey”) promises to be a very challenging task, and the road to these systems will be filled with excitement.

No hay comentarios:

Publicar un comentario