Andy Way: "The biggest challenge of machine translation is quality"

The English are reputed to be a bit closed on linguistic issues, expect them to go anywhere in the world and communicate in English. Andy Way is out of this group. The idea that everyone speaks the same language seems totally correct. He told us "good morning" and greeted us with a "good day", the rest of the conversation has necessarily been done in English. Of course, we cannot miss the jokes about it: a machine translation system of English and Basque would make the relationship much easier. Then, we got into the core of the conversation.

Andy Way: "The biggest challenge of machine translation is quality"


Expert in machine translation
Andy Way: "The biggest challenge of machine translation is quality"
01/07/2007 | Rementeria Argote, Nagore | Elhuyar Zientzia Komunikazioa
Andy Way is a professor at the University of Dublin and a reference in machine translation.
N. Hardware
Let's go to the origin of machine translation. Of course, it would be created to communicate people from different languages.

It is true. And to a large extent machine translation was also developed for political reasons. At one time the Americans wanted to know what the Russians were saying, for example. And today, the United States is investing heavily in developing machine translation of Arabic. In this sense it is a security issue.

Another level of communication is communication between two people. For example, communication between us would be much simpler, when I am in Dublin and you here in Euskal Herria, if you wrote to me in Basque and I received it in English through a machine translation system. I would answer you in English and you pick it up in Basque.

And it does not matter if English is not correct, I understand it. And the key is there, for an Englishman who does not know Basque a bad English is better than the correct Basque, and on the contrary, for a Basque who does not know English a bad Basque is better than the more direct English. At this level the most important thing is to receive the main message.

Therefore, the growth framework of machine translation is based on the individual. That two people who do not speak the same language can communicate with each other.

Internet has been a great opportunity in the field of communication in general, also in machine translation?

No doubt. On the internet there are free systems like Babel Fish. They are not very sophisticated systems, but people communicate in their language thanks to them. As we have said, they are devalued and it is not necessary to register anywhere. And every day they are used millions of times, although the quality is not very good. Therefore, if they improve the quality of these systems, their use will be much greater.

As for quality, on the other end is the European Union. I have heard that they use machine translation very effectively.

In Europe there has been a big change. In the 1980s only nine were the official languages of the European Union, and now they are more than twenty. And in the eyes of the Union all these languages are equal. Therefore, they have to translate the documents into all these languages, and there are about 400 language pairs.

(Photo: N. Hardware)

Take brass and Greek. How many translators are able to translate between both languages? Not much. Therefore, there are not enough translators and they have many texts to translate. Machine translation is very useful for these translators.

In the European Union they have a machine translation system, called SYSTRAN, specially designed for domestic use. It is not the same SYSTRAN system that is on sale in stores, but it was designed and adapted for internal use and used to draft a first draft translation. Subsequently, translators must edit and correct it. Because a document to send to clients or public must be error-free.

After all machines are used, which has its positive side: They can work 24 hours and no translators. The advantage of these tools is that they can work much faster, but with less quality.

Therefore, people should not see the work of the human translator as an activity that comes to replace; it is like any other instrument, such as the phone, the toaster or the car. It is a tool that helps us.

There are research groups around the world working to improve machine translation. You might want to work with other groups, right?

Yes, among other things, we are in contact with the University of the Basque Country. They work with English-Basque and Spanish-Basque couples. And gradually we add more language pairs. Thus, we relate to groups that work with different languages, such as Arabic, Chinese, Italian, French, German, Spanish, and now also Basque.

On the other hand, we have a student working the translation between English and Irish sign language for specific environments such as airports. In fact, at airports they do not put all written information on screens. The latest boarding calls and similar notices are only issued from the speakers and the deaf do not hear. We are working in this type of environment.

If the scope of application is reduced, regardless of the language pair, translation is much easier. Thus, if we limit ourselves to the scope of the airport, most of the problems presented in more general translations are overcome.

In terms of language, minority languages have greater problems. What is the difference between languages with many speakers and few speakers?

Most researchers currently working in machine translation are engaged in corpus based machine translation (a collection of texts and documents). Therefore, a corpus is needed; above all a parallel corpus is needed, that is, this phrase corresponds to that other phrase in the other language. And there is a large parallel corpus for some languages with many speakers. If it is between English and French, such as the minutes of the Canadian Parliament, and for English Chinese, those of the Hong Kong Parliament.

He works with the IXA group of the Faculty of Informatics of the UPV, Eleka and Elhuyar Fundazioa, among others. In the picture, Way with Professor Kepa Sarasola of the UPV.
N. Hardware
And for the English-Basque couple? Where to get the corpus?

Machine translation techniques currently used are valid in principle for any linguistic couple, but in general there are no parallel corpus for minority languages. In languages with many speakers we have much more text and more translations, in Spanish, English, French... This is the biggest problem of minority languages. For example, to translate between Gaelic and Basque we do not have parallel texts. And that is a great difficulty.

So what is the biggest challenge of machine translation today?

The greatest challenge, undoubtedly, is quality, since in general it is not yet very good; and, of course, as we have just mentioned, for certain pairs of languages it cannot be approached by corpus, because there is no corpus. Therefore, one of the problems for translation is in the first step: transferring existing resources.

I think one of the big challenges of machine translation is getting to people's homes. In fact, at university we usually try to solve very difficult problems, but there are relatively simple solutions that can help in people's everyday life, such as sign language.

And street people are clear that machine translation is necessary and very useful. Other computational linguistic problems are very difficult to understand. But anyone knows what translation is and what a computer is, and that machine translation facilitates communication and is therefore necessary.

Seeing that the conversation was about to end, Andy Way wanted to add something: "At first we talked about communication. I think the translation of speech will soon come. Within a few years, you will ask me questions in your language, in Basque, and I hear them in English. Therefore, we will use speech recognition and you will listen to me in Basque through your computer. This is going to be soon, and it is going to be a breakthrough, as speech is much more natural than written communication."

Rementeria Argote, Nagore
Services
233
2007
Other
028
Interviews; Computer science; Software
Interview
Services

Buletina

Bidali zure helbide elektronikoa eta jaso asteroko buletina zure sarrera-ontzian

Bidali

Bizitza