Don't break your head, Matxin!
2016/09/01 Iñurrieta Urmeneta, Uxoa - EHUko IXA taldeko ikertzailea | Aduriz Agirre, Itziar - EHUko IXA taldeko ikertzailea | Díaz de Ilarraza Sánchez, Arantza - EHUko IXA taldeko ikertzailea | Labaka Intxauspe, Gorka - EHUko IXA taldeko ikertzailea | Sarasola Gabiola, Kepa - EHUko IXA taldeko ikertzailea Iturria: Elhuyar aldizkaria
For those of us living in bilingual societies, mistakes made for thinking in one language and speaking in another are very common. Many of us would get red on some occasion, for example, when we were young, laughing at our parents, thanks to whom he gave us. because we answered quietly. Have you hurt the child who has fallen on the street? If you ask yourself, most older Basques would not surprise us so much, because even though the phrase seems strange, we would immediately think that person, probably, is a new Euskaldun. And it is that those who speak of more than one language know, with experience, that what they have learned in one of them is not always useful for anything else: not for this reason it is not for that in Spanish, but for nothing; and you have done harm is not a pain in Basque, but it has hurt you.
In these cases, teachers, parents or friends correct the mistakes so that we learn the correct way for the next one. Well here we will also talk about students and teachers, but not of any kind. The student is over eleven years old, is called Matxin and has no bones or flesh, is an automatic translator. He uses a lot of rules to learn languages and translate into Basque what was read in Spanish, but many times they give him translations that are out of those rules, and the teacher's job is to help him not to break his head, not to be mistaken so often.
Today, Matxin translates from Spanish to Basque. It is based on a series of grammatical rules and two bilingual dictionaries, from which you get information to move from one language to another. He develops his work in three phases: analysis, transfer and generation (Mayor et al., 2009).
As can be seen in Figure 1, firstly, it analyzes the text in Spanish, or in English, morphologically and syntactically: the slogan of each word (for example: vi 8 see), the category (verb, name, adverb...), the syntactic function (subject, direct object, modifier...) and other characteristics. Subsequently, in the transfer phase, it compensates the words of the original phrase and adjusts the grammatical information. And finally, in the creation phase, create the text in Basque from the information obtained in the transfer: create the form corresponding to each motto (for example: vi with 8) and adjust the order of the words.
However, not all translations can be translated correctly through these general rules and dictionaries, which gives Matxin trouble. See, if not, what phrases it produces:
(1) EN: Eragin handia izan zuen.
EU (Matxin): It provided a great effect.
EU (correct): It had great influence.
Phraseological Units (UF), outside Matxin's general grammatical standards
In fact, there are some combinations of words that are outside the general rules of languages, including the Phraseological Units (Gurrutxaga, 2016). Corpas (1997) and Urizar (2011), among others, have classified them into three groups:
1. Phraseological statements: they can be used as such without entering a phrase and only in certain circumstances. Thank you very much from this group and not like that.
2. Locutions: they are not complete sentences and the meaning of the combination is not the sum of the meanings of the combined words. The meaning of interventions, for example, does not correspond to what the hands and commissioners normally have.
3. Glues: these are not whole sentences either, at least one of the words retains its meaning and, to express a concrete meaning, with a combination word is usually used another determined, and not other equivalents. For example, we focus on attention and not localized or similar attention.
In addition, UFs vary widely from language to language (Sanz, 2015), so they are often not easy to translate. Take as an example the ones mentioned in the classification:
• As we have said before, we do not use anything in Spanish and not for that.
• Participate, i.e., take part or participate (mangoes, hands are not mentioned anywhere).
• We normally use care as a consideration for the care given.
Moreover, if we bring this last example to French and English, we will see that the verbs do not coincide either with those of Basque or with those of Castilian: in French, faire attention; and in English, pay attention.
Therefore, if your learning is laborious for the human being, imagine how difficult it is to translate automatically for a computer, considering that the dictionaries on which it is based are limited and the grammatical rules very general.
In the dictionary used by Matxin there are several entries of several words that sometimes lead to direct translations:
(2) EN: Ikasle batzuk irakasle pilota egin zuten.
EU (Matxin): Some students blurred the teacher.
(3) EN: 13:00 pm
EU (Matxin): I just made the stone.
Unfortunately, there are not many entries and they are not always used correctly. Let's see, for example, what happens if we slightly change the terms of examples 2 and 3:
(4) EN: Irakasle ezin zuen ikasle egiten ari ziren pilota.
EU (Matxin): The teacher could not believe the ball the students were doing.
EU (correct): The teacher could not believe how the students were blurred.
(5) EN: I just scrubbed the floor.
EU (Matxin): I just made stone flooring.
EU (correct): I just cleaned the floor.
Matxin has two main difficulties: on the one hand, knowing the UF in Spanish or English, and on the other, translating them into Basque. Currently, only the combinations of words that are always followed and in the same order are correctly detected, so if we separate the words from the combination or change them of order, they are treated as loose words and not as if they were part of a UF (Example 4). Also, for the creation of phrases in Basque, information is often lacking, since each entry is granted a single payment in the Matxin dictionary. Hence the sentence of example 5 has been wrongly translated, since it has not been taken into account that to the verb to scrub, when the floor is accompanied by the nominal syntagma, to cleanse him that it is the payment and not the stone.
Konbitzul, new professor at Matxin
Therefore, as the examples so far show, if Matxin is going to correctly translate the UF, the help is fundamental, and for this Konbitzul has been created, the tool that will teach you to translate combinations of names+verbs.
Konbitzul is a public database that collects information obtained from a linguistic analysis. It contains data on the characteristics of combinations of words with names and verbs, and their counterfunctions, so far in the Spanish-Basque language pair. The nominal+verbal combinations of this study have been grouped into three sources: The bilingual dictionary Elhuyar, the gigantic sets of translations by hand and the DiCE dictionary of Spanish glues (Alonso, 2004).
Most of the information worked is available on the Internet and what is missing will also be available to users shortly. In fact, the database interface is searchable and any user can easily search by typing what they want to search for and displaying a list of combinations that match what they write along with payments. Then, if you click on compensation, you can see more linguistic information (Figure 3).
However, as mentioned above, the main task of Konbitzul is to help Matxin face two challenges: on the one hand, to know the UF of the language of origin and on the other, to translate into Basque. Suppose they give you the following sentences:
(6) The subject aroused interest in listeners.
(7) Interes handia egin zuen gaia listeners.
The UF – arousing interest – that appears in these three examples is not yet in Matxin's dictionary, so currently it does not treat this combination of words like UF (Figure 4). However, even if it were in Matxin's dictionary, with the method it has used so far, it would only recognize it in example 6, where the two words appear in the same order and without other elements.
However, with the help of Konbitzul you will know that arousing interest is an UF and it is also a flexible combination, that is:
• That other words may appear between the two components of the combination.
• Word order is variable.
Thus, when analyzing the phrase of the source language, all this information will be taken into account and you will be able to know that in examples 7 and 8 there is also a UF. An experiment shows that thanks to the database information almost 30% more UF are known than with the previous method.
On the other hand, once combinations have been detected, they must be brought to the Basque language and Matxin will also need additional information. Once again, Konbitzule will resolve your doubts:
• By the verb to awaken, to ignite (and not to awaken).
• By the name of interest, use it and put it in limited.
Thus, instead of creating phrases like the one that aroused great interest, you can create phrases like the one that aroused Great Interest. The information for this second task has not yet been integrated into the system, but the linguistic analysis has been carried out, so it is logical to think that soon we will be able to see the results on the network.
Filling the sack to satisfy curiosity
However, the work does not end there, of course, because Matxin is a student of great curiosity. The next step will be to collect information to translate the UF in English, and from there Konbitzule will have to continue collecting data to fill the bag progressively, so that the student, as he grows, is becoming better translator.
Gai honi buruzko eduki gehiago
Elhuyarrek garatutako teknologia