}

MultiMeteo also knows Basque

2001/11/01 Díaz de Ilarraza, Arantza | Sarasola, Kepa | Mayor, Aingeru | Loinaz, Miel | Chevreau, Karine | Coch, José Iturria: Elhuyar aldizkaria

Weather influences a lot our day to day. Man has always been looking to the sky trying to find out if it rains, storm, sun or what the hell it brings. Technical advances have made it possible to achieve very high reliability in predictions for 48 hours. Consequently, our society, in general, lives waiting and thirsting for these forecasts today, or... are not the moments of television and radio those of the weather forecasts? This thirst has created an ideal situation for researching and marketing ad writing systems, as well as for devising automatic tools for the dissemination of these types of texts into several languages.

The quality of the work of the human translator will undoubtedly be better and richer, but today it is possible to create documents in a specific and technical field such as meteorology, using automatic techniques. In
this article we present the interactive system Multimeteo that uses multilingual textual creation in the field of meteorology, as well as the adaptation we have made to the creation in Basque. The developed system offers daily weather forecasts at the following web address: http://www.ingurumena.net/udala //www.inm.es/wwi/Multimeteo/Multimeteo.html

Background

Image received by the satellite Meteosat.
meteosat

Although automatic text creation is not used, a system that automatically translates weather predictions must be mentioned here. The METEO system created by the Montreal TAUM group has been the most successful translation system of all time. It was difficult to find translators for boring translations that looked like daily, and Canada's official weather service began investigating automatic routes. The METEO system obtained has been translating meteorological newsletters from English into French since 1977, and 80% of its translation is totally direct. However, the success of meteorology has not spread, since although the system has adapted to other issues, no results of equal quality have been obtained. It seems that the field of weather predictions has a special adaptation to this type of automatic processes.

The Forecast Generator (FoG) work environment was also launched in Canada in 1993. In this system, the meteorologist uses a graphical editor to adapt the map showing the weather data and subsequently the system automatically generates the weather forecast in English and French for the region.

History of the multiMeteo system

Contact Information
Barcelona
110001100011000110001100011000110001100011000110001100011000110001100011000110001100011000110001100011000110001100011000110001100011000110001100011000110001100011000110001100011000110001100011000110001100011000110001100011000110001100011000110001100011000110001100011000110001
Table . Meteorological prediction data matrix.

In 1995 the French Meteorological Service (Meteo France) promoted the MultiMeteo project for the publication of weather forecasts in several languages. He contacted the National Meteorological Institute (INM) of Spain, the Royal Meteorological Institute (RMI) of Belgium, the Zentralanstallt für Meteorologie und Geodynamik of Austria (ZAMG) and two companies specialized in linguistic creation: Lexiquest, based in Paris, and CL Language Services in Madrid. The German Meteorology Service (DWD) also joined initially, but was subsequently abandoned.

These associations presented the project called “Multilingual Production of Weather Forecasts” and obtained community funding. The system was developed in four languages: French, English, Spanish and German. The results of the evaluation carried out in February 1999 were very positive.

In 2000 INM and Lexiquest reached an agreement to extend the system to four more languages: Dutch, Catalan, Galician and Basque. The Ixa Group and the UZEI Terminology Center of the Faculty of Computer Science of San Sebastian have been in charge of broadcasting to Basque, and at this moment we are about to finish the development phase of the project.

Usual procedure for creating weather predictions

Two sources are used for collecting meteorological data: surface data collection and spatial collection. Surface data are taken at meteorological observatories, where physical variables describing the state of the atmosphere are measured and collected at all times. Other data obtained from space are meteorological satellites, geostationary satellites METEOSAT and polar satellites of the TIROS-NOAA series, which do not stop sending information.

All numerical data obtained are processed by complex mathematical models. Automatic processes simulate the evolution of physical variables in the coming days, generating data matrices for meteorological predictions. The meteorologist then has the opportunity to retouch these data matrices, that is, to complete and round the forecast with his experience. As a conclusion, as seen in Table 1, the matrices present data of temperature (Te), wind direction (DD) and force (FF), clouds, rain, etc. for different hours (periods of 3 hours in the case of the INM system). For each point of the map, an array of this type is obtained.

With this data meteorologists create weather forecasts manually. This work is very long and expensive, especially when a single prediction has to be made several versions in different languages or styles (general predictions, beaches, sea, mountain, by community, by province...).

There is the interest of MultiMeteo. It is not about replacing the work of meteorologists, but about contributing in an interactive way to their tasks, so that predictions can be disseminated in different languages and styles. In addition, it allows you to make predictions for different places on the map.

A support tool: interactive multilingual creation

Figure . Newsletter created in Basque by MultiMeteo.

This technique, first, by automatic creation, generates a draft from perhaps incomplete input data. Although it has the ability to create text in several languages, the meteorologist, to act as a corrector, is offered only in his native language. If the meteorologist wants to make a correction in a text snippet, click on the part you want to modify. Then the pop-up menu will offer you a number of options and alternative modifiers, choosing one of them to perform the correction comfortably. Taking into account the changes made, the system will generate predictive texts in all languages.

The advantages of this technique are the speed (to produce each text in each language it takes about 2 seconds; a human translator needs about 10 minutes); the feasibility of creation, although some data has not yet been collected, the high quality of the texts created (sometimes with human touches); the ease of maintenance and adaptation; and finally, the acceptance by human users (meteorologists will not them to write in foreign languages).

Automatic newsletter creation

Figure . System architecture.

MultiMeteo creates two ways:

  • For the wording of the title of each paragraph a fixed text with the name of the provinces is used, and to write the header of the bulletins (see figure 1) a template with several internal variables is used, for example:

Weather forecast *IS *CO. *MO *FD.
Local time: *FP.
Ad value: *TT.

where:

  • The value of IS can be "by provinces", "by islands" or nothing.
  • Value of the CO - name of the communities (for example, for the "Autonomous Community of Galicia").
  • Month MO ("June")
  • Date of the DF, expressed in figures.
  • FP indicates time
  • Prediction period by TT (e.g., “today from 06:00 to 12:00 midnight”).
  • A much more complex method is used to write the body of the paragraphs. The following points explain the architecture and modules needed to address automatic creation at this level.

General system architecture

The generation engine used by the system was developed in 1994 in French for the automatic generation of commercial cards. In 1995 it extended to English by integrating into a prototype translation of technical manuals. And the same year was also integrated into the project “Multilingual Production of Weather Forecasts” to incorporate new languages and functionalities in the creation of meteorological newsletters (interactive creation and management of stylistic knowledge).

The system architecture can be seen in figure 2. The first phase consists of obtaining and reformatting a meteorological database that allows the use of generation modules. Subsequently, the task of the creation module is divided into two parts: plan and execute.

Planning module

Planning uses knowledge bases of concepts and styles (EU) and is divided into two phases:

  • General planning: the newsletter is organized in several paragraphs (header, paragraph for each province, etc.)
  • Weather planning: from the input data the content of each paragraph is determined. The events ( event) that must appear in the paragraph and the relations between them are collected in a list using an interlingua, so that the description is independent of the languages. The following modules will be made for each language.

The event is a conceptual object associated with the meteorological situation or evolution of the situation. The phenomena are of two types: atomic and molecular.

The atomic event represents a meteorological parameter without evolution, with a single associated value ( Value attribute). For example, the atomic event representing the covered sky is:

Event_CloudCovering4: Event{} Value=
Class CloudCovering_code4;
Time_Representation= Time<unk>
Mod{};}

Class CloudCovering_code4 is a set of simple concepts: Overcast, NoSun and VeryCloudy-Overcast. Each of these concepts is associated with a term in each language.

The molecular event indicates more than one parameter. For example, when we talk about wind we can have strength, direction and evolution data. They can carry several values ( Value0, Value1, etc. attributes), as well as an operator (Operator attribute) that specifies how to collect these values. For example, the molecular event to describe the cloudless sky to be covered is:

Cloudier_Min0: Event_mol{ Value0= Event_CloudCovering0;
Value1= Event_CloudCovering4;
Operator=
Class <unk> Cloudier_Min0;
Time_Representation= Time<unk>
Mod{};}

This molecular event is manifested by two atomic episodes and an operator. It serves to situate the events time - representation in time (present, past or future) and indicates the period (day, morning, afternoon, night...).

At the exit of the planning module a concept is selected for each atomic event and for each class of Operator attribute of molecular events. In addition, other attributes can be added (automatically or in interaction with the meteorologist): probability index, phase, period...

Execution module

zeru1Sem

Simple concept
Term in Basque
Definition of the term: Semantic units ( Usem)
Semantic
representation ( Rsem)
estali1sem
Overcast
Covered Sky Covered
Usem = Zeru1Sem
UsemR1_WINTER= Estali1Sem
Usem = Estali1Sem
Table . Simple concepts, terms in Basque and their semantic expression.

The module to materialize linguistically the concepts obtained in each language is based on the Theory of Meaning - Text (Mel’cuk 1988, Polguère 1988). This phase uses a linguistic knowledge base that is divided into five stages: predenotation, semantics, deep syntax, surface syntax and morphology.

  1. Predenotation. At this stage a term corresponding to that language is selected for each simple concept derived from planning. For example, for the simple Overcast concept of the aforementioned Class CloudCovering_code4 group, one of the terms Sky, Covered or Covered will be selected. These terms are divided into semantic units ( USem), with which the semantic expression ( RS) is created (see).
  2. Semantic. From the semantic expression Rsem is formed the graph of the deep syntax formed by nodes and relationships, for which the lexical unit corresponding to each semantic unit is selected.
  3. Deep syntax. A graph is constructed that has all the words of the phrase to be created on the nodes.
  4. Cutaneous syntax. Nodes are ordered to determine the place each word should occupy in the phrase.
  5. Morphology. The corresponding word form according to the morphosyntactic information of each node is collected from the dictionary. In the dictionary all declined forms are stored to avoid morphological creation.

Adaptation to Basque

Concept
Execution in Basque
Execution in French
Execution in Spanish
NebDim_inm
Cloud reduction
nebulosite diminution
decreased cloudiness
Neb0_inm
sky, opscarbia
sentence
clear sky
Neb6
cloud range
nuageux
cloud intervals
Neb8_inm
clouds developed throughout the day
In the case of the Basques
cloudiness of daytime
evolution
DD1
north wind
vent
North wind
FF4
wind, very strong
comfort
very strong
FF5
wind, hurricane
loss
hurricane wind
TempeRel1
significant drop in temperatures
important chute of temperatures

TempeRel2
moderate decrease in temperatures
ambient temperature

TN2
rain
rainfall
rain
RT3
basins
aversion
basins
Br1
raw
bruise
calima
Br2
cloudy
mist
mist
Morning_Mid
in the morning
in milieu of matinées
in the morning
Table . Execution of some atomic concepts in Basque, French and Spanish.

The computational work for the diffusion of the MultiMeteo system into Basque has been developed by the IXA group and the terminological work has been done by UZEI. The adaptations to Galician and Catalan have been made from the Castilian version, and they have had to work mainly the lexicon, since no major changes in syntax and morphology were required. For Basque, although we have left Spanish (and sometimes French), most of the sentence structures have been modified and we have had to work especially with morphological declination marks.

We started our work in three phases:

  • collection and analysis of the corpus of time in Basque,
  • Knowledge of the multiMeteo system and its architecture, and
  • system adaptation.

The adaptation is carried out in three subphases: first we approach the atomic events (for example, the “sky, covered”), then the molecular events that were easy (for example, the “wind, weak, from the north”), and finally, the molecular events that presented special difficulties (for example, the sky, initially covered, with rain, later very temporarily covered).

In each of the adaptation phases, a previous linguistic analysis, an analysis and design of the information to be included in the knowledge base, an introduction and proof of the information of a representative example for each event and, finally, an introduction and proof of all the possibilities for each type of event.

The main characteristics of this adaptation are:

  • Given that the predictions generated by the system had to follow the telegraphic style of the INM, we decided to delete the verbs. Also, name modifiers that are the area of the phrase will be separated by commas as an attribute syntagma. For example, instead of giving “weak North Wind” or “weak North Wind,” the system will generate “weak North Wind.”
  • The meteorological evolutions expressed in French and Spanish by Gerund are done differently in Basque. For example, "Clear sky rising to cloudy" will be created in Basque as follows: “The sky, at first cloudy, then cloudy.”
  • In the dictionary we have written all forms of words (sometimes multi-word units) that can be used in newsletters. In the newsletters two cases are used: absolute and sociative. The slogan of the word is also possible.

If you would later like to expand the system with other styles, more cases of decline should be used, so these cases should be introduced in the dictionary. Let us see, for example, the introduction of the vocabulary of the word rain:

BA_Euri1: Lexeme
NomBA{
CatMorph = NOM; SsCatMorph = COMMUN; UMorph=
[ morpho{Cas= ABS; Name=

SINGULIER; UMG= "euria"},
morpho}=

Phuns;
  • The area of the sentence, by default, will have the case of the absolute decline, and the case of the area modifiers will be determined in the definition of the concept or term. For example, the concept that creates "The sky, covered, with rain" must specify that the term cover will occupy the singular absolutive and the singular sociative rain. In the singular absolutive the term zeru appears because it is the space of prayer.
  • In Basque, the case of declining the syntagma adheres to the last word of each syntagma, and the system did not give the opportunity to manage it elegantly. Therefore, we have had to add a series of rules: on the one hand, at the conceptual level, the system pastes the case mark to all the words of each syntagma, and then when the words are sorted in the superficial syntax stage, it removes the case from those that are not the last word. For example, to create the phrase “The sky, covered, with general rains and storms”, it is indicated in a concept that all the syntagma of general rain and storms must carry the case of the sociative; for this it is necessary to mark all the terms with the case of rain (soz)+general(soz)+ekaitz(soz); for later the terms rain, and general are demarcated with “preceding”.

Table 3 shows how several atomic concepts have materialized in Basque (including Spanish and French reference).

Table 4 shows the execution of several molecular concepts. The variables indicate, when indicated, the values of this event: Variables N state of the clouds (oscarbia, under cloud, covered...); Variables DD wind direction (north, southwest, etc. ); FF variables are wind force (moderate, strong,...); Variables TS precipitation (rain, sirimiri...), PER period (mornings...)...

Works of the future

Concept
Execution in Basque
Execution in French
Execution in Spanish
OrageGrele
thunderstorms with hail
orage compagné de grel

storms with hail
NebEvSpec
sky, at first N1, then N2
ciel N1devenorg N2
sky
Expanding/Reducing to N2
NebEvSpecTSPer
sky, PER N1 with TS1, then N2
2.
PER A N1
Increasing/ N2 Decrease
NebEvSpecTSOrage
sky, initially with N1, TS1 and thunder storms, then N2
2.
sky N1 with TS1 and
storms to N2
VentSecteur
wind, FF1, overall DD1
Vent FF1 of secteur DD1 dominant
Specifications
wind, DD1, at the beginning FF1, then FF2
2.
FD1 F1 More/
FF2 Avancez
Pass_var_inm
wind, variable, FF1, temporary DD2, FF2
Variable vent FF1 passagerement FF2 DD2
variable wind FF1
passenger FF2
Table . Molecular concepts made in Basque, French and Spanish.

The project is currently in the last stages of development. The next step is a massive test to analyze possible system errors. Then make the necessary changes and final evaluation. However, the adaptation is already integrated into the INM system and the weather forecasts of the Spanish state communities are offered every day on the web http://www.inm.es/wwi/ MultiMeteo/Multimeteo.html.

In addition to the telegraphic writing of the general objective, the realization of special purpose predictions (for beaches, mountaineers, skiers...) and the elaboration of richer writings (for example, the introduction of verbs with complete sentences) would be feasible steps in the medium term. This type of complete versions have been made in French and are currently used. At the moment it would be enough to analyze the usefulness of the system developed for the Basque language, and if later the need was detected, then the organization of the aforementioned improvements should be addressed.