Internet search engines, constant evolution
2010/05/01 Leturia Azkarate, Igor - Informatikaria eta ikertzaileaElhuyar Hizkuntza eta Teknologia Iturria: Elhuyar aldizkaria
The Internet is the largest knowledge bank available to humanity. To find the information we want we need search engines like Google, Yahoo or Bing. At first they only looked for words, but they have more and more possibilities. And new types of search engines begin to appear.
When the web was created the search engines were very simple. They looked for documents with given words or words, then order them according to general criteria and point. But they had many problems: for example, on many occasions the results were not in the language we wanted; or they did not look for the derivatives of the word sought; not even the synonyms of the candidate; etc. Over time, search engines have solved these problems. Some options have been integrated directly into searches and others are offered in advanced options. In addition, specialized search engines have been created to solve some of these problems.
Improvements according to language
One of the first improvements was language detection. By using linguistic technologies, search engines are able to detect in which language a website is located, offering only pages of a specific language. In addition, when the language of the pages is known, they offer the page a specific treatment according to this language. For example, they apply stemming or malice lematization to all words and get the search to fit the word slogan, overcoming the problem of the derivatives mentioned.
However, the main search engines only do so with the most important languages or with greater presence in the network, and it is not the case of the Basque language. In cases where you want to obtain results only in Basque and with a motto, you can use the search engine Elebila (http://www.elebila.eu), a search engine marketed by Eleka based on the technology of the Elhuyar Foundation R&D group.
Multi-lingual staff
In other cases we are interested to the contrary, that is, to obtain the most interesting web pages about a certain thing, they are in any language. The research line that aims to facilitate this is called the search for interlinguistic information. The word or words that are wanted are translated into other languages and searched in them, returning then the most significant results of each language. To close the circle, all results can be translated into the initial language by machine translation.
Some experimental examples can be found in http://terpconnect.umd.edu/~dlrg/clir/systems.html. In commercial search engines Google is the only one that does something like this through the Google Translated Search service (http://translate.google.com/translate_s). So, for example, we can ask you to look for " bars in Moscow " (" bars in Moscow ") on Russian pages. He will translate the question into Russian, seek and return the results to English.
As for the Basque language, the R&D group of the Elhuyar Foundation will have to publish soon the search engine of interlinguistic sciences Zientzianitz. What we are looking for in Basque will look for in the most significant scientific websites in Basque, Spanish and English.
Based on the meaning
There may be several words that indicate the concept we are looking for. But the search engine will only return the pages containing that specific word. To improve the results, you can use the technique called diffusion of the question, which consists of looking for synonyms or variants of the word. Google, for example, also seeks synonyms by placing the ~ sign before the word. The search engine Elebila in Basque will not search automatically, but you can select variants or synonyms of the word.
On the other hand, if the word we seek has more than one meaning, we will normally only be interested in the results associated with one of them. Translating only them is of great help, or at least show the results grouped by different meanings. Microsoft's Bing Reference search engine (http://www.bing.com/reference) --only on Wikipedia articles, at home - or the Haki search engine (http://www.hakia.com) try to do something like this.
In any case, to implement these options it is necessary that the search engine guess which of the meanings of the word interests the user. There are several ways to do this. One of them is to ask the user directly what is the meaning that interests him or if the word has been translated correctly. Another is to try to guess the meaning through linguistic technologies using the context provided by the other words, but for this purpose the search must be composed of several words. And another is to try to guess the meaning by leveraging the user's search history or geographical location. This last is what Google does if we expressly authorize it.
Responding to questions
In some cases we go to the Internet in search of the specific answer of a question. If we ask a question to a common search engine we will return the list of documents containing the words of the question, but there are also systems capable of answering questions. Some use texts and techniques to search for information and language technologies, such as the START system of MIT (http://start.csail.mit.edu/) or the development of the IXA Taldea Group, Ihardetsi, which answers questions in Basque. Others use structured knowledge and automatic reasoning, such as Wolfram Alpha (http://www.wolframalpha.com) or TrueKnowledge (http://www.trueknowledge.com). And they are also developing semantic web users, such as DBPedia (http://dbpedia.org).
There is no doubt that the search engines have evolved a lot since their origin and continue to improve at present. Thanks to them, and thanks to the new search engines that are still in a quite experimental situation and offer new capacities and possibilities, the searches that will be made on the web in the future will surely be much simplified.
Gai honi buruzko eduki gehiago
Elhuyarrek garatutako teknologia