Boom of creative artificial intelligence
2023/06/01 Leturia Azkarate, Igor - Informatikaria eta ikertzaileaElhuyar Hizkuntza eta Teknologia Iturria: Elhuyar aldizkaria
We are certainly all familiar with the names of Stable Diffusion, LaMDA, Midjourney, GPT, DALL-E, Bard and, above all, ChatGPT, which in recent months has hardly been mentioned, not only in the world of technology, but also in the media, social networks and society in general. They are all creative artificial intelligence systems capable of producing long texts, structured answers to questions and images. The sector has made a truly surprising step forward in the last year, to the point of delivering really useful and even surprising results for many cases, and there are increasingly many areas in which these systems are most being implemented. Let's know better what creative artificial intelligence is, how it works, what possibilities it offers and what risks it carries.
The people who work with language and speech technologies have worked mainly in the analysis or comprehension part, that is, in systems that provide a shorter output of an entry: text classification, synthesis, extraction of entities or terminology… Or also in those that provide a result of a similar extension, transformed into another medium or code: analysis, correction or translation of texts, text to speech conversion, audio transcription, etc. But in systems that generate a longer or more worked output than input, i.e. in the NLG (Natural Language Generation) subarea, we have worked very little because the task is much more difficult and the right results were not obtained. The same has happened in other areas of artificial intelligence, such as images or videos, where images were analyzed to detect objects present or to describe the image, but image creation hardly existed. I mean, the so-called creative artificial intelligence was or seemed underdeveloped.
A summer of creative artificial intelligence
And first and foremost, in the last year there has been an explosion of long text production systems, images or structured answers to questions. Several companies and organizations, beyond the publication of scientific articles on their research, have opened the use of developed systems or have shown their results to the general public, finding that the quality of creative artificial intelligences was very high. Examples of these open or displayed systems are Stable Diffusion, Midjourney and DALL-E image creation tools, GPT and LaMDA text creation tools and ChatGPT and Bard question response boots. It was no accident that everyone opened up in such a similar time (almost ramp). Of course they wanted to surprise people with the results and generate the yearning for the instrument, besides anticipating the competition or demonstrating that they are not lagging behind. But all these tests carried out by people have undoubtedly served to refine their systems, detect failures in real cases, detect and limit from now on inappropriate uses, etc.
The main cause of this explosion is OpenAI, which is DALL-E, GPT and ChatGPT. OpenAI is a non-profit company, founded in 2015, in which Elón Muskiz, Microsoft, etc. contribute to the development of artificial intelligence, collaboration with other actors and the free realization of their developments. However, M is not in this company, since 2019 works with the aim of conquering, little by little Microsoft has put more money and has won power until, in practice, most developments do not make them free (GPT Microsoft can take advantage of it exclusively). In addition to OpenAI is Google with LaMDA and Bard.
Base: Great models of languages
Most of these systems are based on large language models (LLM, Large Language Models). Large language models, like almost every artificial intelligence system today, are a kind of deep neural network. They are trained with the texts for a task, in principle very simple: to predict the next word of a fragment, according to the probabilities of the texts observed during training. But being extremely large networks (with millions or thousands of parameters) and training with huge amounts of texts (billions of words), they absorb much of the syntax and semantics of language, as well as a great general knowledge of the world. And if the word length of the phrases you accept in the input is quite large and chaining the word predisposed in the output again, long texts can emerge. This GPT or LaMDA are the LLM to form a long text with a text boot.
In addition, these large language models can adapt to other tasks through fine-tuning. Thus, adjusted to a couple of questions can be trained to act as chatbot and, if the length of the entry is large enough, their previous answers can be included as inputs, forming a complete conversation, remembering and refining the previous answers if required. And that's ChatGPT and Bard. Or, by adjusting it to the images and their descriptions, giving a text you can get a system of image creation, which is ultimately DALL-E. You can also get a LLM to be used for other tasks without needing to be adjusted or retrained by prompt engineering or input sentence engineering. For example, if we give you a text and specify “This is the subject of this previous text”, I would make a text classification; or a text summary if we include “This is a summary of the previous text”; or a translation if we ask you “Translate I don’t know what” in English.
The AES mentioned are trained with texts in some languages, so they are able to understand and answer the questions raised in them and to perform an automatic translation among them through prompting (that is, the number of texts in all languages is not the same, and in some of them it works very well but in others it does not). Rather, in the training it has been provided a code composed of several content languages, and is also able to write programs. And GPT-4 also accepts images as an introduction to give answers or arguments about them.
Thus, the latest version of ChatGPT contains 175 billion parameters, is trained with 500 billion words in 95 languages and the extension of words it admits in its introduction is 32,000 words. And it really gives very good results, often surprising: long, linguistically correct and well-structured. And the answers are always very plausible and they are ejected safely. However, answers are often not always correct, complete or optimal.
In short, ChatGPT is not a global or hard artificial intelligence (AGI, Artificial General Intelligence or Strong AI), that is, a real intelligence able to “understand” (classify the facts in a structured way) and reason about them, nor an attempt to achieve it. As has been said, the word is only shaped according to the probability, usually very well, due to the large number of texts given to it to learn. But in those training texts there is fiction, misinformation, contradictions… and it can respond from them, mix very different things… As it is said in technical language, it can “hallucinate”. And as chatbot tries to answer all the questions so the user feels satisfied, it can correct when lacking information, adapt the answer to the concrete words of the question (now saying one thing and then the opposite, even in the same conversation if you are encouraged)… And that without being aware that everything is true or lies, or that it is being invented! Some have argued that we do too and that our brain works like this, but experts are clear that it doesn't.
A crazy race to get where you are
In any case, both Microsoft and Google have entered a crazy race, each of which has put its virtual assistant (ChatGPT and Bard respectively) before available in their tools (search engines, office programs, etc. ). Of course, they do so with the intention of increasing the user fee and removing it a little from the other. People are also eager to start using it.
But I don't know if this is really useful for search engines. In fact, search engines work well for navigational (which are intended to know the address of a person’s website) and transactional (which are intended to buy something) searches. In the former, the most appropriate result is usually among the former, we detect and select quickly and in the latter we want to analyze all the alternatives. Would we trust the only solution given to us by an omniscient chatbot? In addition, they cannot provide up-to-date information as training data on these topics will be obsolete as soon as training is completed and the new system is published (ChatGPT data date back to September 2021). And many tools based on relatively simple databases and algorithms work perfectly: to take them to a site with GPS, to buy airline tickets ... There is no point in replacing them with others such as ChatGPT or Bard (and they will probably not replace them, if it is found that a question of this kind has been asked, they will be referred to classical systems).
They may be more appropriate for informational searches (when we seek information about something), as instead of analyzing a one-to-one response list, it will give us a direct answer and save us work. But you won’t be able to know if the information is correct or complete without assuming that work… And if the users stop entering the results websites, how will it be progressively maintained without visiting the current ecosystem of websites and media?
In fact, rather than the general public seeking information or asking questions, these creative artificial intelligence instruments are more useful to those who have to make the creation and, instead of replacing search engines, it makes more sense to create other instruments or services for those tasks.
As many risks as advantages
In any case, many of the people and sectors that have to write texts or make images have already started to use them or are making adaptations to use them as soon as possible: teachers, writers, journalists, cartoonists, students, scientists… But it must be clear that starting to use them without being aware of the advantages and risks and without thinking about how and for what.
On the one hand, the risk of relying directly on the response provided by these chatbots, that is, of publishing erroneous information or sending it to the evaluator. The people who are working on developing machine translation systems must draw what we always say, that it must be revised before it is published or used. Obviously, in many cases it will not be done because that requires some knowledge and work, and there are students who prefer to risk than learning or writing, some creators who earn time, some publishers who save money… Other times it will not be done because they think artificial intelligence systems are infallible (as is currently the case with some machine translation systems).
In the educational field, there are those who say that these systems cannot be avoided and that students must be taught to make good use of them, but there is also a concern about the consequences that a competence such as writing can have on academic results or on learning.
Beyond personal consequences, this can eventually lead to the proliferation of mediocre, erroneous, similar or deliberately false content in networks or media. And because future AI systems will feed and train with these contents, the system itself can also cause a worsening (similar to the genetic worsening caused by endogamy).
On the other hand, it is clear that it can lead to a significant loss of jobs. There are sectors concerned and/or angry, especially the cartoonists are excited, taking into account that surely to train this AA system that can take away the work they have used their drawings without respect to copyright and illegally… And the same happens with text generators. Some are calling for legislation to be able to audit with which data these systems have been trained.
Furthermore, such systems facilitate the creation of false content that appear to be truthful (in addition to creating text and image, there are also systems that can produce audio and video) and require systems or mechanisms to detect them (such as hidden watermarks), which is technically very difficult.
There are also environmental concerns, because these gigantic systems consume a huge amount of energy to train and use. The massive use and expansion of such systems does not seem to be the best in the context of the climate emergency we are in.
Finally, all these systems are negatively influenced by the development of the R.D.T. from the United States, which have been clearly seen with existing tools (office, machine translation, smart speakers, search engines…), multi-level biases (sex, race, language, culture…), loss of privacy, loss of technological sovereignty, concentration of wealth in a few hands…
To learn more about these risks and damage, it is highly advisable to follow Emily Bender and Timnit Gebru, who are in charge of warning them.
And Euskera, what?
We have already commented that ChatGPT is multilingual but not in a balanced way. Of course, English is the language that works best (linguistic bias mentioned), in other large languages it works very well, but in small languages not so much. ChatGPT has Basque and you can say it works quite well, but it still has a way to improve. And the knowledge of Basque culture or subjects is not what I would need (cultural bias mentioned). If the use of ChatGPT is made everyday or regular, it will be another area that the Basque country will lose, as, as in many other areas, the Basques will use it in Spanish, French or English if it works better, and think about what this can influence, especially if we do it in education.
Instead of dedicating ChatGPT to an uncertain or demanding hope for a better future in Basque, the path is to develop this type of technology in another way or locally. And that is that, although ChatGPT was very well in Euskera, the aforementioned damages deriving from the ownership of the technological multinationals (lack of privacy, concentration of wealth, technological dependence…) would still be there.
In this regard, there are initiatives to develop CSD in a different way and to liberate the systems developed, such as GPT-J, EleutherAI -founded by the non-profit company, or BLOOM, jointly developed by many researchers from around the world. Both demand results similar to those of GPT or ChatGPT. However, the GPT-j is much smaller in size and has no Basque in the training data, while the BLOOM is the size of the GPT and in the training data it has little Basque. Therefore, it is to be assumed that both will not work very well in Basque.
And that is, when giants introduce Euskera into their tools, they do so generically and with minimal effort, that is, using data that can be easily obtained, without analyzing quality or without making a special manual effort to obtain larger quantities, confusing with all the other data and leaving them in very small proportion. On the contrary, the local agents on numerous occasions have demonstrated that, by making an effort to obtain or produce quality data in Euskera in large quantities, developing tools only for the Basque Country, and making a specific development for specific tasks rather than a single instrument valid for all, we are able to obtain very good results for the Basque Country (often better than those of technological giants), for example in the field of translation and/or automatic transcription.
At the Orai NLP Theknologia work center where I work, which is part of Elhuyar, we are working in the field of creative intelligences GPT or ChatGPT for the Basque Country, but it is not an easy task, on the one hand, the demands of gigantic and growing structures to store them, train them and produce results in Basque are not in the hands of anyone. Therefore, in line with other minority or minority languages, research is under way to obtain smaller structures with less data, instruments of this kind and similar results. Let's see if we get it and then if we all get it right!
Gai honi buruzko eduki gehiago
Elhuyarrek garatutako teknologia