Data, happy data
“Give and give data, do we have to feed the monster ourselves?” a friend once asked me. It seemed to me that it was not only a simple question, but also a question of account, that it really meant “what are you doing, underestimating the work of creators under the pretext of feeding artificial intelligence”. The consideration of artificial intelligence as a monster also had its importance in the question.
And he gave me what to think. My friend made clear to me the two sides: on the one hand, the data providers; on the other, the data consumers, those who are dedicated to the research and development of the so-called Creative Artificial Intelligence (AAS, from here). In order to avoid creating gaps between the two sides, it is advisable to clarify the roles and perspectives of both.
The author is the owner of the work he has created, whatever its format, and the copyright cannot be alienated. The author decides whether or not to publish and, if so, how to publish. In the case of delegation to an editorial, the exploitation rights are established by contract with the publisher, which includes the conditions of the authorizations for the reproduction, distribution and sale of the work. So far nothing new.
These works, let’s call them, are essential in the development of the ASA. The technological giants began to systematically collect texts, audios and videos to make us aware, and that collection has not stopped since then. It’s amazing how data collection has been massified. The data is collected from anywhere, at any time and in any way, in intensive sucking.
There is considerable confusion as to the legality and legitimacy of this collection. To begin with, if they are published under an open license, let’s say on the web, they are available and therefore available for use. In this case, unless otherwise indicated by the owner of such data, the language model developer may use and publish such data. If, on the other hand, they are published under more restrictive licenses, the possibility of republication by a third party may be denied. But the question is: could language models be introduced with this data?
There are good reasons to say yes. In fact, language models do not reproduce, distribute or sell data as such. They use the data. This, being strict, is not plagiarism or copying. There is a radical innovation brought by the AAS. Until now, only people were the ones who used data to dress us, and that is why it is said that from the moment the data is published, it becomes collective knowledge. Well, language models do exactly that, they take advantage of that knowledge to form the mathematical model that they contain. Therefore, there does not seem to be any obvious legal impediment to this practice. This approach has strength among researchers and developers in the field of AAS.
But here we must also highlight that the technological use of this collective knowledge has such an economic value, how should all this be managed? Behind this is, of course, the issue of benefit-sharing, which requires recognition of the work of authors and data providers. How? How? This is a complex issue, too complex to deal with in this small space.
What is clear is that the solution is not to put limits and obstacles to the knowledge that has been collectivized, to the data that has been published. Such behaviour is contrary to open data and knowledge and ultimately harms small and resource-poor languages.
Languages such as Basque need to facilitate the use of data and use open licenses so that what we Basques have created in Basque is also reflected in technological services and products. What we really need is for these products to also be made in Basque, at the level of powerful languages.
We have talked about authors, publishers, collectors and developers, but there are also users, and in the era of AAS users are not just users, but also data providers. When we make inquiries, give assent or disapproval to the answers, we are providing information. The first is to be aware of this and the second is to act responsibly.
My friend was talking about the fear of feeding the monster. Responsibility and recognition of its creators are necessary, but paralyzing fears and denial of data will not be good for the Basques.
Buletina
Bidali zure helbide elektronikoa eta jaso asteroko buletina zure sarrera-ontzian



