Lesson, bitter?
in 2019, Rich Sutton published his short essay The Bitter Lesson. If he published it, he did well to explain some of the beliefs of those who were operating in artificial intelligence. You could see what he said when he wrote them. Practice had already shown that we were in the era of profound change. The reason for the shock was not so much the originality of the ideas issued, but the fact that the formulation of these ideas was so general and categorical.
And what did Sutton argue about in that The Bitter Lesson?
He said that the most significant advances in artificial intelligence came not from imitating human intelligence, but from taking advantage of general methods by scaling computing power. In other words, it was better to use general search and learning algorithms in machines with high computing power than to take advantage of specialized human knowledge. Sutton knew by the time he wrote it that this statement caused frustration in those who had spent a long time on detailed research work. That’s why he called it a “bitter lesson.”
He already had examples. The computer method that beat Kasparov in 1997 was based on a massive and exhaustive search and not on the specific knowledge of chess that had been cultivated until then by computational chess researchers.
Deep learning has come to reinforce this lesson, where the dependence on human knowledge is even less. With large data sets and the computational capacity to process them, general learning processes serve a variety of tasks. It is a general pattern and is applied as such in many areas.
This paradigm has also prevailed in the processing of language, causing a crisis in classical computational linguistics. Why teach systems morphologies, syntax, etc.? Why, if they learn well enough from large collections of texts without additional linguistic knowledge? Where are those “chinglés” left to break down the computational description of the language? What is nostalgia.
It is true, however, that some authors do not fully agree with Sutton's lesson, and although many accept it as a guiding principle, they do not regard it as a universal dogma. Three of the criticisms stand out. The first one: the principle of "more and more computing" is not viable because computing is not an infinite resource.
Second: because efficiency is important, it is necessary to look for the optimal way in scaling, balancing the data with the capacity of the model. Third, although general methods are powerful, the addition of expert knowledge is still beneficial in many contexts, especially when limited data are available.
From this third point of view, what happens to the language models that speak Basque? Some of them are doing well (even without Basques in the work teams), but there are also shortcomings in both the quality of the Basque language and the reflection of the Basque culture.
That’s why we Basques know how to deal with the frustration caused by Sutton’s lesson: let’s find ways to enrich our knowledge with these black systems that are data eaters.
Buletina
Bidali zure helbide elektronikoa eta jaso asteroko buletina zure sarrera-ontzian