Isabelle Guyon: “We have to put artificial intelligence to the maximum number of people”

Isabelle Guyon: “We have to put artificial intelligence to the maximum number of people”
That's it. Some simple jobs to be human are very difficult for machines. For example, if the machine has to learn how to distinguish pears and apples, it is sometimes difficult for it, as some apples look like pears and some apple pears. It's hard to know where the limit is. And we developed some complex mathematical algorithms that helped us detect these limitations -- the aid vectors.
In fact, many methods are used in machine learning. At first, I graduated from neural networks, compared several methods, including so-called nuclear methods. But then, when I met Professor Vladimir Vapnik at the Bell Lab, we used and developed aid vectors based on a method invented in the 1960s, examples of data discrimination.
I realized that those algorithms and the nuclear method could be combined. My husband, Bernard Bozer, implemented that combination and worked pretty well. We started to apply in a number of things. With Bernard Schölkopf we developed a whole field around nuclear methods, multiplying their applications.
I worked in this field for many years. It replaced my first love, the neural networks. Not intentionally, but in my work there were two areas that were in competition. However, in practice they are not competitors. On the contrary, I think they are very complementary. In basic learning, you can combine neural networks and support vectors. Now there are many people who combine them and create more powerful techniques.
That is, there was a turning point. Since sufficient data are available to form neural networks and other study machines, the machines have matched human capacity. Sometimes they've even exceeded it, because the processing capacity of large amounts of data is very limited in humans. For example, they trained a lot of people to play Go, with hundreds of parties on display, overcoming human capacity. It was a surprise that a machine outperformed the game champion Go, because we thought it was still far away. And of course, this causes as many fears in people as sleep.
They fear that machines will become “superbeings”. But I believe that this is a great opportunity and that we should not be scared, but we must exploit it and make it available to the largest possible segment of the population.
Yes, and I think we're at the beginning of the revolution, because they're spreading a lot, especially the algorithms that find patterns in the data. Now, on our phones and on our computers there are a lot of machine learning products that know the faces or that do automatic translations, for example. There are numerous applications of artificial vision thanks to convolutional neural networks. They actually developed them at the Bell Lab when I was working there. And at the same time, we work with vector machines, because they're complementary.
For example, suppose you form a neural network to split an image into small segments that then combine into chunks of stripes and crosses. You need big databases to get good training. But if you don't have a lot of data and the data you have is not right? For example, imagine you want to train your system to know the faces, but most of the images you have are images of objects, other kinds of data. But suppose it also has a few images of children's faces. To be able to train the system you need a method based on examples, like vector machines, and not a method based on the features you see.
That's it, and we also have different ways of learning. For example, we have a long-term memory. This memory needs a lot of data and allows us to learn pattern differentiation strategies. And we have a short-term memory, with examples that we only learn from memory and then we make comparative decisions.
Yes, that has been very important. And it's still important. We talk about big data, which is having a lot of data. But what data do we need? There are basically two ways to address the issue: the high number of examples and the high number of characteristics of each. If we talk about chemistry, we can study a molecule of many characteristics, with thousands of characteristics. In addition, biomedical research can also study the patient, who has thousands of characteristics.
For example, if we measure all the activities of genes, we study thousands of characteristics. Big data is a different kind. We don't have many genes, but we have many of their characteristics. Here you can use the aid vector machines. So much has been used in biomedicine and now also in chemistry.
Yes. And the most interesting thing is that we combine different disciplines: statistics, optimization and other traditional methods. Many people have joined forces in the last 20 years. Conventional statistical methods were sometimes unknown in computer science. And it's exciting for people who have worked on other kinds of artificial intelligence, that we can only do powerful things from numbers, especially by manipulating numbers and collecting lots of data.
But it's not black magic. If we have hundreds of thousands of features, how can we distinguish patterns? Are we trying to find the most characteristic features of one thing or another? Suppose we want to separate dogs from cows. It doesn't matter to have four legs, because both dogs and cows have four legs, but cows have horns and dogs don't. They look for these kinds of characteristics. Ultimately, from hundreds of thousands of data, you can simplify the problem by analyzing just those few numbers that matter to you for a particular problem.
People often think it's hard to have a lot of data, but the hardest thing is to have little data. In fact, Vapnik's theory helped us a lot to understand that when we have little data, we need to use pretty simple models. Interestingly, neural networks that handle little data are narrow networks. It underlies the complex theory. Now it's called regularization theory, that is, to work with little data, the key is not only the model you use, but also the way you form it.
I am particularly interested in what we call “short learning of little data”, that is, systems that must learn from a few examples. In these cases, we organize competitions. That's my way of working. Instead of me and the students doing the work, we open up the problem to a large group of researchers. We are therefore raising problems and opening up the possibility for anyone to solve. We can do a new job with a system formed in other jobs.
Yes. GAN networks have revolutionized neural network formation in recent years. People invent new methods and new ideas to exploit them. One of the things we've done is generate realistic artificial data. One of the goals is to protect privacy. And that is that these data, on many occasions, generate privacy concerns or have commercial value, so they cannot be disseminated without more. The big problem has been that some large companies have been denounced for releasing private data. So now they are very prudent. And that's bad for the research community, because researchers can't study more interesting problems and try to find a solution.
So I've worked with my colleagues in the New York IPR. Divide messages based on GAN networks to generate realistic artificial data without information about individuals. These data store all the statistical properties of the actual data, so they are useful for research.
Thus, students can use them to form the systems. The problem is that we'd also like to use them to make real discoveries, and they don't help. Keeping the properties of the real data, we could use them in research to make real discoveries. We're trying to progressively extend the boundaries of these realistic artificial data.
Yes, in biomedicine we have created many fake medical records because it's very sensitive information. In general, we were collaborating with companies with sensitive data, but they didn't allow us to export data. However, we now export models that can generate data that could exceed certain security or privacy limits. If I hope it will serve the scientific community.
Buletina
Bidali zure helbide elektronikoa eta jaso asteroko buletina zure sarrera-ontzian