The AI Hunger Games - Why is modern Artificial Intelligence so data hungry? (Part I)

Tuesday, June 5, 2018

The AI Hunger Games - Why is modern Artificial Intelligence so data hungry? (Part I)

Guest Post written by Paulo Villegas - Head of Cognitive Computing at AURA in Telefónica CDO

Modern Artificial Intelligence is performing human-like tasks that seemed out of reach just a few years ago. Granted, we are talking about narrow AI (tasks involving only a small subset of human capabilities) – general AI is still far away. But on that narrow task we are experiencing spectacular advances. The most salient results are in perception: visual perception (image recognition, which in tasks such as large-scale object recognition is achieving human-like performance) or audio perception (speech recognition is also achieving unprecedented results). But other noteworthy results have also made headlines, such as Google’s AlphaGo beating Go champions. There are also initial forays into ‘artistic’ traits such as painting styles or music composition.

Many of these advances are tightly related with development in certain area of Machine Learning – Deep Learning (statistical learning performed by neural nets with many layers). Deep Learning is achieving impressive results in many areas due to its versatility and the capabilities of modern networks to be trained very efficiently. But there is a catch: to achieve its magical performance, a deep learning instance typically needs to be trained with lots of data.

One typical example is ImageNet, the image database typically used to train deep learning classifiers for object recognition. ImageNet is large: it contains more than 14 million images. These are distributed into many classes: there are near 22,000 different classes in it (each class being that of images containing relevant instances of a given entity). In order to be able to recognized, say, cats in images, we could gather the cat images in ImageNet and train a deep learning neural network with them (along with a varied and sizable collection of images that are not cats). How many images would we use? Including subcategories (such as Siamese cat, Angora cat, etc) there are 22,387 images with cats in ImageNet. That’s indeed a lot of cats.

Figure 1. Everybody likes cat pictures, so here we go: the “cat” class in ImageNet

Modern AI is highly concerned with statistical pattern recognition, and that’s a huge difference with older (“classic”) AI, which was highly symbolic. Back at the beginning of Artificial Intelligence research it was believed that AI was the realm of logic and reasoning. Good-Old Artificial Intelligence (abbreviated to GOFAI) was all about establishing rules and reasoning over them, trying to emulate the higher levels of human thinking. This did not work. For perception tasks, GOFAI failed miserably, being incapable to cope with the sheer variety of reality, which is full of noisy instances not really suitable for “sharp” reasoning.

Nowadays it is accepted that tasks related with perception (making sense of the world around us) are much better solved with statistical machine learning, training systems with real examples of that world around us. But why should we need so many? Do humans need that many examples to recognize stuff?

I don’t have data at hand, but it seems unlikely that a human child will need to see 22,000 labeled instances of cats (i.e. having her parents and teachers show them 22,000 cats told to her as actually being cats) before it can recognize one. Of course, since each single cat is often presented to the child not as a still image but as a living animal, the child can probably see it from different angles and moving, which aids recognition. But still. Humans seem to need a lot less examples to be able to recognize them.

However that comparison is not fair.

A Deep Neural Network prepared for visual identification starts as a blank slate. Yes, we fix its topology (number and shape of neuron layers, activation functions, etc.) and the training procedure (mini-batches, dropout, momentum, etc.). But the network parameters (neuron weights, of which a big deep learning network can have millions) come uninitialized, or with random initializations.

From then on, the network is shown all the labelled training data (including our 22000 cats) many times; at the end of each training epoch the system has learned from the examples seen and (usually) improves its recognition performance, step by step, until it reaches its final (impressive) success.

A human brain is nothing like a blank slate. 

Instead, it comes preconfigured with a lot of wiring. This wiring is imprinted in the human genome, which is the one containing the instructions for building the brain. Those instructions have been shaped by millions of years of natural selection: evolution has selected the brain wirings that made us fitter for survival, among them the ones that equal to a huge amount of “training epochs” for our neural circuits. We are not born with “random weights” in our brain, rather with structures already well prepared for perception tasks. Among them, recognition of cat-like animals. Which is an ability likely to be very useful for survival, hence a good trait to be acquired through natural selection.

tiger with cubs
Figure 2. Human brains better come pre-trained for fast cat identification … or else

And in that sense we could argue that the human brain comes pre-trained by evolution. That means our neural wiring has a great advantage over the current AI systems, which they need to overcome by using many examples.

In the next part we will talk about some ways to somehow overcome this data appetite of Deep Learning systems.


2 comments:

  1. Nice post! It would be interesting to compare intermediate outcomes of a child and a Neural Network in the process of learning what a cat is. In the beginning, a child might overgeneralise and call all 4-footers cat, only learning later to distinguish between a cat and a dog. What does a NN do?

    ReplyDelete
  2. That's an intriguing comparison. I'd say that a NN learning pattern (say, a CNN for visual classification) has typically two main stages of learning:
    * during the initial ramp-up learning, it will produce egregious mistakes, like mixing up cats and locomotives, since the layers are still trying to adjust the main information flow patterns
    * then, on the refinement phase, when it has already achieved reasonable performance (say, 70-80%) errors will be more nuanced. Not as semantic as a toddler would do (like labeling all 4-legged furry animals) since such a higher level abstraction is (probably) not embedded in the net, but more like labeling as cats items for which patches of the object can have cat-like resemblance (i.e. it's more a piecewise classification)

    In the NN case the confusion depends a lot on the negative examples (i.e. the non-cat images it is shown). I could imagine that for a child the negative examples come from all her background knowledge, which is probably more focused than the collections fed to the NN (i.e. the range of real-life scenes that a typical child is exposed to is probably more restricted than an arbitrary database). This could help recognition by being able to concentrate on more specific differences, but at the same time could make generalization more difficult for the child (i.e. when exposed to never-seen-before animal). This is where semantic abstractions ("four-legged furry animal") would be most applicable.

    If we follow utility theory, we might also conclude that evolution could have shaped us towards taxonomy mistakes that are nevertheless useful and provide natural selection advantages. Mislabeling a jaguar as a tiger is a zoological error, but an evolutionary advantage: the outcome (get away now!) is still a good survival fit.

    ReplyDelete