De cerca, nadie es normal

On Master Algorithms, ML Schools of Thoughts, and Data Privacy

Posted: November 19th, 2021 | Author: | Filed under: Artificial Intelligence, Machine Learning | Comments Off on On Master Algorithms, ML Schools of Thoughts, and Data Privacy

November, 19th 2021

Although not a recent one (2015), The Master Algorithm by Pedro Domingos is a pleasant book to be read, mainly as a sort of basic pedagogical introduction to machine learning. As the author stated in the book, “when a new technology is as pervasive and game changing as machine learning, it’s not wise to let it remain a black box. Opacity opens the door to error and misuse.” Therefore, this initial effort to democratize this subfield of artificial intelligence is logically welcome.

Professor Domingos is a machine learning practitioner and hence you can realize his bias concerning other approaches to artificial intelligence; said that, it’s interesting how he divides and frames the different schools of thoughts inside machine learning. From his standpoint, there are five schools:

  1. Symbolists: they view learning as inverse deduction and they take ideas from philosophy, psychology, and logic.
  2. Connectionists: they reverse engineer the brain and they are inspired by neuroscience and physics.
  3. Evolutionaries: they simulate evolution on the computer and they draw on genetics and evolutionary biology.
  4. Bayesians: they believe learning is a form of probabilistic inference and they have their roots in statistics.
  5. Analogizers: they learn by extrapolating from similarity judgements and they are influenced by psychology and mathematical optimization.

Each of the five tribes of machine learning has its own master algorithm, a general purpose learner that you can in principle use to discover knowledge from data in any domain. The symbolists’ master algorithm is inverse deduction; the connectionists’ is backpropagation; the evolutionaries’ is genetic programming; the bayesians’ is Bayesian inference; and the analogizers’ is the support vector machine.

For Symbolists, all intelligence can be reduced to manipulating symbols. Symbolists understand that you can’t learn from scratch: you need some initial knowledge to go with the data. Their master algorithm is inverse deduction, which figures out what knowledge is missing in order to make a deduction go through, and then makes it as general as possible.

Symbolist machine learning is an offshoot of the knowledge engineering school of AI. In the 1970s the so-called knowledge-based systems scored some impressive successes and in the 1980s they spread rapidly, but then they died out. The main reason was the infamous knowledge acquisition bottleneck: extracting knowledge from experts and encoding as rules is just too difficult, labor intensive, and failure-prone. Letting the computer automatically learn to, say, diagnose diseases by looking at databases of past patients’ symptoms and the corresponding outcomes turned out to be much easier that endlessly interviewing doctors.

For Connectionists, learning is what the brain does. The brain learns by adjusting the strengths of connections amongst neurons, and the crucial problem is figuring out which connections are to blame for which errors and changing them accordingly. The connectionist master algorithm is backpropagation, which compares a system output with the desired one and then successfully changes the connections in layer after layer of neurons, so as to bring the output closer to what it should be.

Connectionist representations are distributed, mirroring what happens in the human brain. Each concept is represented by many neurons and each neuron participates in representing many different concepts. Neurons that excite one another form a cell assembly. Concepts and memories are represented in the brain by cell assemblies. Each of these can include neurons from different brain regions and overlap with other assemblies.

The first formal model of a neuron was proposed by Warren McCulloch and Walter Pitts in 1943. It looked a lot like the logic gates computers are made of. McCulloch and Pitts’ neuron did not learn though. For that it was needed to give variable weights to the connections amongst neurons in resulting in what’s called perceptrons. Perceptrons were invented in the late 1960s by Frank Rosenblatt, a Cornell psychologist. In a perceptron, a positive weight represents an excitatory connection, and a negative weight an inhibitory one. The perceptron generated a lot of excitement. It was simple, yet it could recognize printed letters and speech sounds, just by being trained with examples.

In 1985 David Ackley, Geoff Hinton, and Terry Sejnowsky replaced the deterministic neurons in Hopfield networks with probabilistic ones. A neural network had then a probability distribution over its states, with higher energy-status being exponentially less likely than lower-energy ones. One year later, 1986, Backpropagation was invented by David Rumelhart, a psychologist at the University of California, with the help of Geoff Hinton and Ronald Williams.

Evolutionaries believe that the mother of all learning is natural selection. The master algorithm is genetic programming, which mates and evolves computer programs in the same way that nature mates and evolves organisms. Whilst backpropagation entertains a single hypothesis at any given time and the hypothesis changes until it settles into a local optimum, genetic algorithms consider an entire population of hypothesis at each step, and these can make big jumps from one generation to the next thanks to crossover. Genetic algorithms are full of random choices; they make no a priori assumptions about the structures they will learn, other than their general form.

Bayesians are concerned above all with uncertainty. The problem then becomes how to deal with noisy, incomplete, and even contradictory information without falling apart. The solution is probabilistic inference and the master algorithm is Bayes’ theorem and its derivates. Bayes theorem is just a simple rule for updating your degree of belief in a hypothesis, when you receive new evidence. If the evidence is consistent with the hypothesis, the probability of the hypothesis goes up, If not, it goes down.

For Analogizers the key to learning is recognizing similarities between situations and thereby inferring other similarities. The analogizers’ master algorithm is the support vector machine, which figures out which experiences to remember and how to combine them to make new predictions. The nearest neighbor algorithm, before the support vector machine, was the first preferred option in the analogy-based learning.

Up to the late 1980s researchers in each tribe mostly believed their own rhetoric, assumed their paradigm was fundamentally better and communicated little with the other schools. Today the rivalry continues but there is much more cross-pollination. For professor Domingos, the best hope of creating a universal learner lies in synthesizing ideas from different paradigms. In fact just a few algorithms are responsible for the great majority of machine learning applications.

As a coda to his pedagogical explanation of machine learning, professor Domingo’s views about data privacy are worthy to be highlighted. From his standpoint, our digital future begins with a realization every time we interact with a computer -whether it’s a smart phone or a server thousands of kilometers away- we do so on two levels: the first one is getting what we want there and then: an answer to a question, a product you want to buy, a new credit card. The second level, in the long run the most important one, is teaching the computer about us. The more we teach it, the better it can serve us -or manipulate us.

Life is a game between us and the learners which surround us. We can refuse to play but then we will have to live a twentieth-century life in the twenty-first. Or we can play to win. What model of us do we want the computer to have? And what data can we give it that will produce that model? Those questions should always be in the back our minds whenever we interact with a learning algorithm -as they are when we interact with other people.