The Master Algorithm gives readers an overview of the current state of machine learning. The book’s hypothesis is that there can be one general purpose algorithm that’s able to learn anything. The book reviews the philosophies, strengths, and weaknesses of each of the “five tribes” of machine learning and, in the process, readers learn the history of artificial intelligence (AI). The Master Algorithm concludes with some thoughts about a future when computers surpass the capabilities of the human brain for tasks that previously required conscious beings. Will learners solve all of our problems and bring about technological singularity, or will our future learning machines become our overlords in a Skynet-like dystopia?
The book is intriguing and informative but also sometimes bewildering and frustrating. The text is written in a fanciful manner, which made it fun; however, it often doesn’t fully explain concepts. Consequently, I needed Google and Wikipedia to fill in the gaps. The book’s author, Pedro Domingos, is a respected machine leading expert. Read this book if you want a comprehensive overview of machine learning and if you don’t mind resorting to other sources for background information about unfamiliar topics. If you already have a solid background in machine learning or statistics, you may enjoy this book more than novices and still gain valuable insights.
Taming the Complexity Monster
If you’ve been involved in software development over the past decade or two, then you know that the resources available to developers have increased significantly. Software engineers build on each other’s efforts and often contribute their work to free open source projects, resulting in platforms that enable us to develop ever more sophisticated systems faster and at a lower cost. The problem that we encounter on our way to software nirvana is something Domingos calls the “complexity monster,” which he likens to a multi-headed hydra. Space complexity renders an algorithm useless when the bits required to run an algorithm exceed a computer’s memory. Time complexity destroys usefulness when it takes too long for algorithms to give us answers. But the most troublesome complexity monster rears its head when the task we’re trying to solve is too complicated for our feeble human brains. Machine learning can rescue us from that complexity monster. Traditional computer programs take input data, then use an algorithm to process that data, and finally produce results. With machine learning, the input is both the data and the results, while the output is the algorithm. In that way, learners are programs that write themselves.
Learners Will Cure Cancer
An algorithm for curing cancer is one that Domingos says would be too complex for humans to develop without machine learning. Such an algorithm could instantiate models of cells in the human body and anticipate how those cells interact. We would encode the algorithm with what we already know about molecular biology and input vast amounts of data from biomedical research, including DNA sequences, patient histories, and so on. When an individual gets cancer, the program would instantiate models of that patient’s normal and cancerous cells. It would then try all available drugs against the models before selecting a treatment most likely to destroy the tumor without harming the patient’s normal cells. Holy sh##!
Overfitting is a Universal Problem
Overfitting is the most common problem in machine learning. It happens when algorithms “hallucinate;” in other words, they detect relationships that aren’t there. As models include more parameters or increase in complexity, overfitting happens more often. Suppose you’re a human resources manager who needs to replace Joe, a key employee who just quit. So you write the following in a job ad: “Wanted: Software Engineer. Must have BS and MS degrees in Computer Science from Boston College with five and one-half years of Python experience. Must be a 36-year-old male with reddish-brown hair, own a German Shepard named Schatzie, and drive a dilapidated red Honda Civic.” That job ad overfits because while it accurately describes Joe (i.e., the training data), many the attributes listed are irrelevant for finding a qualified replacement. Stated another way, the ad is a good fit for a training data set, but it doesn’t capture the relevant dimensions required to fit new data sets. How to tune algorithms to reduce overfitting is a challenge that all tribes in the machine learning community must address.
“With machine learning, the input is both the data and the results, while the output is the algorithm.”
The Five Tribes of Machine Learning
Each of the five tribes of machine learning can cite successful implementations. But each tribe also has weaknesses, and none yet can lay uncontested claim to the master algorithm. The tribes have always competed with one another and have traded places over time when one algorithm gains momentum while others fall from grace. Domingos delves into some of the personalities (and their egos) who drove the movement. The following subsections summarize the characteristics of each tribe.
The symbolists tribe employs an algorithm known as inverse deduction, which is another way to say induction. Whereas deduction uses general rules to reach conclusions about instances, induction starts with specific examples to arrive at general rules. If, for example, all four people who just came out of a room have blue eyes, then the algorithm would induce that everyone in that room has blue eyes.
Symbolists enjoyed successes in the 1970s and 80s with the expert systems they developed during that period. But as Domingos points out, those types of systems suffer from a knowledge acquisition bottleneck. Obtaining rules from experts and encoding those rules in computer programs is a far too laborious and error-prone process to make this approach a suitable master algorithm. As Domingos noted, “Letting a computer learn to, say, diagnose a disease by looking at a database of past patients’ symptoms and the corresponding outcomes turned out to be much easier than endlessly interviewing doctors.” The effectiveness of a more independent learning system requires a lot of training data (e.g., patient database). Keeping that in mind, consider the following chart:
According to the preceding chart, humans created 5 exabytes of data from the beginning of time until 2003. By 2015, we produced that amount of data every two minutes. In addition to YouTube cat videos, that data includes the Web links we click on, MRI results, geopositioning data collected by cell phone companies, and much more. Clearly, we now have a great deal of data that we could use to train our learning algorithms. The combination of more advanced learning algorithms and the data we need to teach those algorithms has made increasingly useful self-writing programs a reality. The torrent of training data also helped other tribes, most notably the connectionists, to eclipse the symbolists.
The connectionists tribe models their learners after the brain. Those learners include a software representation of interconnected neurons (a.k.a. neural network). The connections are like the synapses we have in our brains. This tribe uses a backpropagation algorithm to give neurons feedback on whether or not they are generating good results. Based on those results, the network will adjust connection weights, which is how neural networks learn. In real life, something like backpropagation (i.e., reinforcement) happens when, for example, a toddler learns that he can get his favorite toy by screaming at the top of his lungs. But a toddler (and many other animals) can also learn things from one-off events, like when he learns not to touch a hot stove. Backpropagation does not handle those one-time learning events. Moreover, even though neural networks have made tremendous progress, they still do not approach the complexity of the human brain. For example, neural networks can recognize a picture of a cat, but people can identify the same cat at different angles.
The term deep learning is most often associated with connectionists. The connectionists tribe currently has the greatest momentum in AI. Google, Facebook, Amazon, and others are investing heavily in connectionist projects. Google, for example, paid half a billion dollars to acquire DeepMind, a London-based deep learning startup that had no customers, no revenue, and just a few employees. Notable areas where neural networks perform well include handwriting and speech recognition (e.g., “Alexa, play something that I would like”).
Like connectionists, nature inspires the evolutionaries tribe. They believe that the best learners mimic natural selection. This tribe uses what they call, unsurprisingly, genetic algorithms. Things that work are allowed to reproduce, and also mutate, creating a new set of candidates that will compete for survival in subsequent generations. Unsuccessful features die childless. Genetic algorithms have been very successful in robotics.
Genetic algorithms are particularly vulnerable to the exploration-exploitation dilemma. If we know that something works, should we stick with it or try something new? In nature, life forms regularly mutate. Most mutations are harmful in that they reduce the probability that the mutated organism will successfully pass on the mutation to its offspring (or even survive). But the rare helpful mutation will give the organism an advantage, resulting in the mutation spreading to future generations and sometimes throughout the species’ population. A subfield of machine learning includes algorithms that explore on their own. We refer to those algorithms as reinforcement learners. Your future home robot will probably use reinforcement learning. The first thing it might do after you take it out of the box will be to roam around your kitchen discovering where everything is. Reinforcement learners address the exploration-exploitation dilemma by sometimes making a random choice as opposed to the best-known choice. For example, you and your robot may one day learn that you love chorizo eggs Benedict, even if you’ve never asked for it.
Domingos argues that nature-inspired algorithms alone are unlikely to solve our most pressing problems, such as curing cancer. That’s because while backpropagation and genetic algorithms can produce excellent results, some results will probably be suboptimal. He cites an inside joke among molecular biologists who say that the biology of cells is such a mess that only people who are unfamiliar with it could believe in intelligent design.
Reverend Thomas Bayes lived in early 18th-century England. Without knowing it, Bayes invented a new way to think about chance. Pierre-Simon de Laplace, a French mathematician who was born almost 50 years after Bayes, first codified Bayes’ ideas into what we now call Bayes theorem.
Bayesians use the probabilistic inference algorithm. It describes the probability of an event based on prior knowledge of conditions that could be related to the event. Bayesians, therefore, start with a set of hypotheses that could explain the data. Then they test those hypotheses. They use Bayes’ theorem to update probabilities in response to test results. For example, you might start with a hypothesis that an individual has a certain percent chance of having AIDS. If that person is an IV drug user, the probability goes up. If they tested negative for AIDS, the probability goes down. Probabilistic inference adjusts for new information that impacts probabilities. Crucially, it also considers the effects of false positives and false negatives in its probability calculations.
The Bayesians’ have been successful implementing spam filters. The filters determine the probability that a message is spam by considering specific words that occur in the text, evaluating all words, and by looking for rare words. You can also tune Bayesian spam filters for individual users.
Bayesian learning relies on probability. Domingos notes that “Most experts believed that unifying logic and probability was impossible.” That made the prospects of finding a master algorithm “not look good,” particularly when learners need to combine multiple data sets. In the book, Domingos recounts how he was frustrated when he tried to implement logic in a Bayesian model.
Members of the analogizers tribe argue that entities with similar known properties have similar unknown ones. For example, if you liked the movie Pulp Fiction, and a lot of people who enjoyed Pulp Fiction also liked No Country for Old Men, then you’ll probably like it, too. Netflix and Amazon have used the nearest neighbor method for making recommendations.
Analogizers use a type of algorithm called a support vector machine (SVM). Vladimir Vapnik, a Russian emigre and Bell Labs researcher, invented the SVM approach during the 1990s. SVMs dominated machine learning by about 2000; however, fancier deep learning algorithms have eclipsed SMVs in recent years.
Domingos says that the curse of dimensionality effects all learners; however, learners that employ the nearest neighbor approach are particularly vulnerable. If the model only includes a few dimensions, the curse is not a problem. But the training data we have today can include numerous dimensions. The likelihood that “Joe” has cancer is probably most strongly related to only a few factors. But our training data may tell us a million things about Joe, making it more likely that irrelevant factors creep into our model. As the number of attributes in your model goes up, the number of examples you need to find in your training data grows exponentially. As Domingos puts it, “With twenty Boolean attributes, there are roughly a million different possible answers. With twenty-one, there are two million…”
Domingos suggests that an approach he researched, called Markov Logic Network (MLN), is the most promising candidate so far for the master algorithm. He believes that the master algorithm must provide unified support for both logic and probability. To be honest, I’d probably have to read a great deal more than the few pages he devoted to explaining MLNs in his book before I could even begin to reach my own judgment on that.
Understanding machine learning is hard. Doing so requires knowledge of numerous abstract concepts. Advanced degrees in math and computer science wouldn’t hurt either. As far as I can tell, Domingos is the only author to even attempt to provide a comprehensive overview of the area. That required both smarts and courage. To make the book more accessible, Domingos left out almost all of the math. Ironically, that could have made the book a little harder to follow for people like me who are uncomfortable reading about technical topics without understanding the details. Although I don’t believe he was entirely successful, Domingos did a pretty darn impressive job of tackling a difficult challenge. For me, the book was still well worth reading.
If you think that trends like automation, outsourcing, and globalization have cost workers a lot of jobs in the past few years, then you’re in for a shock. Machine learning is going to bring about far more drastic changes to our economy and careers in the next decade. And the changes won’t just impact Uber drivers; even highly skilled professionals, like attorneys, doctors, and financial planners, are going to see significant changes. Silicon Valley billionaires are already talking about socialist-sounding concepts like Universal Basic Income (UBI). Domingos predicts that instead of measuring the unemployment rate, soon we’ll be following the employment rate. While the potential benefits of machine learning (e.g., curing cancer) could not be more exciting, it will also cause a great deal of economic and social turmoil.
I’ll be doing a lot of studying in this area.
If you want to learn more about machine learning, then check out these links:
- Coursera course on machine learning by Stanford University.
- Calculus for Dummies – You’ll need it (and probably linear algebra as well) to really understand the math in machine learning.
If you want to get your hands dirty with machine learning, then you might want to also learn how to write code in Python. I’m not tackling the Stanford Course just yet. But I am teaching myself Python 3 (on a very low priority thread) via a Lynda course. Then I’m going to write some machine learning code myself when I take Machine Learning Essential Training.
While writing this post, I found the searchable online version of The Master Algorithm on Google Books to be very useful. The book includes an unusually good Further Readings section at the end of the book.
Another book that I’m planning to read is Superintelligence: Paths, Dangers, Strategies.