Earlier this year, an essay by Gary Marcus threw some cold water on the idea that humanity is on the brink of general artificial intelligence. Specifically, Marcus targeted Deep Learning, the method underlying modern neural networks. Marcus makes a series of critiques, but one prominent one (discussed earlier) is that neural networks have abysmal judgment for problems that lie outside the borders of their training set. This doesn’t mean they can’t be very good inside their training set. But give them a problem not well approximated by a mix of the features of examples they are trained on, and they fall to pieces.
On this (and other) dimensions, neural networks fare worse than human beings. But how important is this? In this post, I want to think a little about how far you can get with purely “interpolative innovation.” By interpolative innovation, I mean innovation that consists in the discovery of new things that are “mixes” of preexisting thing. A lot of innovation falls under this category.
A nice 2005 paper by economist Ola Olsson serves as a roadmap. Olsson’s “Technological Opportunity and Growth,” published in the Journal of Economic Growth (paywall, sorry), makes no mention of neural networks. It was meant to be a paper of how innovation in general happens. But it nicely illustrates how the dynamics of primarily interpolative innovation might play out in the long run, and how interpolative innovation can come to look like it evades Marcus’ critique: it (seemingly) breaks free of its training set.
The Technology Space
Olsson asks us to imagine a highly-dimensional space. Each of the dimensions in this space corresponds to some kind of attribute that an idea might have, and which might be measured by a number. Olsson suggests dimensions could correspond to things like “complexity”, “abstractness”, “mathematics”, “utility”, and so on. Scattered throughout this space are specific technologies, represented as points. Essentially, he asks us to imagine the human technological system as a cloud of points, where each point corresponds to a technology and a point’s position tells us about its features. Technologies with a lot of similar features are closely bunched together, and technologies with very different features will be distant.
This can be mapped into a neural network setting pretty easily. For a neural network to work with data, the features of the data need to be inputed to specific neurons as numbers. We can imagine those numbers correspond to positions along axes in Olsson’s technology space. Just as the set of technologies floats out there as a cloud of points in highly-dimensional space, so to does the set of training examples float out there in highly-dimensional space. Examples that have very similar features are close together, examples with very different features are far apart.
Innovation in the technological space
In Olsson’s model, all innovation is an interpolation between existing technologies. To begin, he defines incremental innovation as the discovery of a new technological point lying on a line connecting two technologies that are “close” together and which are already known. Essentially, we add new points to our cloud, but we can only add points in the spaces between existing technologies. However, as we innovate, we add new technologies, and these give us new possibilities for combination. If incremental innovation was all there was, then in the long run, we would eventually fill up all the gaps between technologies. In technical parlance, we would be left with the convex hull of technologies close enough to eventually be fully connected. The convex hull is the region such that no line drawn between points in it fall outside the region.
This is precisely Marcus’ critique of neural networks. They cannot extrapolate, and they cannot go beyond their training data. At best, they can recover the convex hull of their training set. Additionally, note that Olsson assumes it’s only possible to interpolate in between technologies that are already “close” in technological space. This is reminiscent of the way that neural networks need to be tuned to the kind of data they receive. For instance, the inputs to image classification neural nets differ dramatically from AlphaGo’s inputs, and the networks cannot transfer what they’ve learned in one domain to another (indeed, this is another of Marcus’ critiques). So we might imagine, in Olsson’s framework, that neural networks are only capable of interpolating between very similar (i.e., “close”) sets of technologies.
Olsson adds to his model the assumption that every once in awhile, purely by random chance, serendipitous discoveries are made. These are points in the technological space that simply appear on the scene. By chance, some of them will lie outside the convex hull described above. We can image these correspond to the incidents like Fleming’s lucky observation that a stray bit of penicillin mould had retarded the growth of bacteria in a petri dish. Or maybe they correspond to scientific anomalies in Thomas Kuhn’s sense.
So long as incremental innovation is feasible, researchers ignore these discoveries. However, at some point, incremental innovation begins to exhaust its possibilities. All possible combinations have been discovered. When this is the case, researchers turn their attention to these discoveries and engage in what Olsson calls radical innovation. He assumes this is riskier or more costly, and therefore the less favored choice of researchers. However, when incremental innovation is not possible, they have no choice but to turn to radical innovation.
Radical innovation is the creation of a new technology lying on a line between an existing technology and one of the serendipitous discoveries lying outside the convex hull. After much radical innovation, there are enough new technologies close enough to existing ones for incremental innovation to begin again. This time, the incremental innovation makes use of the technologies discovered by radical innovation. In this way, innovation proceeds, breaking free of its old paradigm by exploiting and developing serendipitous discoveries.
Again, this framework seems a good fit for neural networks. We can imagine radical innovation corresponds to retraining a neural network with additional examples that lie outside its previous training set. Just as Olsson assumes radical innovation to be in some sense harder than incremental innovation, retraining neural networks in this way seems harder than business as usual. In our last post, for example, it took several years before computer scientists figured out how to represent the styles of multiple painters in the same neural network. If we want to add new and distinctive examples to our training data, we may need to modify the network so that it can handle both the existing data and the new one. This kind of “radical” training set expansion is painful and time-consuming, but eventually enables neural networks to go to work interpolating new technologies between existing data and the new discoveries.
An Autopilot for Technological Progress
In Olsson’s model, incremental innovation proceeds as long as the paradigm remains productive (meaning all ideas in the convex hull have not been found). At the start of a paradigm, there are abundant opportunities for combining ideas. The returns to R&D are high. Over time, these ideas get fished out and the return on R&D falls. Throughout this period, random and serendipitous discoveries or anomalies are occasionally discovered, but they are left unexploited. As time goes on though, incremental innovation runs out (and this happens more quickly if there are more researchers). At that point, a period of difficult and groping R&D happens as firms engage in radical R&D. This requires interpolating between the known paradigm and the serendipitous discoveries. After some successes though, the convex hull is expanded and a period of incremental R&D starts anew.
Olsson meant to describe the cyclical nature of innovation by human civilization. But his model also provides an (admittedly speculative) blueprint for open-ended automated innovation by next generation neural networks. For as long as it’s valuable, the neural networks generate new discoveries by “filling in the gaps” of their training data with interpolations. Think of AlphaGo discovering better moves in Go, and style-transfer networks discovering new styles of painting, but also next generation neural networks discovering new molecules for drugs or material science, and new three-dimensional structures for buildings and vehicles (no links because I made those up).
And we could also automate non-incremental innovation. We could begin by programming the neural network to look for it’s own serendipitous discoveries. In Olsson’s model, these come about by random luck. But in a neural network, we could program them to occasionally try something completely random (and outside the boundaries of their training set). This will almost never work. But on the rare occasions when the neural net tries something outside its training set that actually works, it can incorporate this information into it’s next training set. It can interpolate between this discovery and its existing ideas, and in this way it can escape the cage of its training set.
For this kind of autonomous technological progress to work (at least!) two problems would need to be solved. The first we have already alluded to. Neural networks are quite domain specific. There is no guarantee they can even “see” examples that lie outside their training data, especially if different features of that data are what’s relevant. Maybe, we could build neural networks who are trained on the specific task of putting new data into a form usable for old neural networks… but we are well outside my area of expertise here. I have no idea. In any event, maybe humans can do that task (not that it would be easy!).
The other barrier is the nature of the feedback a neural network would receive. Neural networks tune their internal structure according to well-defined goals, whether that goal is winning at Go or matching the style of a painter while preserving the content of an image. A neural network trained to deliver useful technologies would need to be trained on how valuable its discoveries are. How would that be determined? The answer is not so clear. In some cases, it’s relatively easy. If the neural network is generating new drugs, we can run clinical trials and see how they fare. But what if we’re developing polymers for material science or three dimensional structure? We can rate these discoveries on various criteria, but they may have unexpected and unanticipated uses. An alternative would be to let the market decide: after all, technologies that are profitable are ones that consumers will pay a lot for, relative to production costs, and this seems to be closely related to the value of an idea. But this solution is not without its own problems. For example, it might lead the neural networks to develop baubles for the super-rich.
I don’t intend to resolve this issue here. Indeed, how best to incentivize human innovators to focus their efforts where it is most valuable is an open question (and one I’ll explore in later posts)! But this need not distract us too much; my main point is to illustrate that it is at least possible for innovation to go a very long way, even if it’s primarily interpolative.
However, just because something is possible, doesn’t mean it’s a good idea. Might there be better ways to innovate? At the end of the day, neural networks are only one way to represent regularities in nature. In upcoming posts, we’ll discuss some of the others.