Ripples in the River of Knowledge

Measuring the flow of science from upstream to downstream

Apr 06, 2021

In More Science Leads to More Innovation, we looked at four natural experiments where the “supply” of science was increased or decreased differently across scientific fields. When the supply of science increased, we saw more downstream technological innovation, and when the supply of science decreased, less. In this post, I’ll argue those studies underestimate the influence of science on innovation.

Direct dependence on science is uncommon

To begin, while it’s true that more science leads to more innovation, the majority of technological innovations probably do not directly depend on recent science:

Citing academic articles in patents has become more common over time, but even in 2018, 74% of patents did not cite any scientific journal articles.
In a survey from the 1990s, European inventors rated the importance of knowledge from scientific literature for developing innovations at 2.5 out of 5 (lower than they rated the importance of knowledge from customers/users and the patent literature). They rated the importance of universities and public research laboratories at just 1.4 out of 5.
In a 1994 survey, US R&D managers estimated only 20% of R&D projects relied on public research.

Instead, the dependence on science is unevenly distributed. The figure below illustrates the average number of citations to science per patent, by technical classification. Patents in chemistry/metallurgy, and human necessities (which include the biomedical and pharma sectors) cite science much more intensively than other fields. Fields like mechanical engineering barely cite the scientific literature at all.

Average Citations to Science per Patent by technical classification and grant year (Marx and Fuegi 2020)

This is broadly consistent with a 1994 survey of corporate R&D managers that found R&D projects in automobiles, general manufacturing, and electrical equipment relied on public research significantly less than in fields like biotechnology and pharmaceuticals.

Indirect Dependence on Science?

But just because an invention doesn’t rely directly on science doesn’t mean that science plays no role in it. Scientific knowledge and principles can become embodied in technologies that other technologies, in turn, use as components. For example, chemistry and metallurgy heavily cite the scientific literature. It may be that new chemistry production processes (which are patented) allow for the manufacture of new kinds of composites, that in turn allow for, say, the creation of new, more powerful engines. But then the patents for the composites and the engines may not cite science and the inventors of these things might not report any dependence on science, even though without the production processes enabled by the science they would be out of luck.

There is actually a way to measure this “distance from science” that we’ve discussed before. In the example above, although a new type of composite might not cite any scientific articles itself, it might cite the patents for the new chemistry production processes that do. And the engine patent might cite the composite patent, which in turn cites science-based production processes. Ahmadpoor and Jones (2017) use this basic idea to measure the “distance” from science of US patents by counting the smallest number of citation steps between a patent and a scientific article. A patent that cites a scientific article has a distance of 1. A patent that cites no science itself, but does cite a patent that cites a scientific article has a distance of 2. And so on. In Ahmadpoor and Jones’ sample of patents from 1976 to 2013, although only 16% of patents directly cite a scientific article, 61% of patents are “connected” to science via some kind of chain of citation (most often, a distance of 3).

There is some indirect evidence that this measure of distance from science is capturing something real. Patents that are “closer” to science as measured in this way have some of the characteristics of patents that directly cite science. For example, as discussed in other posts, patents that cite scientific articles are more valuable than those that do not, and also more likely to be traded. But it’s also true that, looking at patents that don’t directly cite science, those that are closer to science are still more valuable and more likely to be traded than those that are farther from science.

Most of the natural experiments discussed in More Science leads to More Innovation pertain to patents with a distance of 1 (the closest to science). Indeed, half of them explicitly measure the link between science and technology via a citation from a patent to a journal article (which means, by definition, they have a distance of 1). But if there is a knock-on effect for patents further from science (distance 2 or greater), they probably miss it.

Upstream Patenting Predicts Downstream Patenting

Another strand of literature gives us some good reasons to think there are significant knock-on effects. Technologies tend to be hierarchically composed of many sub-technologies, and to build on each other in ways that are relatively stable and predictable over multiple years. This means there is some degree of predictability about technological trends. If there is a flurry of breakthroughs in an upstream technology, downstream technologies that use it as an important component, or which adapt its principles and uses for new contexts, are likely to see a flurry of breakthroughs in subsequent years. (Think of how we might be reasonably confident that all sorts of new and improved engine designs will be enabled by a new and improved composite)

There are ways to observe this hierarchy and use these relationships to make predictions. US patents are classified as primarily belonging to one of several hundred technology classifications (examples range from “Class 012: Boot and shoe making” to “Class 706: Data processing – artificial intelligence”). The hierarchical relationship between these technology classes can be observed in the citations of the patents belonging to these classes. Acemoglu, Akcigit, and Kerr (2016) build a directed network between different technology classes, where the strength of a link between two classes is given by the probability a patent in one cites the other.

Directed citation network between broad technology categories, from Acemoglu, Akcigit, and Kerr 2016

Acemoglu, Akcigit, and Kerr show a statistically significant relationship between patent activity in upstream classes and the patenting of downstream classes (that is, the ones that historically cite this class heavily). Pichler, Lafond, and Farmer (2020) perform a similar exercise. In the figure below, they plot the correlation between the growth rate of patents in a given class, and the growth rate of the weighted average growth rate of upstream technology classes.

Patent growth rate in technology class, vs. average growth rate of upstream classes, from Pichler, Lafond, and Farmer 2020

It turns out these correlations are robust enough to be used for forecasting. In one application, Acemoglu, Akcigit and Kerr use data from 1975 to 1994 to fit their statistical model, and then they use it to predict the number of patents in the following ten years. After adjusting for the influence of technology classification (some classes always patent more than others) and time (in most classes there tend to be more patent applications per year), they find a 10% increase in predicted patenting (based on the growth of patenting in upstream classes) is associated with an actual out-of-sample 3-4% increase in patenting.

Pichler, Lafond, and Farmer predict the growth rate of patenting as a function of upstream patenting activity using methods derived from machine learning. They fit a number of alternative models based on data from 1945-1987 to model the correlation between the growth rate of patenting in each technology class and prior patenting growth in upstream technologies. They then choose the model that makes the best forecast for 1988-2002. Finally, they use all the data from 1945-2002 to refit this model and predict out-of-sample patent growth rates, in each class, over 2003-2017.

How well does it do? To understand it’s performance, they need a benchmark. They replicate the whole process above, but for models that exclude data on upstream patenting. Instead, the benchmark predicts patenting in class x by the historical patenting activity of just class x (is patenting in this class rising or falling over time? Does it tend to move in booms and busts? And so on). Again, they find the model that best predicts out-of-sample from 1988-2002, re-estimate is with data up to 2002, and then forecast out-of-sample through 2017.

The results are in the figure below, with the relative performance of the models using upstream patent data in blue (the green is a model not discussed in this post). At it’s peak, models using data on upstream patenting gain nearly 40% in predictability relative to a benchmark.

Average gain in predictivity, relative to benchmark ARIMA model; shaded area is two standard errors; from Pichler, Lafond, and Farmer (2020)

It all boils down to this: historical patent citations allow us to identify the technology classes that lie “upstream” of any other class; and upstream patenting predicts downstream patenting in the future, out of sample, in two different papers.

Upstream = Closer to Science?

So we have two related but different ways of measuring indirect knowledge flows among patented technologies. Some papers have measured the distance from science, via the shortest citation chain to a scientific paper. Others have defined upstream and downstream relationships among technologies based on the total share of citations that flow from one technology class to another. A natural question is the extent to which the two line up. That can tell us something about how science indirectly impacts technology.

For example, we know that science tends to lead to more innovation in technology classes that directly depend on science. Are these technology classes, in turn, directly upstream of many other classes? If so, the results from Acemoglu-Akcigit-Kerr and Pichler-Lafond-Farmer would predict an increase in science-based innovation would lead to a second round of innovation in the technologies that lie immediately downstream. And if these classes are themselves upstream of many other classes, there would be a second reverberation, and so on.

In the figure below, I computed the average distance to science for US patents over 1976-2018, based on the Ahmadpoor and Jones method and using Marx and Fuegi’s dataset. On the horizontal axis, we have the average distance to science of the patents belonging to each of 307 different technology classes. Since I’m interested here in the indirect impact of science on technology, I limited my attention to technology classes with distance of 2 or greater: that is, technologies that do not typically cite scientific papers directly. Classes to the left are closer to science, classes to the right are farther away. On the vertical axis, we have the average distance to science of their upstream technology classes, weighted by citation share. The lower the dot, the closer to science are the classes cited.

Average distance to science for technology classes (horizontal axis) and weighted average of upstream technology classes (vertical axis). Author calculations.

Let’s look at an example. In the upper right corner, we have a red dot corresponding to Class 81: Tools. On the horizontal axis, we see the average patent in this class has a distance to technology of 4.5. That means the shortest distance to science for tool patents often involves citing a patent that cites a patent that cites a patent that cites a patent that cites a science article. On the vertical axis, we see the typical distance to science for technology classes that are heavily cited by tools patent is just 3.6. These upstream classes include classes like Class 29: Metal working (average distance to science 3.0) and Class 30: cutlery (average distance to science 4.1). The main take-away is that classes directly upstream of tool patents tend to be closer to science than tool patents.

The black line cutting through the middle of this figure demarcates the split between technologies that mostly cite technologies closer to science (dots below the line) and those that mostly cite technologies farther from science (dots above the line). For the classes displayed, 78% of them lie below this line – that is, the technologies that lie upstream also tend to lie closer to science. This difference isn’t uniform though. Looking only at technologies far from science, with a distance of 3 or greater, 95% lie downstream of technologies closer to science then they are! But looking at classes in the 2-3 interval, it’s basically 50/50. Essentially, that means technologies that are 1-2 citation steps removed from science largely cite each other; they don’t primarily build on technologies that are closer to science. But technologies further out do.

Indirect Impact of Science

So where do we stand? In More Science leads to More Innovation, we looked at pretty compelling evidence that increasing the supply of science tends to lead to more innovation. But that direct effect is concentrated in a relatively small share of technologies; only a quarter of patents directly cite scientific work. However, any innovation has spillover effects. As discussed elsewhere, the magnitude of the unintended benefits of science tend to be at least comparable to the intended benefits. In this post we focused on specific kind of spillover: an increase in innovation tends to lead to further innovation in “downstream” technologies (which we can identify based on citation patterns).

We have no reason to think that spillover effect wouldn’t hold if innovation increased because of science. Any field that sees an increase in innovation due to science will probably have some downstream fields that will also benefit. Initially, these downstream fields might be relatively “close” to science themselves, so they might also directly benefit from an increased supply of science. But eventually, the passing of technological concepts and improved components from upstream to downstream becomes a channel through which the fruits of science might also measurably flow.

If you liked this post you might also like: