New Things Under the Sun is a living literature review; as the state of the academic literature evolves, so do we. This post highlights some recent updates.
Science: Trending Less Disruptive
The post “Science is getting harder” surveyed four main categories of evidence (Nobel prizes, top cited papers, growth in the number of topics covered by science, and citations to recent work by patents and papers) to argue it has become more challenging to make scientific discoveries of comparable “size” to the past. This post has now been updated to include an additional category of evidence related to a measure of how disruptive academic papers are. From the updated article:
…The preceding suggested a decline in the number of new topics under study by looking at the words associated with papers. But we can infer a similar process is under way by turning again to their citations. The Consolidation-Disruption Index (CD index for short) attempts to score papers on the extent to which they overturn received ideas and birth new fields of inquiry.
To see the basic idea of the CD index, suppose we want to see how disruptive is some particular paper x. To compute paper x’s CD index, we would identify all the papers that cite paper x or the papers x cites itself. We would then look to see if the papers that cite x also tend to cite x’s citations, or if they cite x alone. If every paper citing paper x also cites x’s own references, paper x has the minimum CD index score of -1. If none of the papers citing paper x cite any of paper x’s references, paper x has the maximum CD index score of +1. The intuition here is that if paper x overturned old ideas and made them obsolete, then we shouldn’t see people continuing to cite older work, at least in the same narrow research area. But if paper x is a mere incremental development, then future papers continue to cite older work alongside it.
That’s the idea anyway; does it actually map to our ideas of what a disruptive paper is? It’s a new measure and it’s properties are still under investigation, but Wu, Wang, and Evans (2019) tried to validate it by identifying sets of papers that we have independent reasons to believe are likely to be more or less disruptive than each other. They then checked to see that the CD index matched predictions. Nobel prize winning papers? We would expect those to be disruptive, and indeed, Wu and coauthors find they tend to have high CD index scores on average. Literature review articles? We would expect those to be less disruptive than original research, and their CD index is indeed lower on average than the CD index of the papers they review. Articles which specifically mention another person in the title? We would expect those tend to be incremental advances, and they also have lower CD index scores. Lastly, for a sample of 190 papers suggested by a survey of 20 scholars as being distinctively disruptive or not disruptive, the CD index closely tracked which papers were disruptive and which were not.
Park, Leahey, and Funk (2022) compute the CD index for a variety of different datasets of academic publications, encompassing many millions of papers. Below is a representative result from 25 million papers drawn from the web of science. Across all major fields, the CD index has fallen substantially.
This decline is robust to a lot of different attempts to explain it away. For example, we might be worried that this is a mechanical outcome of the tendency to cite more papers, and to cite older papers (which we discuss in the next section). For any given paper x, that would increase the probability we cite paper x’s references, in addition to x. Park, Leahey, and Funk, try to show this isn’t solely driving their results in a few different ways. For example, they create placebo citation networks, by randomly shuffling the actual citations papers make to other papers. So instead of paper y citing paper x, they redirect the citation so that paper y now cites some other paper z, where z is published in the same year as x. This kind of reshuffling preserves the tendency over time of papers to cite more references and to cite older works. But when you compute the CD index of these placebo citation networks, they exhibit smaller declines than in the actual citation networks, suggesting the decline of disruption isn’t just a mechanical artifact of the trend towards citing more and older papers.
Lastly, it turns out this decline in the average value of the CD index is not so much driven by a decrease in the number of disruptive papers, as it is a massive increase in the number of incremental papers. The following figure plots the absolute number of papers published in a given year with a CD index in one of four ranges. In blue, we have the least disruptive papers, in red, the most disruptive, with green and orange in the middle.
While the annual number of the most disruptive papers (in red) grew over 1945-1995 or so, it has fallen since then so that the number of highly disruptive papers published in 2010 isn’t much different from the number published in 1945. But over the same time period, the number of the mostly incremental papers (in blue) has grown dramatically, from a few thousand a year to nearly 200,000 per year.
As an aside, the above presents an interesting parallel with the Nobel prize results discussed earlier: Collison and Nielsen find the impact of Nobel prize-winning discoveries are not rated as worse in more recent years (except in physics), but neither are they rated better (as we might expect given the increase in scientific resources). Similarly, we are not producing fewer highly disruptive papers; we simply are not getting more for our extra resources.
The updated article also includes some new discussion of additional text-based evidence for a decline in the number of topics under study in science, relative to the number of papers, again from Park, Leahey, and Funk (2022). It also adds in some evidence that the rise in academic citations to older works does not merely reflect a rise in polite but inconsequential citations - at least in recent times, the citations to older work are just as likely to be rated influential citations as the citations to younger work.
Creative Patents and the Pace of Technological Progress
The article “Innovation (mostly) gets harder” has a similar conclusion to “Science is getting harder”, but applied to the case of technological progress: eking out a given proportional increase along some technological metric seems to require more and more effort. The original article reviewed evidence from a few specific technologies (integrated circuits, machine learning benchmarks, agricultural yields, and healthcare) as well as some broad-based proxies for technological progress (firm-level profit analogues, and total factor productivity). I’ve now updated this article to include a discussion of patents derived from a fascinating PhD job market paper by Aakash Kalyani:
…it’s desirable to complement the case studies with some broader measures less susceptible to the charge of cherry-picking. One obvious place to turn is patents: in theory, each patent describes a new invention that someone at the patent office thought was useful and not obvious. Following Bloom et al., below I calculate annual US patent grants1 per effective researcher. As a first pass, this data seems to go against the case study evidence: more R&D effort has been roughly matched by more patenting, and in fact, in recent years, patenting has increased faster than R&D effort! Is innovation, as measured by patents, getting easier?
The trouble with the above figure is that patents shouldn’t really be thought of as a pure census of new inventions for a few reasons. First off, the propensity of inventors (and inventive firms) to seek patent protection for their inventions seems to have increased over time.2 So the observed increase in annual patenting may simply reflect an increase in the share of inventions that are patented, rather than any change in the number of new inventions. Second, patents vary a lot in their value. A small share of patents seem to account for the majority of their value. We don’t care so much about the total number of patents as the number of valuable patents.
On the second problem at least, Kalyani (2022) shows that one way to separate the patent wheat from the patent chaff is to look at the actual text of the patent document. Specifically, Kalyani processes the text of patents to identify technical terminology and then looks for patents that have a larger than usual share of technical phrases (think “machine learning” or “neural network”) that are not previously mentioned in patents filed in the preceding five years. When a patent has twice as many of these new technical phrases as the average for its technology type, he calls it a creative patent. About 15% of patents are creative by this definition.
Kalyani provides a variety of evidence that creative patents really do seem to measure new inventions, in a way that non-creative patents don’t. Creative patents are correlated with new product announcements, better stock market returns for the patent-holder, more R&D expenditure, and greater productivity growth. Non-creative patents, in general, are not. And when you look at the number of creative patents (in per capita terms - it’s the solid green line below), Kalyani finds they have been on the decline since at least 1990.
If we focus on creative patents, again it looks like innovation has become harder. And the above figure probably understates how much harder things are becoming for a few reasons. First, it is expressed in per-capita terms, rather than per-effective-researcher terms, and effective research growth has outstripped population growth over this time period. Second, as noted above, it seems probable that a greater share of inventions are patented over time, so that the above figure may understate the number of creative patents in the beginning of the sample.
As an aside, the decline in creative patents is not everywhere universal. Kalyani documents, for example, rapid growth in the number of computer and IT-related patents over 1980-1995, before they too fall off. Again, it is not that innovation universally gets harder, merely that this is the norm.
An interesting additional link between these two updates is that Kalyani also shows patents that cite recent academic literature are more likely to be creative patents. Perhaps, as science has gotten harder, less science has proven useful for inventing creative new patents? Indeed, as documented in “Science is getting harder” the share of patent citations to recent research has been declining along a similar time-trend as the decline in creative patents.
Seems bad? What to do?
Why does progress seem to be getting harder? I have some ideas (note though those links are not exhaustive). But whatever the cause, there are a lot of options for things we might be able to do to accelerate progress. The article “How to accelerate technological progress” is written to be a sort of guide to the content on New Things Under the Sun (.com), framed around identifying ways to accelerate technological progress. This article just went through a large update, adding in a lot of content I’ve written over the last year. It’s gone from 2500 to 3500 words, and the number of “claim” articles discussed has increased by about 50%. Here’s a new excerpt from the piece:
…let’s assume we have identified a domain where the traditional scientific incentive system is not working well; specifically, we’ll assume it is underinvesting in research on an important topic. Now what?
The article “Building a new research field” looks at two specific challenges that come with trying to encourage development of an understudied topic. First, because it is hard to do great science in isolation, there is a coordination challenge with getting a critical mass of scientists to begin work in a new field. One tool for solving this challenge is scientific prizes, discussed at length in the article “Steering science with prizes.”
It seems scientific prizes can sometimes galvanize research in overlooked areas by creating credible, public, signals of areas of promising research.
A second challenge to building a new research field is that it is challenging to do great work in a new field, at least on average. That creates a strong incentive for scientists to “stay in their lane” rather than branch out and try new things. If we want to encourage scientists to try new things regardless, one approach may be to offer some insulation from the usual demands of academia to continuously produce high profile publications. Alas, as discussed in “Building a new research field” the evidence on the efficacy of this strategy is mixed.
Read the whole thing below. If you are a newish subscriber, this is also a good way to quickly survey a good chunk of the material available on the site.
What else is new?
There are smaller updates scattered over the site. The article “An example of successful innovation by distributed teams: academia” has been updated to include discussion Lin, Frey, and Wu (2022) and Frey and Presidente (2022), two articles discussed recently in the post “Remote Breakthroughs.” I’ve also added a bit of discussion of attempts to validate the disruption index to all the articles that discuss it.
Answering Your Questions
I’m going to start closing out these update posts by answering a reader-submitted questions. If you would like to submit a question, please use this form.
What kind of national research institutions should states prioritize beyond universities, if the objective is to produce new technologies that can later on be democratized by the private sector? For example, is DARPA the example to follow here? -Pablo from Spain
DARPA models are great for some applications. Azoulay, Fuchs, Goldstein, and Kearney have a nice paper looking at the qualities of a technology or research project particularly amenable to the ARPA model.
But in terms of new research institutions, I think the most interesting new development in this space is the proposal for focused research organizations (FROs). These are intended to be mission-driven non-profit organizations focused on building technological/research public goods. It’s hard for academia to do this kind of integrative applied work, but the private sector may lack the right incentives to invest in technologies that generate huge spillovers to their rivals. What I like about this approach is it tries to learn from the past to inform new solutions to contemporary problems, rather than just copying institutions that worked decades ago but which may no longer be appropriate today.
For example, Ashish Arora, Sharon Belenzon, Andrea Patacconi, and Jungkyu Suh have a great paper on the role of the corporate science lab in the history of innovation. They argue, in the past, these labs played a crucial role in integrating a wide variety of different scientific discoveries to realize new general purpose technologies, which were subsequently adapted or built on by many other players. We still see heavily science-based work like this in biology (think the mRNA vaccine platforms) and artificial intelligence, but on the whole Arora and coauthors document a retreat of the private sector from doing this more fundamental scientific work. And they argue this retreat is rational from the perspective of the private sector: these labs aren’t as good an investment in an era when public science is strong, rivals can more easily copy ideas, and firms have become more specialized (which means they are less likely to benefit from discoveries with applications in unanticipated domains).
Conditions on the ground have changed, and so we shouldn’t just try to make corporate labs return. But with FROs, maybe we can find a different actor to fulfill the function they performed.
Until Next Time
Thanks for reading! As always, if you want to chat about this post or innovation in generally, let’s grab a virtual coffee. Send me an email at matt.clancy@openphilanthropy.org and we’ll put something in the calendar.
Note the year a patent is granted usually differs from the year research is performed: in an important sense, the following graph is lazy and wrong. But the date a patent is granted is close enough to the date research is performed (usually within a few years) for the purposes of this post, which is just to illustrate that in the long-run patents per effective researcher do not seem to follow the “innovation is getting harder” story.
There are a few reasons this may be the case - more kinds of invention have become eligible for patent protection, arguably federal courts began treating patent claims more favorably, and corporate management of intellectual property rights may have changed. See Kortum and Lerner (1998) for one discussion of these issues.