Welcome to day 3 of patent data week!
This week we’re working through four related posts I wrote about using patent data to measure innovation:
Tuesday: Patents (weakly) predict innovation
Wednesday: Do studies based on patents get different results?
If you can’t bear to wait though, all the posts are up on New Things Under the Sun (.com): just click the links above.
And now, on to the third of this four-part series…
Do Studies Based on Patents Get Different Results?
This article will be updated as the state of the academic literature evolves; you can read the latest version here. You can listen to this post above, or via most podcast apps here.
Lots of social science research about innovation relies on patents as a way to measure innovation. But it’s not clear that patents are a great way to measure innovation. Probably only a relatively small share of inventions receive patent protection; moreover, while patenting does predict a lot of other measures of innovation, the linkage tends to be a pretty noisy one. Maybe the patent-based innovation literature is built on a foundation of sand?
One way to validate patents as a measure of innovation is to exploit the fact that tons of papers study the same phenomena with different datasets: some use patents, some don’t. Do they tend to arrive at different results? If so, that suggests the papers using patent data might be picking up something unique about patents, rather than something about innovation per se. On the other hand, if analyses built on patent and non-patent data tend to get similar results, that suggests patents are roughly as good a measure of innovation as the available alternatives.
I think New Things Under the Sun can itself be a useful data source on this particular question. At the time of writing (March 2024), New Things Under the Sun consists of 73 articles that synthesize multiple academic papers to examine various narrow claims about innovation. I count 37 articles that discuss studies built on both patent and non-patent data.1 Among these 37, how often do the patent-based analyses disagree with the non-patent analyses?
I looked them over to see.
My takeaway from this exercise is that studies relying on patent data tend to obtain similar results to those that don’t. In 31/37 (84%) of the claims I’ve looked at, I didn’t think there was meaningful disagreement between the patent and non-patent studies: regardless of which type of data is used for a problem, results were broadly consistent. In the other 6/37 (16%), I thought there was generally a mix of agreement and disagreement. The patent and non-patent data differed along some qualitatively important dimension, though even in these cases I didn’t find uniform disagreement. For example, in the article Are ideas getting harder to find because of the burden of knowledge?, non-patent data indicates first discoveries are being made at increasingly older ages, but patent data doesn’t show this. However, both the patent and non-patent data were consistent with team sizes increasing and specialization increasing. Nonetheless, because there was some disagreement, I classified this article as exhibiting some disagreement between the patent and non-patent evidence.
Actually, I’m not sure the differences I found between patent and non-patent data were any more severe than you would find if you were to explore the same phenomena with the same dataset (for example, two papers looking at the same thing with data on journal articles). That said, note that my definitions of agreement and disagreement are kind of loose and subjective; directionally the same, rather than numerically the same. Moreover, not all of the scope for agreement and disagreement was super substantive. Sometimes the bulk of the evidence comes from almost exclusively patent or almost exclusively non-patent data, and the data from the other source only covers a part of the overall claim. Even so, in many cases, it’s a bit surprising to me there aren’t more disagreements, since in some cases there are important differences between the kinds of innovation that are studied by patent or non-patent data.
In the next section, I display how I classified these 37 articles, along with a short description of where I saw agreement or disagreement. Feel free to skip it for some further discussion about potential biases with this exercise, due to selection effects.
Classifications of New Things Articles
At least some disagreement
Age and the impact of innovation: As scientists or inventors age, their work receives fewer citations, from a narrower set of inventors, and becomes less disruptive as measured by both papers and patents. But productivity over an academic lifecycle appears to remain high for a longer period of time (as measured by production of papers) than productivity of an inventor (as measured by patents).
Are ideas getting harder to find because of the burden of knowledge? The age of first scientific discovery has steadily increased, while the age of first patent rose, but then fell. However both patents and academic papers find team size and specialization is on the rise.
How common is independent invention? Evidence from both patents and papers finds the incidence of simultaneous independent discovery is quite rare; but the rate implied by patent interference hearings is orders of magnitude lower than for papers. At the same time, evidence from both patents and papers suggests multiple independent discovery is more likely for more valuable research ideas.
Innovation (mostly) gets harder: The same level of research effort yields fewer successively smaller improvements by most measures. This is not true for raw patent counts, but is true for one measure of particularly innovative patents.
Teaching innovative entrepreneurship: One study of two particular entrepreneurship training programs looked at many different indicators of successful entrepreneurship. Neither program had a statistically significant effect on patenting by participants. For one of the programs, this was consistent with it having no impact on any other measures; for the other, it had a positive effect on some measures of successful entrepreneurship, but not patents and a few other measures.
The best new ideas combine disparate old ideas: Patents and papers that comprise unusual combinations of ideas are associated with higher impact. There is some evidence that the highest impact papers also make some more conventional combinations than patents.
No disagreement
Adjacent knowledge is useful: Patent evidence from agricultural technology, and a variety of non-patent evidence, suggest knowledge spillovers tend to be most often pulled from fields that are not too “far” away.
Age and the nature of innovation: Evidence from academia and patentees is consistent with older innovators relying on older ideas in their work. Measures of how disruptive a paper or patent is also decline with the age of the author.
An example of high returns to publicly funded R&D: Comparing companies that barely win an SBIR grant to those that barely lose, the winners get more patents, but also do better on a variety of other measures of business success.
Big firms have different incentives: Analyzing the text of patents indicates larger firms have more process patents; survey data also indicates larger firms spend a greater share of R&D on processes.
Building a new research field: Scientists who pivot in their research topics are less likely to produce highly cited research; inventors who jump to working in a new technology class receive fewer citations to their patents.
Do academic citations measure the impact of new ideas? Patents, like government policy papers, are disproportionately likely to cite academic research that is highly cited within academia.
Entrepreneurship is contagious: People exposed to entrepreneurial peers are more likely to become entrepreneurs themselves, as measured by entrepreneurial activity. Postdocs with advisors who have patents are more likely to patent themselves as well.
Free knowledge and innovation: Patents incorporate information available at local (physical) libraries. SImilarly, academic articles in chemistry incorporate information available freely on wikipedia.
Gender and what gets researched: Evidence from both patents and academia finds that women are more likely to research medical problems related to their gender. There is also some evidence from both that as gender representation improves, men also become more likely to work on these topics.
Geography and what gets researched: Evidence from both patents and academia finds that people are more likely to conduct innovation related to local problems and priorities.
Highly cited innovation takes a team: Academic papers, patents, and software, all increase citations received as the team involved in their creation rises. Comic books by bigger teams are also more valuable. Other related variables also correlate similarly with team size, across papers and patents.
How long does it take to go from science to technology? The statistical correlation between funding for relatively basic science and subsequent productivity gains is strongest at around 20 years. The typical gap between when a patent application is filed and an academic article it cites is similarly long.
How to impede technological progress: Policies that make the return on research effort less rewarding disproportionately impact marginal players, in both academic settings, where innovation is measured in papers, and in industry, where innovation is measured with patents or with new drug products.
Importing knowledge: Evidence from both patents and academic paper citations shows that immigration seeds knowledge prevalent in the originating country among non-immigrants in the receiving country.
Innovators who immigrate: When US or EU inventors emigrate, their patenting rises. Similarly, when scientists move to well-resourced places for science, their academic productivity rises (across many measures).
Is technological progress slowing? The case of American agriculture: Patent data indicates agricultural invention substantially builds on knowledge discovered outside the agricultural sector; TFP data suggests agricultural productivity growth follows productivity growth in the rest of the economy with a long lag.
Knowledge spillovers are a big deal: Data from patents, academic papers (and the grants that fund them), and R&D spending all suggests the quantitative impact of knowledge spillovers are large.
More science leads to more innovation: A variety of patent data documents linkages between the supply of scientific research and subsequent technological progress. There is also a correlation between the supply of scientific publications and industrial productivity in related sectors, after a substantial lag.
Publish or perish and the quality of science: Researchers working outside the academic system in structural biology tend to be higher quality, holding constant the citation potential of a protein. Patent evidence suggests industry prefers industry research to academic research, holding constant the nature of the discovery.
Pulling more fuel efficient cars into existence: Rising fuel prices and fuel efficiency standards tend to improve the fuel efficiency of cars, whether measured by patents or the actual traits of vehicles.
Remote breakthroughs: Innovators increasingly collaborate at a distance, whether we measure collaboration among patentees or coauthors on academic papers. More remote teams have typically been less disruptive/novel than colocated ones, but this effect has moderated or even reversed over time, whether measured by papers or patents.
Science is getting harder: Both patents and academic papers have become progressively less likely to cite recent academic work for several decades.
Science is good at making useful knowledge: Papers that are highly cited in one domain, tend to be more highly cited in other domains as well. Economics papers highly cited by economists are likely to be cited outside economics; academic work highly cited by other academics is likely to be cited by patents.
Teacher influence and innovation: Various studies show students adopt the interests of their mentors, where interest is measured in several different ways, including interest in seeking a patent and other non-patent measures.
The internet, the postal service, and access to distant ideas: When the costs of communicating via text between two geographically distant establishments of the same firm falls because they get internet access, they are more likely to cite each others patents or collaborate. When the cost of communicating via text fell in Great Britain, due to postal reforms, distant regions were more likely to cite each other’s scientific work.
The size of firms and the nature of Innovation: As firms get larger they obtain fewer inventions per R&D dollar, whether inventions are measured with patents or alternatives. The inventions they do get also tend to be more incremental, again, whether we measure with patent-based proxies or others.
Transportation and innovation: When regions are better connected by transit links, collaboration by inventors and scientists across those regions increases, as measured by either patents or papers.
What does peer review know? One study looking at NIH peer review scores finds that grants with higher scores tend to lead to more publications, more citations, and more patents.
When extreme necessity is the mother of invention: Covid-19 spurred a surge in new invention in technologies to mitigate its effects, whether medical treatments (measured by new clinical trials) or patent applications for remote work technology.
When technology goes bad: A greater share of R&D is focused on health and safety, whether measured by the share of patents that correspond to medical technologies, or the share of publicly funded research spending on health and environment.
Why proximity matters: who you know: Evidence from patents citations is consistent with a story where distance is not a strong impediment to sharing knowledge with people you have relationships with, but an impediment to forming such relationships. This is consistent with evidence from academia.
Selection Bias?
The above finds broad agreement between innovation studies that use patent data and those that don’t, where they study closely related phenomena. But we might reasonably worry: is this just an artifact of selection?
Indeed, there are multiple possible layers of selection bias.
The first level of selection bias is that researchers decide when and when not to use patent data. In this post’s exercise, I’m only observing the cases where the researcher thought patents would be an appropriate measure of innovation and where I thought the paper was a good fit for New Things Under the Sun. And so the claim that “patent and non-patent data tend to arrive at similar conclusions” only applies to the set of claims where researchers thought patents were an appropriate dataset (and I thought the researchers wrote a nice paper).
To give a concrete example, I have a series of posts about publication bias in the sciences - the notion that the research record gives us a biased picture of evidence since only positive findings are publishable. Only one of those posts features any studies reliant on patent data (see #19 above in the “No disagreements” list). It makes sense that few researchers thought it was appropriate to study publication bias with patents, since publication bias is typically assumed to be an outcome of incentives that are peculiar to academia, not private sector invention. If someone did try to study publication bias in patents, they might get quite a different result than if they had studied it with data on journal articles.
The upshot is that this post’s analysis implies that if you think a paper is by a good researcher and it uses patent data, the results of the paper would probably agree with another paper on the same topic that didn’t use patent data. But, if you instead start with a specific research question, these results don’t imply you would get the same results whether you use patents or not. They instead imply that you would, if it’s the kind of research question that researchers think patents are appropriate for. If it’s not, then the results of this post don’t really apply. The claim is not that patents measure innovation well in all cases. The claim is that innovation researchers have done a decent job of restricting their attention to cases when patents do work well.
There is a second potential layer of selection bias though, above the researcher’s own decision about whether to use patents. Publication bias might actually be giving us a skewed perception of how reliable is patent data itself! Suppose that patents really are a bad measure of innovation, and accordingly they rarely deliver positive findings. It might be the case that we only observe the papers that do get positive results, since those are the only ones that are publishable. If this issue is serious, it would mean I’m overstating the extent to which research using patent data arrives at similar conclusions as papers that do not. I think the popularity of patent data as a data source is some evidence against this concern - if the data had a reputation for leading disproportionately often to unpublishable null results, it probably wouldn’t be so popular. But it is something to bear in mind.
Lastly, there could be bias from the fact that my choice of topics on New Things Under the Sun isn’t random. I like writing about topics I think are important or where I think academic research can tell us something useful. The latter preference is potentially a serious source of bias. All else equal I feel less enthusiastic writing about a field where there is a muddle of different findings depending on which dataset you use (though I would still write a post if I thought the topic was important). That might mean my selection of topics is biased towards claims where patent and non-patent data obtain similar results, since those are the ones where I’m most confident social science research can tell us something.
There’s at least one way to evaluate how much of a concern this should be. New Things Under the Sun is a livingliterature review. There might well be a selection bias in how I choose which articles to write. But after the articles are written, there is a lot less bias in my choice about what articles to update. One of my goals for this project is for the posts to provide an honest account of the state of the literature. That means if new studies come out that contradict what I’ve already written, I do feel obliged to update the post to reflect this. That presents an opportunity to check for this last form of selection bias. If updates tend to find more disagreement between patent and non-patent data than original articles, that would suggest my choice of what to initially write about is overstating the extent to which patent and non-patent studies agree.
Going through my newsletter archive, I found 20 updates to existing articles that include patent and non-patent data. Of these updates, 3 have at least some disagreement between the patent and non-patent analyses. The other 17 do not have any meaningful disagreement, in my judgment. This is pretty close to the ratio I found in my original survey of 37 articles that examine both patent and non-patent data. About 15% (3/15) of the time, there is some disagreement between analyses reliant on patent data and those that do not, compared to 16% in my main analysis. See the appendix on New Things Under the Sun (.com) for how I classified each of these 20 updates, along with a short description of the nature of agreement or disagreement.
All in all, this exercise formalizes an intuition I’ve had for a long time. I’ve noticed that when I write about studies that use patent data, I often encounter some skepticism. For that very reason, I often go out of my way to try and find articles that do not rely on patent data, but which study the same phenomena as the patent-based papers I’m writing about. And in my experience, that exercise rarely leads me to substantively revise my original views. In the academic literature, if it’s possible and sensible to study a question with both patent data and non-patent data, in my experience results are subjectively similar.
Thanks for reading! The final, definitive, post in this series will be posted to Substack tomorrow - stay tuned! As always, if you want to chat about this post or innovation in generally, let’s grab a virtual coffee. Send me an email at matt.clancy@openphilanthropy.org and we’ll put something in the calendar.
The remaining New Things Under the Sun articles look exclusively at patent data, or exclusively at non-patent data.