Welcome to patent data week!
Towards the end of February, I started working on what was supposed to be a single post about using patents to measure innovation (analogous to this post on whether academic citations are a good measure of the impact of new ideas). It grew into four posts! Since they are all closely related, I decided to release them all on substack and the podcast over the course of a week:
Tuesday: Patents (weakly) predict innovation
Wednesday: Do studies based on patents get different results?
That said, if you can’t bear to wait until Thursday, all four posts are already up on New Things Under the Sun (.com) - just click on the links above. And if you can’t be bothered to read more than one post about using patents data to measure innovation, my suggestion is to read the last one (Can we learn about innovation from patent data?), as it tries to summarize and synthesize what’s in the rest. But otherwise, they’ll pop into your inbox or podcast feed, at a manageable (?) one per day.
And now, on to the first of this four-part series…
How many inventions are patented?
This article will be updated as the state of the academic literature evolves; you can read the latest version here. You can listen to this post above, or via most podcast apps here.
Patents are ostensibly a useful data source on invention for many reasons. Inventors are incentivized to seek patents, because obtaining one means you have the right to exclude others from using your technology for 20 years. So we might think most important inventions will have associated patents. Meanwhile, inventions get screened by patent examiners, to ensure what gets an invention is novel, non-obvious, and valuable. So we might think that patents mostly contain information on genuine invention. Lastly, when there is a patent on an invention, we get a host of information on it: a description of the invention, who the inventors are, where they live, a classification into technological categories, information on cited references, and much more.
That’s the case for patents as data anyway. And in fact, it’s quite common for research papers on innovation to use patents as a dataset about innovation. But in fact, many inventions are not patented. This is for a variety of reasons. First, some forms of knowledge creation, such as abstract ideas, are not patentable. Second, patent protection is imperfect. It runs out after twenty years, rivals may be able to “invent around” the patent, and enforcement is costly and uncertain. Third, patenting requires one to disclose information about the invention, essentially forcing you to waive the option to protect an invention with secrecy. Fourth, patenting isn’t free.
If we’re going to use patents as a way to measure innovation (setting aside today the question of whether they are a good way to incentivize innovation), then a natural question is: how many inventions are patented?
This is an easy problem to state, but a hard problem to decisively pin down for a few reasons. First off, with a few exceptions, we don’t usually have an exhaustive list of inventions, which would let us compute the share that are patented. Second, if you were to try and draw such a list up, you would find it gets challenging to cleanly divide innovation into different discrete inventions. For example, many new products exhibit multiple technological advances; are these several inventions or one? Third, there may not be a one-to-one match from a patent to an invention. An invention that bundles together multiple breakthroughs might be protected by many patents that seem only loosely connected to the invention. And one patent might protect many different products that rely on different aspects of a breakthrough.
R&D Performers and Patents
Still, we can probably get an approximate answer that’s useful. To start, let’s assume that firms who conduct R&D are likely to also have inventions, and then see how many firms that do R&D also have patents. Mezzanotti and Simcoe (2023) report on the Business R&D and Innovation survey, which was conducted between 2008 and 2015 by the US Census Bureau and the National Science Foundation. This survey asked more than 40,000 US firms, from a nationally representative sample, about their use of intellectual property.
Only a small share of firms (roughly 6%) conduct any R&D at all. And of the ones that do, most do not seek patent protection. In fact, only 18% of firms that perform any R&D also seek a patent protection. But this stat is a misleading indicator of the share of inventions that have patent protection. That’s because firms are highly unequal in their size: the economy is characterized by a small number of very large firms (who do a lot of R&D) and a large number of very small firms (who do a small share of total R&D). Firms with less than 100 employees did about 9% of domestic US R&D in 2015; firms with more than 100 employees did the other 91%. Large firms are also more likely to seek patents. While most firms that do R&D don’t seek patents, the small share of firms that do patent also do most of the R&D. In fact, patent-holding firms account for about 90% of total R&D spending.
You can just ask
As noted above, we don’t generally have a list of inventions that would let us compute the share with patents. But as an alternative we can ask firms to just estimate the share as best they can. Cohen, Nelson, and Walsh (2000) describes the results of a 1994 survey administered to ~1500 R&D labs in the US manufacturing sector. In this survey, firms were asked to estimate the percent of innovations that were patented. Across all sectors, on average firms reported they patented 49% of their product innovations and 31% of their process innovations (see appendix table A1). Cohen et al. (2002) reports on a similar survey offered to large Japanese R&D performing manufacturing firms. Respondents estimated that 62% of product inventions were patented and 42% of process inventions (see footnote 19).
Note though that there was substantial variation in those figures. In the US survey, drug manufacturers stated they patented 96% of products; glass manufacturers patented 6% of product innovations and 2% of processes. So, at least in 1994, US firms themselves estimated they only patented between a third and a half of their inventions, at least in the manufacturing sector. The figure was a bit higher in Japan. Notably, the share of inventions the respondents estimated were patented was higher than the share for which they believed patent protection was effective. For example, US respondents claim to patent about half their product inventions, but estimate patents are effective protectors of only about 35% of their product inventions.
Reasonably or not, economists tend to be a bit wary of self-reported survey data. So to get some non-survey evidence on how many inventions get patented, let’s next turn to some papers that actually do draw up lists of inventions, and then look at how many inventions on the list get patents.
Matching Patents to Invention Lists
Perhaps the sector where this approach works best is the pharmaceutical sector. That’s because in this sector, inventions actually are kind of like discrete events; they actually are different small molecules that have been approved by the FDA. In addition to providing a complete record of all drugs available in the US that make claims of medical efficacy, the FDA also collects and publishes information on which of those drugs are currently protected by patents (with the intention of providing guidance to generic drug manufacturers). Durvasula et al. (2023) analyzes this data, finding 68% of new drug approvals since 1985 have at least one associated patent. But this includes very minor new drug approvals, that amount to small tweaks to existing drugs. If we limit our attention to new molecular entity drugs, those with a novel active ingredient, the share with patent protection rises to 85%. This is a majority of inventions, but it’s also less than Cohen, Nelson, and Walsh (2000), which found that in surveys drug manufacturers claimed to patent 96% of their products.
Moreover, the drug industry is generally understood to be an unusual case, where patents work particularly well. For a broader perspective, another particularly rich strand of literature uses innovation prize competitions to source lists of inventions. Each of these papers obtains a list of all inventions that either win prizes, or which are in a competition for a prize, and then sees what share of the inventions also enjoy patent protection. This literature spans many countries and time periods.
Moser (2012): At the 1851 Crystal Palace world’s fair in London, countries from around the world submitted inventions for display. Moser uses the exhaustive exhibition catalogs from this fair to generate a list of both patented and unpatented inventions. She then identifies which of these exhibited inventions were patented - for British inventions, this information is in the catalogs (self-reported, but verified by fair organizers), while for US inventions, Moser searches US patent records for patents related to the inventions listed in the catalog. Out of over 6,000 British exhibits, only 11% were patented. Out of over 500 US exhibits, only 15% were patented.
Brunt, Lerner, and Nicholas (2012): Over 1839-1939, the Royal Agricultural Society of England held annual prize competitions related to agricultural technology. Brunt, Lerner, and Nicholas obtain data on some 15,000 inventions entered into these competitions, and hand match them to UK patents. They estimate 18% of these inventions were patented, though that share seems to exhibit a modest upward trend over the century under study.
Shimizu and Hoshino (2012): Each year since 1954, the Okouchi Memorial Foundation has typically handed out 10-15 awards to Japanese innovations, with a focus on production technology. Shimizu and Hoshino compile data on these awards from 1972-2007, including whether the invention is patented/applying for a patent, which is reported by the applicant. In 1977, roughly 40% of prize-winning inventions were patented, but this has steadily risen, such that from 1990 on, essentially all Okouchi award-winning inventions are patented.
Fontana et al. (2013): Since 1963, the magazine Research and Development (previously called Industrial Research) has run the annual “R&D 100 award” competition to identify the ~100 most technologically significant products available for sale/licensing in the previous year. Fontana and coauthors gather data on 2802 award-winning inventions over 1977-2004 and then search for patents that list the same inventors, same organization, roughly describe the award-winning invention, and which were granted within 3 years (before or after) the prize. By this criteria, only 9.1% of award-winning inventions have an associated patent.
Capponi, Martinelli, and Nuvolari (2022): These authors obtain data on 1,234 inventions awarded the Queen’s Award for Innovation over 1976-2015, a prestigious award organized by the UK government to recognize striking achievement in innovation. Capponi and coauthors identify the firms associated with these prizes, and then look through their patents to identify matching patents, searching for patents filed no more than ten years prior to the prize. By this criteria, 32% of prize-winning innovations were protected by at least one patent.
To sum up, most of these papers find that approximately 10-30% of inventions associated with prizes are also protected by patents, with one study finding numbers approaching 100% in Japan. If we focus on papers covering the post-WWII era, which might be most relevant to the share of inventions patented today, we get a low of 9.1% from Fontana et al. (2013), and a high of nearly 100% from Shimizu and Hoshino (2012). Brunt, Lerner, and Nicholas find on the order of 30% of agricultural prize winners were patented in the years leading up to 1940. With the exception of Shimizu and Hoshino, these estimates are notably less than the average share of inventions patented in survey responses from Cohen, Nelson, and Walsh.
Let’s look at one more study, which also illustrates how conceptually tricky it can be to assign patents to inventions. Argente et al. (2023) use the Nielsen Retail Measurement Services data to get information on how often new consumer products are patented. Their dataset covers data on more than one million products sold over 2006-2015, accounting for a large share of sales in grocery stores and drug stores. This lets them identify inventions as new UPC codes; for example, a new kind of disposable cup, which is assigned a new UPC code when it reaches the market. They also have data on the product attributes of products sold, which they use to identify larger and smaller innovations (a new product with previously unseen product attributes is a bigger innovation than one without), but I’ll set that discussion aside for now. The important thing is that they have, again, a fairly representative list of new products in this sector and since they know the identities of the firms, they can also find all the patents associated with those new products.
Rather than try to identify one-to-one matches between products and patents, Argente and coauthors look to see if firms have patents related to the product categories where they introduce new products. To do that, they rely on the fact that Nielsen places each product into one of more than 1,000 different product categories. Argente and coauthors build a text description of each of these categories by combining hand-selected wikipedia articles about the product category with short descriptions of each product category provided by Nielsen. They then use clustering algorithms to group these 1,000+ categories into 400 clusters that use similar text to describe product categories - for example, disposable cups and disposable plates might be grouped. They then look to see which patents a firm owns are textually close to the text of these clusters. In short, they look to see if a firm has patents that use similar words (in its title, abstract, etc.) as the words used to describe product categories in which it is active. When they do this, they find that 23% of new products are introduced by firms who have patents related to that product category.
More than zero, less than half
So to sum up, most R&D is performed by companies that have at least one patent. In the pharma sector, we can be pretty confident that most (but not all) new drugs get a patent. In manufacturing, if we survey firms, they claim to protect roughly half their products with patents and a third of their process inventions. But if we look at the share of inventions that either win prizes or are entered into prize contests, we can generally find patent matches only 10-30% of the time (with one major exception). This is also about the rate at which consumer goods firms have patents related to product categories in which they introduce new products.
We’ve got three different estimates here: 90% of R&D is performed by patent-holders, 30-60% of inventions are patented according to survey based evidence, and roughly 10-30% of inventions can be matched with patents. Why the divergence?
It’s not hard to reconcile the very high share of R&D associated with patents with the other measures; it just implies firms patent some but not all of the work they perform R&D on.1 It’s harder to reconcile the survey and matching based estimates though, but there are a few factors going on. One issue is down to who is in the sample. The prize-based papers include innovations that come from outside the private sector, who are less likely to patent. The survey-based papers are primarily based on relatively large manufacturing firms, and large firms are more likely to patent. As we’ve seen, there is also a lot of variation in the propensity to patent by sector, and the sectors covered differ as well. Lastly, it might be that people who submit inventions to prize contests differ from those who seek recognition or reward for their inventions from patents.
Another factor is that surveys can pick up patents that would seem to be unrelated to an invention, based only on reading the patent, but which an expert skilled in the field would recognize is actually relevant. We might expect inventors to have a better idea of which seemingly unrelated patents are actually relevant to their invention, and indeed, we do find higher patenting rates when we let inventors self-report the share of inventions they patent. Brunt, Lerner, and Nicholas note that 28% of prize winning inventions at agricultural technology competitions report patent protection in the prize announcements, but they can locate patents for only 22% of winners. Shimizu and Hoshino, who also rely on self-reporting of patent protection, find nearly all Okouchi prize winners enjoy some kind of patent protection; in contrast, Fontana et al. (2013) are only able to match inventions to patents for 25% of R&D 100 winners from Asia. We don’t know how many of the inventions Fontana et al. (2013) study are from Japan, but it was likely a large share, since even as late as 2004 Japan spent more on R&D than South Korea, China, and Taiwan combined. And the survey results from Cohen, Nelson, and Walsh are on average higher than the levels implied by matching patents to prizes, and even higher than FDA data on drugs protected by patents.
Argente et al. (2023) also have some evidence on this matching issue. Recall, their method involves classifying patents into different related product categories, based on the similarity of patent text to text associated with different product categories. To validate their approach, they take advantage of the fact that, since 2011, firms have been able to list patents that protect their product lines on a website. Few firms currently make use of this right, but Argente and coauthors can at least look at the patents that Proctor & Gamble and Kimberly-Clark report to be protecting their various product lines. This provides some information on what product categories the patent-holders believe their patents are related to. They find their text-based algorithm picks the “right” product category (defined here as the one the patent-holder reports as the relevant one) as its first or second choice about 80% of the time. But in a significant fraction of cases (more than 10% of the time), the text-based approach is not very close; it does not identify the product category the patent-holder says is relevant to be textually close. All told, I think this difficulty in matching a patent to an invention is a reason to think of the matching literature (which found patent matches on the order of 10-30%) as a lower bound estimate. It appears easier to miss a related patent than to incorrectly match a patent to an invention when it has similar text and the same inventors.
So, how many inventions are patented?
Probably more than zero, but less than half. That’s enough for us to be confident that the patent register is not a census of invention. Instead, it’s a sample. The good new is that with hundreds of thousands of new patents each year, it’s a sample that is plenty big enough for us to do statistical analyses and learn interesting things. The bad news is that it’s not a random sample, but a biased sample: some kinds of invention are much more likely to show up in the patent record than others. Much of the challenge of doing good analysis with patent data is understanding and correcting for these biases.
Thanks for reading! The next post in this series will be posted to Substack tomorrow - stay tuned! As always, if you want to chat about this post or innovation in generally, let’s grab a virtual coffee. Send me an email at matt.clancy@openphilanthropy.org and we’ll put something in the calendar.
Notably, about 7.5% of firms with patents report doing zero R&D. Note, this is not necessarily 7.5% of patents, but rather firms with patents.