How Useful is Science?

Very.

Feb 27, 2020

What’s the state of scientific progress? Is it slowing down? Speeding up? One way to measure this is to look at how much technology “uses” science. Improving technology isn’t the only point of funding science, but it’s a big justification.

We’ll use patents because, though they’re frustratingly imperfect, they remain the best source of detailed information on broad-based technological innovations. And we can get an idea how much patented inventions rely on science by looking at citations in the patent document.

Marx and Fuegi (forthcoming) use text processing algorithms to match scientific references in US and EU patents to data on scientific journal articles in the Microsoft Academic Graph. The average number of citations to scientific journal articles has grown rapidly from basically 0 to 4 between 1980 and today.

This is a bit of an encouraging vote of confidence in science. But what do these citations really mean?

Watzinger and Schnitzer (2019) have a cool paper that suggests scientific research is a wellspring of new ideas that get transformed into technology, and that these connections are well proxied by citations. They build directly on Marx and Fuegi, but characterize patents’ dependence on science in a slightly more nuanced way.

They begin by assuming patents that directly cite scientific research depend on science the most. These patents are called “D = 1” patents, meaning the “distance” to science is just one citation. Patents that cite “D = 1” patents, but not science directly, are called “D = 2” patents, indicating their distance to science is two citations (one citation to a patent that, in turn, cites a scientific article). Patents citing “D = 2” patents, but not any science or “D = 1” patents are called “D = 3” patents and so on. The idea is that the higher is “D”, the “farther” the patent is from relying on science. It’s a measure of how many links there are in the shortest citation chain between the patent and a cited scientific article. (This measure is based on another cool paper by Ahmadpoor and Jones 2017).

Watzinger and Schnitzer then show patents with lower “D” tend to be higher value: closer to science, more valuable patent.

To do this, they need a way to measure the value of patents. There are a lot of approaches to doing this, but the one they use is based on a paper by Kogan et al. (2017). Essentially, the idea is to see what happens to the stock price of companies on the 3 days before and after they get a patent granted. Under some assumptions, you can translate this into the market’s estimated value of the patent grant. Kogan et al. (2017) shows this measure of patent value is correlated with a lot of other stuff, and it’s become a new standard way to measure the value of patents in dollar terms.

Watzinger and Schnitzer (2019) find patents with D = 1 are nearly $3mn (in 1982 dollars!) more valuable than similar patents in the same year and tech field with no connection to science! Patents with D = 2-3 are also more valuable, but the science premium declines in the way you would expect.

What is it about science that makes these patents so valuable? Watzinger and Schnitzer (2019) also scan the text of patent abstracts and look for new and unusual words - those that have not previously appeared in patent abstracts. They show these text-based measures of novelty are also associated with more value. Finally, they find patents closer to science are indeed more likely to introduce new and unusual words. Their interpretation is that science discovers new concepts, and that these concepts get spun into valuable new technologies.

This probably isn’t the only way science contributes to technology, but let’s follow this thread. If one of the contributions of science to technology is the discovery of new concepts, reflected in new text, then it might be interesting to see how much science generates new phrases and concepts.

Milojevic (2015) counts the number of unique phrases in the text of journal article titles in physics, astronomy, and biomedicine as a way to measure how the “cognitive extent” of disciplines change over time. Here, a “phrase” is a string of words that begins or ends in a common word (e.g., “and”) or a punctuation phrase delimiter (e.g., '“;”). For phrases longer than three words (e.g., “high resolution energy filtered scanning tunneling microscopy”), she collapses the phrase to the last three words. This practice does not significantly change the results. The idea is that phrases serve as a proxy for different scientific and technical concepts.

By counting the number of unique phrases that appear in an annual sample (of 10,000 phrases), Milojevic can get a measure of how many different concepts the field is researching at any given time (inset is the log number of annual publications).

The good news is that the number of unique phrases in each field is rising. The bad news is that the rate of increase seems to have slowed in physics, and especially in biomedicine. Notably, these are two areas where other researchers have also argued scientific progress is slowing (physics, biomedicine).

That said, this measure does not tell us exactly how many new concepts are being created by science for a few reasons. First, the number of unique phrases in a year is not synonymous with the number of new phrases in a year. Second, the above figures each count the number of unique phrases in a sample of 10,000 title phrases. But the total number of titles has been growing at an exponential rate (see inset figures, which are the log number of annual publications).

This should imply the total number of unique phrases is actually growing at a slightly faster than the number of articles. And if those phrases make their way into valuable new technologies (as reflected in patents), then at least by this narrow thread of evidence, the state of science is fine.

What's New Under the Sun

Comments