Free Knowledge and Innovation

  
0:00
-10:10

Sometimes obvious ideas work. If you want to encourage more innovation, give people better access to knowledge.

Let's start in the 1800s. Over 1883-1919 (but mostly after 1899) Andrew Carnegie provided the funds for the construction of ~1700 free public libraries, scattered across the USA. There was even one in my hometown.

Berkes and Nencka (2020) sets out to measure the impact of this library building spree on innovation. They want to compare the patenting rate of cities and towns that received libraries to otherwise identical ones that did not. The challenge is finding cities that did not receive libraries, but otherwise serve as a good control group. What Berkes and Nencka use is a set of 200 towns that applied for library funding and were approved for funding, but then changed their mind and rejected Carnegie’s money. 

Why would they do that? There are a variety of possible reasons, but a big one is distaste for Carnegie himself after he hired a militia to violently put down a mining strike (many people died). Berkes and Nencka show that these rejecting cities were, on average, no different than the acceptors on various observable metrics such as education levels, racial makeup, job mix, age, population. For Berkes and Nencka’s comparison to work, we have to believe that whatever the reason these cities rejected funding, it’s largely uncorrelated with a tendency to change patenting behavior after an application is accepted. Berkes and Nencka do a couple other things to establish this pretty convincingly in my view.

Because it turns out patenting does change pretty noticeably for the towns that get libraries, compared to the ones that apply and do not. The main story is visible in the figure below, which shows the average number of patents per city-year on a log scale: towns that ended up with libraries looked pretty similar to towns without ones up until they both applied for libraries, but then afterwards the towns that got libraries tended to have 8-12% more patents than the ones that didn’t. In the figure below, the solid line is when funds for a library are granted, and the dashed line is when libraries were typically opened (three years later).

(Aside: why does the figure above have this inverted U-shape? That has to do with larger trends of patent activity shifting away from towns and into big cities during the period under study - having a library did not stop that trend)

Moreover, Berkes and Nencka also provide some supportive evidence that the increase in patenting really does come from library access: patents from cities with libraries are more likely to contain words associated with citing a book (e.g., "vol.", "his book", "pp.", "pages", etc).

Let’s fast forward to 1975. In that year, the US Patent and Trademark Office began a program to dramatically increase the number of patent depository libraries around the country, with a goal of having at least one library in every state. Patents provide inventors with exclusive rights over their invention for a period of time, in exchange for disclosing how the invention works. In theory, that should let other people build on the underlying principles expressed in new inventions. But prior to the internet, to easily read published patent documents, you needed to go to one of these libraries and access to them was very unequally distributed.

Furman, Watzinger, and Nagler (2018) measure the impact of getting a patent library by following a similar strategy as Berkes and Nencka (2020): they compare regions that got a patent depository library to otherwise similar regions that did not. In their case, they use the fact that federal depository libraries serve as a potentially obvious control group. Federal depository libraries are libraries that provide access to federal regulation and laws, and they were also the most common library sites where patent depositories were set up. They compare patent rates in the 15 miles around federal depositories that get a patent depository library to the patent rates in the 15 miles around federal depository libraries that did not, but which are close to ones that did. (Why did some federal depository libraries get patents and others didn’t? The patent office basically followed a principle of first come, first served, so reasons were often idiosyncratic, like the person running a federal depository library wanted to travel to DC for annual patent training).

As with Berkes and Nencka, this paper finds the patent rate of regions that get a patent library diverged from those that didn’t in the subsequent years, as illustrated in the figure below. All told, getting a library seems to boost patent rates by about 17%.

Furman, Nagler, and Watzinger are also able to provide a lot of supporting evidence that this increase in patenting is driven by improved access to patents:

  • The effect is strongest for young firms and small firms, which we might assume are less likely to have alternative ways of accessing patents

  • The effect is strongest for technologies that disclose the most information in patents (chemistry patents)

  • Patents from near patent libraries cite more geographically distant patents and a wider variety of technology classes. That suggests the inventors are learning about what’s relevant from the library, rather than their social network (which is more likely to be local and to work on similar technologies).

Of course, today, anyone can read any patent online. We shouldn’t really expect it to matter if you are close to a patent library anymore. And this actually provides confirmatory evidence that those patent depository libraries really mattered. The first internet searchable patent databases became available in 1995. And it turns out the positive impact of having a physical patent depository library disappeared in that year!

That also suggests improving online access to knowledge might be another way to boost innovation. Enter wikipedia. Anyone with access to the internet can now read a free encyclopedia that has 6.3mn articles; for comparison, the encyclopedia Britannica never had more than 100,000 articles, even in it's digital incarnations. Wikipedia also has extremely detailed scientific articles: Thompson and Hanley (2020) find wikipedia covered 93% of the topics in upper level undergraduate chemistry classes and nearly half of the topics in masters' level graduate school. Does access to wikipedia have a similar effect on innovation as access to libraries?

To test this, Thompson and Hanley perform an experiment. They commission 43 new chemistry articles, written by PhD students, and then they randomly post half to wikipedia. They want to see if access to these articles (as compared to the unposted ones) exerts an influence on science.

Only 0.01% of academic papers directly cite wikipedia (I guess it’s embarrassing), but Thompson and Hanley provide a lot of evidence that scientists read and are influenced by these wikipedia articles. First off, these are quite specialized chemistry topics - for their experiment, they focus on material from chemistry grad school that wasn't already on wikipedia. But even though the topic is quite niche, the readership is huge: 4,400 views per month, 2 million total views as of February 2017. Moreover, while people may be shy to cite wikipedia, they are not shy about citing the scholarly literature that is listed in the wikipedia reference list. Thompson and Hanley show references in articles they published to wikipedia got 91% more citations on average than reference in their control group of unpublished articles. People are reading these articles and citing the referenced literature, rather than wikipedia itself.

Finally, the bulk of Thompson and Hanley's paper uses a textual similarity metric to identify the influence of wikipedia. For each of their chemistry articles, they compute the similarity of the wiki text to the text of published academic work in Elsevier. Basically, they have a method of checking things like the extent to which both articles use the same unusual words. They compare the similarity of elsevier and wiki articles published 6 months before a wiki article is published to the similarity of elsevier and wiki articles published 6 months after the wiki article. The figure below shows how the distribution of similarity changed over that period. The blue line corresponds to similarity with wiki articles that were posted and the green to similarity with the control wiki articles that were not posted (until after the experiment ended). How you can interpret this is that after wiki articles get posted, you see more elsevier articles that have high similarity to wiki articles (the ones in the top 10% of similarity). In the control, you see the opposite; more elsevier articles dissimilar to the (unposted) wiki articles are published over time.

So, Thompson and Hanley provide lots of evidence that these wikipedia articles shape the direction of science. They get read a bunch; the things they cite get cited a bunch; and after they are published, you see more peer-reviewed articles using similar words and phrases as the wiki article. Another thing the paper does is generalize this approach to all of chemistry wikipedia. Looking at 27,000 chemistry articles published on wikipedia and 326,000 chemistry articles published on elsevier, they ask how does the distribution of similarity change for elsevier articles published before and after wikipedia articles? What they find is quite similar to their much smaller experiment comprised of 43 new wikipedia articles. There is an increase in the number of elsevier articles using similar language as a given wikipedia article, after it gets published, as compared to the elsevier articles published before the wikipedia article.

(As the author of a free newsletter about academic research that strives to be accessible to lay readers, I have chosen to believe I too exert an inescapable gravitational pull on the direction of research)

Public libraries, patent depository libraries (in the pre-internet days), and chemistry articles in wikipedia; in all three cases, new access seems to have had a measurable impact on innovation.

Share