May 2025 Updates

Government funding for R&D and a big retraction

May 27, 2025

New Things Under the Sun is a living literature review; when the literature changes, so do we! This post covers a few updates to articles. As a reminder, the most up-to-date versions of each article live on NewThingsUnderTheSun.com.

Consistency in the Returns to R&D

The post Government Funding for R&D and Productivity Growth reviewed a few recent papers that try to empirically measure the productivity impact of government spending on R&D. Those papers qualitatively found similar results, but it was hard to assess whether the magnitudes from each approach were consistent with each other, because they each used different methods and reported results in different ways. Now, Fieldhouse and Mertens (2025) makes some welcome progress on this question. From the updated article, the article now concludes:

So… what is the return to government funded R&D?
To answer this, recall that Jones and Summers (2021), the paper I mentioned at the outset of this article, argued via a thought experiment that a dollar of R&D on average is worth several dollars in benefits. Another way to express the benefits of R&D is via a social rate of return, which can be understood as the interest rate you would need to be offered on a conventional investment to be indifferent between it and getting the returns from R&D. By this measure, Jones and Summers argue as a baseline that the return on R&D averages 67% (much higher than the return on most investments!).
At first glance, it seems like Fieldhouse and Mertens (2023) find answers that are dramatically different—they estimate social returns to nondefense public R&D in the range of 140-210%, compared to the 67% from Jones and Summers. But in a follow-up paper (Fieldhouse and Mertens 2025), they show these answers are not that difficult to reconcile if you keep in mind a few key points.
First, the 67% estimate from Jones and Summers is an average over all R&D spending, including both government and private sector R&D. The Fieldhouse and Mertens (2023) estimate is not an overall average—it's specifically for nondefense government R&D.
To see how these different estimates can be reconciled, we need to consider the composition of R&D spending. Over the post-war period, private sector R&D has accounted for roughly 54% of all R&D spending, defense R&D for about 26%, and nondefense government R&D for approximately 20%. If we use the estimate that private sector R&D generates a return of about 55% (which they pull from another paper, Bloom et al., 2013, discussed in more detail here), and defense R&D generates a return of about 25% (which is consistent with the findings in Fieldhouse and Mertens 2023), and nondefense R&D generates a return of 175% (the midpoint of their 140-210% range), then the weighted average return would be:
54% × 55% + 26% × 25% + 20% × 175% = 71%
This is very close to the 67% estimated by Jones and Summers (2021).
What about Dyévre (2024)? I think his answers are also consistent with the conclusion that public funding for R&D is actually higher than the levels implied by Jones and Summers (2021). Here’s a simply argument to see why.
As a benchmark, suppose that 100% of annual economic growth is driven by (100% of) annual R&D. This is basically the assumption made in Jones and Summers (2021). Let’s suppose there is a constant relationship between R&D spending and growth, just as a benchmark. If that’s true, then we should expect a 1% increase in annual R&D to generate a 1% increase in annual growth. Annual GDP per capita growth in the USA has been about 1.8% per year since the 1950s, so a 1% increase in the growth rate is 0.018%. Do Dyévre’s results match this benchmark?
As noted previously, government R&D has averaged only about 45% of total R&D, so increasing overall R&D by 1% would require increasing government spending by more than 1% - specifically, 2.2%. Recall that Dyévre finds that a 1% increase in government R&D funding generates roughly a 0.024% increase in productivity after five years, so a 2.2% increase should lead to a 0.0528% increase in productivity.
In other words, Dyévre’s empirical approach finds that an increase in government spending equivalent to 1% of total R&D spending tends to lead to a 0.0528% increase in productivity growth; that’s a lot more than the 0.018% benchmark that I think is consistent with Jones and Summers (2021). One would need to do some careful work to be sure, but it also seems broadly consistent with the Fieldhouse and Mertens (2023) result which found the return to non-defense R&D was much higher than the overall average for R&D.

Read the whole post

Changes in the Composition of Government R&D

In my post Frequently Asked Questions About US Government Funding for R&D, I noted that government spending on R&D has grown more slowly than overall government spending, private sector spending on R&D, and US GDP. But looking only at aggregate funding for R&D hides some interesting dynamics; not all kinds of R&D have seen their level of government support fall, relative to GDP. In this update, I note:

There have been substantial changes in the composition of federal spending on R&D over time. Federal support for basic and applied research has been roughly constant as a share of GDP since the 1970s, at roughly $0.38 for every $100 of GDP. Meanwhile, support for development has fallen from a high of $1.50 in every $100 of GDP during the height of the space race to roughly parity with support for basic and applied research.

Author calculations. Sources: Federal R&D spending data taken from NSF Science and Engineering Indicators, table SRD-5. GDP data taken from U.S. Bureau of Economic Analysis, Gross Domestic Product [GDP].

Read the whole post

A big retraction

To close, one of the biggest papers in the economics of innovation of the last year was Toner-Rodgers (2024), which purported to describe an experiment conducted by an unnamed materials science company on the effects of AI on R&D. The paper had a lot of interesting implications, and I wrote about it in the articles Prediction Technologies and Innovation and Do Prediction Technologies Help Novices or Experts More?, and to a more minor degree in the post What if we could automate innovation? On May 16, 2025, MIT announced it had conducted an internal confidential review, and concluded the paper should be withdrawn from public discourse. We don’t know exactly what the story here is, but MIT economists Daron Acemoglu and David Autor stated “we have no confidence in the provenance, reliability or validity of the data and in the veracity of the research.”

New Thing Under the Sun is a living literature review, and I aim for articles to reflect the current state of our knowledge. I think the current state of knowledge no longer includes the findings in Toner-Rodgers (2024), so I’ve dropped them from the review. This has had the following effects on the updated articles:

Prediction Technologies and Innovation looked at evidence that access to a better prediction technology (which included everything from gene maps to software to AI) can inadvertently retard the pace of knowledge growth, by encouraging researchers to narrow their research agenda to domains where the new prediction technology can be useful. Papers looking at early prediction algorithms in structural biology, and gene maps illustrated that this concern about research narrowing can indeed happen. Somewhat in contrast, Toner-Rodgers had found that access to AI tools increased the novelty of materials discovered (which is not exactly the same thing as the question about narrowing). Without Toner-Rodgers (2024), the article is now a bit more internally consistent, in that all the papers discussed illustrate a narrowing effect of new prediction technologies. But the article no longer has any discussion of modern AI tools.
Do Prediction Technologies Help Novices or Experts More? argued that prediction technologies can benefit experts or novices more, depending on what kinds of sub-problems they help solve. Most of the papers discussed found evidence for both dynamics; in some situations a prediction technology (like a satellite map or a genome association study) helps novices more, and in others, experts. Again, somewhat in contrast, Toner-Rodgers (2024) found pretty unambiguous evidence that experts benefitted from access to the AI tool. Indeed, researchers who were the least productive in the pre-AI era didn’t benefit at all. Without Toner-Rodgers (2024), I think we are kind of back where we started, in asserting that prediction technologies in general can help either experts or novices more, but now the post doesn’t discuss any articles about the effects of AI on this process. I predict we won’t have to wait too long until that changes though.
What if we could automate innovation? is mostly a discussion of one theoretical model of innovation, which assumes innovation is comprised of many different tasks. In the model, when a subset of tasks get automated, researchers reallocate their effort away from that task and towards the ones that are not (yet) automated. The post discusses Toner-Rodgers (2024) as an illustration of this phenomenon, but the theoretical argument does not depend on it (and was written well before the paper).

So a general theme is that I haven’t dramatically revised my views on the effects of prediction technologies on innovation. We now know less than we thought about the impact of AI on innovation, but I think what we knew from this paper was always pretty provisional, given how fast AI changes. It was always more like the impact of a specific technology, at a specific time. I suspect we’ll soon have lots of new papers on these topics.

On a personal note, I’m sorry I got taken in by this paper and signal-boosted it. As with all the papers I write about, I read it carefully. At the time, I thought it was almost too good to be true - I had never seen a more comprehensive set of data - but not actually too good to be true. It now seems like I was wrong.

Should I tighten up my standards so this doesn’t happen again? I ultimately think the tradeoffs for any feasible changes I could make to avoid this kind of error would hurt New Things Under the Sun more than help it. The premise of this project is that I publish faster and with less hedging than would often be the norm in academia, but what I publish is more provisional. It’s a living literature review, with the expectation that findings will be revised as we learn more.

Should academia tighten its standards? I am in favor of a regime where the level of resources devoted to assessing a paper’s veracity is tailored to the likely impact of the research. In such a regime it is easy to do preliminary research, which ensures we get lots of shots on goal, but also we can be (more) confident in influential results. In general, I don’t think we currently invest enough in ensuring the rigor of high impact papers (we should make replication the norm there, for example). But I would not advocate for increasing the level of oversight in a preprint like Toner-Rodger (2024).

Until Next Time

Thanks for reading! If you want to chat about this post or innovation, don't hesitate to reach out at matt@newthingsunderthesun.com.