USAID Administrator Samantha Power speaks at the University of Khartoum in Sudan’s capital on August 3, 2021. | Ashraf Shazly/AFP via Getty Images
The rise of cash benchmarking at USAID, explained.
How good does a foreign aid program have to be to be good enough? It’s a question I have wrestled with since my first post with the US Agency for International Development. Now, after a decade of work, the agency, which administers billions in civilian foreign aid and development assistance annually, may be on the cusp of a big step forward.
My involvement with the question began in 2011, when, as a newly minted foreign service officer in Nicaragua, I was assigned to participate in the final stages of a typical smallholder farmer program. The program claimed to have spent about $3,000 per household to raise a farmer’s income by about 20 percent. But these were poor farmers making less, perhaps much less, than $1,500 per year. Was this such a success if it would take many years of the higher income to make back the money spent on the program? I found myself wondering if we could have been more effective if we simply gave away the money as a cash transfer.
Two years later, I landed in Rwanda for a new posting just as a study on the work of a small, new nonprofit organization called GiveDirectly became public. The results suggested that direct cash programs could give traditional programs, well, a run for their money — and inspired me to spend the better part of 10 years on an initiative called “cash benchmarking.” It’s a simple idea: No USAID program should be approved unless it is likely more cost-effective than an unconditional cash transfer. Now that I am out of the agency, I am finally free to talk about it.
In her Senate confirmation hearing in March 2021, USAID Administrator Samantha Power promised to “work tirelessly with Members on both sides of the aisle to ensure that taxpayer dollars are well spent. Guided by evidence, I will work with you to adapt or replace programs that are not delivering.”
That USAID’s programs are supposed to be based on evidence isn’t in doubt. The first principle of USAID’s own operational policy is to “apply analytic rigor to support evidence-based decision-making.” Importantly, USAID’s new Policy Framework, released this March, makes “investing in USAID’s enduring effectiveness” a core pillar to be achieved in large part through “grounding responses in evidence.” One would imagine that, in response to such a clear mandate prioritizing effectiveness, significant efforts would be made to know whether or not programs are working. But to date, that hasn’t happened.
USAID does, of course, do many evaluations of its program — 93 in 2022, to be exact. The problem is a large majority are performance evaluations, which according to USAID’s own guidance can’t tell you much about the effectiveness of a program. Impact evaluations, on the other hand, can. They use a control group to tease out the effect of a program from all the other things happening in the background. For example, you can’t know if an agriculture program increased farmers’ yields if you don’t account for things like good or poor rains that growing season. An internal analysis I helped conduct while leading USAID’s Evaluation Team in 2020 found that staff often procured a performance evaluation when they were asking questions only an impact evaluation could answer.
Even when USAID does conduct impact evaluations, unfortunately, they’re often not very good. Roughly half don’t meet USAID’s own definition of impact evaluation, despite bearing the label, and only 3 percent are of the highest quality. To make matters worse, USAID, like the rest of the foreign aid field, very rarely does cost analyses of its programs. This means that even when USAID knows the amount of change a program created, it doesn’t have any way to assess if the change is worth the money it took to get it. As a result, USAID is left with very little evidence on whether its programs are delivering. Thankfully, improvement is in the works.
What does “effective” actually mean? Enter cash benchmarking.
Even if USAID generated plentiful evidence of the success of its programs, exactly how much improvement is necessary to call a program a good use of money is subjective. Is a 1 percent increase in dietary diversity a success if it costs $1 per beneficiary to get there? How about if it cost $1,000 per beneficiary? What is needed is a minimum standard of cost-effectiveness that can apply broadly, so that staff have a point of comparison when considering different approaches.
Cash benchmarking, the USAID initiative I helped start in 2013, tries to provide that standard. It makes the case that a program shouldn’t be approved unless it is likely to do more good than taking the amount of money it planned to spend on behalf of beneficiaries and giving it directly to them. After all, if the USAID activity design, procurement, and management system isn’t “[doing] more good for the poor with a dollar than the poor can do for themselves,” as GiveDirectly co-founder Paul Niehaus put it, the aid system is detracting value from the money entrusted to it by Congress.
With support from USAID’s Development Innovation Ventures team and then-Administrator Rajiv Shah, Google.org (the charitable arm of parent company Alphabet), the Development Impact Lab at UC Berkeley, and Innovations for Poverty Action, we set up experiments comparing cash to traditional programs. It took two years, a legal opinion from USAID’s general counsel, and a briefing to the congressional appropriations committees to get the experiments off the ground. Coordinating the studies with the implementers of a standard USAID nutrition program and a separate workforce development program was complex, but in 2018, the first results came out. The public reaction was incredibly positive; USAID’s reaction was … cautious.
Thankfully, despite significant discomfort in some quarters of the bureaucracy, key members of USAID leadership, under then-Administrator Mark Green, went to bat to keep the initiative going. They leaned into USAID’s own risk appetite statement, which commits to a culture of learning and transparency, even when others were more concerned about the optics of publicly acknowledging a few weak programs. Credit also goes to the implementers of the evaluated programs for their commitment to evidence and the desire to know if what they were implementing was working. Cash benchmarking was expanded to several other countries with the help of Good Ventures and the Development Impact Lab. USAID eventually referred to cash benchmarking in its operational guidance, gave it a shoutout in its new Economic Growth Policy and declared it a key accomplishment. Administrator Power has followed through with a public commitment to “expand the practice of cash benchmarking” and highlighted it in USAID’s newly released Policy Framework.
Despite coming a long way in just a few years, several misunderstandings about cash benchmarking have hampered the beneficial impact it could have at USAID, other aid agencies, and philanthropies.
What cash benchmarking is … and isn’t
Cash benchmarking creates a hurdle rate of cost-effectiveness, a hard line in the sand for aid donors to justify not only their expenses, but the paternalism implicit in deciding for people in poverty how money meant to help them is to be spent.
However, cash benchmarking does not assume that cash is particularly effective or ineffective at achieving any particular development goal. It is the relative effectiveness that matters. As a noted cash researcher once put it, “It is not that cash transfers to the poor are a panacea, [it’s] more like, they probably suck less than most of the other things we are doing. This is not a high bar.”
What makes unconditional cash transfers, rather than some other intervention, the right standard against which to compare proposed interventions is that they are the most neutral form of assistance. They strip away outsider preferences and ideas. Cash transfers are also broadly applicable, as they have been shown to move a tremendous number of outcomes that agencies like USAID care about, including food security and female empowerment. The typically lower costs of cash programs make them compelling as a point of cost-effectiveness comparison. For example, the Rwanda youth employment cash benchmarking study found that the overhead for the workforce program was 60 percent, while the overhead of cash was between 11 percent and 20 percent. Traditional programs have to be a lot more effective than cash to make up for their higher administrative costs.
A related misconception is that the case for cash benchmarking is a case for doing more cash-assistance programs. Cash benchmarking simply asks that proposed programs make an evidence-based case that they are likely more cost-effective than unconditional cash transfers. The case for cash programming would be that a program designer has looked at the options and decided that nothing is more cost-effective than unconditional cash transfers. In those cases, by all means do cash programming. But the vision of cash benchmarking is that, if a proposal doesn’t make a convincing case that it beats the cash benchmark, program designers go back to the evidence base and look for something that does. They shouldn’t necessarily pivot to a cash program.
It’s worth noting that cash benchmarking is not applicable to all types of aid programs. Many interventions inherently beat the cash benchmark, since no amount of money to individuals or households will accomplish the program’s goal. For example, cash benchmarking isn’t relevant for public goods programs, such as policy reform, trade facilitation, media freedom, large infrastructure, or rule of law, but it does apply to any program targeting an outcome that unconditional cash transfers have been shown to improve. And cash has been shown to move a lot of outcomes.
Cash benchmarking is not a judgment on implementers. While surely some excellent designs are derailed by poor implementation, if a program is found to be less effective than an equivalent amount of cash, the first doubt should be about the intervention chosen and prescribed by the program designer.
Even if one takes a very conservative view and assumes that just 30 percent of USAID’s $50 billion budget is compatible with cash benchmarking, the impact on the agency’s efficiency would be immense. Killing off the worst programs before they start and reallocating that spending to programs that are just 10 percent more cost-effective, would be equivalent to saving $1.5 billion per year, every year.
Cash benchmarking in practice
To achieve Power’s commitment to expanding the use of cash benchmarking, several things are needed.
First, USAID’s extensive program design guidance must explain cash benchmarking and give it some teeth. Activity design guidance should clearly state that no new program will be approved unless it is either a truly innovative approach and therefore has never been tested, or if it provides rigorous evidence that it is likely to beat the cash benchmark. According to USAID’s evaluation policy, approaches that have never been tested are supposed to be evaluated with an impact evaluation. If program designers want to claim that an approach is innovative, they should be asked to comply with this requirement.
Second, USAID should facilitate the use of cash benchmarking by providing program designers with an estimate of the impact per dollar they need to provide evidence that they can beat. Say you are a nutrition officer in Senegal, designing a program that will cost $200 per household. What is the likely impact of $200 in cash transfers on dietary diversity? To provide these cost-effectiveness estimates, USAID and other aid organizations need to invest in more systematic reviews, including meta-analysis, which USAID itself says is the most rigorous form of evidence. Thankfully, that is beginning to happen. In the meantime, imperfect evidence is better than nothing: USAID staff should make use of individual impact evaluations of cash transfers that study similar expenditure levels and outcomes to the programs they are designing.
Third, it would make things easier for program designers if they were provided with rules of thumb for which interventions are generally likely to beat and not beat the cash benchmark. For example, providing information on the benefits and costs of education to improve learning probably beats the cash benchmark. Vocational and life skills training to improve income probably does not. Rules of thumb similar to the Smart Buys for Education analysis would make it much easier for staff to comply with the cash benchmarking requirement. Ideally, these rules of thumb would be derived from comparative cost-effectiveness analysis, but a simple placeholder rule could be to assume that any program that costs 50 percent or more of the beneficiaries’ annual household income is unlikely to beat the cash benchmark. The opportunity cost of providing some of the lowest-income people in the world with what is for them a windfall fortune is too high.
Cash benchmarking and the broader evidence agenda
Cash benchmarking is an important piece of the puzzle in making USAID a leader in evidence-informed foreign aid, but just a piece. After all, the goal of foreign assistance programming shouldn’t merely be to be better than cash, but to be as cost-effective as possible. To achieve that goal, there are a number of other strategies USAID should embrace.
For example, an HIV program shouldn’t just be asked to show that it beats the cash benchmark, but that it is more cost-effective than a known effective alternative like providing antiretroviral drugs. However, despite the large number of systematic reviews looking at the impact of different development interventions, we are still far from knowing the best approach to address many problems. While evidence is being generated and considered, it is worth using the cost-effectiveness of unconditional cash as a temporary benchmark to ask that programs at the very least be more cost-effective than just giving away the money. This can be seen as an important part of decolonizing aid.
Additionally, USAID should clearly state in its operational guidance that maximizing the cost-effectiveness of its programs is the primary goal of activity design. This would best respect both US taxpayers and the intended beneficiaries of aid. USAID should also expand the use of problem diagnostics beyond economic growth to help focus resources on the most important barriers to fixing a development problem. A requirement to use rigorous evidence to justify proposed programs should be baked into mandatory activity and project design templates and guidance, even if cash benchmarking isn’t required. USAID should meet this new requirement by following its own advice and making better use of systematic reviews in deciding which interventions to fund. Testing multiple approaches at the beginning of a program would overcome challenges in translating results of impact evaluations across contexts.
Cash benchmarking results have continued to come in. While many cash studies analyze small transfer amounts, recent USAID cash studies look at transfer amounts comparable to the per-beneficiary spend of many traditional programs. In Liberia, cash had a positive impact on food security, wealth, intimate partner violence, and child education two years after the transfer. In Malawi, researchers found similar results two years on. In Congo, a study found a traditional program combined with cash improved business performance and personal agency. A follow-up study looking at the impacts of the cash and workforce program in Rwanda three years on showed fading, but still significant and sustained, effects.
With these results in hand, USAID now has more rigorous, longer-term cost-effectiveness studies of cash transfers than it does of interventions it funds year in and year out without question. USAID should first follow through on its own commitment to help field staff conduct better-quality impact evaluations and then make a concerted effort to evaluate the sustained cost-effectiveness of its most common and highly funded interventions. It’s worth noting that relying on overstretched field staff to take responsibility for procuring the right type of evaluation and ensuring the evaluations are well done hasn’t worked. USAID’s evaluation experts in Washington should be given more direct responsibility for scoping, procuring, and managing evaluations.
By doing these things, USAID can better meet its goal of delivering progress beyond programs through the use of evidence.
Reasons for optimism
Ten years after the start of the cash benchmarking program, USAID’s practical commitment to evidence continues to advance. Last November, USAID continued its run of excellent chief economists by hiring Dean Karlan, a renowned expert in impact evaluation and evidence use who is now leading the establishment of the best-resourced Office of the Chief Economist USAID has ever had. He has the mandate, top-level support, and expertise to tackle these problems of evidence quality and use. In March, USAID released its new Policy Framework with an emphasis on evidence use. In May, USAID released guidance to staff on how to analyze the costs of their programs (which is half the cost-effectiveness equation). In June, USAID published guidance to improve commissioning of impact evaluations.
USAID’s Policy Framework states, “To make the maximum impact with our programming … USAID must transform ourselves with urgency to meet the moment.” For the use of evidence to improve program effectiveness, that moment has arrived. Other donors should follow USAID’s commitment to upping its game. The opportunity cost of lives improved, even saved, is too high to drag our feet on asking decision-makers to justify their programs with evidence.
Editor’s note: The views expressed in this article are the author’s own and do not necessarily reflect the views of 3ie.