Alcohol in Africa – more illegal, but not more deadly

May 7, 2010

Why do people vote? Don't ask a micro-economist.

May 7, 2010

Randomized Controlled Trials: panacea or mirage?

May 7, 2010
empty image
empty image

Randomized Controlled Trials (RCTs) are all the rage among development wonks at the moment. Imported from medical research, they offer the tantalizing allure to social scientists of finally overcoming the Achilles’ heel of real-world research – the counterfactual (aka ‘how do we know what would have happened if we hadn’t lobbied the government/ employed the teachers/ built the road etc?). Here, one of the RCT gurus (and recent winner of the almost-as-good-as-a-Nobel John Bates Clark medal for young economists), Esther Duflo, sets out the case for the technique [h/t Chris Blattman]. She is the Abdul Latif Jameel Professor of Poverty Alleviation and Development Economics in the Department of Economics at MIT and a founder and director of the Jameel Poverty Action Lab (J-PAL) (must take a look at her business card sometime).

Not everyone is convinced – in a recent debate in the Enterprise Development and Microfinance journal James Copestake (anti) slugged it out with Dean Karlan and Nathanael Goldberg (pro). Copestake criticizes RCTs on four grounds:

“Problem selection bias. I am worried by reports of bright young “randomistas” narrowing the research agenda by selecting issues for research to fit their preferred tool, rather than finding the best tool to fit the most important issues. For example, use of RCTs to test product design changes should not divert attention from other influences on impact that may be harder to randomise: geographical targeting methods and organisational culture, for example.

External validity. RCTs require a fixed investment and generate evidence at the end of a discrete period of time, rather than continuously. This accentuates the difficulty of choosing which few among many possible ‘treatments’ should be studied, where and when. The value of findings then depends upon their transferability.

Cost effectiveness. I’m very much in favour of experimentation and testing, but remain to be convinced that RCTs are necessarily the most cost-effective way for managers and policy makers operating in complex, diverse and uncertain contexts to evaluate them, compared to triangulating routine monitoring data against focus group discussions and individual satisfaction surveys, for example.

Fourth, there may well be other more technical problems with RC studies. For example, it will not always be possible to ensure that treatment and control groups are not contaminated through spillover effects between them: response to not having a treatment being affected by my knowledge that others are having it, for example.”

If you want to read Karlan and Goldberg’s replies to these criticisms to make your own mind up who’s right, I’m afraid you have to pay (or go to the library). Anyone know of an ungated version? Or read the wikipedia entry linked to at the top of this post, which I thought gave a rather good summary of the pros and cons.

13 comments

  1. “I am worried by reports of bright young “randomistas” narrowing the research agenda by selecting issues for research to fit their preferred tool, rather than finding the best tool to fit the most important issues”?

    First of all, this doesn’t make any sense. Duflo’s entire goal is to prevent organizations and policymakers from narrowing the development agenda by examining what really works.

    Moreover, Copestake’s comment here smacks of sexism. What other “bright young ‘randomistas'” are out there? The only prominent economist I can think of who has really been pushing for similar RC analysis is William Easterly, who is bright but hardly young and incidentally not a female–so I doubt anybody would call him a “randomista.”

    Unfortunately, Duflo works in a field that is still dominated by old white men who are not as cute as she is, and apparently this makes her less credible among her peers. Maybe economists ought to be doing double-blind reviews of each others’ research, in which they don’t know who wrote the papers they review until after the fact. I think in Copestake’s case it might be interesting to see.

    Duncan: thanks Audrey, I have no idea who Copestake was referring to, but would point out that ‘randomista’, like ‘Sandinista’ is gender neutral!

  2. The problem seems to be “finding the best tool to fit the most important issues”.
    I do not see how ditsributing lentils to encourage inoculations can be considered “selecting issues for research to fit their preferred tool”.
    I think that the problem has been to find neat theories that work for many issues. If RCTs show some ideas in specific situations which work, may be later one can try to discern some general principles, if there are any.

  3. Dear Duncan et al

    You’ve made some good points – many of which are corroborated by last year’s ODI study of the production and use of impact evaluations (see link at the bottom).

    Aside from it being unwise to engage in methodological argument with people who are so hot on that sort of thing, randomistas should, to a certain extent, be left to carry out their work which is essentially a scientific enterprise, producing ‘global public good’ knowledge. There are a few problems with this however:
    – RCTs do tend to be carried out where they are methodologically convenient rather than where the new knowledge is really needed
    – there is massive publication bias, by our count over 95% of published RCTs show ‘positive’ impact, which severely limits the ability to really learn

    The bigger problem is about the relationships between RCTs and policy. They were hugely fashionable in the US in the ’60s until it was clear that they couldn’t deliver all that was hoped, and now it seems that development policy makers are in love with the idea of having clear numbers on the impact of all of their work. The problems:
    – policy makers often assume a much higher level of external validity than is actually appropriate, and tend to ignore the careful caveats which come along with RCTs
    – As even people like Duflo would have to admit, the RCT model is only suitable for measuring impact in a subset, nay a minority of the kinds of intervention required for development. However they are being given a disproportionate amount of attention
    – this is an issue of cost-effectiveness as you say, about where we choose to spend evaluation budgets and with what coverage
    – but this is also about the danger of demanding ‘rigorous impact evaluation’ (which =RCTs for some) incentivising those programmes which fit easily into that model. These are generally output-driven programmes like distributing bed-nets, and things which will struggle are governance, capacity building, budget support, policy influencing.

    I wrote up my worries about these issues in this ODI opinion piece: http://www.odi.org.uk/resources/download/2811.pdf
    And here is the WP on production and use of IEs http://www.odi.org.uk/resources/download/3177.pdf

    Cheers

    Harry

  4. Dear Duncan et al

    You’ve made some good points – many of which are corroborated by last year’s ODI study of the production and use of impact evaluations (see link at the bottom).

    Aside from it being unwise to engage in methodological argument with people who are so hot on that sort of thing, randomistas should, to a certain extent, be left to carry out their work which is essentially a scientific enterprise, producing ‘global public good’ knowledge. There are a few problems with this however:
    – RCTs do tend to be carried out where they are methodologically convenient rather than where the new knowledge is really needed
    – there is massive publication bias, by our count over 95% of published RCTs show ‘positive’ impact, which severely limits the ability to really learn

    The bigger problem is about the relationships between RCTs and policy. They were hugely fashionable in the US in the ’60s until it was clear that they couldn’t deliver all that was hoped, and now it seems that development policy makers are in love with the idea of having clear numbers on the impact of all of their work. The problems:
    – policy makers often assume a much higher level of external validity than is actually appropriate, and tend to ignore the careful caveats which come along with RCTs
    – As even people like Duflo would have to admit, the RCT model is only suitable for measuring impact in a subset, nay a minority of the kinds of intervention required for development. However they are being given a disproportionate amount of attention
    – this is an issue of cost-effectiveness as you say, about where we choose to spend evaluation budgets and with what coverage
    – but this is also about the danger of demanding ‘rigorous impact evaluation’ (which =RCTs for some) incentivising those programmes which fit easily into that model. These are generally output-driven programmes like distributing bed-nets, and things which will struggle are governance, capacity building, budget support, policy influencing.

    I wrote up my worries about these issues in this ODI opinion piece: http://www.odi.org.uk/resources/download/2811.pdf
    And here is the ODI working paper on the production and use of IEs http://www.odi.org.uk/resources/download/3177.pdf

    Cheers

    Harry

  5. I like the methodology but the presentaion made me wonder about some things. The example of using food to incentivize mothers to come and get their kids vaccinated is a pretty old stragtegy dating back to the 1970’s and has been applied on a widespread basis by USAID, WFP, UNICEF and almost every Ministry of Health in every developing country that I can remember with the coperation of large private development organizations such as CARE, CRS, World Vision etc.. for over three decades. Does the RCT in India confirm the wisdom of 30 years of maternal-child health interventions? If so, what made all these agencies so smart to pursue such programs without the benefit of an RCT during any of that time? Can we say that there are may paths to heaven?

  6. Not that RCTs are any less good or bad than so many other approaches but once I heard Duflo’s speech … sigh! I feel like, once again, do American academics know what distinguishes method from methodology?

  7. Duncan,

    Good to see this post. Panacea or mirage? Neither, but a very useful development in a sector that is hardly the most transparent about success and failure.

    I would second everything that Harry has posted about pitfalls, especially assumptions about external validity. Most methods, RCTs included, have low external validity – i.e. it is difficult to extrapolate research findings to other places and times. This is especially so for interventions where the causal mechanism plays out in a socially contingent, open system – contrast with a medical intervention where human bodies are more similar to each other than they are different.

    RCTs have the highest internal validity of any method – they are excellent at isolating specific causal factors. The problem comes in conflating high internal validity with “rigour”. We usually want to be able to make inferences for how things will play out for populations, not samples, and so external validity is also important. I like Nancy Cartwright’s writing on this topic (the philosopher, not the woman who does the voice for Bart Simpson…):

    http://personal.lse.ac.uk/cartwrig/PapersOnEvidence/Are%20RCTs%20the%20gold%20standard.pdf

    That said, I think one of the best things about the randomistas is their sense of humility and inventiveness. These are not ivory tower academics. I really appreciate the spirit in which they work, and the method allows us them to be very transparent in testing out new ideas. For example, Oxfam America are running some on a savings group intervention (I think) in Africa. This really helps them understand whether this kind of intervention does work in that context, and if they do it well, they’ll be able to also understand why.

    The reason RCTs get so much attention is because policy-makers / NGO directors (I hope so) can understand the methods and the results much more easily than a complex bit of econometrics, and so have a greater degree of trust in the findings.

    See also Chris Blattmans’ excellent note on Impact Evaluation 2.0:
    http://www.chrisblattman.com/documents/policy/2008.ImpactEvaluation2.DFID_talk.pdf

    All power to the elbow of those with an interest in promoting more transparent and credible evaluation methodology in development agencies of all stripes – it’s a hard row to hoe.
    Cheers,
    James

  8. Great article and great discussion!

    The RCT bonanza seems to me to be a reaction to the data-less (or at least data poor) development of the past and a desire to be associated with the success of medical science over the past decades (explicitly stated by Duflo). RCTs are the gold standard of medicine, but the assumption of external validity is much more reasonable. Human beings, more or less, have the same biochemistry; if a drug works in an RCT, it probably will work for everyone (yet personalized medicine is challenging even this assumption). Human behavior is much more variable than human biochemistry, and so I agree with Duncan and others who criticize development RCTs on these grounds. I think both for personalized medicine and for development, new techniques will need to emerge to monitor and characterize local and individual differences.

    It’s important to consider not just the factual truth of a claim, but also its local validity. To an NGO, “low external validity” doesn’t mean that we shouldn’t give out bed nets or deworm kids. It means that those interventions are probably a good place to start, but progress should be monitored and the intervention changed if it’s not effective. I’m very glad that I can search the Literature and find a hundred studies on the cost effectiveness of every possible intervention under the sun. But that only gives me a ranked list of options, not a rigid prescription. Neither medicine nor development can ever be done completely by the book; I am no more in favor of cookbook medicine than I am of cookbook development.

    Like good doctors, development workers need to monitor the progress and impact of their work. There have been some pretty exciting developments in technology that I’ve been trying to stitch together for my NGO, Nuru International. I’ve been able to set up a mobile data collection system for our seed project in Kenya that has been extremely helpful. We use cell phones and the web browser on the phone to collect data with Google Forms at very low cost; the total set up was about $5000 for 75 users (every member of our field staff), and has been costing around $1/user/month in data costs. For more info: bit.ly/cOYOyg .

  9. There is a massive need for evidence on what works, when, why and for how much in the field of international development, as acknowledged in the Evaluation Gap Working Group report. The International Initiative for Impact Evaluation (3ie) was set up to help fill this gap, by promoting and funding impact evaluations of social and economic development interventions and summaries of the existing evidence. We read Duncan’s original post and the additional contributions with great interest, and provide a few comments below.

    As stated in Duncan’s posting, the vast majority of social and economic development interventions are probably not randomisable. It’s been estimated that only 5% are, because of ethical reasons, since the control group population needs to be isolated from the intervention, at least during the study period, or for political reasons, such as the desire to reach certain populations or the neediest. RCTs may also not be feasible for certain types of programme, such as a national policy change. But there are other approaches to estimating impact rigorously which don’t require randomised assignment of an intervention to people or communities – so-called quasi-experiments.

    Generalisability of findings is indeed incredibly important. What works in post-conflict Liberia may not be readily transferable to Rwanda, and even more so in Latin America. This is one argument why more impact evaluations in different contexts are needed. But, while the RCTs that were done in the US in the 1960s only provided a number for impact – what Melvin Mark calls “bare bone RCTs” – the emphasis in international development programme evaluation, especially with the push from 3ie and impact evaluation network NONIE, is for theory-based impact evaluation which aims to answer not just what works, or not, but why and in which contexts.

    A good theory-based impact evaluation “opens the black box” by collecting information on the reasons for effectiveness, or lack of it, which can then be used to inform programme planning elsewhere. It builds on a theory of change and assesses information along the causal chain, as set out for example in a log-frame, to assess the reasons why interventions succeed or fail – what Chris Blattman has called Impact Evaluation 2.0. This information is just as likely to be opinions of participants and practitioners collected using beneficiary interviews and focus groups, as it is to be ‘hard’ numbers collected using surveys or administrative data. Use of mixed methods to triangulate data is a key tool in theory-based impact evaluation.

    Despite earlier doubts, the range of interventions which are amenable to rigorous impact evaluation is large. For example, 3ie is funding RCTs of:
    – Community driven development in Sierra Leone
    – Monitoring patient compliance with tuberculosis treatment in Pakistan
    – Expanding secondary education in Ghana
    – Mexico’s Payments for Ecosystem Services Program
    – Scaling up male circumcision service provision in Malawi
    – BRAC’s Graduation Model in Ghana
    – Micro-entrepreneurship support in Chile
    – Diffusion of health knowledge through social networks in Burkina Faso
    – Property taxation in Pakistan

    Innovations are being piloted so that impact evidence collected and disseminated is relevant to all stakeholders, not just researchers in ivory towers. For example, 3ie’s Policy Window enables policy makers in developing country governments and NGOs to request programmes that they want to be evaluated. Systematic reviews aim to provide an unbiased assessment and synthesis of all the available evidence on an intervention; this includes the unpublished studies which may remain hard to access because they do not provide evidence supporting positive effects. 3ie, DFID and AusAID will be launching a joint call for systematic review proposals in late September.

    We sincerely look forward to seeing the growth of theory-based impact studies and systematic reviews, and their dissemination and use by policy makers and practitioners in international development.

    Duncan: thanks Hugh, really interesting comment. For Chris Blattman on Impact Evaluation 2.0, go to http://chrisblattman.com/2008/02/16/impact-evaluation-2-0/

  10. As I have written up, this is a enterprise tool. It’s not really a miracle piece of equipment. It could easily change into another expense unless you put the persistence into learning how to take advantage of it.

  11. Like any other system, RCTs are continually improving based on past experience and new ideal and technology. For example, our RCT consultancy http://researchtool.org has built an extensive cloud system to minimise costs, improve access and streamline processes. It is healthy to have critics, but there is no doubt that there is no substitute for a well designed Randomised Controlled Trial.

    One thing that could accelerate the evolution of RCTs is a better communication between researchers. Hopefully now it will be changing, as there is a new community forum dedicated to RCT http://rctrials.org

Leave a comment