Commodities of War: What the people without guns say about life, death and fear in the DR Congo

November 21, 2012

A Muslim tiger? Turkey’s rising power and influence

November 21, 2012

Lant Pritchett v the Randomistas on the nature of evidence – is a wonkwar brewing?

November 21, 2012
empty image
empty image

Last week I had a lot conversations about evidence. First, one of the periodic retreats of Oxfam senior managers reviewed our work on livelihoods, humanitarian partnership and gender rights. The talk combined some quantitative work (for example the findings of our new ‘effectiveness reviews’), case studies, and the accumulated wisdom of our big cheeses. But the tacit hierarchy of these different kinds of knowledge worried me – anything with a number attached had a privileged position, however partial the number or questionable the process for arriving at it. In contrast, decades of experience were not even credited as ‘evidence’, but often written off as ‘opinion’. It felt like we were in danger of discounting our richest source of insight – gut feeling.

In this state of discomfort, I went off for lunch with Lant Pritchett (right – he seems to have forgiven me for my screw-up of a couple oflant pritchett years ago). He’s a brilliant and original thinker and speaker on any number of development issues, but I was most struck by the vehemence of his critique of the RCT randomistas and the quest for experimental certainty. Don’t get me (or him) wrong, he thinks the results agenda is crucial in ‘moving from an input orientation to a performance orientation’ and set out his views as long ago as 2002 in a paper called ‘It pays to be ignorant’, but he sees the current emphasis on RCTs as an example of the failings of ‘thin accountability’ compared to the thick version.

In a forthcoming paper (which I will definitely link to when it’s published), Lant defines thick accountability as ‘an “account” in the sense of a justificatory narrative of my actions, the story of my actions I tell to those whose opinion of me is important (including myself, but including family and kinsmen, friends, co-workers, co-religionists, people I respect and desire admiration from) that explains why my actions are in accord with, and deserving of, a positive view of myself.    In contrast, thin accountability is “accounting”, which is that small part of the account about which objective facts can be established.’  He sketched out the inevitable 2×2 matrix for me

Thin accountability

Low performance

e.g. fragile states

Thin accountability

High performance

e.g. post office and road-building

Thick accountability

Low performance

e.g. families and other non-performance oriented institutions

Thick accountability

High performance

e.g. just about any complex institutional ecosystem

The challenge in most development work is to move from top left to bottom right. There are occasions when thin accountability/high performance works – typically routine functions like delivering mail or building roads. But anything involving the messiness of people and institutions requires thick accountability, involving deep bonds of trust and reciprocal relationships that are likely to be defined by a setting’s unique history and geography – what he calls ‘folk practices, from which formal organizations can (re)emerge’.

He argues that the randomistas just don’t get this. His critique of RCT culture ranged pretty wide:

  • The politics of RCTs: ‘RCTs are a tool to cut funding, not to increase learning.’  ‘Randomization is a weapon of the weak’ – a sign of how politically vulnerable the argument for aid has become since the end of the Cold War. ‘Henry Kissinger wouldn’t have demanded an RCT before approving aid to some country.’ And I can’t see the military running RCTs to assess the value for money of new weaponry before asking for more cash (mind you, if they did, that might at least save some money on Trident….).
  • The lack of interest in theory: ‘the randomistas are going back to alchemy – atheoretic experimentation’.
  • RCTs test at most a few project variants using ‘project vs non-project’, whereas interventions are typically multiple, overlapping and synergistic (i.e. the whole cannot be reduced to a sum of parts).
  • No-one evaluates the evaluators. At the very least, given how much RCTs cost, you need to know that the findings are useful elsewhere (so-called ‘external validity’). But once you have multiple RCTs on the same issue (and their spread is starting to produce such comparable studies), you find very little external validity – the results of an RCT in one country and time are not replicated elsewhere (with the possible exception of deworming in schools, but even that iconic RCT story is contested). This is the big contrast with real science, where replicability is a key condition of validity.
Patronising? Overpromising? Nah....

Patronising? Overpromising? Nah....

In another recent paper, he argues instead for ‘structured experiential learning’, which involves rigorous and intelligent conversation, rather than the illusory certainty of numbers. Get people in a room, agree what the problem is, agree to try out some experiments to solve the problem, and set up rapid feedback to identify failure and/or build on success. In another recent paper, he calls this ‘Problem Driven Iterative Adaptation (PDIA)’. It sounds very similar to the conclusions of the Africa Power and Politics Programme, which I reviewed recently. In yet another paper (he’s horribly prolific), he also draws a neat distinction between experiments and experimentation:

‘Perhaps surprisingly, the experimentation and experiments approaches are not at all the same. I argue that experiments, while a terrific method for generating PhD dissertations and published papers, will have impact on development and development practice only insofar as they are embedded in an experimentation approach (which they are often not).’

The feeling I got from these conversations was of two tribes encamped and preparing for battle. That line from Henry V comes to mind: ‘from camp to camp, through the foul womb of Boston night, the hum of either army stilly sounds.’ On one side are the ‘best fit’ institutionalists and complexity people, with their focus on path dependence, evolution and trial and error. On the other are the ‘universal law’ experimentalists, offering the illusory certainty of numbers, and (crucially) comfort to the political paymasters seeking to prove to sceptical publics that aid works. It’s hard to see how they can both be right, or happily coexist for long. Time for a wonkwar on this blog, I think…..


  1. I think the Pritchett and APPP message to get people in a room and see if they agree the problem and possible ways of tackling it is useful, but raises questions of power (who gets in the room) and conflict (if there isn’t agreement on the problem). Some of Lant Pritchett’s collaborators have addressed these issues a bit (see Interim Institutions and the Development Process by Adler, Sage and Woolcock, but I’m hoping there is more to be published by Pritchett et al on this.

    Of course, if the locally-defined problems turn out to be ones where an RCT can help, fine. But I also wonder if there is a danger where RCTs are presented as being within an experimentation approach even if they aren’t really, as Lant suggests above. This was a weakness of Tim Harford’s otherwise very good book Adapt. Harford cites RCTs and the randomistas as a model of experimentation. But the methodology is looking for average changes over wide
    programmes rather than letting individual actors adapt to very
    contexts. So there are some interesting questions here about at what levels adaptation and experimentation occur.

  2. Interesting.

    I’m picking up specifically on your point that ‘external validity is in contrast with real science where replicability is a key condition of validity’.

    A few things on this (as physicist who works with the randomistas):

    – external validity is a problem with any type of evaluation of social programmes. This is because schools and families and ‘the system’ in Kenya, say, are manifestly different to schools and families and ‘the system’ in Mexico or Scotland, so it’s perfectly likely that what works in one place won’t work in them all. Malarial bednets reduce deaths in Kenya but not Scotland: that ‘external invalidity’ is not a problem as such, it just means that we need care in applying results. That bednets don’t achieve much in Scotland isn’t a reason against deploying them in Kenya.
    Therefore external validity is not confined to RCTs – it arises with any eval method, so it’s not a weakness of RCTs as such.

    – strictly speaking, it’s not even true that external validity doesn’t arise in ‘real science’ (I assume that physics would come within ‘real science’). There are loads of physical laws for which we have no clue whether they hold in other parts of the universe ( nobody’s ever tested run an electrical circuit on Mars as far as we know) so for all we know, our brilliant physical laws are only valid in this local little place where we live. In fact, there’s lots of evidence that that’s true (e.g., dark matter blah blah). That doesn’t make those laws unhelpful here.

    And just as physicists spend their time looking for underlying models which explain different phenomena in different places, so do good development economists. If you read the second half of Poor Economics (or talk for more than about 2 minutes Dean Karlan who you seem to dislike), you’ll find them doing just that.

    1. Thanks Caroline (and nice to hear from a physicist – I was one a long time ago). Obviously if you define ‘external validity’ as external to this planet, solar system or universe, everything has an external validity problem. The issue here is external validity between different geographies on planet earth. You can test gravitational attraction in Scotland and Kenya and find you get pretty much the same result.

      And of course development economists are looking for underlying patterns (did I or Lant say otherwise?). The point is that mimicking the natural sciences is a seductive, but often misconceived way to find them.

      Finfally Poor Economics is a great book (see my review at, More than Good Intentions not a bad one – my problem is with the title. What message do you think it conveys to the many hard-working and (dare I say it) rigorous non-randomistas?

  3. I had a hard time with the Karlan Appel book. I can’t tell who the audience is. It’s too basic for most actual people working in aid, and too complex for outsiders.

  4. While RCTs (as well as their quasi-experimental counterparts) certainly have their limitations (e.g. focus of average treatment effects and external validity limitations), they do have a place in the spectrum of research approaches to give us reliable feedback on whether what we are doing is making a meaningful difference. Let’s not throw the baby out with the bathwater, simply because certain development bureaucrats/institutions have perhaps gone overboard in viewing them as the holy grail of evidence for all types of interventions and research.

    Indeed, many of us know from a variety of ways of knowing (field experience, “gut feelings”, and more conventional forms of evidence) that a lot of development resources and effort could certainly be used more effectively. We need to be more critical of our approaches and get more honest and reliable feedback on really what difference they are making, if any. We do need more than simply good intentions, and the wise and strategic application of RCTs (as well as integrating relevant insights they generate into decision-making) are part-and-parcel of wider spectrum of evaluation approaches we should be pursuing to get us there.

  5. Oh Duncan! Always trying to get us to draw battle lines. I think the key thing is the agreed need for increased rigour coming from both the randomistas and the ‘hard-working rigorous non-randomistas’. I don’t think anyone is arguing that RCTs are a panacea that can measure all things, but nor do we accept that opinions, for example, unchecked for issues of representativeness and bias, is all that can be gleaned from those interventions that don’t lend themselves to an RCT style evaluation. The evaluation design will necessarily be informed by the evaluation/ research question at hand and the type of intervention that’s under the microscope. The question is what can we credibly, usefully, practically ascertain, and how can we use this information to increase our understanding and inform adaptations and improvements.

    1. Well put Claire, but that discussion with our bosses worried me – a lot of useful knowledge risks getting devalued because it is not numerical, and that seems a big mistake. As for battle lines – it’s all dialectical comrade (and anyway, the readers like it!)

    2. And of course the first thing that happens when you put supposedly neutral ‘data’ on the table, is that everyone pounces on the bits that support their preferences and ignores the rest!

    3. Sorry, me again! And the thing both conversations helped me realize was that the important bit is having some kind of control/comparator (obvious I know). So we take a control/comparable population, but then do qualitative/institutional/complexity type things in both, and that helps us a bit with attribution. I assume that we are doing this already (we usually are….)

  6. Nice to see a mention for the ‘complexity people’ here.

    Also, don’t miss out on the opportunity to read the powerful (and occasionally hilarious) diatribe against J-PAL et al. by Princeton Economist Angus Deaton called ‘Instruments, randomization, and learning about development’.

    It’s on his website and is definitely worth reading.

  7. Duncan, Great provocative piece. I love it. Unfortunately, the embrace of diversity of ideas, meanings, and perspectives is an unrealistic goal in a field where one particular discipline has such a monopoly of power and resources. And it is also a particular discipline that prizes “elegant” simplicity. If I could make one change at the World Bank, where I freelance, it would be the creation of an office for a “chief social scientist” with an expert staff and budget at least equal to the chief economist’s. Imagine if the two had to cooperate but rotate on managing the WDRs? While that would not take care of the fact that there are numerous non-economic social science disciplines, or the need for more humanities in our models, it would darn sure shake things up around at least some tables there.

  8. Re: Duncan’s comment #10… If you have a control group, the sane way to assign people to that group is randomly in order that confounding variables be randomised away. Then, whatever evaluation technique you choose to apply to assess the outcome, you just performed an RCT?

    Also, from the main article, the claim is made that RCTs are ineffectual because ‘interventions are typically multiple, overlapping and synergistic’. That’s exactly the point of proper randomisation: if a large enough sample is chosen genuinely randomly, the effects of any other interventions being enacted are also randomised away, and you can look at the effects of each of them individually.

  9. Sadly all too often a case of “who pays the piper calls the tune”. Increasingly am finding donors “know the cost of everything and the value of nothing”.

    Plus My sense is many field based workers still find whole M&E process something they do for others, a hurried add on to keep a donor happy and not an intrinsic part of their “learning and development”.

    Add to that we need success stories to raise monies, convince public and politicians then perhaps no great suprise that outside of policy wonk land “accountability” often reduced to a PR exercise to reflect the values of those who “fund/control” rathter than the life experiences of those we seek to work with.

    Seems to me challenging and lobbying donors is a vital part of our work but when we want their money not something we always do with the
    vitailty that is required.

  10. Plenty of external validity issues on Earth, by the way. For example, the Earth’s gravitational pull is weaker in Singapore than in Scotland (planet not spherical). So physicists look for laws which explain both results.
    Just like (good) development economists do.

  11. Though I understand the problems associated with leaving M&E until a project is complete, I’m concerned that more responsive feedback systems could lead to premature change. Sometimes change unfolds at a glacial pace, and an instant feedback system may not be sensitive to this reality.

  12. To Andrew: The issue is that your definition of “proper randomization” is unfeasible for most complex challenges and interventions in development, as the proper size would be tremendous, or the item to be sampled (e.g. national parliaments in Muslim countries of South Asia) does not have a large enough number on earth.

    Picking examples from the effect of girls’ education on ecosystem management to the efficacy of collaborative governance assessments in improving equity in service delivery, the topics that one sees in a donor agency are too broad. The randomista response has, unfortunately, often been to then focus on whether a given approach to increasing girls’ attendance does so successfully, or whether a collaborative meeting or a government audit results in local authorities allocating more funding for a given budget category.

    Answering these questions often does not significantly advance a learning agenda – though it can often advance knowledge in incremental ways.

    The pushback to the RCT emphasis is basically on opportunity cost – that for those who wish to apply rigor to learning in service of identifying what works and deserves support or replication, there are sufficiently rigorous methods to the task that advance learning more quickly than RCTs because they can engage with problems in a different way.

    Or, to put it another way, let’s not look for our keys under the RCT streetlight because that’s where we can most clearly see.

  13. “In contrast, decades of experience were not even credited as ‘evidence’, but often written off as ‘opinion’. It felt like we were in danger of discounting our richest source of insight – gut feeling.”
    Having read Robyn Dawes’ “House of Card”, that sounds like a fantastic development!

  14. Dear Duncan,
    Very good post that I discover a bit late.
    I also feel that we (although I am an exOxfam, I still say “we” and use the present tense…), we still have to find the right way to respond to that growing demand for “numbers” (from institutional donors and also from the general public).
    We should at least try to avoid the endless cycle of always providing more space (and time) for quantity and always less space (and time) for quality. There must be another innovative way for a great organisation like Oxfam to lead.
    Please, read that early XXth century book : “The Reign of Quantity and the Signs of the Time” from excellent french writter René Guénon.

Leave a comment