The Devil Is In The Data

(Deborah Hardoon, Senior Researcher at Oxfam, unpicks some of the World Bank findings presented last week during their annuals meetings. Its a long post. Bear with us this time and read until the end. Theres lots of information and, after a long back and forth, we decided it was better in one piece than in two separate posts. Follow Deborah on Twitter @DeborahHardoon.

Last week the World Bank released its Global Monitoring Report (GMR) 2014/2015. The report contains 240 pages of data, charts and analysis covering the progress towards the twin goals of ending poverty and boosting shared prosperity, the Millennium Development Goals and the broader global economic outlook (IMF). This is a great resource, providing data and insights on critical measures of development and all complemented by nice infographics and an easy to navigate website. Nice, huh?

But most of us do not have the time to dig through the 240 pages or hang out on the website hovering over charts.  Rather, we trust that the World Bank is full of smart people and as the responsible guardians of so much development data, we skip straight to the findings nicely summarized in a forward and signed off by the President of the bank and the MD of the IMF.

The forward includes the following ‘remarkable finding’:

Preliminary work on shared prosperity has led to remarkable findings. In 58 out of 86 countries for which the Global Monitoring Report had adequate data, incomes among the poorest 40 percent grew faster than for the population as a whole between 2006 and 2011. In 13 countries, income or consumption of the poorest 40 percent grew by more than 7 percent annually during this period.

The World Bank says (therefore it must be true) that in an important majority of countries the incomes of the poorest people are growing faster than the richer people, they are catching up, inequality is reducing. Brilliant! But wait a minute, this is all rather surprising given the growing body of evidence from all over the world to the contrary, evidence that finds that in fact the incomes of the richest people are growing faster than the rest of us. Evidence that finds that the levels of economic inequality we see around the world today are extreme and growing. In the US, 95% of the growth post recession in the US has been captured by the richest 1%. In India, where poverty remains stubbornly high, the wealth of billionaires continues to grow. In a world where last year the wealth of just 85 people is the same as that of half the planet. An economist has become a rockstar over this trend!

So do we have a problem with growing inequality or not? To find out, let’s unpack this ‘remarkable finding’ by finding out what the World Bank are measuring and how. So here’s the finding in 5 simple steps:

  1. Firstly, the World Bank use household survey data for 86 countries that provides information on how much different people earn, from which they calculate the average income of each decile (10%) of the population
  2. These surveys are not conducted every year in every country, so they use data from the most recent period of 5 years for which data is available, that still falls within the recent 2002-2012 period.
  3. They have then extracted the average income of the poorest 4 deciles, (the bottom 40% of the population) and calculate the growth (or decline) of this income over the 5 year period, call this % change ‘A’.
  4. This is then compared against the change in average income of the whole country over this same period, call this % change ‘B’.
  5. When A is greater than B (as it is in 58 of the 86 countries), the poorest people in the country are increasing their incomes at a faster rate than the average, thereby catching up with everyone else, reducing the gap between the poor and the rest and reducing inequality. Remarkable.

Ok so far, this totally makes sense. Conceptually it captures the dynamics of reducing inequality by way of the poorest people catching up (and indeed is the thinking behind the proposed target for inequality for the future Sustainable Development Goals 10.1). Methodologically it’s pretty straightforward and based on real data.

But because I find the result so surprising, I’m just going to do a bit more digging and unpack three issues hidden within these calculations, all which have an important bearing on the result. Let the data crunching commence:

Firstly, let’s look at the data source – Household survey data

Household surveys provide a wealth of information about a population, but it is well established that they do not capture information about the very top of the distribution and that supplementary data on this group is needed to get the full picture of income distribution in a country. In fact, research that is available on the top of the distribution does find that incomes for this group of people are pulling away and leaving the rest of us behind with a smaller share of the economic pie.

Household data is also expensive and complicated to collect. As a result, we do not have data for all countries for all years and in fact, for the purposes of this calculation there is only sufficient data of this nature for 86 countries in the world. A sample of 86 countries captures less than half of the countries in the world. Exactly which countries that are included in this sample is important, particularly in the context of a report to measure development progress. Sixteen countries from sub-saharan Africa are included in the 86. But there are more than 50 countries in Africa where more than a billion people live and, where poverty remains stubbornly high. Seventeen countries from Latin America are included in the 86, one more than from Africa, yet there are only 21 independent countries in Latin America and a population of approximately half that of Africa (600million), and this is also a region that has made notable progress in recent years in reducing poverty and inequality, as the World Bank acknowledges.

Secondly, the time frame for the analysis – 5 years, between 2002 and 2012

5 years is a very short period of time with which to establish a ‘trend’. In fact, when we look at the same data over a longer period, the trends look very different. The World Bank present this longer term data for several countries in the report, with some notable differences from the results for these same countries when looking just at the five year period. In Sri Lanka for example, it is clear from the longer term chart that the growth rate of the bottom 40% has been consistently lower than growth of average incomes for the last decade, yet data over the shorter time frame finds the bottom 40% growing faster. The opposite is true of the data on Uganda, here the World bank finds that ‘the average income of the bottom 40 percent has increased over time, and at rates that were equal or higher than the national average, yet Uganda features as one of the 18 countries where the income of the bottom 40% grew at a rate slower than the average. To their credit, in the policy report that accompanies the GMR, the World Bank go into some detail explaining the sensitivities to time, recognizing the importance of the choice of time period and how it can affect the results.

Not only can such a narrow time horizon limit the degree to which the finding can be determined as a trend, but also anomalous events that are not representative of a trend can determine the results and therefore be misleading. This is particularly the case where there are just two data points available for this period (one survey conducted at the start and one at the end) and where we know that anomalous events, or ‘shocks’ occurred. The global economic crisis is one such shock. We know that between 2008 and 2009 investment and output crumbled, wealth was destroyed as property prices and share values plummeted. People with lots of assets lost substantial amounts of wealth. The poorest people had less to loose.  Such a massive wholesale economic shock will affect the data, but the impact on the data is not typical of the long term economic trajectory. The fact that data has been used that is in and around 2008/2009 in some cases with these as reference years, may tell us something about the immediate impact of the crisis, but does not tell us about past trends or importantly prospects for distribution trends in the future.

Finally, let’s take a look at the unit of analysis – the bottom 40% versus the rest

Inequality and shared prosperity requires that we look at the whole of the economic pie and the whole of the distribution. So why do these calculations focus only on the bottom 40%? The bottom 40% whilst being a constant proportion of the population in each country, the makeup of this group can look very different.  Indeed the World Bank acknowledges this variance in terms of with this great chart (figure 7). In Bolivia the bottom 40% captures more or less everyone below the national poverty line (45% population), which therefore makes sense as a unit of analysis. The World Bank finds that this group grew remarkably faster than the average for the 5 year period. However looking more closely at the data and bearing in mind that whilst still poor, the people in the 4th decile have almost 9 times the income of those in the bottom 10%, the poorest 10% of people actually had a lower share of national income in 2008 (0.44% – most recent data) than they did in 2005 (0.46%), and substantially less than in 1990 (2.32%).  In Ethiopia, India and Laos the bottom 40% captures less than half of the people that are poor and vulnerable so focusing just on this group misses important groups that remain in poverty. In fact, in these countries, the bottom 40% grew at a rate slower than the average, but when we look at data for all deciles, in Ethiopia, only the top 30% actually grew at a rate higher than the average, in Laos the growth of income share was captured by the top 20% and in India only the top 10% increased their share of income.

Picking different comparison groups can clearly yield very different results. The OECD recently took the very ends of the distribution and compared the household disposable incomes of the top 10% with the bottom 10%. In Mexico, Ireland and Estonia, the poorest 10% saw their incomes fall significantly, declining at a much greater rate than the average decline in the economy. All these three countries are found by the World Bank stats to have a growth rate of the bottom 40% above that of the national average, over a similar period. The attention to the bottom 40% only misses much of what is actually happening in the income distribution.


Summary – So if you scrolled straight down to this point, you missed the point of this post. Whilst summaries are handy snippets easily read on your smart phone on a bus, without properly understanding all the inputs and workings out that got us to the summary, you will have no idea whether the conclusions are remarkable or not.

But more critically, we know that how progress is measured will matter. This is stated clearly on page 2 of the World Banks accompanying policy research report. Using incomplete data for a subset of countries using arbitrary cut off points can be at best misleading and at worse result in policies and actions that fail to respond to real development dynamics and priorities. So to the World Bank, my request is to invest more in gathering more and better data before publishing findings.


2 thoughts on “The Devil Is In The Data

  1. Claire Godfrey

    Great blog Deborah. Just a few thoughts:

    - We’re talking about the accelerating wealth accumulation of the top 1% or 0.1% when we focus on extreme inequality. Don’t we want data gathered on that group – the top 85 etc – and the pace of their widening gap even from the rest of the top 10% – and of course the bottom 10%. But, of course because of financial secrecy etc. it’s so difficult to access accurate information about this small number.

    - If you needed to look at redistribution, and examine government policies on redistribution, I’d hazard a guess that it’s the 20-80 decile range that policy targets ie the middle classes take the heaviest tax burden, but the benefits – health systems, roads to markets, irrigation etc. don’t necessarily fall to the poorest or those who bear the heaviest tax burden? It would be good to have data available that could be used to investigate this further. This is surely within the remit of the WB to do this kind of study?

    - The MDGs framework provided a useful tool for establishing indicators for governments and donors, development agencies etc. to base their programming and data collection on. We need to bear this in mind for the SDGs. What are the best targets and indicators to generate worthwhile data?

  2. Vamsee Kanchi

    High-quality data must be a core component of robust analysis of the World Bank Group’s Twin Goals. Peter Lanjouw, author of the Policy Research Report mentioned in the blog post above, and Ana Revenga, Senior Director of the Poverty Global Practice, separately wrote blog posts last week on many of the concerns raised. The links are as follows:

    Parsing the challenges of measuring poverty and shared prosperity

    Making data work for everyone


Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>