Building empirical knowledge in ethnically diverse countries: persistent data challenges

Extracts from Canelas, C. & Gisselquist, R.M. (2018) Horizontal inequality and data challenges. WIDER Working Paper 2018/55. Helsinki: UNU-WIDER

Stewart (2008: 3) defines horizontal inequalities as ‘inequalities in economic, social or political dimensions or cultural status between culturally defined groups’. Consistent with much of the literature, we treat ‘culturally defined groups’ here as generally equivalent to ‘ethnic’ groups, broadly defined. ‘Ethnic’ as understood here includes categories based on ascriptive attributes such as skin colour, native language, tribe, caste, religion, and sometimes region (Chandra 2004; Horowitz 1985; Htun 2004). We also consider ascriptive groups in a broader sense to include gender.

In terms of filling empirical gaps on horizontal inequality using survey and census data, two points are clear from our research:

  • First, more can be learned from rigorous re-examination of existing datasets and this can help us to fill (some) empirical gaps.

  • Second, our country-focused studies show clearly that, even with such focused analysis, it is not uncommon for significant gaps to remain because the information on ascriptive groups that is needed for the production of valid and reliable quantitative measures of horizontal inequality is unavailable in survey and census data.

    Consider, for one, Tanzania. Standard quantitative measures of ethnic diversity around the world list Tanzania as among the most (if not the most) ethnically diverse countries in the world (see Alesina et al. 2003). However, limited data on ‘ethnicity’ and religion are available from the Tanzanian census and standard surveys, such as the DHS.

Some gaps in the extant data on ‘ethnic’ groups can be traced to issues common across multiple areas, but we find three sets of challenges in particular problematic in terms of survey and census data for ‘ethnic’ groups broadly defined.

  • The first set of challenges are ‘methodological’ and largely particular to small minority populations. With normal sampling procedures, nationally representative surveys could miss such populations or provide insufficient data on them to produce representative samples. Moreover, it is not unusual for small minority populations to be located in remote areas, or to be unable to speak the dominant national language, which can add expense to data collection and pose methodological and practical challenges for survey and census enumerators.

  • A second set of issues stem from the conceptual challenge of capturing ‘ethnic’ identities and groups. For one, there are multiple such groups in a society that have political, social, and economic salience (see Laitin 1986; Posner 2005). How do we decide where our attention should be focused? For instance, in the SIR, McDoom et al. (2018) discuss how the groups that are salient in socio-political terms in the Philippines in fact differ from those in the census. The Philippine census in 2000 identified 147 ethno-linguistic groups and 93 religions (in 2010, 182 and 97  respectively). However, McDoom et al. (2018) make the case on the basis of socio-political salience for analysing horizontal inequality with reference to only three ethno-religious groups—Muslims, indigenous persons, and everyone else—thus reclassifying the categories listed in the census into these groupings for their analysis. Such detailed consideration of group salience is clearly within the scope of a country-focused study, but it can pose greater challenges for cross-national data efforts. … To add further complexity, it is not simply that ethnic groups and boundaries shift over time or that different groups are salient in different spheres of life, but also that individual ethnic identifications vary across contexts (e.g. Okamura 1981; Posner 2017). In other words, in responding to surveys or the census, individuals may self-identify in ethnic terms differently depending, for instance, on the way in which questions are asked and the choice of options they are given.

  • A third set of challenges stems from the political salience of ethnicity and the fact that data are political— and ethnic data can be especially so. This relates, for one, to use of census and survey data by governments in the distribution of public resources. For instance, population figures may be taken into account in budget allocations and ethnic data in particular may be used when transfers are linked to ethnicity (e.g. support for disadvantaged minorities). Moreover, simple ethnic statistics can have implications for perceptions of power and for political manoeuvring among groups in and outside of government. In a majoritarian system, for instance, whether a group is 49 or 51 per cent of the population is extremely important. Census data also have implications for political representation—for example, the number of legislative seats from particular regions are regularly tied to the number of people who live there, which is determined by the census. Ethnic groups are often concentrated in particular administrative regions. In some countries, political representation is directly linked to the relative population shares of ethnic groups, such as via ethnic quotas.

Finally, the act of compiling such information, especially in official sources such as the census, could be—or could be seen as—nationally divisive (Lieberman and Singh 2016). In Rwanda, for instance, where official ethnic identification was a major component of the genocide, the official line has become: ‘There is no ethnicity here. We are all Rwandan’ (Lacey 2004). More broadly, ‘official’ projects like the census lend such subnational identities official legitimacy and could have an impact on their continued salience (see, e.g. Hochschild and Powell 2008; Mazumder 2018). In many countries, such as Tanzania, for instance, the building of a national identity—that supersedes subnational ethnic identities—has been seen as a government priority and one with important links to national development (Campbell 1999).

Source: Horizontal inequality and data challenges