Ethnicity coding in English health service datasets

The Covid-19 pandemic has highlighted ethnic disparities in health care and outcomes in England, but data on the ethnicities of patients remains poor. This report looks at the quality and consistency of ethnicity coding within health datasets and calls on NHS England to provide new guidance for health service providers and GPs.

The Covid-19 pandemic has highlighted the extent and impact of ethnic disparities in health to communities, health services and government.

The pandemic has also demonstrated that the limited availability of ethnicity data and the quality of the data are reducing understanding of, ethnic inequalities, and the ability to identify effective responses. Current challenges range from the absence of ethnicity data in essential data sources such as death registrations (from which mortality statistics are derived), to poor coverage in primary care data, outdated ethnicity codes used within the NHS compared with those used in the 2011 and 2021 censuses, and systematic differences in ethnicity coding between White and minority ethnic groups. Effectively using currently available ethnicity data and improving the quality of the data are vital for identifying and addressing ethnic disparities in health.

For this report we have analysed the quality and consistency of ethnicity coding within widely used health datasets, in order to inform users of ethnicity data and identify the actions needed to improve the quality of the underlying data.

Along with providing insights for data users, the report sets out recommendations for policy-makers and organisations that generate and regulate health data.

Key points 

  • The Covid-19 pandemic has highlighted the extent and impact of ethnic disparities in health to communities, health services and government. However, poor data about ethnicity has obscured the true extent of ethnic disparities in the impact of the pandemic.
  • Many health related datasets do not routinely include ethnicity. Ethnicity recorded within hospital records is used instead, but mis-coding in hospital data mean that estimates of Covid-19 infections, hospitalisations and deaths could be over or under counted in minority ethnic and White groups.
  • Our analysis of the quality of ethnicity coding in hospital datasets found data quality problems including:
    • incomplete coding and inconsistent use of codes
    • an excessive and growing proportion of patients have ethnicity recorded as “not known”, “not stated” or “other” which impedes reliable analyses of ethnic differences, and
    • systematic biases in data quality - for example, data quality is worse in London, in adults of working age, and for patients with short hospital stays.
  • Importantly, data quality problems affect records for minority ethnic patients disproportionately.
  • The lack of comprehensive, high quality data on health and mortality by ethnicity is a significant obstacle to understanding ethnic inequalities in health, and therefore how the diverse health needs of different ethnic groups can be addressed.
  • Action is needed to improve data quality at source by developing and implementing up to date guidance on ethnicity coding for health service providers and GPs.
  • In the meantime, users of data need to be aware of problems with ethnicity coding, and analysis and reporting of ethnicity data quality issues is essential.

Report

Published: 7/06/2021

Download the report [PDF 592KB]

Suggested citation

Scobie S, Spencer J, Raleigh V (2021) Ethnicity coding in English health service datasets. Research report, Nuffield Trust