Ethnicity coding in English health service datasets


Published: 10/12/2020

Project status: Current

Aim of the project

Understanding inequalities in health, and access to and outcomes of healthcare services, between ethnic groups relies upon high quality ethnicity coding in patient records.

This project is being undertaken in collaboration with the NHS Race and Health Observatory.  We will examine the completeness, validity, and consistency of ethnicity coding within NHS health datasets in England (excluding GP records), in order to establish the extent and nature of data quality issues.  This will provide the basis for action to improve data quality, and to inform more robust analysis and reporting of ethnic inequalities.

Why it's important

This work is timely and of paramount importance in the wake of the priority given to analysing access and health outcomes in ethnic minority groups, given the disproportionate impact that the Covid-19 pandemic has had on these communities. Moreover, the government is introducing mandatory ethnicity recording in death certification, with ethnicity being taken from health records. It is therefore critically important that ethnicity data is fit-for-purpose. Without this, it will not be possible to assess the health status of ethnic minority groups and inequalities between ethnic groups reliably. Current evidence suggests significant data quality issues with ethnicity coding in health records and systematic biases in the data. 

This report will provide a thorough assessment of the quality of ethnicity coding in NHS health records (excluding GP records).  

What we'll do

We will conduct a descriptive analysis of NHS hospital datasets and the NHS community services dataset to assess:

  • completeness – what proportion of records have an ethnic group coded?
  • validity – what proportion of records have a valid ethnic code as per the official classification of ethnic categories?
  • consistency - are there discrepancies in the ethnicity recorded for patients who have multiple health records? This will be assessed for patients within the same data set, and across data sets
  • and to compare regional usage of healthcare services by age, gender and ethnic category, with population estimates of these groups provided by the Office for National Statistics.

We will examine coding for different groups of patients and services (for example, comparing NHS and independent providers of NHS-funded care, and elective and emergency care) to explore hypotheses for why ethnicity may be incomplete or incorrectly recorded. 

The data sets to be analysed include: inpatients, outpatients, accident and emergency attendances (including using the new Emergency Care Dataset), and community services contacts

Project outputs

We will produce a briefing for users of these data sets setting out:

  • The scale of data quality issues with ethnicity coding
  • The implications for analysis and decision making about ethnicity and health
  • Recommendations for action to improve ethnicity coding and use of data
  • Recommendations for the research agenda on ethnicity and health

Our target audiences for this work include policymakers, the Race Disparity Unit, academic and other researchers, generators and users of data in the NHS and government, statistics regulators, the media, patients and the public.


The project will aim to report in Spring 2021.

Preliminary findings from our work were discussed with stakeholders, including Public Health England and the Office for National Statistics, at a workshop at the end of March 2020. See notes here.

Further information

Please contact for further information about the project.