Administrative Database Guidance

using administrative data to support evaluation in higher education

Author

Nicholette Pollard-Odle

Published

April, 2026

Modified

March, 2026

What is administrative data? 

Personal data is routinely collected in our day-to-day lives. For instance, when a child enrols in school, information is collected about them, such as demographic characteristics, attendance, and attainment. This information is then used within schools for day-to-day operations (timetabling, class sizes, safeguarding), but this same data can be re-purposed for research to answer research questions, track long-term outcomes, or evaluate policies.

This type of information is known as administrative data because it is the secondary by-product of routine operations and administrative systems collected by public bodies or private organisations when people interact with those services (Office for National Statistics, 2024)

List of administrative datasets available and relevant to higher education (HE)
  1. National Pupil Database (NPD) – compiled by the Department for Education, contains information on pupils in state-funded schools in England. 

  2. Higher Education Statistics Agency (HESA) datasets – collected by Jisc, contains wide ranging information about the HE sector in the UK. 

  3. Longitudinal Education Outcomes data (LEO) – compiled by the Department for Education, contains the educational records of individuals linked to later employment, tax and benefit data to create a de-identified person level dataset. 

  4. The Universities and Colleges Admissions Service (UCAS) datasets, contain information on admissions, offers, and enrolment in further and HE institutions. 

  5. The Individualised Learner Record (ILR) – compiled by the Department for Education, contains information on students aged 16-18 years old and those aged 19 and older studying in vocational or further education in England.

Why is administrative data useful?

The primary advantage of administrative data is its high coverage1 and the large amounts of information it collects on individuals who interact with private or public services. This makes it particularly useful for drawing inferences about entire populations or specific subgroups that may be missed by other data collection methods, such as surveys. Furthermore, because of its population-level coverage, and longitudinal structure, studies that use administrative data are often better powered (ADR UK, n.d.).

Another advantage of administrative data is that it can be linked (combined) with other sources of information to build a more comprehensive dataset that enables researchers and policymakers to better understand societal issues they hope to address (National Centre for Social Research, n.d.). For instance, early years data can be linked with labour market data, which traditionally would be hosted in two databases. This provides insight into identifying critical windows where interventions can most effectively narrow lifetime inequalities, which, if not linked, could not be inferred from analysing only a single dataset. 

In addition, administrative data is collected across a range of topics. The breadth of variables and outcome measures available enables academics and other researchers to explore key social issues in ways surveys struggle to match. However, administrative data has drawbacks. For example, it may not include all the variables a researcher needs, such as people’s motivations about a particular decision or sensitive experiences that would be disclosed in interviews or surveys. In such cases, it often needs to be supplemented with other sources to fully meet research needs. Other issues exist with administrative data like quality issues (e.g., missing fields, miscoding) and a lack of a standardised approach in data collection among different administrative databases (Office for Statistics Regulation, n.d.). Therefore, it can be time-consuming to extract and process the relevant data required for statistical purposes due to inconsistent data structures. 

Who is this guide for? 

This guidance is for:

  • evaluators

  • academic researchers

  • delivery staff

  • and senior leaders interested in utilising administrative databases to evaluate interventions in the HE sector.

It is curated to support evaluation within the English HE system because education is devolved in the UK, and each country’s government publishes data on its separate system. Those interested in Welsh, Scottish, and Northern Ireland education statistics are advised to consult relevant local guidance. 

How is the guide organised?

This guide is divided into five sections. The first four sections focus on administrative data sources relevant to higher education (HE) research and policy including: the National Pupil Database (NPD) , the Higher Education Statistics Agency (HESA), Longitudinal Education Outcomes Data (LEO) and the University and College Admissions Service (UCAS). The fifth section provides an overview of the Higher Education Access Tracker (HEAT), which is a centralised tracking service that negotiates access to the majority of the variables covered in this administrative guidance resource. To quickly compare which datasets contain different variables, view the data catalogue.

The guide is designed to be accessible and practical, supporting users in understanding each dataset’s structure, access process and application to investigating HE inequalities. Table 1 provides a high-level summary of each data set’s population coverage, key variables2, years available3, and unit of observation. For more details, please visit the section for the relevant data set.

Table 1. Summary of administrative datasets
Administrative dataset
NPD
HESA
LEO
UCAS
Population covered Children who attended state-funded schools in England. Students, staff (including governors), and graduates who have obtained HE qualifications. Individuals who have enrolled in the English education system. Applicants to higher education in the UK.
Key variables Attainment data (KS1-5), absence, exclusions and children’s social care data. Subject studied, mode of study, award and degree classification, postcode, employment status, and occupation. Employment status, earnings, benefit claims, sector of employment, type and size of organisation. Applicant domicile, provider tariff band, achieved or predicted qualification points/grades, acceptance route, number of offers or choices, and year of entry.
Years available The school census is available from 2001/2 to present. Some components (KS 1-3) have longer coverage. The Student Record is available from 1994/95 and the Graduate Outcomes Survey is available from 2017/18. Education data is available from 2001/2. Some components (employment/ benefits) have longer coverage. 2007 to present.
Unit of observation Individual-level and school level (not class level). Individual level and provider level (for aggregate offshore record). Individual-level Individual level, by stage of the application cycle.
Back to top

References

Footnotes

  1. Coverage refers to how completely a dataset captures the population or group it is intended to represent.↩︎

  2. All administrative datasets collect demographic and socioeconomic information, but they vary in which ones are included. All record basic details such as name, postcode (except the LEO), date of birth, ethnicity, and gender or sex. Full details of the list of demographic characteristics can be found in each administrative dataset data tables.↩︎

  3. Not all components of the dataset are available for all the years below. To see the temporal coverage of different components of the dataset, see the detailed table in the section of this guide pertaining to that dataset.

    Further, please note that the years available refer only to the components of each administrative dataset that are relevant to understanding student pathways. Administrative databases are built from multiple underlying data collections, and thus datasets, some of which extend beyond the scope of this resource.  Users seeking full technical detail should consult official guidance documentation for their specific dataset of interest.↩︎