Longitudinal Education Outcomes (LEO)

Introduction to LEO

The Longitudinal Education Outcomes (LEO) dataset is an administrative database that contains de-identified individual-level information on the employment and earnings of approximately 39 million learners who have attended schools, further education colleges and HE providers in England (DfE, 2024b).1 This allows tracking of students from compulsory school age into HE and the labour market. LEO includes records for anyone born since 1985 who attended school in England. Learners from Wales or Scotland appear in LEO only once they enrol in an English institution, via the NPD or ILR.

The LEO database comprises of different linked data sources (Figure 1)234combining FE and HE student records (NPD, ILR and HESA) with the Department for Work and Pensions (DWP) and His Majesty’s Revenue and Customs (HMRC) earnings, benefits and tax data, with its most recent expansion including linked data sources from the Universities and College Admissions Service (UCAS, 18-year-old English applicants only), the Inter-Departmental Business Register (IDBR) and COVID-19 furlough data (ADR UK, 2023).

The purpose of the LEO database is to facilitate evidence-informed decision-making about educational policy and practice based on retrospective data from English learners in order to enhance the lifetime outcomes of current and future learners.

Figure 1. Data structure of the LEO I2 (Iteration 2 Standard Extract) (UK, 2023)

Overview of the data available in LEO

Population covered Individuals who have enrolled in the English education system
Unit of observation Individual-level
Key variables

Personal characteristics5, such as FSM status, age, gender, ethnicity and POLAR quintile (Participation Of Local Area)6.

Educational history, such as schools, further education (FE), Alternative Providers (APs)7 and HE provider attended,8 prior school attainment, domicile status (UK, EU and overseas), course studied, predicted grades of HE applicants, number of offers received by applicants, and qualification level achieved.

Employment status (including sector type and size), benefits claimed, industry of employment, and earnings (available from 2003/04).

COVID-19 furlough – whether in receipt of either the Coronavirus Job Retention Scheme (CJRS) or the Self-Employment Income Support Scheme (SEISS) in the 2020/21 tax year.

Movement of graduates throughout the entire cycle of study between their home region (before entering HE), the region where they studied, and their current region (after completing their course).

Years available Employment (HMRC PAYE) available (1997/1998 to 2020/21)
Benefits (1999/00 to 2020/21)
HESA (2004/05 to 2019/20)
NPD (2001/02 to 2020/21, with earlier collections subject to the dataset)9

Limitations and exclusions

The LEO database does not collect information on:

  • The contractual type of employment (e.g. permanent, fixed-term or part-time/full-time), hours worked, or occupational roles. 

  • Graduates classified as in voluntary work or unpaid internships

  • Graduates working or living abroad after graduating because HMRC tax records only cover UK-based taxable income.

As a result, individuals emigrating or working abroad are excluded from earnings data, which can bias results for highly mobile groups.

Additionally, all datasets included in the LEO can be linked at the individual level to each other, but no other individual-level data can be linked to LEO.

Access to LEO

LEO data is published online in two aggregated formats, ‘Graduate and Postgraduate Outcomes’ and ‘Graduate Outcomes Provider Level Data’. Provider-level data is grouped by subject, graduate characteristics, and HE provider, while graduate-level data is aggregated by qualification level (bachelor’s, master’s, doctorate) (DfE, 2024a). Data grouping protects the anonymity of HE graduates but still allows policymakers and researchers to compare long-term employment and earning outcomes. The LEO aggregated data is published openly and can be accessed via the GOV.UK Data Catalogue, under the HE theme (DfE, n.d.).

Access to the LEO standard extract is highly restricted and requires ONS Accredited Researcher Status. Researchers must submit a full project application through the ONS Project Accreditation Service for SRS (PASS)10 outlining the research aims, public benefit, methods and data requirements. The application must include the LEO I2SE Variable Request Form, specifying the required tables and variables, and may also require an NSDEC ethics self-assessment or equivalent institutional ethics approval depending on the project (ONS, n.d.). Those seeking access are advised to have proficiency in Structured Query Language (SQL). However, strong proficiency is not essential, as data can be exported from SQL into other statistical packages available in the Secure Research Service (SRS). Guidance can be found on GOV.UK (DfE, 2025)

Application of LEO in HE

The primary purpose of LEO is to connect individual’s education data with their employment, benefits and earnings data. Therefore, the LEO dataset provides a unique data source to understand the role the English education system has in shaping individual long-term employment outcomes.

Some LEO analysis is designed to inform policies to address inequality gaps in UK HE. For example, comparing median earnings at 3 and 10 years post-graduation for FSM-eligible versus non-FSM graduates reveals large gaps in earnings. Insights like these can inform targeted interventions such as bursaries, mentoring or other support schemes (i.e. transition support) for disadvantaged students. Further, breaking results down by qualification level or subject area can reveal pathways that yield the greatest earning returns and employability gains. Hence, by using the LEO database, policymakers can understand which groups to support, at what stage in their studies, and in which courses (ONS, 2022).

Alternatively, the LEO dataset can be used to explore the role of early attainment and post-16 pathways in shaping labour market outcomes, which can inform when careers advice and guidance may be most impactful for disadvantaged students. For example, the Education Policy Institute (EPI) used regression modelling with LEO data to examine how prior attainment, post-16 qualification routes, and disadvantage status are associated with earnings at age 25. Their findings showed that GCSE attainment explains a substantial share of the earnings gap between disadvantaged and non-disadvantaged young people, while post-16 qualification choices make an additional but smaller contribution. These patterns were more pronounced for women, for whom differences in attainment and qualification level explained a larger proportion of the earnings gap than for men (Cruikshanks & Robinson, 2025).

Unlike the NPD and HESA data, the LEO data cannot currently be matched with bespoke datasets collected by evaluators outside of the LEO system for the purposes of evaluation.11

Background

In 2021, TASO commissioned State of Life and Mime to research whether HE addresses existing equality gaps between advantaged and disadvantaged students. Using the Longitudinal Educational Outcomes (LEO) dataset. The study examined employment earnings and employment status among young people who pursued different educational pathways, defined as the highest level and type of qualification obtained nine years post-Key Stage 4 (KS4) (TASO, 2023).

Process for accessing LEO

The application process for accessing the LEO dataset is managed by the ONS. At the time this application was made, researchers were required to apply through the Research Accreditation Service (RAS), which preceded the current PASS/PPS systems. The Iteration 1 Standard Extract (LEO I1SE) was the strand of LEO available at that time.

To access the LEO standard extract, State of Life and Mime completed a full project application, which involved:

  • Describing the project purpose, research questions, proposed methods and the expected public benefit

  • Identifying funders and commissioners of the project, the research team including confirmation that all members requiring access were Accredited Researcher and specify the project timeline.

  • Provide a description of the data sources that will be linked (Table 3), and submitting the LEO I1SE variable request form specifying the required datasets and variables

  • Outlining the planned publication and dissemination strategy

  • Submitting the UK Statistics Authority data ethics self-assessment form as part of the application

Table 1. Overview of the sources of variables used in analysis undertaken by State of Life and Mime as research on whether HE addresses existing quality gaps between advantaged and disadvantaged students.
Outcome measure Data to be collected
National Pupil Database (NPD) Pupil characteristics (gender, eligibility for FSM ethnic group, Region of England, Special Educational Needs and Disabilities (SEND), KS4 attainment.
Individual Learners Record (ILR) Qualifications entered and achieved in Further Education (FE) and adult skills providers in England, including apprenticeships
Higher Education Statistics Agency (HESA) Information on course studied, HE provider and their achievement
Longitudinal Education Outcome (LEO) Earnings and employment status (in years 9 and 16 after KS4)

The LEO linkage process

State of Life and Mime requested access to four individual-level datasets that made up the LEO I1SE, these datasets were:

  • NPD for school related data

  • HESA for higher education data

  • ILR for further education and apprenticeship data

  • HMRC and DWP for employment and earnings data (Table 4)

This information was sent in the variable request form. As the data LEO is highly sensitive, de-identified, linked data is not released to the researchers at State of Life and Mime. Instead, all linkage is carried out centrally by the data owners (DfE, HMRC and DWP) before the LEO standard extract is made available in the ONS Secure Research Service. Identifiers within each administrative dataset are used internally to create a pseudonymised linkage key, and all direct identifiers are removed before release to State of Life and Mime researchers.

State of Life and Mime requested linkage for two cohorts of KS4 finishers in the NPD data (2002 and 2003), totalling 1,125,035 learners. These learners were then linked with their corresponding records across all four datasets. The linkage success was high, with over 95% of individuals successfully linked using characteristics such as gender and attainment as variables in the matching process .

Table 2. Outcomes measures for research undertaken by State of Life and Mime on whether HE addresses existing quality gaps between advantaged and disadvantaged students. *Note PAYE earning does not include any earnings from self-employment or outside the UK
Outcome measure Data to be collected Point of collection Sample
Primary: Earnings (9 years after KS4) Reported total PAYE UK earnings in relevant tax year, from the LEO dataset. This includes everyone with any PAYE earnings reported, and therefore will include both part-time and full-time employees. Nine years after KS4 - tax years 2010/2011 and 2011/2012 for the two cohorts, respectively. 2002 and 2003 cohorts, at the time of collection nine years after KS4 2010/2011 and 2011/12
Primary: Earnings (16 years after KS4) Reported total PAYE UK earnings in relevant tax year, from the LEO dataset. This includes everyone with any PAYE earnings reported, and therefore will include both part-time and full-time employees. 16 years after KS4 - corresponding to tax years 2017/2018 and 2018/2019 for the two cohorts respectively. 2002 and 2003 cohorts, at time of collection corresponding to tax years 2017/2018 and 2018/19 for the two cohorts respectively
Primary: Employment status (9 years after KS4) Person recorded as being employed in the UK at any point in the relevant tax year, from the LEO dataset. This includes any record of employment, regardless of length or nature of employment. Nine years after KS4 - tax years 2010/2011 and 2011/2012 for the two cohorts, respectively. 2002 and 2003 cohorts, at time of collection Nine years after KS4 2010/11 and 2011/2012
Primary: Employment status (16 years after KS4) Person recorded as being employed in the UK at any point in the relevant tax year, from the LEO dataset. This includes any record of employment, regardless of length or nature of employment. 16 years after KS4 - corresponding to tax years 2017/2018 and 2018/2019 for the two cohorts respectively. 2002 and 2003 cohorts, at time of collection corresponding to tax years 2017/2018 and 2018/19 for the two cohorts respectively

Ethical considerations

  • Anonymity – Any demographic groups less than 20 in size were suppressed and all groups were rounded to the nearest five people.
Back to top

References

ADR UK (2023) New longitudinal education outcomes data made available for public good research. https://www.adruk.org/news-publications/news-blogs/new-longitudinal-education-outcomes-data-made-available-for-public-good-research.
Cruikshanks, R. & Robinson, D. (2025) What you learn and what you earn: Educational choices and labour market outcomes. https://epi.org.uk/publications-and-research/what-you-learn-and-what-you-earn/.
DfE (2025) Apply to access the Longitudinal Education Outcomes (LEO) dataset Guidance. GOV.UK. https://www.gov.uk/guidance/apply-to-access-the-longitudinal-education-outcomes-leo-dataset.
DfE (n.d.) Find statistics and data. https://explore-education-statistics.service.gov.uk/find-statistics.
DfE (2024a) LEO Graduate and Postgraduate Outcomes, Tax year 2021-22. https://explore-education-statistics.service.gov.uk/find-statistics/leo-graduate-and-postgraduate-outcomes/2021-22.
DfE (2024b) Transparency data: Longitudinal Education Outcomes (LEO) data. https://www.gov.uk/government/publications/longitudinal-education-outcomes-leo-dataset/longitudinal-education-outcomes-leo-data.
ONS (n.d.) Access the data securely - office for national statistics. https://www.ons.gov.uk/aboutus/whatwedo/statistics/requestingstatistics/secureresearchservice/accessthedatasecurely.
ONS (2022) Why free school meal recipients earn less than their peers - office for national statistics. https://www.ons.gov.uk/peoplepopulationandcommunity/educationandchildcare/articles/whyfreeschoolmealrecipientsearnlessthantheirpeers/2022-08-04.
TASO (2023) The value of higher education. https://taso.org.uk/libraryitem/report-the-value-of-higher-education/.
UK, A. (2023) New Longitudinal Education Outcomes data made available for public good research. ADR UK. https://www.adruk.org/news-publications/news-blogs/new-longitudinal-education-outcomes-data-made-available-for-public-good-research.

Footnotes

  1. The full list of variables available in LEO is provided in the LEO I2SE variable request form, under Section 4: Using Longitudinal Education Outcomes (LEO) (ONS, 2024)↩︎

  2. There are two iterations of the LEO standard extract. The initial version LEO I1 (Iteration 1 Standard Extract) contained the NPD, ILR, HESA and HMRC and DWP. The most recent version is an extension of the LEO I1, with more years of data and additional linked datasets (Figure 1). ↩︎

  3. When researchers request access to LEO, they receive only the specific linked datasets relevant to their approved project, not the full LEO database. All data sources within LEO can be linked to each other at the individual level through shared pseudonymised identifiers, but no external individual-level datasets may be linked to LEO. Data can be linked at the school, college or university level but approval is granted only where the data owner determines that including these variables does not create an unacceptable disclosure risk. Decisions are made on a case-by-case basis.↩︎

  4. LEO linked data sources span different time periods; NPD from 2001/02 (exceptions of KS1 1997/98 and KS2 1995/96), ILR from 2002/03, HESA from 2004/05, DWP from 1990/00, HMRC from 1997/98, UCAS 2007/08, IDBR from 2004/05 and COVID-19 furlough data 2020/21 only. ↩︎

  5. These variables come from different parts of the LEO database, with some common across datasets (e.g., age, gender, ethnicity) and others unique to a single dataset.↩︎

  6. LEO does not provide postcodes, substitutes such as geographic identifiers are available such as LSOA or POLAR quintiles.↩︎

  7. The Research Accreditation Service (RAS) has now been replaced with two new services. The Project Accreditation Service for the Secure Research Service (PASS) is now used to apply for new SRS projects or request changes to existing ones. The People and Projects Service (PPS) is used to manage researcher accreditation applications for new researchers or those who are renewing and to submit an application for a project in the IDS Integrated Data Service (IDS)↩︎

  8. LEO does not store provider name, only anonymised identifiers and location (pseudonymised HESA provider ID), available from 2004/05–2019/20.↩︎

  9. LEO KS1 attainment data available from 1997/98 to 2011/12.
    KS2 attainment data available from 1995/96 to 2015/16.
    KS3 attainment data available from 1997/98 to 2012/13.
    KS4 attainment data available from 2001/02 to 2020/21.
    KS5 attainment data available from 2001/02 to 2020/21.
    ↩︎

  10. The Research Accreditation Service (RAS) has now been replaced with two new services. The Project Accreditation Service for the Secure Research Service (PASS) is now used to apply for new SRS projects or request changes to existing ones. The People and Projects Service (PPS) is used to manage researcher accreditation applications for new researchers or those who are renewing and to submit an application for a project in the IDS Integrated Data Service (IDS)↩︎

  11. Correct at the time of publishing in 2025↩︎