Enhancing our disease prediction capabilities with MILTON

New cutting-edge machine learning research tool enables early disease detection for drug discovery and beyond

Home / R&D / Our technologies / Machine learning

Machine learning enabling patient-centric discovery

In recent years, drug discovery has been transformed by integrating machine learning and artificial intelligence (AI) with vast datasets from human biobanks.

What is machine learning?

Machine learning is a division of AI that uses models to learn from data to detect patterns and make predictions, rather than by receiving explicit programming instruction. The algorithums also adapt in response to new data and experiences to improve efficacy over time.

Traditional drug development relies on hypothesis-driven approaches – where specific targets and mechanisms were identified and validated through preclinical studies – often carrying uncertainties about the translatability of the discoveries in the patient setting. Through access to datasets from large human populations samples that contain comprehensive biomarker and health information from real people, contemporary R&D benefits from a human-first hypothesis-agnostic approach to discovery. The result is a more accurate identification of human disease drivers and the ability to develop more targeted, effective medicines supported by causal human evidence.

Introducing MILTON: Enhancing drug discovery and paving the way for preventative health care

Recently published in Nature Genetics, we profile our newest machine learning-based research tool MILTON (MachIne Learning with phenoType associatONs).¹ Developed to strengthen case-control associations statistics from large population-based cohort studies, researchers from our Centre for Genomics Research created this AI tool to identify individuals who might be incorrectly classified in these studies. MILTON's ability to reclassify individuals among controls to putative cases has demonstrably enhanced the statistical power of the genetic association analyses.

While larger sample sizes remain essential for statistical confidence in genetic discoveries, MILTON offers a novel approach to maximise the value of existing samples, helping to derive meaningful signals from random noise. This innovation has the potential to expand the scope and accuracy of new gene discovery for hundreds of diseases.

Slavé Petrovski Head of Centre for Genomics Research, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca

Predicting disease onset before diagnosis with MILTON

During its development, researchers recognised that, by design, MILTON was also highly effective in predicting individuals at risk of developing future disease diagnoses, extending its utility to early disease detection. This dual capability not only augments case-control genomics studies but also shows potential to shape early diagnostic and intervention strategies to redefine the landscape of preventative healthcare.*

With many diseases, the ability to intervene and treat early is critical to enhancing patient outcomes. However, many complex diseases are diagnosed only after clinical symptoms appear. Using machine learning to integrate information about an individual’s molecular profile could enable detection of disease at an earlier and sometimes more treatable stage to tailor preventative or early intervention therapies to those most likely to benefit.

By embracing the data, following the science and pushing the boundaries, we've turned a genomic study research tool into a potential game-changer for disease prevention and early intervention that will continue to improve as training datasets increase in genomic diversity alongside biomarker breadth and depth.

Dimitrios Vitsios Director Data Science & Genomics, Centre for Genomics Research, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca

MILTON shows enhanced predictive power compared to currently available disease prediction methods

MILTON was trained on a comprehensive set of 67 routine clinical biomarkers, including blood biochemistry, blood count, urine assays, spirometry, body size measures, and blood pressure from nearly 500,000 participants in the UK Biobank. A subset of models also included measurements from 3,000 plasma proteins – information from proteins circulating in the blood – which notably enhanced prediction accuracy for numerous diseases, demonstrating the potential power of integrating multi-omic data to inform disease prediction. The tool's performance was benchmarked against polygenic risk scores, with MILTON showing superior predictive capabilities for the majority of the studied diseases.

Putting the power of MILTON on a global scale

As a leader in open science and a dedication to genomic equity, we are empowering researchers around the world with accessible genetic insights to develop more inclusive and impactful medical solutions.

We are committed to open science

We have an unwavering commitment to data democratisation, and are proud to make all gene-disease associations and predictive biomarker collections available to the public for research use through our interactive portal, ensuring that researchers and healthcare professionals worldwide can access and utilise the insights derived from MILTON.

Click here to discover the power of MILTON through our interactive portal

We are committed to genomic equity

Insights from genetic studies that include genomic diversity will lead to more inclusive and impactful medical solutions that benefit global health care. Though trained on the UK Biobank, MILTON-augmented genetic discoveries were validated using the independent FinnGen biobank, showcasing its value across biobanks with potential to be applied to any genomic ancestry.

Learn more about our global commitment to inclusion and diversity

*MILTON technology does not at present have the capability to be used in clinic for early diagnosis of disease in patients.

**MILTON analysed 3,200 diseases and achieved a high level of predictability for 1,091 diseases (AUC above 0.7) and excellent predictive performance for 121 diseases (AUC above 0.9). AUC (area under the curve) is the gold standard for assessing machine learning performance and ranges in value from value from 0 to 1. A model where predictions are 100% wrong has an AUC of 0.0; one where predictions are 100% correct has an AUC of 1.0.

Technologies

Reference:

1. Garg, M., Karpinski, M., Matelska, D., et al. Disease prediction with multi-omics and biomarkers empowers case–control genetic discoveries in the UK Biobank. Nat Genet (2024). https://doi.org/10.1038/s41588-024-01898-1