Machine Learning Approaches in Microbial Genomics for Pathogen Identification
and Antimicrobial Resistance Prediction
Emmanuel Nkansah1, Micheal Abimbola Oladosu2*, Moses Adondua
Abah3, Abimbola Mary Oluwajembola2, Fwangmun Ezekiel Gushit4,
Olaide Ayokunmi Oladosu5, Adesola Esther Adeneye6 and Bukola
Oluwaseyi Olufosoye7
1Department of Accounting, Economics and Finance, School of Business, La Sierra University, Riverside, CA, USA
2Department of Chemical Sciences, Faculty of Science, Anchor University, Ayobo, Ipaja, Lagos, Nigeria
3Department of Biochemistry, Faculty of Pure and Applied Sciences, Federal University of Wukari, Wukari, Taraba State, Nigeria
4Department of Public Health, Faculty of Health Science, Ahmadu Bello University, Zaria, Kaduna State, Nigeria
5Department of Computer Science, Faculty of Science and Technology, Babcock University, Ilishan, Nigeria
6Department of Biological Sciences, Faculty of Science, Anchor University, Ayobo, Ipaja, Lagos, Nigeria
7Department of Medical Microbiology, Faculty of Medical Laboratory Sciences, Ambrose Alli University, Ekpoma, Edo State, Nigeria
*Corresponding Author: Micheal Abimbola Oladosu, Department of Chemical
Sciences, Faculty of Science, Anchor University, Ayobo, Ipaja, Lagos, Nigeria.
Received:
November 30, 2025; Published: March 31, 2026
Abstract
The emergence and rapid spread of antimicrobial resistance (AMR) pose a critical threat to global public health, necessitating
innovative approaches for pathogen identification and resistance prediction. Machine learning (ML) has revolutionised microbial
genomics by enabling rapid, accurate analysis of vast genomic datasets to predict AMR phenotypes and identify pathogens. This
review examines recent advances in ML applications for microbial genomics, focusing on supervised and unsupervised learning
algorithms, deep learning architectures, and their integration with whole-genome sequencing (WGS) data. We discuss the performance
of various ML models, including random forests, support vector machines, convolutional neural networks, and ensemble methods
in predicting antimicrobial resistance across different bacterial species. The review highlights challenges in model interpretability,
data quality, and clinical implementation while exploring emerging trends in federated learning and transfer learning approaches.
Understanding these computational methodologies is essential for developing rapid diagnostic tools and informing antimicrobial
stewardship programs in clinical and pharmaceutical settings.
Keywords: Machine Learning; Antimicrobial Resistance; Pathogen Identification; Genomics; Whole-Genome Sequencing; Deep
Learning
References
- Antimicrobial Resistance Collaborators. “Global burden of bacterial antimicrobial resistance in 2019: a systematic analysis”. Lancet10325 (2022): 629-655.
- Schlaberg R., et al. “Validation of metagenomic next-generation sequencing tests for universal pathogen detection”. Archives of Pathology and Laboratory Medicine12 (2020): 1423-1432.
- Hendriksen RS., et al. “Global monitoring of antimicrobial resistance based on metagenomics analyses of urban sewage”. Nature Communication1 (2019): 1124.
- Arango-Argoty G., et al. “DeepARG: a deep learning approach for predicting antibiotic resistance genes from metagenomic data”. Microbiome 1 (2020): 108.
- Wheeler NE., et al. “Machine learning identifies signatures of host adaptation in the bacterial pathogen Salmonella enterica”. PLoS Genetics5 (2020): e1008850.
- Macesic N., et al. “Machine learning: novel bioinformatics approaches for combating antimicrobial resistance”. Current Opinion in Infectious Diseases5 (2020): 382-388.
- Sundermann AJ., et al. “Whole-genome sequencing surveillance and machine learning of the electronic health record for enhanced healthcare outbreak detection”. Clinical Infectious Disease3 (2022): 476-482.
- Khaledi A., et al. “Predicting antimicrobial resistance in Pseudomonas aeruginosa with machine learning-enabled molecular diagnostics”. EMBO Molecular Medicine3 (2020): e10264.
- Su M., et al. “Genome-based prediction of bacterial antibiotic resistance”. Journal of Clinical Microbiology3 (2019): e01405-18.
- Yang Y., et al. “Machine learning for classifying tuberculosis drug-resistance from DNA sequencing data”. Bioinformatics10 (2018): 1666-1671.
- Avershina E and Ahmad R. “Machine learning approach for phenotype prediction from metagenomic composition”. Artificial Intelligence in Medicine 111 (2021): 101998.
- Kuang X., et al. “Accurate and rapid prediction of tuberculosis drug resistance from genome sequence data using traditional machine learning algorithms and CNN”. Scientific Report1 (2022): 2427.
- Peiffer-Smadja N., et al. “Machine learning in the clinical microbiology laboratory: has the time come for routine practice?” Clinical Microbiology and Infection10 (2020): 1300-1309.
- Kim J., et al. “VAMPr: VAriant Mapping and Prediction of antibiotic resistance via explainable features and machine learning”. PLOS Computational Biology1 (2020): e1007511.
- Nguyen M., et al. “Predicting antimicrobial resistance using conserved genes”. PLOS Computational Biology11 (2020): e1008319.
- Eyre DW., et al. “WGS to predict antibiotic MICs for Neisseria gonorrhoeae”. Journal of Antimicrobial Chemotherapy 7 (2017): 1937-1947.
- Yang Y., et al. “Prediction of antimicrobial resistance in Mycobacterium tuberculosis using a graph convolutional network”. Computational and Structural Biotechnology Journal 19 (2021): 4096-4105.
- Peng C., et al. “A transformer-based model for antimicrobial resistance prediction from whole genome sequencing data”. Brief Bioinformation1 (2023): bbac543.
- Tian Y., et al. “Deep learning for antimicrobial resistance prediction: current applications and challenges”. Brief Bioinformation2 (2023): bbad024.
- Ji Y., et al. “DNABERT: pre-trained bidirectional encoder representations from transformers model for DNA-language in genome”. Bioinformatics 15 (2021): 2112-2120.
- Rives A., et al. “Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences”. PNAS15 (2021): e2016239118.
- Devlin J., et al. “BERT: Pre-training of deep bidirectional transformers for language understanding”. arXiv preprint arXiv:1810.04805 (2018).
- Nguyen LH and Holmes S. “Ten quick tips for effective dimensionality reduction”. PLOS Computational Biology6 (2019): e1006907.
- Pearman WS., et al. “The advantages and disadvantages of short- and long-read metagenomics to infer bacterial and eukaryotic community composition”. Annals of the New York Academy of Sciences1 (2020): 42-50.
- Liang Q., et al. “DeepMicrobes: taxonomic classification for metagenomics with deep learning”. NAR Genome Bioinformation1 (2020): lqaa009.
- Wolters M., et al. “Rapid molecular diagnostics of respiratory tract infections caused by multidrug-resistant bacteria”. Bundesgesundheitsblatt Gesundheitsforschung Gesundheitsschutz5 (2020): 601-608.
- Moura A., et al. “Whole genome-based population biology and epidemiological surveillance of Listeria monocytogenes”. Nature Microbiology 2 (2017): 16185.
- Snitkin ES., et al. “Tracking a hospital outbreak of carbapenem-resistant Klebsiella pneumoniae with whole-genome sequencing”. Science Translational Medicine148 (2012): 148ra116.
- Nguyen M., et al. “Using machine learning to predict antimicrobial MICs and associated genomic features for nontyphoidal Salmonella”. Journal of Clinical Microbiology2 (2019): e01260-18.
- Davis JJ., et al. “Antimicrobial resistance prediction in PATRIC and RAST”. Scientific Report 6 (2016): 27930.
- Hyun JC., et al. “Comparative pangenomics: analysis of 12 microbial pathogen pangenomes reveals conserved global structures of genetic and functional diversity”. BMC Genomics1 (2020): 7.
- Sheppard AE., et al. “Nested Russian doll-like genetic mobility drives rapid dissemination of the carbapenem resistance gene blaKPC”. Antimicrobe Agents Chemotherapy6 (2016): 3767-3778.
- Drouin A., et al. “Predictive computational phenotyping and biomarker discovery using reference-free genome comparisons”. BMC Genomics1 (2016): 754.
- Kouchaki S., et al. “Application of machine learning techniques to tuberculosis drug resistance analysis”. Bioinformatics 13 (2019): 2276-2282.
- Aytan-Aktug D., et al. “Prediction of acquired antimicrobial resistance for multiple bacterial species using neural networks”. mSystems1 (2020): e00774-19.
- Mahé P and Tournoud M. “Predicting bacterial resistance from whole-genome sequences using k-mers and stability selection”. BMC Bioinformatics1 (2018): 383.
- Anahtar MN., et al. “Applications of machine learning to the problem of antimicrobial resistance: an emerging model for translational research”. Journal of Clinical Microbiology7 (2021): e0126020.
- Moradigaravand D., et al. “Prediction of antibiotic resistance in Escherichia coli from large-scale pan-genome data”. PLOS Computational Biology12 (2018): e1006258.
- Ellington MJ., et al. “The role of whole genome sequencing in antimicrobial susceptibility testing of bacteria: report from the EUCAST Subcommittee”. Clinical Microbiology Infectious1 (2017): 2-22.
- Moran RA and van Schaik W. “The dynamics of antimicrobial resistance: sources, sinks, and the global context”. Nature Reviews Microbiology1 (2023): 16-29.
- Chawla NV., et al. “SMOTE: synthetic minority over-sampling technique”. Journal of Artificial Intelligence Research 16 (2002): 321-357.
- van Boeckel TP., et al. “Global antibiotic consumption 2000 to 2010: an analysis of national pharmaceutical sales data”. Lancet Infectious Disease8 (2014): 742-750.
- Lundberg SM., et al. “A unified approach to interpreting model predictions”. Adv Neural Inf Process Syst. 30 (2017): 4765-4774.
- Chen T and Guestrin C. “XGBoost: a scalable tree boosting system”. Proc 22nd ACM SIGKDD Int Conf Knowledge Discovery Data Min. (2016): 785-794.
- Wattam AR., et al. “Improvements to PATRIC, the all-bacterial bioinformatics database and analysis resource center”. Nucleic Acids ResearchD1 (2017): D535-D542.
- US Food and Drug Administration. Proposed regulatory framework for modifications to artificial intelligence/machine learning-based software as a medical device (2019).
- Dayan I., et al. “Federated learning for predicting clinical outcomes in patients with COVID-19”. Nature Medicine10 (2021): 1735-1743.
- Subramanian I., et al. “Multi-omics data integration, interpretation, and its application”. Bioinformatics and Biology Insights 14 (2020): 1177932219899051.
- Ma T and Zhang A. “Integrate multi-omic data using affinity network fusion (ANF) for cancer patient clustering”. Proc IEEE Int Conf Bioinform Biomed. (2017): 398-403.
- Rieke N., et al. “The future of digital health with federated learning”. NPJ Digital Medicine1 (2020): 119.
Citation
Copyright