Toward Interpretable Metagenomic Analysis: A Compositionally-Aware Explainable AI Pipeline for Taxonomic Classification and Functional Prediction

A, Rakshitha; S, Priyanka; M.U, Navya Shree

doi:https://doi.org/10.55041/ijcope.v2i4.739

Volume 02, Issue 05

Published on: May 2026

TOWARD INTERPRETABLE METAGENOMIC ANALYSIS: A COMPOSITIONALLY-AWARE EXPLAINABLE AI PIPELINE FOR TAXONOMIC CLASSIFICATION AND FUNCTIONAL PREDICTION

Rakshitha A Priyanka S Navya Shree M.U

Computer Science and Engineering Dayananda Sagar University

DOI:https://doi.org/10.55041/ijcope.v2i4.739

Article Status

Plagiarism Passed Peer Reviewed Open Access

Available Documents

Download PDF Review Report

Abstract

Metagenomics has brought about a paradigm shift in our understanding of microbial populations. However, the incorporation of machine learning with metagenomic analysis is plagued by two closely associated challenges, that is, the compositional nature of sequencing data and the failure to interpret deep learning networks. Current XAI methods used with shallow classifiers depend on count-based features. Such an approach is in contrast with the fundamental tenets of standard machine learning techniques and results in unreliable feature attribution. While deep learning models have shown superior performance in taxonomy prediction and functional pathways, they are still considered black boxes. Key contributions made in our research include a number of novel bioinformatics pipeline developments that can resolve both of these issues and consist of: (i) An Aitchison geometry compositional transformation (CLR, Centered Log-Ratio) applied before training to mitigate the simplex constraint issue; (ii) Sequence classification model based on the transformer architecture and multiple heads; (iii) A method of attributions developed based on the modified SHAP algorithm. As can be seen from the results of our experiment using HMP2 IBD data set, the usage of SHAP attributions for explaining raw count data produces highly unreliable attribution maps (faithfulness AUROC = 0.63). At the same time, CLR SHAP provides a significant improvement compared to this approach (AUROC = 0.81).

Index Terms—metagenomics, compositional data analysis, explainable AI, SHAP, transformer, clinical bioinformatics, XAI.

How to Cite this Paper

A, R., S, P. & M.U, N. S. (2026). Toward Interpretable Metagenomic Analysis: A Compositionally-Aware Explainable AI Pipeline for Taxonomic Classification and Functional Prediction. International Journal of Creative and Open Research in Engineering and Management, <i>02</i>(05). https://doi.org/10.55041/ijcope.v2i4.739

A, Rakshitha, et al.. "Toward Interpretable Metagenomic Analysis: A Compositionally-Aware Explainable AI Pipeline for Taxonomic Classification and Functional Prediction." International Journal of Creative and Open Research in Engineering and Management, vol. 02, no. 05, 2026, pp. . doi:https://doi.org/10.55041/ijcope.v2i4.739.

A, Rakshitha,Priyanka S, and Navya M.U. "Toward Interpretable Metagenomic Analysis: A Compositionally-Aware Explainable AI Pipeline for Taxonomic Classification and Functional Prediction." International Journal of Creative and Open Research in Engineering and Management 02, no. 05 (2026). https://doi.org/https://doi.org/10.55041/ijcope.v2i4.739.

Search & Index

References

Knight, R. et al., ”Best practices for analysing microbiomes,” Nature Reviews Microbiology, vol. 16, no. 7, pp. 410–422, 2018.

Lloyd-Price, J. et al., ”Strains, functions and dynamics in the expanded Human Microbiome Project,” Nature, vol. 550, no. 7674, pp. 61–66, 2017.

Sharma, S., Narahari, H. P., and Raman, K., ”Harnessing machine learning for metagenomic data analysis: trends and applications,” mSystems, vol. 10, no. 11, e01642-24, 2025.

Gloor, G. B., Macklaim, J. M., Pawlowsky-Glahn, V., and Egozcue, J. J., ”Microbiome datasets are compositional: and this is not optional,” Frontiers in Microbiology, vol. 8, p. 2224, 2017.

Aitchison, J., ”The Statistical Analysis of Compositional Data,” Journal of the Royal Statistical Society: Series B, vol. 44, no. 2, pp. 139–177, 1982.

Schiffer, L. et al., ”Deep learning methods in metagenomics: a review,” Microbial Genomics, PMC11092122, 2024.

Joos et al., “Credible inferences in microbiome research: ensuring rigour, reproducibility and relevance in the era of AI,” Nature Reviews Gastroenterology & Hepatology, 2025.

Lundberg, S. M. and Lee, S. I., ”A unified approach to interpreting model predictions,” Advances in Neural Information Processing Systems, vol. 30, 2017.

Ethical Compliance & Review Process

•All submissions are screened under plagiarism detection.
•Review follows editorial policy.
•Authors retain copyright.
•Peer Review Type: Double-Blind Peer Review
•Published on: May 04 2026

CCBYNC

This article is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. You are free to share and adapt this work for non-commercial purposes with proper attribution.

View License

Back to Volume 02, Issue 05 View All Issues Next Article

← Previous Article

The Transformative Power of AI: Innovations, Challenges, and Future Prospects

Next Article →

Used Car Price Prediction Using Machine Learning