IJCOPE Journal

UGC Logo DOI / ISO Logo

International Journal of Creative and Open Research in Engineering and Management

A Peer-Reviewed, Open-Access International Journal Supporting Multidisciplinary Research, Digital Publishing Standards, DOI Registration, and Academic Indexing.
Journal Information
ISSN: 3108-1754 (Online)
Crossref DOI: Available
ISO Certification: 9001:2015
Publication Fee: 599/- INR
Compliance: UGC Journal Norms
License: CC BY 4.0
Peer Review: Double Blind
Volume 02, Issue 05

Published on: May 2026

ANOMALY DETECTION IN DISTRIBUTED FILE SYSTEM LOGS USING HYBRID MACHINE LEARNING AND DEEP LEARNING MODELS

Hema M S Aliasgar Abbas Ringnodwala Chatura J S Chirag Ananda Kumar Dhruv Mishra

Department of Computer Science and Engineering RV Institute of Technology and Management Bengaluru – 560076

Karnataka India

Article Status

Plagiarism Passed Peer Reviewed Open Access

Available Documents

Abstract

This paper proposes a hybrid anomaly detection framework for Distributed File System (DFS) log analysis, combining statistical machine learning with deep learning sequence models. Rather than relying on a single classifier, the system integrates Random Forest, Linear Support Vector Machine (SVM), Bidirectional Long Short-Term Memory (BiLSTM), and DeepLog within a weighted soft-voting ensemble. Class imbalance is addressed through the Synthetic Minority Over-sampling Technique (SMOTE), applied to classical learners prior to training. Precision-Recall Area Under the Curve (PR-AUC) is adopted as the primary evaluation metric, given its suitability for imbalanced classification tasks. The dual-modality design captures both event-frequency patterns via Bag-of-Words representations and temporal ordering dependencies via sequential models. Experiments on the public HDFS LogHub dataset show the ensemble achieves a PR-AUC of 0.767 and perfect precision, while Random Forest attains the highest F1-score and a ROC-AUC of 0.953. The framework provides a reliable and interpretable approach to log-based anomaly detection in large-scale distributed systems.

Keywords—System log file Analysis; Anomaly Detection; Hybrid Machine Learning; Deep Learning; Log-based Monitoring; Random Forest; BiLSTM(Ensemble); DeepLog(Ensemble); SMOTE (Class Imbalance); Precision-Recall AUC (PR-AUC).

How to Cite this Paper

S, H. M., Ringnodwala, A. A., S, C. J., Kumar, C. A. & Mishra, D. (2026). Anomaly Detection in Distributed File System Logs Using Hybrid Machine Learning and Deep Learning Models. International Journal of Creative and Open Research in Engineering and Management, <i>02</i>(05). https://doi.org/10.55041/ijcope.v2i4.995

S, Hema, et al.. "Anomaly Detection in Distributed File System Logs Using Hybrid Machine Learning and Deep Learning Models." International Journal of Creative and Open Research in Engineering and Management, vol. 02, no. 05, 2026, pp. . doi:https://doi.org/10.55041/ijcope.v2i4.995.

S, Hema,Aliasgar Ringnodwala,Chatura S,Chirag Kumar, and Dhruv Mishra. "Anomaly Detection in Distributed File System Logs Using Hybrid Machine Learning and Deep Learning Models." International Journal of Creative and Open Research in Engineering and Management 02, no. 05 (2026). https://doi.org/https://doi.org/10.55041/ijcope.v2i4.995.

Search & Index

References

[1]R. Vaarandi, “A Data Clustering Algorithm for Mining Patterns from Event Logs,” in Proc. IEEE Workshop on IP Operations and Management (IPOM), 2003.

[2]M.-L. Shyu, S.-C. Chen, K. Sarinnapakorn, and L. Chang, “A Novel Anomaly Detection Scheme Based on Principal Component Classifier,” in Proc. IEEE Foundations and New Directions of Data Mining Workshop, 2003.

[3]W. Xu, L. Huang, A. Fox, D. A. Patterson, and M. I. Jordan, “Detecting Large-Scale System Problems by Mining Console Logs,” in Proc. ACM SIGOPS Symposium on Operating Systems Principles (SOSP), 2009.

[4]M. Du, F. Li, G. Zheng, and V. Srikumar, “DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning,” in Proc. ACM SIGSAC Conference on Computer and Communications Security (CCS), 2017.

[5]P. He, J. Zhu, Z. Zheng, and M. R. Lyu, “Drain: An Online Log Parsing Approach with Fixed Depth Tree,” in Proc. IEEE International Conference on Web Services (ICWS), 2017.

[6]K. Zhang, J. Xu, M. Min, et al., “LogAnomaly: Unsupervised Detection of Sequential and Quantitative Anomalies in Unstructured Logs,” in Proc. International Joint Conference on Artificial Intelligence (IJCAI), 2019.

[7]J. Zhu, S. He, P. He, J. Liu, and M. R. Lyu, “LogHub: A Large Collection of System Log Datasets for AI-Driven Log Analytics,” arXiv preprint arXiv:2008.06448, 2020.

[8]Y. Alaca, E. Başaran, and Y. Çelik, “Enhancing Anomaly Detection in Large-Scale Log Data Using Machine Learning: A Comparative Study of SVM and KNN Algorithms with HDFS Dataset,” Empirical Software Engineering, 2024.

Ethical Compliance & Review Process

  • All submissions are screened under plagiarism detection.
  • Review follows editorial policy.
  • Authors retain copyright.
  • Peer Review Type: Double-Blind Peer Review
  • Published on: May 02 2026
CCBYNC

This article is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. You are free to share and adapt this work for non-commercial purposes with proper attribution.

View License
Scroll to Top