IJCOPE Journal

UGC Logo DOI / ISO Logo

International Journal of Creative and Open Research in Engineering and Management

A Peer-Reviewed, Open-Access International Journal Supporting Multidisciplinary Research, Digital Publishing Standards, DOI Registration, and Academic Indexing.
Journal Information
ISSN: 3108-1754 (Online)
Crossref DOI: Available
ISO Certification: 9001:2015
Publication Fee: 599/- INR
Compliance: UGC Journal Norms
License: CC BY 4.0
Peer Review: Double Blind
Volume 02, Issue 05

Published on: May 2026

AN INTELLIGENT SELF-HEALING AI FRAMEWORK FOR AUTONOMOUS ERROR DETECTION, ROOT-CAUSE DIAGNOSIS, AND ADAPTIVE RECOVERY IN DISTRIBUTED SOFTWARE SYSTEMS

Swatantra Shukla Rakesh Kumar

Sagar Choudhary

Department of CSE, Quantum University, Roorkee, India.

Article Status

Plagiarism Passed Peer Reviewed Open Access

Available Documents

Abstract

Modern software ecosystems are increasingly dependent on distributed services, cloud- native deployment, microservices, real-time data streams, and intelligent user-facing applications. While these architectures improve scalability and flexibility, they also introduce complex runtime failures such as service unavailability, API degradation, memory exhaustion, state inconsistency, configuration drift, and unpredictable workload spikes. Traditional error- handling mechanisms, including static exception handling, manual debugging, predefined retry policies, and rule-based monitoring, are often insufficient for highly dynamic environments where faults must be detected and resolved before they affect service continuity. This paper proposes an intelligent self-healing AI framework designed to autonomously detect software anomalies, identify probable root causes, select recovery strategies, and continuously improve recovery decisions through feedback learning. The proposed model integrates real-time telemetry collection, hybrid anomaly detection, causal diagnosis, reinforcement-learning-based recovery selection, and a knowledge-driven feedback loop inspired by autonomic computing principles. The framework is evaluated using a simulated distributed application environment containing API failures, memory leaks, latency spikes, database connection errors, and container restarts. Experimental analysis indicates that the proposed approach improves fault detection accuracy, reduces mean time to recovery, and increases service availability compared with traditional rule-based recovery and static monitoring methods. The system achieved 96.4% anomaly detection accuracy, reduced mean time to recovery from 8.7 seconds to 2.1 seconds, and improved recovery success rate to 94.8% under controlled test conditions. The study demonstrates that AI-driven self-healing systems can provide a practical pathway toward resilient, adaptive, and autonomous software operations for cloud, enterprise, and mission-critical applications.

Keywords


Self-Healing Systems; Artificial Intelligence; Autonomous Error Recovery; Anomaly Detection; Root-Cause Analysis; Reinforcement Learning; Cloud Resilience; Fault Tolerance; MAPE-K; Predictive Maintenance; Distributed Systems; Intelligent Automation.

How to Cite this Paper

Shukla, S. & Kumar, R. (2026). An Intelligent Self-Healing AI Framework for Autonomous Error Detection, Root-Cause Diagnosis, and Adaptive Recovery in Distributed Software Systems. International Journal of Creative and Open Research in Engineering and Management, <i>02</i>(05). https://doi.org/10.55041/ijcope.v2i5.800

Shukla, Swatantra, and Rakesh Kumar. "An Intelligent Self-Healing AI Framework for Autonomous Error Detection, Root-Cause Diagnosis, and Adaptive Recovery in Distributed Software Systems." International Journal of Creative and Open Research in Engineering and Management, vol. 02, no. 05, 2026, pp. . doi:https://doi.org/10.55041/ijcope.v2i5.800.

Shukla, Swatantra, and Rakesh Kumar. "An Intelligent Self-Healing AI Framework for Autonomous Error Detection, Root-Cause Diagnosis, and Adaptive Recovery in Distributed Software Systems." International Journal of Creative and Open Research in Engineering and Management 02, no. 05 (2026). https://doi.org/https://doi.org/10.55041/ijcope.v2i5.800.

Search & Index

References


  1. K. Kuntamukkala, "Self-Healing Angular Architecture: AI-Driven Autonomous Error Recovery and System Resilience," International Journal of Artificial Intelligence, Data Science, and Machine Learning, vol. 5, no. 3, pp. 219-230, 2024.

  2. K. Jangam, "Role of AI and ML in Enhancing Self-Healing Capabilities, Including Predictive Analysis and Automated Recovery," International Journal of Artificial Intelligence, Data Science, and Machine Learning, vol. 3, no. 4, pp. 47-56, 2022.

  3. K. Vankayalapati, C. Pandugula, V. K. A. T. Ganti, and G. Mishra, "AI-Powered Self-Healing Cloud Infrastructures: A Paradigm for Autonomous Fault Recovery," Migration Letters, vol. 19, no. 6, pp. 1173- 1187, 2022.

  4. O. Kephart and D. M. Chess, "The vision of autonomic computing," Computer, vol. 36, no. 1, pp. 41-50, 2003.

  5. Salehie and L. Tahvildari, "Self-adaptive software: Landscape and research challenges," ACM Transactions on Autonomous and Adaptive Systems, vol. 4, no. 2, pp. 1-42, 2009.

  6. Psaier and S. Dustdar, "A survey on self-healing systems: Approaches and systems," Computing, vol. 91, no. 1, pp. 43-73, 2011.

  7. Cohen, M. Goldszmidt, T. Kelly, J. Symons, and J. S. Chase, "Correlating instrumentation data to system states: A building block for automated diagnosis and control," in Proc. USENIX OSDI, 2004, pp. 231-244.

  8. V. Mirgorodskiy, N. Maruyama, and B. P. Miller, "Problem diagnosis in large-scale computing environments," in Proc. ACM/IEEE Supercomputing Conference, 2006, pp. 88-99.

  9. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, 2nd ed. Cambridge, MA, USA: MIT Press, 2018.

  10. A. Gers, J. Schmidhuber, and F. Cummins, "Learning to forget: Continual prediction with LSTM," Neural Computation, vol. 12, no. 10, pp. 2451-2471, 2000.

Ethical Compliance & Review Process

  • All submissions are screened under plagiarism detection.
  • Review follows editorial policy.
  • Authors retain copyright.
  • Peer Review Type: Double-Blind Peer Review
  • Published on: May 27 2026
CCBYNC

This article is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. You are free to share and adapt this work for non-commercial purposes with proper attribution.

View License
Scroll to Top