Published on: April 2026
SHIELDLLM: A HYBRID ADVERSARIAL PROMPT INJECTION DETECTION FRAMEWORK FOR SECURING LARGE LANGUAGE MODELS
Vijay Kumar
Article Status
Available Documents
Abstract
Large Language Models (LLMs) have rapidly permeated enterprise, consumer, and governmental applications, fundamentally transforming the human–computer interaction paradigm. However, their widespread deployment has exposed critical security vulnerabilities, most notably adversarial prompt injection attacks, in which maliciously crafted inputs are designed to override system-level instructions, exfiltrate sensitive data, or hijack model behaviour. Existing safeguards, such as coarse-grained output filters and rule-based blocklists, are demonstrably insufficient against semantically sophisticated attack vectors. This paper proposes ShieldLLM—a real-time, hybrid AI firewall that combines Bidirectional Encoder Representations from Transformers (BERT)-derived semantic embeddings with an ensemble Random Forest classifier and a complementary rule-based detection layer to classify incoming prompts as either Safe or Injection Attack with a latency budget under 45 ms. Evaluated on a corpus of 10,000 labelled prompts spanning five injection sub-categories, ShieldLLM achieves 96.3% accuracy, 95.8% precision, 95.7% recall, and an AUC-ROC of 0.982, surpassing all evaluated baselines. The framework is architecturally agnostic and can be integrated as middleware within any LLM serving stack. This work advances the nascent field of LLM-specific intrusion detection and provides a reproducible benchmark dataset for the research community
How to Cite this Paper
Kumar, V. (2026). Shieldllm: A Hybrid Adversarial Prompt Injection Detection Framework for Securing Large Language Models. International Journal of Creative and Open Research in Engineering and Management, <i>02</i>(04). https://doi.org/10.55041/ijcope.v2i4.463
Kumar, Vijay. "Shieldllm: A Hybrid Adversarial Prompt Injection Detection Framework for Securing Large Language Models." International Journal of Creative and Open Research in Engineering and Management, vol. 02, no. 04, 2026, pp. . doi:https://doi.org/10.55041/ijcope.v2i4.463.
Kumar, Vijay. "Shieldllm: A Hybrid Adversarial Prompt Injection Detection Framework for Securing Large Language Models." International Journal of Creative and Open Research in Engineering and Management 02, no. 04 (2026). https://doi.org/https://doi.org/10.55041/ijcope.v2i4.463.
References
[1] T. B. Brown et al., "Language models are few-shot learners," in Advances in Neural Information Processing Systems (NeurIPS), vol. 33, pp. 1877–1901, 2020.[2] F. Perez and I. Ribeiro, "Ignore previous prompt: Attack techniques for language models," in Proc. NeurIPS Workshop on Machine Learning Safety, New Orleans, LA, USA, Nov. 2022.
[3] K. Greshake, S. Abdelnabi, S. Mishra, C. Endres, T. Holz, and M. Fritz, "Not what you've signed up for: Compromising real-world LLM-integrated applications with indirect prompt injection," in Proc. ACM Workshop on Artificial Intelligence and Security (AISec), Copenhagen, Denmark, 2023, pp. 79–90.
[4] A. Alon and M. Kamfonas, "Detecting language model attacks with perplexity," in Proc. ICLR Workshop on Secure and Trustworthy Large Language Models, Vienna, Austria, 2024.
[5] I. Markov, A. Dey, O. Harel, and Y. Goel, "Holistic approach to undesired content detection in the real world," in Proc. AAAI Conference on Artificial Intelligence, Washington, DC, USA, 2023, vol. 37, no. 12, pp. 15009–15018.
[6] J. Ebrahimi, A. Rao, D. Lowd, and D. Dou, "HotFlip: White-box adversarial examples for text classification," in Proc. 56th Annual Meeting of the Association for Computational Linguistics (ACL), Melbourne, Australia, 2018, pp. 31–36.
[7] E. Wallace, S. Feng, N. Kandpal, M. Gardner, and S. Singh, "Universal adversarial triggers for attacking and analyzing NLP," in Proc. Conference on Empirical Methods in Natural Language Processing (EMNLP), Hong Kong, 2019, pp. 2153–2162.
[8] L. Ouyang et al., "Training language models to follow instructions with human feedback," in Advances in Neural Information Processing Systems (NeurIPS), vol. 35, pp. 27730–27744, 2022.
[9] A. Wei, N. Haghtalab, and J. Steinhardt, "Jailbroken: How does LLM safety training fail?" in Advances in Neural Information Processing Systems (NeurIPS), vol. 36, pp. 80079–80110, 2023.
[10] Y. Bai et al., "Constitutional AI: Harmlessness from AI feedback," arXiv preprint arXiv:2212.08073, Dec. 2022.
Ethical Compliance & Review Process
- •All submissions are screened under plagiarism detection.
- •Review follows editorial policy.
- •Authors retain copyright.
- •Peer Review Type: Double-Blind Peer Review
- •Published on: Apr 18 2026
This article is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. You are free to share and adapt this work for non-commercial purposes with proper attribution.
← Previous Article
She FoundNext Article →
Sign Gesture to Audio Conversion

