Published on: April 2026
MCP-POWERED RAG FOR VIDEO UNDERSTANDING
S. Prathik K. Vinitha SK. Sameer C. Omkar
B. Sreelatha
Article Status
Available Documents
Abstract
The rapid emergence of video content in domains like education, security, and media has introduced some difficulties when it comes to efficient extraction of information. The current approaches are based on using metadata that rarely describes the actual content and results in inefficiency and time waste. In order to solve this issue, this paper presents a framework based on MCP and using retrieval-augmented generation to provide intelligent video understanding and semantic retrieval capabilities. The proposed framework uses a pipeline method including video processing, feature extraction, generation of embeddings, vector indexing, and retrieval process based on a user query to generate context-aware responses. An interaction protocol is implemented to allow the interaction between the components of the system and make it more modular and scalable. The user-related capabilities provided include a timestamp-based interface, the capability to generate clips, and take notes. The experimental evaluation shows that the proposed method significantly improves the efficiency of the process while producing better results than the conventional retrieval frameworks. This makes the proposed solution efficient and scalable enough for practical use in education and research tasks. The proposed MCP-based video search system is efficient and scalable for intelligent video analytics
How to Cite this Paper
Prathik, S., Vinitha, K., Sameer, S. & Omkar, C. (2026). MCP-Powered Rag for Video Understanding. International Journal of Creative and Open Research in Engineering and Management, <i>02</i>(04). https://doi.org/10.55041/ijcope.v2i4.126
Prathik, S., et al.. "MCP-Powered Rag for Video Understanding." International Journal of Creative and Open Research in Engineering and Management, vol. 02, no. 04, 2026, pp. . doi:https://doi.org/10.55041/ijcope.v2i4.126.
Prathik, S.,K. Vinitha,SK. Sameer, and C. Omkar. "MCP-Powered Rag for Video Understanding." International Journal of Creative and Open Research in Engineering and Management 02, no. 04 (2026). https://doi.org/https://doi.org/10.55041/ijcope.v2i4.126.
References
- Jeong, K. Kim, J. Baek, and S. J. Hwang, “VideoRAG: Retrieval-Augmented Generation over Video Corpus,” arXiv:2501.05874, 2025.
- DOI: 10.48550/arXiv.2501.05874.
- Kunisetty, P. Ramachandrula, S. Sruthi, S. Vekkot, and D. Gupta, “Advancing ASR for Indian- Accented English: Dataset Creation and Whisper Fine-Tuning,” Procedia Computer Science, 2025. DOI: 10.1016/j.procs.2025.04.513.
- Hemmat, K. Vadaei, M. Shirian, M. H. Heydari, and A. Fatemi, “Adaptive Chunking for VideoRAG Pipelines with a Newly Gathered Dataset,” in Proc. IEEE Int. Conf. on Systems Integration and Intelligent Computing (CSICC), 2025.
- DOI: 10.1109/CSICC65765.2025.10967455.
- Luo, X. Zheng, G. Li, and S. Yin, “Multimodal Video Understanding and Retrieval,” arXiv:2411.13093, 2024.
- DOI: 10.48550/arXiv.2411.13093.
- Tevissen, K. Guetari, and F. Petitpont, “Towards Retrieval Augmented Generation over Large Video Libraries,” in Proc. IEEE Int. Conf. on Human System Interaction (HSI), 2024.
- DOI: 10.1109/HSI61632.2024.10613524.
- A. Arefeen, B. Debnath, M. Y. S. Uddin, and S. Chakradhar, “ViTA: An Efficient Video-to- Text Algorithm using Vision-Language Models for RAG-based Video Analysis,” in Proc. IEEE/CVF Conf. on Computer Vision and Pattern Recognition Workshops (CVPRW), 2024.DOI: 10.1109/CVPRW63382.2024.00232.
- Y. Lokkondra, D. Ramegowda, G. M. Thimmaiah, and A. P. B. Vijaya, “DEFUSE: Deep Fused End-to-End Video Text Detection and Recognition,” Revue d’Intelligence Artificielle, vol. 36, no. 3, 2022.DOI: 10.18280/ria.360314.
- Y. Lokkondra, D. Ramegowda, G. M. Thimmaiah, and M. H. Shivananjappa, “ETDR: An Exploratory View of Text Detection and Recognition in Images and Videos,” Revue d’Intelligence Artificielle, vol. 35, no. 5, 2021.DOI: 10.18280/ria.350504.
- Lewis et al., “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks,” arXiv:2005.11401, 2020.
- DOI: 10.48550/arXiv.2005.11401.
Ethical Compliance & Review Process
- •All submissions are screened under plagiarism detection.
- •Review follows editorial policy.
- •Authors retain copyright.
- •Peer Review Type: Double-Blind Peer Review
- •Published on: Apr 07 2026
This article is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. You are free to share and adapt this work for non-commercial purposes with proper attribution.

