MCP-Powered Rag for Video Understanding

Prathik, S.; Vinitha, K.; Sameer, SK.; Omkar, C.

doi:https://doi.org/10.55041/ijcope.v2i4.126

Volume 02, Issue 04

Published on: April 2026

MCP-POWERED RAG FOR VIDEO UNDERSTANDING

S. Prathik K. Vinitha SK. Sameer C. Omkar

B. Sreelatha

Department of CSE (Data Science), Ace Engineering College, Hyderabad, Telangana, India

DOI:https://doi.org/10.55041/ijcope.v2i4.126

Article Status

Plagiarism Passed Peer Reviewed Open Access

Available Documents

Download PDF Review Report

Abstract

The rapid emergence of video content in domains like education, security, and media has introduced some difficulties when it comes to efficient extraction of information. The current approaches are based on using metadata that rarely describes the actual content and results in inefficiency and time waste. In order to solve this issue, this paper presents a framework based on MCP and using retrieval-augmented generation to provide intelligent video understanding and semantic retrieval capabilities. The proposed framework uses a pipeline method including video processing, feature extraction, generation of embeddings, vector indexing, and retrieval process based on a user query to generate context-aware responses. An interaction protocol is implemented to allow the interaction between the components of the system and make it more modular and scalable. The user-related capabilities provided include a timestamp-based interface, the capability to generate clips, and take notes. The experimental evaluation shows that the proposed method significantly improves the efficiency of the process while producing better results than the conventional retrieval frameworks. This makes the proposed solution efficient and scalable enough for practical use in education and research tasks. The proposed MCP-based video search system is efficient and scalable for intelligent video analytics

How to Cite this Paper

Prathik, S., Vinitha, K., Sameer, S. & Omkar, C. (2026). MCP-Powered Rag for Video Understanding. International Journal of Creative and Open Research in Engineering and Management, <i>02</i>(04). https://doi.org/10.55041/ijcope.v2i4.126

Prathik, S., et al.. "MCP-Powered Rag for Video Understanding." International Journal of Creative and Open Research in Engineering and Management, vol. 02, no. 04, 2026, pp. . doi:https://doi.org/10.55041/ijcope.v2i4.126.

Prathik, S.,K. Vinitha,SK. Sameer, and C. Omkar. "MCP-Powered Rag for Video Understanding." International Journal of Creative and Open Research in Engineering and Management 02, no. 04 (2026). https://doi.org/https://doi.org/10.55041/ijcope.v2i4.126.

Search & Index

References

Jeong, K. Kim, J. Baek, and S. J. Hwang, “VideoRAG: Retrieval-Augmented Generation over Video Corpus,” arXiv:2501.05874, 2025.

DOI: 10.48550/arXiv.2501.05874.

Kunisetty, P. Ramachandrula, S. Sruthi, S. Vekkot, and D. Gupta, “Advancing ASR for Indian- Accented English: Dataset Creation and Whisper Fine-Tuning,” Procedia Computer Science, 2025. DOI: 10.1016/j.procs.2025.04.513.

Hemmat, K. Vadaei, M. Shirian, M. H. Heydari, and A. Fatemi, “Adaptive Chunking for VideoRAG Pipelines with a Newly Gathered Dataset,” in Proc. IEEE Int. Conf. on Systems Integration and Intelligent Computing (CSICC), 2025.

DOI: 10.1109/CSICC65765.2025.10967455.

Luo, X. Zheng, G. Li, and S. Yin, “Multimodal Video Understanding and Retrieval,” arXiv:2411.13093, 2024.

DOI: 10.48550/arXiv.2411.13093.

Tevissen, K. Guetari, and F. Petitpont, “Towards Retrieval Augmented Generation over Large Video Libraries,” in Proc. IEEE Int. Conf. on Human System Interaction (HSI), 2024.

DOI: 10.1109/HSI61632.2024.10613524.

A. Arefeen, B. Debnath, M. Y. S. Uddin, and S. Chakradhar, “ViTA: An Efficient Video-to- Text Algorithm using Vision-Language Models for RAG-based Video Analysis,” in Proc. IEEE/CVF Conf. on Computer Vision and Pattern Recognition Workshops (CVPRW), 2024.DOI: 10.1109/CVPRW63382.2024.00232.

Y. Lokkondra, D. Ramegowda, G. M. Thimmaiah, and A. P. B. Vijaya, “DEFUSE: Deep Fused End-to-End Video Text Detection and Recognition,” Revue d’Intelligence Artificielle, vol. 36, no. 3, 2022.DOI: 10.18280/ria.360314.

Y. Lokkondra, D. Ramegowda, G. M. Thimmaiah, and M. H. Shivananjappa, “ETDR: An Exploratory View of Text Detection and Recognition in Images and Videos,” Revue d’Intelligence Artificielle, vol. 35, no. 5, 2021.DOI: 10.18280/ria.350504.

Lewis et al., “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks,” arXiv:2005.11401, 2020.

DOI: 10.48550/arXiv.2005.11401.

Ethical Compliance & Review Process

•All submissions are screened under plagiarism detection.
•Review follows editorial policy.
•Authors retain copyright.
•Peer Review Type: Double-Blind Peer Review
•Published on: Apr 07 2026

CCBYNC

This article is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. You are free to share and adapt this work for non-commercial purposes with proper attribution.

View License

Back to Volume 02, Issue 04 View All Issues Next Article

← Previous Article

Mathematical Modeling of Cyber Security Threats for Network Risk Assessment and Prevention

Next Article →

MEDI AI : A DISEASE PREDICTION SYSTEM