IRIS 2.0: An Edge-Optimized Vision-Language Framework for Real-Time Spatial Assistive Narration

SINGH, HIMANSHU

doi:https://doi.org/10.55041/ijcope.v2i4.744

Published on: April 2026

IRIS 2.0: AN EDGE-OPTIMIZED VISION-LANGUAGE FRAMEWORK FOR REAL-TIME SPATIAL ASSISTIVE NARRATION

HIMANSHU SINGH

Department of Computer Science & Engineering, Mahatma Gandhi Mission’s College of Engineering & Technology, Uttar Pradesh, India

DOI:https://doi.org/10.55041/ijcope.v2i4.744

Article Status

Plagiarism Passed Peer Reviewed Open Access

Available Documents

Download PDF Review Report

Abstract

This paper presents IRIS 2.0 (Intelligent Real-time Imaging System v2), a hybrid edge-cloud assistive intelligence system designed to generate continuous, spatially-aware auditory descriptions for images and live environments, enabling blind and low-vision (BLV) users to access visual information safely. IRIS 2.0 marks a significant upgrade from its traditional serverless predecessor by migrating core perceptual computations directly to mobile hardware. It integrates a quantized Vision-Language Model (VLM) for deep semantic analysis, an on-device spatial audio engine for directional sound generation, and a strategic cloud-fallback mechanism for high-density cognitive tasks. Unlike legacy systems that rely on network-dependent label extraction and delayed narrative construction, IRIS 2.0 processes raw video frames locally, translating discrete visual elements into coherent, 3D-spatialized narrative descriptions in real-time. The system also supports dynamic object tracking, low-light compensation, and continuous mobility mode. This prototype demonstrates the feasibility of using local multimodal AI for inclusive accessibility solutions, achieving ultra-low-latency performance (300–450 milliseconds) while ensuring absolute data privacy and eliminating the constant need for cloud connectivity. IRIS 2.0 provides an empirically reproducible foundation for next-generation assistive tools built on edge-native AI pipelines.

How to Cite this Paper

SINGH, H. (2026). IRIS 2.0: An Edge-Optimized Vision-Language Framework for Real-Time Spatial Assistive Narration. International Journal of Creative and Open Research in Engineering and Management, <i>02</i>(04). https://doi.org/10.55041/ijcope.v2i4.744

SINGH, HIMANSHU. "IRIS 2.0: An Edge-Optimized Vision-Language Framework for Real-Time Spatial Assistive Narration." International Journal of Creative and Open Research in Engineering and Management, vol. 02, no. 04, 2026, pp. . doi:https://doi.org/10.55041/ijcope.v2i4.744.

SINGH, HIMANSHU. "IRIS 2.0: An Edge-Optimized Vision-Language Framework for Real-Time Spatial Assistive Narration." International Journal of Creative and Open Research in Engineering and Management 02, no. 04 (2026). https://doi.org/https://doi.org/10.55041/ijcope.v2i4.744.

Search & Index

References

[1] World Health Organization (WHO), "Blindness and Vision Impairment," 2024. [Online]. Available: https://www.who.int/news-room/fact-sheets/detail/blindness-and-visual-impairment

[2] K. M. A. Al-A., M. S. H. A. S., and S. B. A. R., "A Survey on Computer Vision-Based Assistive Technology for the Visually Impaired," IEEE Access, vol. 8, pp. 50986–51011, 2020.

[3] Chu, X., et al. "MobileVLM: A Fast, Strong and Open Vision Language Assistant for Mobile Devices." arXiv preprint arXiv:2312.16886, 2023.

[4] Liu, H., et al. "Visual Instruction Tuning (LLaVA)." Advances in Neural Information Processing Systems, 2023.

[5] Frantar, E., et al. "GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers." arXiv preprint arXiv:2210.17323, 2022.

[6] Google Resonance Audio, "Spatial Audio SDK Documentation," 2024. [Online]. Available: https://resonance-audio.github.io/resonance-audio/

[7] Amazon Web Services, "AWS Lambda and API Gateway Developer Guide," AWS Documentation, 2024.

[8] A. Vaswani et al., "Attention Is All You Need," in Advances in Neural Information Processing Systems (NIPS), 2017.

[9] D. Amodei et al., "Deep Voice: Real-Time Neural Text-to-Speech," arXiv preprint, arXiv:1702.07825, 2017.

[10] J. Brooke, "SUS: A Quick and Dirty Usability Scale," in Usability Evaluation in Industry, 1996.

[11] Lin, T. Y., et al. "Microsoft COCO: Common Objects in Context." European Conference on Computer Vision (ECCV), 2014.

Ethical Compliance & Review Process

•All submissions are screened under plagiarism detection.
•Review follows editorial policy.
•Authors retain copyright.
•Peer Review Type: Double-Blind Peer Review
•Published on: Apr 25 2026

CCBYNC

This article is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. You are free to share and adapt this work for non-commercial purposes with proper attribution.

View License

Back to Volume 02, Issue 04 View All Issues Next Article

← Previous Article

IoT-Enabled Home Appliances Protection and Control System Using ESP32

Next Article →

Judicial Protection of Children’s Rights in India: A Critical and Analytical Study