Published on: April 2026
IRIS 2.0: AN EDGE-OPTIMIZED VISION-LANGUAGE FRAMEWORK FOR REAL-TIME SPATIAL ASSISTIVE NARRATION
HIMANSHU SINGH
Article Status
Available Documents
Abstract
This paper presents IRIS 2.0 (Intelligent Real-time Imaging System v2), a hybrid edge-cloud assistive intelligence system designed to generate continuous, spatially-aware auditory descriptions for images and live environments, enabling blind and low-vision (BLV) users to access visual information safely. IRIS 2.0 marks a significant upgrade from its traditional serverless predecessor by migrating core perceptual computations directly to mobile hardware. It integrates a quantized Vision-Language Model (VLM) for deep semantic analysis, an on-device spatial audio engine for directional sound generation, and a strategic cloud-fallback mechanism for high-density cognitive tasks. Unlike legacy systems that rely on network-dependent label extraction and delayed narrative construction, IRIS 2.0 processes raw video frames locally, translating discrete visual elements into coherent, 3D-spatialized narrative descriptions in real-time. The system also supports dynamic object tracking, low-light compensation, and continuous mobility mode. This prototype demonstrates the feasibility of using local multimodal AI for inclusive accessibility solutions, achieving ultra-low-latency performance (300–450 milliseconds) while ensuring absolute data privacy and eliminating the constant need for cloud connectivity. IRIS 2.0 provides an empirically reproducible foundation for next-generation assistive tools built on edge-native AI pipelines.
How to Cite this Paper
SINGH, H. (2026). IRIS 2.0: An Edge-Optimized Vision-Language Framework for Real-Time Spatial Assistive Narration. International Journal of Creative and Open Research in Engineering and Management, <i>02</i>(04). https://doi.org/10.55041/ijcope.v2i4.744
SINGH, HIMANSHU. "IRIS 2.0: An Edge-Optimized Vision-Language Framework for Real-Time Spatial Assistive Narration." International Journal of Creative and Open Research in Engineering and Management, vol. 02, no. 04, 2026, pp. . doi:https://doi.org/10.55041/ijcope.v2i4.744.
SINGH, HIMANSHU. "IRIS 2.0: An Edge-Optimized Vision-Language Framework for Real-Time Spatial Assistive Narration." International Journal of Creative and Open Research in Engineering and Management 02, no. 04 (2026). https://doi.org/https://doi.org/10.55041/ijcope.v2i4.744.
References
[1] World Health Organization (WHO), "Blindness and Vision Impairment," 2024. [Online]. Available: https://www.who.int/news-room/fact-sheets/detail/blindness-and-visual-impairment[2] K. M. A. Al-A., M. S. H. A. S., and S. B. A. R., "A Survey on Computer Vision-Based Assistive Technology for the Visually Impaired," IEEE Access, vol. 8, pp. 50986–51011, 2020.
[3] Chu, X., et al. "MobileVLM: A Fast, Strong and Open Vision Language Assistant for Mobile Devices." arXiv preprint arXiv:2312.16886, 2023.
[4] Liu, H., et al. "Visual Instruction Tuning (LLaVA)." Advances in Neural Information Processing Systems, 2023.
[5] Frantar, E., et al. "GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers." arXiv preprint arXiv:2210.17323, 2022.
[6] Google Resonance Audio, "Spatial Audio SDK Documentation," 2024. [Online]. Available: https://resonance-audio.github.io/resonance-audio/
[7] Amazon Web Services, "AWS Lambda and API Gateway Developer Guide," AWS Documentation, 2024.
[8] A. Vaswani et al., "Attention Is All You Need," in Advances in Neural Information Processing Systems (NIPS), 2017.
[9] D. Amodei et al., "Deep Voice: Real-Time Neural Text-to-Speech," arXiv preprint, arXiv:1702.07825, 2017.
[10] J. Brooke, "SUS: A Quick and Dirty Usability Scale," in Usability Evaluation in Industry, 1996.
[11] Lin, T. Y., et al. "Microsoft COCO: Common Objects in Context." European Conference on Computer Vision (ECCV), 2014.
Ethical Compliance & Review Process
- •All submissions are screened under plagiarism detection.
- •Review follows editorial policy.
- •Authors retain copyright.
- •Peer Review Type: Double-Blind Peer Review
- •Published on: Apr 25 2026
This article is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. You are free to share and adapt this work for non-commercial purposes with proper attribution.

