Representational Divergence Between Spiking and Non-Spiking Neural Architectures Under Multimodal Contrastive Learning

V, Kumaran; M, Amarjith

doi:https://doi.org/10.55041/ijcope.v2i5.758

Volume 02, Issue 05

Published on: May 2026

REPRESENTATIONAL DIVERGENCE BETWEEN SPIKING AND NON-SPIKING NEURAL ARCHITECTURES UNDER MULTIMODAL CONTRASTIVE LEARNING

Kumaran V Amarjith M

G. Archana

Dhanalakshmi Srinivasan University, Trichy, India

DOI:https://doi.org/10.55041/ijcope.v2i5.758

Article Status

Plagiarism Passed Peer Reviewed Open Access

Available Documents

Download PDF Review Report

Abstract

Multimodal representation learning enables im-age–text alignment through shared embedding spaces optimized using contrastive objectives. Although Spiking Neural Networks (SNNs) provide biologically inspired temporal computation, their representational behavior under static multimodal supervision remains insufficiently explored. This study presents a controlled comparison between a Spiking Neural Network (SNN) and a Multilayer Perceptron (MLP) for multimodal image–text re-trieval. To isolate the effect of temporal spiking dynamics, both architectures were trained under identical embedding dimension-ality, optimization settings, contrastive learning objectives, and retrieval protocols. Experiments were conducted on a balanced episodic benchmark consisting of 1,000 image–text pairs. Results show that both architectures learn stable and non-collapsed embedding spaces with broad cosine similarity distributions. However, retrieval performance for both models remains close to chance under static supervision. Despite comparable task-level performance, cross-model cosine similarity analysis reveals substantial representational divergence between SNN and MLP embeddings, indicating distinct embedding geometries under identical learning conditions. The findings suggest that tempo-ral spiking dynamics alone do not improve static multimodal retrieval alignment and highlight a mismatch between spiking inductive bias and temporally unstructured supervision. Overall, the study emphasizes the importance of representation-level anal-ysis alongside conventional retrieval evaluation for neuromorphic multimodal learning systems.

Index Terms—Spiking Neural Networks, Multimodal Retrieval, Contrastive Learning, Neuromorphic Computing, Embedding Geometry, Representation Learning

How to Cite this Paper

V, K. & M, A. (2026). Representational Divergence Between Spiking and Non-Spiking Neural Architectures Under Multimodal Contrastive Learning. International Journal of Creative and Open Research in Engineering and Management, <i>02</i>(05). https://doi.org/10.55041/ijcope.v2i5.758

V, Kumaran, and Amarjith M. "Representational Divergence Between Spiking and Non-Spiking Neural Architectures Under Multimodal Contrastive Learning." International Journal of Creative and Open Research in Engineering and Management, vol. 02, no. 05, 2026, pp. . doi:https://doi.org/10.55041/ijcope.v2i5.758.

V, Kumaran, and Amarjith M. "Representational Divergence Between Spiking and Non-Spiking Neural Architectures Under Multimodal Contrastive Learning." International Journal of Creative and Open Research in Engineering and Management 02, no. 05 (2026). https://doi.org/https://doi.org/10.55041/ijcope.v2i5.758.

Search & Index

References

Maass, “Networks of spiking neurons: The third generation of neural network models,” Neural Networks, vol. 10, no. 9, pp. 1659–1671, 1997.

O. Neftci, H. Mostafa, and F. Zenke, “Surrogate gradient learning in spiking neural networks,” IEEE Signal Processing Magazine, vol. 36, no. 6, pp. 61–63, 2019.

U. Diehl and M. Cook, “Unsupervised learning of digit recognition using spike-timing-dependent plasticity,” Frontiers in Computational Neuroscience, vol. 9, 2015.

Roy, A. Banerjee, and A. Basu, “Toward spike-based machine intelligence with neuromorphic computing,” Nature, vol. 575, pp. 607–617, 2019.

Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple framework for contrastive learning of visual representations,” in Proceedings of the International Conference on Machine Learning (ICML), 2020, pp. 1597–1607.

Radford et al., “Learning transferable visual models from natural language supervision,” in Proceedings of the International Conference on Machine Learning (ICML), 2021, pp. 8748–8763.

LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521,436–444, 2015.

Deng et al., “ImageNet: A large-scale hierarchical image database,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009, pp. 248–255.

van den Oord, Y. Li, and O. Vinyals, “Representation learning with contrastive predictive coding,” arXiv preprint arXiv:1807.03748, 2018.

Johnson, M. Douze, and H. Je´gou, “Billion-scale similarity search with FAISS,” IEEE Transactions on Big Data, 2019.

Ethical Compliance & Review Process

•All submissions are screened under plagiarism detection.
•Review follows editorial policy.
•Authors retain copyright.
•Peer Review Type: Double-Blind Peer Review
•Published on: May 25 2026

CCBYNC

This article is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. You are free to share and adapt this work for non-commercial purposes with proper attribution.

View License

Back to Volume 02, Issue 05 View All Issues Next Article

← Previous Article

Report on Customer perception of Digital Banking Services from the Chennai, Indian Overseas Bank (IOB)

Next Article →

Research on Nutrition Calculator in Microwave: A Comprehensive Research Analysis