Published on: April 2026
MULTIMODAL AI-BASED MENTAL HEALTH DETECTION SYSTEM: INTEGRATING TEXT, SPEECH, AND FACIAL ANALYSIS USING DEEP LEARNING
Himanshu Gautam Narvadesh Pandey Mukul Sharma Abhishek Kumar Ashrit Patel
Article Status
Available Documents
Abstract
This paper introduces a multimodal deep learning system designed to change that. By combining three distinct channels of human expression, what people say (text), how they say it (speech), and what their face reveals (facial expressions), our approach builds a more complete picture of a person's mental state than any single method could offer alone. These modalities are woven together into a unified AI framework that functions as a mental health screening tool, designed with both accuracy and ethical responsibility in mind.
Early results on benchmark datasets are encouraging. The multimodal approach consistently outperforms single modality methods in identifying signs of depression, anxiety, and stress. The next step is large scale clinical validation across diverse, real-world populations to ensure the system holds up where it matters most. We also plan to incorporate explainable AI techniques, so that predictions are not just accurate but understandable and trustworthy to the clinicians and patients relying on them.
Looking further ahead, the system is designed to integrate with mobile and wearable technologies, opening the door to continuous, non-invasive mental health monitoring and earlier intervention. The broader goal is a tool that does not replace clinicians, but genuinely supports them, making quality mental health assessment more accessible, especially in settings where resources are limited.
How to Cite this Paper
Gautam, H., Pandey, N., Sharma, M., Kumar, A. & Patel, A. (2026). Multimodal AI-Based Mental Health Detection System: Integrating Text, Speech, and Facial Analysis Using Deep Learning. International Journal of Creative and Open Research in Engineering and Management, <i>02</i>(04). https://doi.org/10.55041/ijcope.v2i4.971
Gautam, Himanshu, et al.. "Multimodal AI-Based Mental Health Detection System: Integrating Text, Speech, and Facial Analysis Using Deep Learning." International Journal of Creative and Open Research in Engineering and Management, vol. 02, no. 04, 2026, pp. . doi:https://doi.org/10.55041/ijcope.v2i4.971.
Gautam, Himanshu,Narvadesh Pandey,Mukul Sharma,Abhishek Kumar, and Ashrit Patel. "Multimodal AI-Based Mental Health Detection System: Integrating Text, Speech, and Facial Analysis Using Deep Learning." International Journal of Creative and Open Research in Engineering and Management 02, no. 04 (2026). https://doi.org/https://doi.org/10.55041/ijcope.v2i4.971.
References
- S. Khoo, M. K. Lim, C. Y. Chong, and R. McNaney, “Machine Learning for Multimodal Mental Health Detection: A Systematic Review of Passive Sensing Approaches,” Sensors, vol. 24, no. 2, p. 348, 2024. Link
- W. Jin, Q. Li, Y. Xie, and G. Xiao, “Artificial intelligence in mental healthcare: an overview and future perspectives,” British Journal of Radiology, vol. 96, no. 1150, 2023. Link
- Zhang, K. Mao, and J. Chen, “A Multimodal Approach for Detection and Assessment of Depression Using Text, Audio and Video,” Phenomics, vol. 4, no. 3, pp. 234-249, 2024. Link
- Sadeghi et al., “Harnessing multimodal approaches for depression detection using large language models and facial expressions,” npj Mental Health Research, vol. 3, Art. 66, 2024. Link
- Xu et al., “Depression detection methods based on multimodal fusion of voice and text,” Scientific Reports, vol. 15, Art. 21907, 2025. Link
- Yin, J. Du, X. Xu, and L. Zhao, “Depression Detection in Speech Using Transformer and Parallel Convolutional Neural Networks,” Electronics, vol. 12, no. 2, p. 328, 2023. Link
- Liu et al., “Diagnostic accuracy of deep learning using speech samples in depression: a systematic review and meta-analysis,” Journal of the American Medical Informatics Association, vol. 31, no. 10, pp. 2394-2404, 2024. Link
- Lian et al., “A Survey of Deep Learning-Based Multimodal Emotion Recognition: Speech, Text, and Face,” Entropy, vol. 25, no. 10, p. 1440, 2023. Link
- Bokolo and Q. Liu, “Deep Learning-Based Depression Detection from Social Media: Comparative Evaluation of ML and Transformer Techniques,” Electronics, vol. 12, no. 21, p. 4396, 2023. Link
- Aldkheel and L. Zhou, “Depression Detection on Social Media: A Classification Framework and Research Challenges and Opportunities,” Journal of Healthcare Informatics Research, vol. 8, pp. 88-120, 2024. Link
Ethical Compliance & Review Process
- •All submissions are screened under plagiarism detection.
- •Review follows editorial policy.
- •Authors retain copyright.
- •Peer Review Type: Double-Blind Peer Review
- •Published on: May 01 2026
This article is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. You are free to share and adapt this work for non-commercial purposes with proper attribution.

