IJCOPE Journal

UGC Logo DOI / ISO Logo

International Journal of Creative and Open Research in Engineering and Management

A Peer-Reviewed, Open-Access International Journal Supporting Multidisciplinary Research, Digital Publishing Standards, DOI Registration, and Academic Indexing.
Journal Information
ISSN: 3108-1754 (Online)
Crossref DOI: Available
ISO Certification: 9001:2015
Publication Fee: 599/- INR
Compliance: UGC Journal Norms
License: CC BY 4.0
Peer Review: Double Blind
Volume 02, Issue 04

Published on: April 2026

MULTIMODAL AI-BASED MENTAL HEALTH DETECTION SYSTEM: INTEGRATING TEXT, SPEECH, AND FACIAL ANALYSIS USING DEEP LEARNING

Himanshu Gautam Narvadesh Pandey Mukul Sharma Abhishek Kumar Ashrit Patel

Dept Computer Science & Engineering IIMT College of Engineering Greater Noida

Article Status

Plagiarism Passed Peer Reviewed Open Access

Available Documents

Abstract

Mental health disorders like depression, anxiety, and PTSD continue to be some of the most pressing challenges in healthcare today. Not just because of how common they are, but because they are genuinely hard to detect early and accurately. These conditions are deeply personal and complex, which makes screening far from straightforward.

This paper introduces a multimodal deep learning system designed to change that. By combining three distinct channels of human expression, what people say (text), how they say it (speech), and what their face reveals (facial expressions), our approach builds a more complete picture of a person's mental state than any single method could offer alone. These modalities are woven together into a unified AI framework that functions as a mental health screening tool, designed with both accuracy and ethical responsibility in mind.

Early results on benchmark datasets are encouraging. The multimodal approach consistently outperforms single modality methods in identifying signs of depression, anxiety, and stress. The next step is large scale clinical validation across diverse, real-world populations to ensure the system holds up where it matters most. We also plan to incorporate explainable AI techniques, so that predictions are not just accurate but understandable and trustworthy to the clinicians and patients relying on them.

Looking further ahead, the system is designed to integrate with mobile and wearable technologies, opening the door to continuous, non-invasive mental health monitoring and earlier intervention. The broader goal is a tool that does not replace clinicians, but genuinely supports them, making quality mental health assessment more accessible, especially in settings where resources are limited.

How to Cite this Paper

Gautam, H., Pandey, N., Sharma, M., Kumar, A. & Patel, A. (2026). Multimodal AI-Based Mental Health Detection System: Integrating Text, Speech, and Facial Analysis Using Deep Learning. International Journal of Creative and Open Research in Engineering and Management, <i>02</i>(04). https://doi.org/10.55041/ijcope.v2i4.971

Gautam, Himanshu, et al.. "Multimodal AI-Based Mental Health Detection System: Integrating Text, Speech, and Facial Analysis Using Deep Learning." International Journal of Creative and Open Research in Engineering and Management, vol. 02, no. 04, 2026, pp. . doi:https://doi.org/10.55041/ijcope.v2i4.971.

Gautam, Himanshu,Narvadesh Pandey,Mukul Sharma,Abhishek Kumar, and Ashrit Patel. "Multimodal AI-Based Mental Health Detection System: Integrating Text, Speech, and Facial Analysis Using Deep Learning." International Journal of Creative and Open Research in Engineering and Management 02, no. 04 (2026). https://doi.org/https://doi.org/10.55041/ijcope.v2i4.971.

Search & Index

References


  • S. Khoo, M. K. Lim, C. Y. Chong, and R. McNaney, “Machine Learning for Multimodal Mental Health Detection: A Systematic Review of Passive Sensing Approaches,” Sensors, vol. 24, no. 2, p. 348, 2024. Link

  • W. Jin, Q. Li, Y. Xie, and G. Xiao, “Artificial intelligence in mental healthcare: an overview and future perspectives,” British Journal of Radiology, vol. 96, no. 1150, 2023. Link

  • Zhang, K. Mao, and J. Chen, “A Multimodal Approach for Detection and Assessment of Depression Using Text, Audio and Video,” Phenomics, vol. 4, no. 3, pp. 234-249, 2024. Link

  • Sadeghi et al., “Harnessing multimodal approaches for depression detection using large language models and facial expressions,” npj Mental Health Research, vol. 3, Art. 66, 2024. Link

  • Xu et al., “Depression detection methods based on multimodal fusion of voice and text,” Scientific Reports, vol. 15, Art. 21907, 2025. Link

  • Yin, J. Du, X. Xu, and L. Zhao, “Depression Detection in Speech Using Transformer and Parallel Convolutional Neural Networks,” Electronics, vol. 12, no. 2, p. 328, 2023. Link

  • Liu et al., “Diagnostic accuracy of deep learning using speech samples in depression: a systematic review and meta-analysis,” Journal of the American Medical Informatics Association, vol. 31, no. 10, pp. 2394-2404, 2024. Link

  • Lian et al., “A Survey of Deep Learning-Based Multimodal Emotion Recognition: Speech, Text, and Face,” Entropy, vol. 25, no. 10, p. 1440, 2023. Link

  • Bokolo and Q. Liu, “Deep Learning-Based Depression Detection from Social Media: Comparative Evaluation of ML and Transformer Techniques,” Electronics, vol. 12, no. 21, p. 4396, 2023. Link

  • Aldkheel and L. Zhou, “Depression Detection on Social Media: A Classification Framework and Research Challenges and Opportunities,” Journal of Healthcare Informatics Research, vol. 8, pp. 88-120, 2024. Link


 

Ethical Compliance & Review Process

  • All submissions are screened under plagiarism detection.
  • Review follows editorial policy.
  • Authors retain copyright.
  • Peer Review Type: Double-Blind Peer Review
  • Published on: May 01 2026
CCBYNC

This article is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. You are free to share and adapt this work for non-commercial purposes with proper attribution.

View License
Scroll to Top