Peer-reviewed | Open Access | Multidisciplinary
Mental health disorders have become a significant global concern, affecting individuals across diverse social and demographic backgrounds. Early detection and continuous emotional monitoring are critical for providing timely intervention and effective psychological support. However, conventional diagnostic approaches often rely on self-reported assessments and periodic clinical evaluations, which may not capture subtle emotional variations in everyday interactions. In response to these limitations, recent research has explored the integration of artificial intelligence techniques for automated mental health analysis. In particular, emotion-aware conversational systems have gained attention for their potential to provide scalable and accessible psychological support. This paper presents a comprehensive review of multimodal emotion recognition techniques and their role in enabling empathetic conversational artificial intelligence for mental health applications. The study examines how multiple sources of behavioral information—including textual communication, speech characteristics, facial expressions, and other behavioral signals—can be combined to improve the accuracy of emotion detection. A systematic review methodology is employed to analyze existing research contributions, focusing on the datasets used, machine learning models applied, and the performance outcomes reported in recent studies. The analysis highlights the growing importance of multimodal learning frameworks that integrate linguistic, acoustic, and behavioral features to capture complex emotional states more effectively than unimodal approaches. Furthermore, the paper discusses emerging technologies such as multimodal large language models, privacy-preserving learning techniques, and wearable emotion sensing devices that are expected to shape the next generation of intelligent mental health support systems. The findings suggest that emotion-aware conversational AI can serve as a valuable complementary tool for mental health monitoring and early intervention, particularly when integrated with human-centered therapeutic practices.
Keywords: Emotion-aware AI, Multimodal learning, Conversational agents, Mental health support, Emotion recognition, Empathetic dialogue systems