The CEO of Soul, Zhang Lu has always regarded Artificial Intelligence as a technology that will usher in a new era for the social networking sector. Soul, a platform known for its popularity among China’s Zoomers, is an exclusively interest-based network that has put a unique spin on digital socialization.
Soul Zhang Lu and her team were among the first on the social scene to acknowledge that the current capabilities of AI models were not enough to transform social interactions to fully meet user demands. But, simply increasing the general intelligence of these models would not cut it.
The quest to replicate human intelligence has been a longstanding pursuit in terms of AI models and undoubtedly, significant strides have been made in tasks like language translation and image recognition. Yet, understanding and responding to human emotions has remained a formidable challenge.
However, a recent breakthrough by Soul Zhang Lu’s team has displayed the potential to change that. The programming team behind the very popular app showed off its chops at the Multimodal Emotion Recognition Challenge (MER24).
The competition is a part of the International Joint Conference on Artificial Intelligence (IJCAI), which is one of the most reputable symposiums on AI technology that attracts top researchers and developers from institutions like Tsinghua University, Imperial College London, and Nanyang Technological University.
At MER24, the focus was on affective computing, the field of AI technology that explores how to interpret, simulate, and respond to human emotions. The competition invited submissions in 3 categories:
SEMI (Semi-Supervised Learning)
NOISE (Noise Robustness)
OV (Open Vocabulary Emotion Recognition)
Soul Zhang Lu’s team participated in the SEMI section. Their submission reflected the company’s technical prowess in using semi-supervised learning to enhance emotion recognition in real-world applications.
Although essential for various sectors, emotion recognition has always been a mindboggling task for the simple reason that in order to understand the complexities of human emotions machines have to be taught to pick up cues across modalities. For instance, to gain a comprehensive understanding of a person’s emotional state, the AI model will need to integrate various data modalities, such as facial expressions, voice tone, and body language.
When this technology sees the light of day, of course, it will have far-reaching implications for a variety of applications, including social media platforms, customer service, and mental health support. But, training a model for multimodal emotion recognition is a gargantuan task. Then, there is the lack of labeled data and the monumental cost of collecting relevant labeled data which simply add to the issue.
Soul Zhang Lu’s team used various innovative approaches along with their in-house AI models to tackle several of these problems. It needs to be mentioned here that the team already has significant experience in building and deploying large language models.
For example, a lot of the platform’s AI features are powered by Soul X, which is the platform’s internally developed large language model. The team also has a large voice model to its credit. Both of these have already been trained to a significant extent for multimodal emotion recognition.
But, for their submission in the SEMI category, the team of Soul Zhang Lu had to up the ante because this category is all about techniques designed to maximize the use of unlabeled data in training AI models. Semi-supervised learning is particularly important in emotion recognition because it reduces the dependence on labeled data, making AI models more robust and adaptable in real-world scenarios.
For the competition, Soul Zhang Lu’s team leveraged their strong technical background to adjust their existing models. Soul’s submission involved the use of EmoVCLIP, a fine-tuned model for video emotion recognition. The team employed a self-training strategy that iteratively used pseudo-labels for unlabeled data. This greatly improved the model’s ability to generalize across different scenarios.
Moreover, the team of Soul Zhang Lu introduced an innovative approach known as Modality Dropout. This technique mitigated the competitive effects between different data modalities (such as text, video, and audio) in the model. By far the most groundbreaking aspect of Soul’s submission, Modality Dropout significantly boosted the model’s accuracy in recognizing emotions, allowing Soul to secure the top spot in the competition.
While the tweaks made to existing models and the Modality Dropout technique have yet to find their way into Soul App’s features, the platform already offers an assortment of AI-powered applications. Most noteworthy among these are AI Goudan a human-like AI character that is trained to help users navigate through the platform and enjoy its various offerings; Werewolf Awakening, a popular game that was reintroduced with AI players that users can play against or even team up with.
Soul Zhang Lu’s team will find myriad ways to use their MER24 submission in real-world applications on the platform. But more importantly, the company’s victory illustrates a broader trend in AI development – the growing need for emotional intelligence (EQ) in large models.
AI’s ability to exhibit emotional intelligence is critical for its success in almost all industries but more so in social media, a sector that is prone to network effects and has high traffic value. Hence, when using AI in social scenarios, models with EQ are crucial to attaining the elusive goal of Product Market Fit (PMF).
Che Bin, Soul’s Vice President and Product Lead, explained it perfectly when he said that emotional intelligence is crucial for AI applications in social networking, where multimodal and anthropomorphic attributes create life-like interactions. And that is what users are craving- Applications that are not just tools but friend-like conversationalists if not actual friends.
Users want applications that will understand their feelings and will respond in a human-like manner, and it goes without saying that AI can provide such personalized, empathetic interactions only if it is emotionally intelligent.