HeadlinesBriefing favicon HeadlinesBriefing.com

EmoNet Retrospective: Speaker Recognition Evolution

Towards Data Science •
×

The author reflects on their MS thesis EmoNet, which achieved 39.18 Weighted F1 on EmoryNLP's emotion recognition leaderboard. At the time, their speaker-aware transformer model placed competitively between specialized architectures like TUCORE-GCN_RoBERTa. The work focused on capturing how speaker identity shapes emotional context in multi-turn dialogues.

EmoNet introduced three key innovations: global speaker IDs across dialogues, a speaker behavior module using GRUs to track historical patterns, and weighted cross-entropy loss for imbalanced emotion classes. Surprisingly, adding global speaker ID alone initially decreased performance, revealing the need for complementary components.

The ERC landscape has transformed since 2023, now dominated by LLaMA-2-7B-based systems with LoRA fine-tuning. Despite architectural shifts, the author's core insights about speaker importance persist in current approaches. The work demonstrates how fundamental concepts can evolve with new tools while maintaining their core value.