In a significant stride towards enhancing music information retrieval, researchers have introduced WEALY, a fully reproducible pipeline that leverages Whisper decoder embeddings for audio-based lyrics matching. This innovative approach addresses the limitations of existing methods, which often struggle with reproducibility and inconsistent baselines.
WEALY, developed by Eleonora Mancini, Joan SerrĂ , Paolo Torroni, and Yuki Mitsufuji, harnesses the power of Whisper, a state-of-the-art speech recognition model, to create robust and transparent baselines for lyrics matching tasks. By utilizing Whisper’s decoder embeddings, WEALY transforms the way audio content is analyzed and matched with corresponding lyrics, offering a more reliable and consistent method compared to previous approaches.
The researchers conducted extensive experiments on standard datasets to validate the effectiveness of WEALY. Their findings demonstrate that WEALY achieves performance levels comparable to state-of-the-art methods, despite the latter’s lack of reproducibility. This not only underscores the potential of speech technologies in music information retrieval but also provides a reliable benchmark for future research in this field.
WEALY’s pipeline integrates textual and acoustic features through multimodal extensions, enhancing its capability to match audio with lyrics accurately. The study also includes ablation studies and analyses on language robustness, loss functions, and embedding strategies, offering valuable insights into the optimization of lyrics matching tasks.
For music and audio production, the implications of WEALY are profound. Accurate lyrics matching can revolutionize the way music is cataloged, searched, and analyzed. It can enhance the user experience in music streaming platforms by providing precise lyrics synchronization, improving the accuracy of music recommendation systems, and facilitating more efficient music discovery. Additionally, WEALY’s robust and reproducible methodology can serve as a foundation for developing advanced tools for music transcription, translation, and analysis, benefiting both professionals and enthusiasts in the music industry. Read the original research paper here.



