In the ever-evolving landscape of artificial intelligence, large language models (LLMs) have made significant strides in open-domain question answering. However, their prowess in music-related reasoning has been somewhat stifled due to the scarcity of music knowledge in their pretraining data. This gap has left a void in the realm of music information retrieval and computational musicology, particularly in the area of factual and contextual music question answering (MQA) that is grounded in artist metadata or historical context.
Enter MusWikiDB and ArtistMus, two innovative resources introduced by researchers Daeyong Kwon, SeungHeon Doh, and Juhan Nam. MusWikiDB is a vector database comprising 3.2 million passages from 144,000 music-related Wikipedia pages. It serves as a treasure trove of music knowledge, enabling systematic evaluation of retrieval-augmented generation (RAG) for MQA.
ArtistMus, on the other hand, is a benchmark of 1,000 questions on 500 diverse artists, complete with metadata such as genre, debut year, and topic. This benchmark provides a robust framework for testing the capabilities of LLMs in the domain of music.
The researchers conducted experiments that revealed RAG significantly enhances factual accuracy. Open-source models witnessed gains of up to 56.8 percentage points. For instance, Qwen3 8B’s accuracy surged from 35.0 to 91.8, approaching the performance levels of proprietary models. Furthermore, RAG-style fine-tuning not only bolstered factual recall but also improved contextual reasoning, yielding better results on both in-domain and out-of-domain benchmarks.
MusWikiDB also proved its mettle by delivering approximately 6 percentage points higher accuracy and 40% faster retrieval than a general-purpose Wikipedia corpus.
The introduction of MusWikiDB and ArtistMus is a significant step forward in the field of music information retrieval and domain-specific question answering. These resources lay the groundwork for retrieval-augmented reasoning in culturally rich domains like music, paving the way for more nuanced and accurate music-related AI interactions.



