JukeBox: A Speaker Recognition Dataset with Multi-lingual Singing Voice Audio

Available for download here.

A text-independent speaker recognition system relies on successfully encoding speech factors such as vocal pitch, intensity, and timbre to achieve good performance. A majority of such systems are trained and evaluated using spoken voice or everyday conversational voice data. Spoken voice, however, exhibits a limited range of possible speaker dynamics, thus constraining the utility of the derived speaker recognition models. Singing voice, on the other hand, covers a broader range of vocal and ambient factors and can, therefore, be used to evaluate the robustness of a speaker recognition system. However, a majority of existing speaker recognition datasets only focus on the spoken voice. In comparison, there is a significant shortage of labeled singing voice data suitable for speaker (i.e., singer) recognition research. To address this issue, we assemble JukeBox - a large-scale speaker recognition dataset with multi-lingual singing voice audio annotated for singer, gender, and language labels. We use the current state-of-the-art methods to demonstrate the difficulty of performing speaker recognition on singing voice using models trained on spoken voice alone. We also evaluate the effect of gender and language on speaker recognition performance, both in spoken and singing voice data.

Below are a set of 5 random samples from the JukeBox dataset.

Speaker ID 10, Original audio file 0.wav
Metadata: Aimer | Japanese | female
Speaker ID 280, Original audio file 1.wav
Metadata: Shaan | Hindi | male
Speaker ID 719, Original audio file 1.wav
Metadata: Johnny Mathis | English | male
Speaker ID 772, Original audio file 1.wav
Metadata: Kirsty MacColl | English | female
Speaker ID 1134, Original audio file 2.wav
Metadata: Vanessa Paradis | English | female

A. Chowdhury, A. Cozzo, A. Ross, “JukeBox: A Multilingual Singer Recognition Dataset,” Proc. of Interspeech, (Shanghai, China), October 2020.

JukeBox: A Speaker Recognition Dataset with Multi-lingual Singing Voice Audio

Publications