A Large Scale Speaker Verification Dataset on Camera

Let's Start!
The wordcloud is generated by the videos' tags.

About

Important Notice: Our released dataset only contains annotation data, including the YouTube links, time stamps and speaker labels. We do not release audio or visual data and it is the user's responsibility to decide whether and how to download the video data and whether their intended purpose with the downloaded data is legal in their country.

1.4M/1M+ Utterances

We annotate over 1.4M/1M audio/video segments from short videos on YouTube, encompassing various contexts including podcasts, lives, live streaming highlights, etc.

38K/18K+ Speakers

The speakers in our dataset come from 130+ countries, spanning multiple families of languages.

2100/1400+ Hours

The scenarios covered are more in line with real-life situations since we use short videos as the data source.

Features

Some new features about the VoxBlink and VoxBlink-clean!

Gender

Languages

Location

Themes

Duration

Time-Varying

Publications

Please cite the following if you make use of the dataset.

Yuke Lin, Xiaoyi Qin, Ming Cheng, Ning Jiang, Guoqing Zhao,Haiying Wu, Ming Li

VoxBlink: A Large Scale Speaker Verification Dataset on Camera, ICASSP 2024

Bibtex |  Abstract |  PDF  

Guidance

Build your VoxBlink

Resource

         The annotation files and meta-resources can be downloaded through here. Apply for the resource and we will soon deal with it.
         1. The information of videos are stored in ./video_tags folder.To get this, you need to decompress video_tags.tar.gz firstly. An example of these files can be referred in video_tag_example.
         2. The timestamp files are stored in ./timestamp folder. To get this, you need to decompress timestamp.tar.gz firstly. An example of timestamp file can be referred in timestamp_example.
         3. Two version of the VoxBlink dataset can be found in ./data. The utterances adhere to the following naming rules:
speaker_id-video_id-utterance_id
         4. The video lists to be download are saved in ./video_list. We provide three versions:
  • spk2videos_full: Download VoxBlink complete version(38K speakers).
  • spk2videos_clean: Download VoxBlink clean version(18K speakers).
  • spk2videos_test: Test if the scripts are runnable(One speakers).
         5. Other meta information(duration, lingual, location, gender) are listed in ./meta.

Execute

         After you download the annotation files, you can follow the guidance in Repo and build your database with ./video_list and ./timestamp

License

         The open-source resources and the execution scripts are licensed under the CC BY-NC-SA 4.0 for protection. Detailed terms can be found on LICENSE. If you have some legal concerns of the privacy confliction to use the data, please consult the lawyer in your local region. The metadata provided is accurate as of June 2023. We cannot guarantee the availability of videos on the YouTube platform in the future. For YouTube users with concerns regarding their videos' inclusion in our dataset, please contact us via E-mail: yuke.lin@dukekunshan.edu.cn or ming.li369@dukekunshan.edu.cn.

Acknowledgement

         This research is funded in part by the National Natural Science Foundation of China (62171207), Science and Technology Program of Suzhou City(SYC2022051) and MaShang Consumer Finance Co.Ltd. Many thanks for the computational resource provided by the Advanced Computing East China Sub-Center.