A Large Scale Speaker Verification Dataset on Camera

Let's Start!
The wordcloud is generated by the videos' tags.


Important Notice: Our released dataset only contains annotation data, including the YouTube links, time stamps and speaker labels. We do not release audio or visual data and it is the user's responsibility to decide whether and how to download the video data and whether their intended purpose with the downloaded data is legal in their country.

1.4M/1M+ Utterances

We annotate over 1.4M/1M audio/video segments from short videos on YouTube, encompassing various contexts including podcasts, lives, live streaming highlights, etc.

38K/18K+ Speakers

The speakers in our dataset come from 130+ countries, spanning multiple families of languages.

2100/1400+ Hours

The scenarios covered are more in line with real-life situations since we use short videos as the data source.


Some new features about the VoxBlink and VoxBlink-clean!








Please cite the following if you make use of the dataset.

Yuke Lin, Xiaoyi Qin, Ming Cheng, Ning Jiang, Guoqing Zhao,Haiying Wu, Ming Li

VoxBlink: A Large Scale Speaker Verification Dataset on Camera, ICASSP 2024

Bibtex |  Abstract |  PDF  


Build your VoxBlink


         The annotation files and meta-resources can be downloaded through here. Apply for the resource and we will soon deal with it.
         1. The information of videos are stored in ./video_tags folder.To get this, you need to decompress video_tags.tar.gz firstly. An example of these files can be referred in video_tag_example.
         2. The timestamp files are stored in ./timestamp folder. To get this, you need to decompress timestamp.tar.gz firstly. An example of timestamp file can be referred in timestamp_example.
         3. Two version of the VoxBlink dataset can be found in ./data. The utterances adhere to the following naming rules:
         4. The video lists to be download are saved in ./video_list. We provide three versions:
  • spk2videos_full: Download VoxBlink complete version(38K speakers).
  • spk2videos_clean: Download VoxBlink clean version(18K speakers).
  • spk2videos_test: Test if the scripts are runnable(One speakers).
         5. Other meta information(duration, lingual, location, gender) are listed in ./meta.


         After you download the annotation files, you can follow the guidance in Repo and build your database with ./video_list and ./timestamp


         The open-source resources and the execution scripts are licensed under the CC BY-NC-SA 4.0 for protection. Detailed terms can be found on LICENSE. If you have some legal concerns of the privacy confliction to use the data, please consult the lawyer in your local region. The metadata provided is accurate as of June 2023. We cannot guarantee the availability of videos on the YouTube platform in the future. For YouTube users with concerns regarding their videos' inclusion in our dataset, please contact us via E-mail: yuke.lin@dukekunshan.edu.cn or ming.li369@dukekunshan.edu.cn.


         This research is funded in part by the National Natural Science Foundation of China (62171207), Science and Technology Program of Suzhou City(SYC2022051) and MaShang Consumer Finance Co.Ltd. Many thanks for the computational resource provided by the Advanced Computing East China Sub-Center.