Outsource work to the Twine expert freelance network
support@twine.net
+44-161-710-3084
The dataset consists of up to 100 utterances of 500 different words, spoken by hundreds of different speakers. All videos are 29 frames (1.16 seconds) in length, and the word occurs in the middle of the video. The word duration is given in the metadata, from which you can determine the start and end frames.