Video

Off-the-Shelf Datasets

These off-the-shelf video data sets can be used for computer vision applications such as facial recognition, object detection, and other visual recognition use cases. Don’t see the video data you need? Contact us for a free quote.

We also have video datasets, audio datasets, image datasets or text datasets available.

Video

Speech Recognition

Casual Conversations Dataset

Casual Conversations is a large scale multimodal (video + audio) benchmark dataset built to evaluate and audit computer vision and speech models for accuracy across diverse ages, genders, apparent skin tones, and lighting conditions.

Video

Speech Recognition

Audio-visual speech with multiple speakers

Large-scale audio-visual dataset comprising speech clips with no interfering background signals.

Video

Activity Detection

Audio-visual emotion recognition

These expressions are produced at two levels of emotional intensities (regular and strong) except for the neutral emotion that only contains regular intensity.

Video

Activity Detection

Instructional cooking videos

Each video contains some number of procedure steps to fulfill a recipe. All the procedure segments are temporal localized in the video with starting time and ending time. The distributions of 1) video duration, 2) number of recipe steps per video, 3) recipe segment duration and 4) number of words per sentence are shown below.

Video

Biometrics

audio-visual recordings of sign language

This corpus contains 15 spontaneous dialogues and multi-participant conversations by deaf signers, 10 of which were recorded in authentic settings like a deaf club and a bar, 5 were recorded in the lab.

Video

Activity Detection

A dataset for lipreading using sequences of video frames

Human lipreading performance increases for longer words, indicating the importance of features capturing temporal context in an ambiguous communication channel.

Video

Biometrics