A dataset for lipreading using sequences of video frames

Video

Published:

January 27, 2023

Lipreading is the task of decoding text from the movement of a speaker’s mouth. Based on LipNet, a model that maps a variable-length sequence of video frames to text, making use of spatiotemporal convolutions, a recurrent network, and the connectionist temporal classification loss, trained entirely end-to-end.

Dataset Technical Specification

Number of files:

100

Total dataset size:

Duration:

Format:

wav

Sample rate:

Resolution:

Dataset Demographics

Country:

Worldwide

Gender:

M/F 50-50%

Age:

18-55

Number of participants:

Request Quote Download Sample

A dataset for lipreading using sequences of video frames

Dataset Technical Specification

Dataset Demographics

Contact Us

Hire Experts

Find Work

Resources

Hire Freelancers

Comparison

Twine Network