Get speech datasets

With our community of over 400,000+ skilled professionals, we can provide both voice datasets and transcription services.
Speech datasets for voice recognition
Trusted by leading generative AI teams, public companies, and startups
Map of the world highlighting counties we work in across the globe

World languages

Covering 163 languages and thousands of dialects. Our huge network of 500,000+ freelancers gives us unparalleled scale.

Different languages, dialects, and cultures to reduce bias in your model.
Contact us
Example AI data request

Accurate transcription

Global data collectors using our speech recording software or professional voice actors for voice datasets, who will provide vocal data using their own recording studios.

For transcription, we can use our network to train ML models by converting speech to text by humans.
Get started

Here's what our customers say

"Working with Twine enabled us to scale projects quicker than before."
-Josh Bolland
CEO, J B Cole
"Working with Twine AI has been an exceptional experience. Their ability to consistently deliver data and the level of service, professionalism, and dedication to understanding our needs set them apart."
-Ian Sherwin
Head of Data, Hypersurfaces
Trustpilot logo
5 star rating
108 reviews

How we work

1

Technical Meeting

Scope your full data collection or data annotation project.
2

Proof of Concept

Delivery of initial proof of concept to prove feasibility.
3

Full Project Delivery

Dedicated Twine Projects team will manage delivery with flexible monthly billing and QA services.
Book a meeting

Benefits of
Twine
AI

Brand designers

Quality

Highest-quality audio recordings in uncompressed WAV 44kHz, 16-bit format. Ideal for training speech recognition models whether for on-chip or cloud-based software.

Scalability

Twine specializes in building voice datasets in major global languages with thousands of dialects. From complex scripts to to simple, our participants are vetted in our QA process train your models.
Person holding globe

Customizable

Participants can follow a script or record conversational audio. You can set a requirement for gender balance, accent variety and specific language requirements.
Audio

Audio Formats

Files can be delivered as needed for your project, whether split, or consolidated. If you prefer compressed audio, or uncompressed we can deliver what you need with relevant meta data.

Project Manager

Have your data collection project run by an experienced  Project Manager who can ensure all participants are following instructions and work with you to improve the collection process.

Payments

We take away the headache of paying participants from all over the world with our integrated payment solution.

Contact Us

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

How do I find out more about Twine AI?

Other kinds of data we provide:

Vision data:
We can work with long-range biometrics, meaning we create video datasets with participants at long distances from the camera. This can be across a wide range of demographics including gender, ethnicity, age, and body size. Alternatively, we can look at facial biometrics by working with participants at a close range. Whilst creating these types of video datasets, the demographics we can look at include gender, ethnicity, age, and facial distinctions (eye colour, glasses, etc). Learn more.
Our other AI resources:
We like to keep our audience well informed on everything regarding data. Our Twine Blog has its own AI category, and within it, we have listicles of the highest-quality, open-sourced datasets out there right now. We have an article on 100+ Open Audio and Video Datasets, 100+ Speech Datasets, and listicles of datasets in almost every language you can think of!

We also have an AI Newsletter, which we send out to our AI/ML audience, providing them with the latest industry news.

Want to be in the loop on LinkedIn? Check out our Twine AI LinkedIn page, where we post our latest dataset listicles, and other exciting articles + media from the AI/ML space.