Speech Data Collection for AI Models

Accurate transcription

Global data collectors using our speech recording software or professional voice actors for voice datasets, who will provide vocal data using their own recording studios.

For transcription, we can use our network to train ML models by converting speech to text by humans.

Get started

Here's what our customers say

"We're very happy with the videos. The results are great. Twine has exceeded our expectations, and we look forward to the next phase of our collaboration."

"Working with Twine AI has been an exceptional experience. Their ability to consistently deliver data and the level of service, professionalism, and dedication to understanding our needs set them apart."

-Ian Sherwin

Head of Data, Hypersurfaces

108 reviews

How we work

Project Scoping

Define your project goals, data needs, and quality standards with a dedicated Project Manager.

Production & Management

We recruit, vet and train experts to work on your project. We run quality control workflows, and handle secure global payments.

Delivery & iteration

Your Project Manager ensures on-time delivery with continuous QA and flexible monthly billing, iterating based on your feedback.

Book a meeting

Benefits of
Twine AI

Quality

Highest-quality audio recordings in uncompressed WAV 44kHz, 16-bit format. Ideal for training speech recognition models whether for on-chip or cloud-based software.

Scalability

Twine specializes in building voice datasets in major global languages with thousands of dialects. From complex scripts to to simple, our participants are vetted in our QA process train your models.

Customizable

Participants can follow a script or record conversational audio. You can set a requirement for gender balance, accent variety and specific language requirements.

Audio Formats

Files can be delivered as needed for your project, whether split, or consolidated. If you prefer compressed audio, or uncompressed we can deliver what you need with relevant meta data.

Project Manager

Have your data collection project run by an experienced Project Manager who can ensure all participants are following instructions and work with you to improve the collection process.

Payments

We take away the headache of paying participants from all over the world with our integrated payment solution.

How do I find out more about Twine AI?

Other kinds of data we provide:

Vision data:
We can work with long-range biometrics, meaning we create video datasets with participants at long distances from the camera. This can be across a wide range of demographics including gender, ethnicity, age, and body size. Alternatively, we can look at facial biometrics by working with participants at a close range. Whilst creating these types of video datasets, the demographics we can look at include gender, ethnicity, age, and facial distinctions (eye colour, glasses, etc). Learn more.

Our other AI resources:
We like to keep our audience well informed on everything regarding data. Our Twine Blog has its own AI category, and within it, we have listicles of the highest-quality, open-sourced datasets out there right now. We have an article on 100+ Open Audio and Video Datasets, 100+ Speech Datasets, and listicles of datasets in almost every language you can think of!

We also have an AI Newsletter, which we send out to our AI/ML audience, providing them with the latest industry news.

Want to be in the loop on LinkedIn? Check out our Twine AI LinkedIn page, where we post our latest dataset listicles, and other exciting articles + media from the AI/ML space.

Get speech datasets

World languages

Accurate transcription

Here's what our customers say