The Best Egyptian Language Datasets of 2022

Egyptian is one of the most commonly spoken languages in the world. That being said, it’s not always easy to find Egyptian language datasets to train your models. 

That’s why we’ve done the hard bit for you. We’ve searched high and low here at Twine to find the best Egyptian Language datasets.

Are you ready?

Let’s dive in.


Here are our top picks for Egyptian Language datasets:

Egyptian Arabic Segmentation Dataset

This dataset contains 350 tweets with more than 8,000 words (including 3,000 unique words) written in the Egyptian dialect. The tweets have much dialectal content covering most dialectal Egyptian phonological, morphological, and syntactic phenomena. It also includes Twitter-specific aspects of the text, such as #hashtags, @mentions, emoticons, and URLs.

Access the dataset

Egyptian Hieroglyphics Datasets

This dataset detects and translates hieroglyphs using a real-time object detection SSD algorithm which will help tourists to unveil the mysteries of Ancient Egypt. Contains text files.

Access the dataset

Egyptian Arabic Conversational Speech Corpus

This open-source dataset consists of 5.5 hours of transcribed Egyptian Arabic conversational speech on certain topics, where nine conversations between two pairs of speakers were contained.

Access the dataset

BOLT Egyptian Arabic-English Word Alignment Dataset

The BOLT Egyptian Arabic-English Word Alignment Dataset was developed by the Linguistic Data Consortium (LDC) and consists of 349,414 words of Egyptian Arabic and English parallel text enhanced with linguistic tags to indicate word relations. Contains text files.

Access the dataset


Wrapping up

To conclude, here are top picks for the best Egyptian language datasets for your projects:

  1. Egyptian Arabic Segmentation Dataset
  2. Egyptian Hieroglyphics Datasets
  3. Egyptian Arabic Conversational Speech Corpus
  4. BOLT Egyptian Arabic-English Word Alignment Dataset

We hope that this list has either helped you find a dataset for your project or, realize the myriad of options available. 

Please let us know if there are any datasets you would like us to add to the list.

If you would like to learn more about how we could help build a custom dataset for your project, don’t hesitate to contact us!

Let us help you do the math – check our AI dataset project calculator.

Ready to learn more? Check out our Dataset Archives:

Twine AI

Harness Twine’s established global community of over 400,000 freelancers from 190+ countries to scale your dataset collection quickly. We have systems to record, annotate and verify custom video datasets at an order of magnitude lower cost than existing methods.