Image classification, the cornerstone of computer vision, unlocks a world of possibilities. Imagine AI systems that diagnose diseases from medical scans, robots navigating environments seamlessly, or self-driving cars recognising traffic signs. These marvels rely on robust datasets, the training ground for AI models. But with countless options available, which dataset is right for you?
This curated list explores 13+ diverse image classification datasets, catering to various project needs and complexities. Whether you need images of everyday objects, people, nature scenes or something more niche, you’ll find some helpful training data here.
Popular Image Classification Datasets
1. MNIST
The MNIST database of handwritten digits is one of the most classic machine learning datasets. With 60,000 training images and 10,000 test images of 0-9 digits (10 classes of digits), MNIST is excellent for benchmarking image classification models. Ideal for testing basic algorithms and understanding image classification fundamentals.
Official dataset page
2. CIFAR-10/100
This dataset is known for its manageability and is composed of 60,000 32×32 color images, neatly divided into 10 classes with 6,000 images per class. Of these, 50,000 serve as the training subset, with the remaining 10,000 earmarked for testing. The CIFAR-10’s moderate size makes it ideal for experiments where computational resources are limited. Perfect for benchmarking and exploring convolutional neural networks (CNNs).
The dataset’s classes are listed below, along with ten randomly selected photos from each:
Similar to its sibling, CIFAR-100 ramps up complexity by offering 100 classes and are grouped into 20 superclasses each containing 600 images. Broken down, that’s 500 images per class for training and a hundred for testing. This increase in categories presents a more challenging scenario for models, sharpening the skills of classification algorithms. The dataset’s classes are listed below:
Superclass | Classes |
aquatic mammals | beaver, dolphin, otter, seal, whale |
fish | aquarium fish, flatfish, ray, shark, trout |
flowers | orchids, poppies, roses, sunflowers, tulips |
food containers | bottles, bowls, cans, cups, plates |
fruit and vegetables | apples, mushrooms, oranges, pears, sweet peppers |
household electrical devices | clock, computer keyboard, lamp, telephone, television |
household furniture | bed, chair, couch, table, wardrobe |
insects | bee, beetle, butterfly, caterpillar, cockroach |
large carnivores | bear, leopard, lion, tiger, wolf |
large man-made outdoor things | bridge, castle, house, road, skyscraper |
large natural outdoor scenes | cloud, forest, mountain, plain, sea |
large omnivores and herbivores | camel, cattle, chimpanzee, elephant, kangaroo |
medium-sized mammals | fox, porcupine, possum, raccoon, skunk |
non-insect invertebrates | crab, lobster, snail, spider, worm |
people | baby, boy, girl, man, woman |
reptiles | crocodile, dinosaur, lizard, snake, turtle |
small mammals | hamster, mouse, rabbit, shrew, squirrel |
trees | maple, oak, palm, pine, willow |
vehicles 1 | bicycle, bus, motorcycle, pickup truck, train |
vehicles 2 | lawn-mower, rocket, streetcar, tank, tractor |
3. ImageNet
The behemoth of image classification, boasting 14 million of hand-annotated images across thousands of categories. The vastness and depth of ImageNet provide a rigorous benchmark for image classification dataset prowess. Ideal for large-scale training and pushing the boundaries of AI.
- Total number of non-empty WordNet synsets: 21841
- Total number of images: 14197122
- Number of images with bounding box annotations: 1,034,908
- Number of synsets with SIFT features: 1000
- Number of images with SIFT features: 1.2 million
4. ObjectNet
Crowdsourcing was used to gather test set photos for the ObjectNet dataset. Because it contains objects in odd places inside realistic, intricate backgrounds, this image set is distinct. Algorithms for object recognition suffer greatly from these crowded environments. For assessing the resilience of an algorithm taught via transfer learning, this makes it perfect.
ObjectNet is distinct from ImageNet and CIFAR-100 because it is intended only for computer vision system testing, not as a training dataset.
Details of the dataset:
- 50,000 test photos with controls for viewpoint, rotation, and backdrop
- 313 distinct object classes, 113 of which have ImageNet overlap
5. Scene Understanding (SUN) dataset
The Scene Categorisation benchmark was created using this dataset, which was made available by Princeton University.
Details of the dataset:
- 108,753 photos, of which 76,128 are training photos
- 10,875 pictures of validation
- 21,750 test pictures
- 397 groups
- 100 JPEG pictures minimum per category
- a maximum of 120,000 pixels per picture
6. Intel Image Classification dataset
The Intel Image Classification dataset, initially compiled by Intel, contains approximately 25,000 images of natural scenes from around the world. The images are divided into categories such as mountains, glaciers, seas, forests, buildings, and streets.
Details of the dataset:
- ~25,000 images are grouped into categories like streets, buildings, woods, mountains, seas, and glaciers.
- 14,000 training images; 3,000 validation; 7,000 test images
Image Classification Datasets for Specialised Domains
7. Open Images V7
Embrace diversity with ~9 million images, annotated with object bounding boxes, object segmentation masks, visual relationships, and localised narratives
The dataset is the largest one currently available with object position annotations, containing a total of 16 million bounding boxes for 600 object classes on 1.9 million photos.
Official dataset page
8. Food-101
Craving image recognition deliciousness? This dataset consists of 101,000 images of diverse dishes for restaurant recommendation systems or dietary analysis. With 750 training and 250 test images for each category, the labels for test images have been manually cleaned. Although the training set does contain some noise
Official dataset page
9. Fashion-MNIST
Dress your AI with 70,000 28×28 fashion images. It is divided into a training set with 60,000 images and a test set with 10,000 images. Each example is a 28 by 28 pixel grayscale image associated with a label from 10 classes. Perfect for e-commerce applications or personal style recommendations.
Official dataset page
Real-World Challenges Image classification datasets
10. COCO (Microsoft Common Objects in Context)
Immerse yourself in 330,000 images, each annotated with 80 object categories with 5 captions describing the scene. Ideal for object detection, segmentation, captioning tasks, and training models to understand complex visual relationships.
Official dataset page
11. Places365
This is a scene recognition dataset which consists of 10 million images comprising 434 scene classes. The dataset comes in two versions: Places365-Standard, which has 1.8 million train and 36000 validation images from K=365 scene classes, and Places365-Challenge-2016, which has 6.2 million extra images in the training set and adds 69 new scene classes (for a total of 8 million train images from 434 scene classes).
Official dataset page
12. CelebA
This dataset is ideal for testing and training facial recognition, emotion detection, and demographic analysis models especially those that identify facial features like brown hair, smiles, and spectacles wearers.
Details:
- 202,599 number of face images of various celebrities
- 10,177 unique identities, but names of identities are not given
- 40 binary attribute annotations per image
- 5 landmark locations
Other datasets for Image Classification
13. DeepGlobe
It is a large-scale geographic dataset that was created for research in deep learning for automated feature extraction from satellite imagery. Details:
- It contains over 1.17 million geographic images extracted from DigitalGlobe satellites covering rural areas, urban areas, mountains, roads, water bodies, and forests across the globe.
- The images are high-resolution with 30 cm per pixel. The images are in RGB-NIR format with 4 channels – red, green, blue and near-infrared.
- The dataset has three main challenges: road extraction, building detection, and land cover classification.
- Road extraction: 8,579 images with pixel-level annotations of roads.
- Building detection: 24,586 building footprint polygon annotations across 1,146 image chips.
- Land cover classification: 1.17 million images categorised into 7 land cover classes – urban, agriculture, rangeland, forest, water, barren and unknown.
14. FGVC Aircraft
This dataset is for fine-grained image classification of aircraft types for classification or detection tasks. It contains 10,200 images spanning 102 different aircraft variants, such as Boeing 747-400, Airbus A320, etc. The images are colour images of varying sizes. Commonly used to evaluate fine-grained visual categorisation (FGVC) models which aim to distinguish between sub-categories within a broader category like aircraft or birds. Here are some key details about it:
Official dataset page
To help make model-building easier, we have put together a list of over 150 Open Audio and Video Datasets.
Remember, the perfect dataset doesn’t exist. Consider factors like task complexity, image quality, annotation detail, and computational resources before diving in.
And if you can’t find the ideal dataset? We’ve got you covered! At Twine AI, we specialise in creating custom image classification datasets tailored to your specific needs. Contact us today to discuss your unique project and unlock the power of custom-built data!