The Best Geospatial Datasets of 2022

Geospatial data is highly sought-after when training machine learning models. That said, it’s not always easy to find geospatial datasets to train your models. 

That’s why we’ve done the tricky bit for you. We’ve searched high and low here at Twine to find the best geospatial datasets.

Are you ready?

Let’s dive in.


Here are our top picks for Geospatial Datasets:

Global Self-consistent, Hierarchical, High-resolution Geography Dataset (GSHHG)

This dataset contains high-resolution geography data, amalgamated from two databases: World Vector Shorelines (WVS) and CIA World Data Bank II (WDBII). The former is the basis for shorelines while the latter is the basis for lakes, although there are instances where differences in coastline representations necessitated adding WDBII islands to GSHHG. 

GSHHG combines the older GSHHS shoreline database with WDBII rivers and borders, available in either ESRI shapefile format or in a native binary format.

Access the dataset

Global 1-km Consensus Land Cover Dataset

This dataset integrates multiple global remote sensing-derived land-cover products and provides consensus information on the prevalence of 12 land-cover classes at 1-km resolution. It contains 12 data layers, each of which provides consensus information on the prevalence of one land-cover class. All data layers have a spatial extent from 90ºN – 56ºS and from 180ºW – 180ºE and have a spatial resolution of 30 arc-second per pixel (~1 km per pixel at the equator).

Access the dataset

Satellite Data for Air Quality Database

With support from NASA, the Holloway Group at SAGE has developed a set of user-friendly datasets to support the wider utilization of remote sensing data for air quality and health. This growing inventory of data includes:

  • Shapefiles of NO2 air pollution from satellite for use in GIS platforms, including the EPA’s EJSCREEN platform for environmental justice
  • 12 km x 12 km daily gridded data of NO2 air pollution from satellite for comparison with  photochemical grid model output or other data sources

Access the dataset

Greenhouse Gas Emissions on Croplands Dataset

This dataset has developed global crop-specific circa 2000 estimates of GHG emissions and GHG intensity in high spatial detail, reporting the effects of rice paddy management, peatland draining, and nitrogen (N) fertilizer on CH4, CO2, and N2O emissions. 

Access the dataset

World Port Index Dataset

This Dataset from the National Geospatial-Intelligence Agency lists approximately 3700 ports across the world, with location and facilities offered. It provides global maritime geospatial intelligence in support of national security objectives, including the safety of navigation, international obligations, and joint military operations.

Access the dataset


Wrapping up

To conclude, here are the top picks for the best geospatial datasets for your projects:

  1. Global Self-consistent, Hierarchical, High-resolution Geography Dataset (GSHHG)
  2. Global 1-km Consensus Land Cover Dataset
  3. Satellite Data for Air Quality Database
  4. Greenhouse Gas Emissions on Croplands Dataset
  5. World Port Index Dataset

We hope that this list has helped you find a dataset for your project or, realize the myriad options available. 

Please let us know if there are any datasets you would like us to add to the list.

If you want to learn more about how we could help build a custom dataset for your project, don’t hesitate to contact us!

Let us help you do the math – check our AI dataset project calculator.

Ready to learn more? Check out our Dataset Archives:

Twine AI

Harness Twine’s established global community of over 400,000 freelancers from 190+ countries to scale your dataset collection quickly. We have systems to record, annotate and verify custom video datasets at an order of magnitude lower cost than existing methods.