Search

Search IconIcon to open search

Find Good Data Sets or Sources

Last updatedUpdated: by Simon Späti · CreatedCreated: · 3 min read

Data sets are always useful for data engineering projects. Here are some listed that can help find one.

# Data Sets

Some that offer great datasets:

  • Kaggle Datasets: A platform for data science and machine learning competitions hosting a wide variety of datasets on topics like finance, sports, healthcare, and more. ^0628a8
  • Google Dataset Search: Google’s tool for finding datasets across the web, now included in default search as of 2023-03-05. Learn more at the Google AI Blog
  • New York Taxi Dataset: The well-known NYC taxi datasets, recently updated with the much better Apache Parquet format (CSV is still available as well) - see also NYC Taxi Dataset
  • OpenData: Open-data initiatives that try to open-source all data.
    • CH: OpenData Swiss: Less relevant unless you’re in Switzerland.
    • US: Data.gov: The US government’s open data repository containing datasets from various agencies and organizations, covering topics including agriculture, climate, education, energy, and more.
  • SeattleDataGuy: Great resources for data engineering projects, including a video suggesting 5 data sources.
  • Data Is Plural: A weekly newsletter of useful and curious datasets with a Markdown archive and RSS/Atom feeds.
  • UCI Machine Learning Repository: A collection of databases, domain theories, and data generators used by the machine learning community for research purposes, containing datasets related to various domains including text, image, and time-series data.
  • Awesome Public Datasets: A GitHub repository containing a curated list of high-quality public datasets on various topics such as agriculture, biology, climate, economics, education, and more.
  • FiveThirtyEight: A website that focuses on opinion poll analysis, politics, economics, and sports blogging with datasets available.
  • CERN Open Data Portal: Open data from CERN’s research and experiments.
  • NOAA Weather data: Weather and climate data from the National Oceanic and Atmospheric Administration.
  • Hugging Face Datasets: A collection of datasets for machine learning and AI research by Hugging Face.
  • **Wikipedia Data sets

# Real-Estate

# APIs

# Other Lists

# Further Reads


Origin: