Search

Search IconIcon to open search

Find Good Data Sets or Sources

Last updated by Simon Späti

Data sets are always useful for data engineering projects. Here are some listed that can help find one.

# Data Sets

Some that offer great datasets:

  • Kaggle Datasets: A platform for data science and machine learning competitions hosting a wide variety of datasets on topics like finance, sports, healthcare, and more. ^0628a8
  • Google Dataset Search: Google’s tool for finding datasets across the web, now included in default search as of 2023-03-05. Learn more at the Google AI Blog
  • New York Taxi Dataset: The well-known NYC taxi datasets, recently updated with the much better Apache Parquet format (CSV is still available as well) - see also NYC Taxi Dataset
  • OpenData: Open-data initiatives that try to open-source all data.
    • CH: OpenData Swiss: Less relevant unless you’re in Switzerland.
    • US: Data.gov: The US government’s open data repository containing datasets from various agencies and organizations, covering topics including agriculture, climate, education, energy, and more.
  • SeattleDataGuy: Great resources for data engineering projects, including a video suggesting 5 data sources.
  • Data Is Plural: A weekly newsletter of useful and curious datasets with a Markdown archive and RSS/Atom feeds.
  • UCI Machine Learning Repository: A collection of databases, domain theories, and data generators used by the machine learning community for research purposes, containing datasets related to various domains including text, image, and time-series data.
  • Awesome Public Datasets: A GitHub repository containing a curated list of high-quality public datasets on various topics such as agriculture, biology, climate, economics, education, and more.
  • FiveThirtyEight: A website that focuses on opinion poll analysis, politics, economics, and sports blogging with datasets available.
  • CERN Open Data Portal: Open data from CERN’s research and experiments.
  • NOAA Weather data: Weather and climate data from the National Oceanic and Atmospheric Administration.
  • Hugging Face Datasets: A collection of datasets for machine learning and AI research by Hugging Face.
  • **Wikipedia Data sets

# Real-Estate

# APIs

# Other Lists

# Further Reads


Origin:
References:
Created 2022-05-17