Since when I start my Data Science journey, I’ve struggle really hard in order to find data sets to practice my Data Science skills. I still remember the day when I started my first data science project and took me 3 hours just to find dataset on Internet.
As of now I have done bunch of projects on Data Science, I’ve a lot of experience in this domain and have worked on different data sets available publicly or not. These Data sets maybe useful for preparing for Data Scientist job interview.
Below is a table containing all the links to different data sets:-
|Data.gov||This website have datasets which are made freely available by US government. These datasets are on different topics varying from Energy usage to Electricity generation etc. Of course, these datasets does not have any confidential information.||https://www.data.gov/|
|Kaggle||Kaggle is a kind of google for search datasets, not only that it’s a platform where different people come together to examine datasets. On a personal note, being part of Kaggle community is very beneficial for learning Data Science so definitely join there.||https://www.kaggle.com/datasets|
|UNICEF||It’s a part of UN and deals with issues related to Children across the globe. It’ve number of datasets related to Children health, education.||https://data.unicef.org/resources/dataset/sowc-2019-statistical-tables/|
|Yelp||This website have a lot of datasets about restaurant reviews, product reviews etc.||https://www.yelp.com/dataset|
|Datahub.io||The goal of many data analysts is to help drive savvy business decisions. As such, using economic or business datasets for your portfolio project might be worth considering. While Datahub covers a variety of topics from climate change to entertainment, it mainly focuses on areas like stock market data, property prices, inflation, and logistics. Because many of the data on the portal are updated monthly (or even daily) you’ll always have something fresh to work with, as well as data that covers broad timescales.||https://datahub.io/collections|
|UCI Machine Learning Repository||This is considered one of the largest repository of datasets on available on Internet. This repository have datasets varying from Plants to Healthcare or even Flags(hahaha) etc.|
Fun Fact – The first Data Science project which I did used one of dataset from UCI.
|Earth Data||This website have a number of datasets related to Earth, made available publicly by NASA.||https://earthdata.nasa.gov/|
|CERN Open Data Portal||CERN as everybody knows on Planet Earth is one of the most prominent agency doing research in Particle Physics. The datasets available on this website are quite complex.||http://opendata.cern.ch/|
|Global Health Observatory Data Repository||Covering everything from malaria to HIV/AIDS, antimicrobial resistance, and vaccination rates, the portal even has a nice little feature that lets you preview data tables before downloading them. Not strictly necessary, but definitely nice to have!||https://apps.who.int/gho/data/node.home|
|NYC Taxi Trip DataSets||This website have datasets containing data about pickup/drop off times, locations, trip distances, fares etc. about different rides taken by people in NYC taxi’s.||https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page|