or
- Winpy (alternative)
- What is Data Science @ O'reilly
- What is Data Science @ Quora
- The sexiest job of 21st century
- What is data science
- What is a data scientist
- Wikipedia
- a very short history of #datascience
- An Introduction to Data Science, PDF.
- Data Science Methodology by John Rollins PhD
- A Day in the Life of a Data Scientist by Rutgers University
- Python Data Science Handbook
- The Data Science Handbook
- The Art of Data Usability - Early access
- Think Like a Data Scientist
- R in Action, Second Edition
- Introducing Data Science
- Practical Data Science with R
- Exploring Data Science - free eBook sampler
- Exploring the Data Jungle - free eBook sampler
- Applied Data Science with Python Specialization
- Microsoft Professional Program in Data Science
- Intro to Data Science
- Python Data camp
- Introduction to Python for Data Science
- Intro to Data Science by Microsoft
- What is machine learning?
- Andrew Ng: Deep Learning, Self-Taught Learning and Unsupervised Feature Learning
- Deep Learning: Intelligence from Big Data
- Interview with Google's AI and Deep Learning 'Godfather' Geoffrey Hinton
- Introduction to Deep Learning with Python
- What is machine learning, and how does it work?
- Data School - Data Science Education
- Neural Nets for Newbies by Melanie Warrick (May 2015)
- Neural Networks video series by Hugo Larochelle
- Google DeepMind co-founder Shane Legg - Machine Super Intelligence
- How to Become a Data Scientist
- Introduction to Data Science
- Intro to Data Science for Enterprise Big Data
- How to Interview a Data Scientist
- How to Share Data with a Statistician
- The Science of a Great Career in Data Science
- What Does a Data Scientist Do?
- Building Data Start-Ups: Fast, Big, and Focused
- How to win data science competitions with Deep Learning
This list covers only Python, as many are already familiar with this language. Data Science tutorials using R.
numpy is a Python library which provides large multidimensional arrays and fast mathematical operations on them.
pandas provides efficient data structures and analysis tools for Python. It is build on top of numpy.
- Introduction to pandas
- DataCamp pandas foundations - Paid course, but 30 free days upon account creation (enough to complete course).
- Pandas cheatsheet - Quick overview over the most important functions.
scikit-learn is the most common library for Machine Learning and Data Science in Python.
- Introduction and first model application
- Rough guide for choosing estimators
- Scikit-learn complete user guide
- Model ensemble: Implementation in Python
Jupyter Notebook is a web application for easy data visualisation and code presentation.
- Downloading and running first Jupyter notebook
- Example notebook for data exploration
- Seaborn data visualization tutorial - Plot library that works great with Jupyter.
- Supervised vs unsupervised learning - The two most common types of Machine Learning algorithms.
- 9 important Data Science algorithms and their implementation
- Cross validation - Evaluate the performance of your algorithm / model.
- Feature engineering - Modifying the data to better model predictions.
- Scientific introduction to 10 important Data Science algorithms
- Model ensemble: Explanation - Combine multiple models into one for better performance.
Some data mining competition platforms
- Competency forecasting
- Employee churn analytics
- Employee performance analytics
- Network analytics on employee interactions
- Resume matching, preselection and tagging
- Workforce planning
- Cost analytics
- Fraud detection
- Waste and abuse detection
- Component quality analytics
- Cybercrime detection
- Server performance monitoring and alerting
- Incident management tickets automatic routing and reply or clustering
- Churn/Customer attrition
- Customer segmentation
- Life Time Value
- Personalized advertising
- Product recommendation engines using recommendation engines
- Marketing Optimization
- Social Media Analytics
- Text Analytics on customer complaints
- Cross-sell opportunities using propensity models
- Lead scoring
- Price elasticity
- Revenue forecasting or Kaggle
- Demand forecasting
- Gas purchase optimization
- Inventory forecasting
- Optimal routes
- Warehouse location optimization
- Fraud detection
- Litigation prediction
- Pricing using telematics
- Solvency II and ORSA compliance
- Risk analytics
- Design of experiments
- R&D portfolio optimization
- Crime Wave Detection
- Patrolling Suggestions (Preventative Policing)
- Crime Case Resolution Prediction
- Crime Clustering
- Complex/Organised Crime network detection
- Terrorist Cell Identification
- Alerting & Officer Safety
- Criminal Evolution
- Domestic Violence
- Radicalisation prediction
- Mass scale surveillance
- Academic Torrents
- hadoopilluminated.com
- data.gov - The home of the U.S. Government's open data
- United States Census Bureau
- usgovxml.com
- enigma.com - Navigate the world of public data - Quickly search and analyze billions of public records published by governments, companies and organizations.
- datahub.io
- aws.amazon.com/datasets
- databib.org
- datacite.org
- quandl.com - Get the data you need in the form you want; instant download, API or direct to your app.
- figshare.com
- GeoLite Legacy Downloadable Databases
- Quora's Big Datasets Answer
- Public Big Data Sets
- Houston Data Portal
- Kaggle Data Sources
- Kaggle Datasets
- A Deep Catalog of Human Genetic Variation
- A community-curated database of well-known people, places, and things
- Google Public Data
- World Bank Data
- NYC Taxi data
- Open Data Philly Connecting people with data for Philadelphia
- A list of useful sources A blog post includes many data set databases
- grouplens.org Sample movie (with ratings), book and wiki datasets
- UC Irvine Machine Learning Repository - contains data sets good for machine learning
- research-quality data sets by Hilary Mason
- National Climatic Data Center - NOAA
- ClimateData.us (related: U.S. Climate Resilience Toolkit)
- r/datasets
- MapLight - provides a variety of data free of charge for uses that are freely available to the general public. Click on a data set below to learn more
- GHDx - Institute for Health Metrics and Evaluation - a catalog of health and demographic datasets from around the world and including IHME results
- St. Louis Federal Reserve Economic Data - FRED
- New Zealand Institute of Economic Research – Data1850
- Dept. of Politics @ New York University
- Open Data Sources
- UNICEF Statistics and Monitoring
- UNICEF Data
- undata
- NASA SocioEconomic Data and Applications Center - SEDAC
- The GDELT Project
- Sweden, Statistics
- Github free data source list
- StackExchange Data Explorer - an open source tool for running arbitrary queries against public data from the Stack Exchange network.
- San Fransisco Government Open Data
- IBM Blog abour open data
- Open data Index
- Liver Tumor Segmentation Challenge Dataset
Reference https://github.com/bulutyazilim/awesome-datascience https://github.com/JosPolfliet/awesome-datascience-ideas https://github.com/siboehm/awesome-learn-datascience