Skip to content

aniketroy/awesome-datascience

 
 

Repository files navigation

Awesome Data Science Awesome

An open source Data Science repository to learn and apply towards solving real world problems.

Table of contents

Motivation

This part is for dummies who are new to Data Science

This is a shortcut path to start studying Data Science. Just follow the steps to answer the questions, "What is Data Science and what should I study to learn Data Science?"

First of all, Data Science is one of the hottest topics on the Computer and Internet farmland nowadays. People have gathered data from applications and systems until today and now is the time to analyze them. The next steps are producing suggestions from the data and creating predictions about the future. Here you can find the biggest question for Data Science and hundreds of answers from experts. Our favorite data scientist is Clare Corthell. She is an expert in data-related systems and a hacker, and has been working on a company as a data scientist. Clare's blog. This web site helps you to understand the exact way to study as a professional data scientist.

Secondly, Our favorite programming language is Python nowadays for #DataScience. Python's - Pandas library has full functionality for collecting and analyzing data. We use Anaconda to play with data and to create applications.

This is the Guide to begin a Data Science project.

Infographic

Preview Description
A visual guide to Becoming a Data Scientist in 8 Steps by DataCamp (img)
Mindmap on required skills (img)
Swami Chandrasekaran made a Curriculum via Metro map.
by @kzawadz via twitter, MarketingDistillery.com
And a male version, from another article by MarketingDistillery.com
By Data Science Central
From this article by Berkeley Science Review.
Data Science Wars: R vs Python
How to select statistical or machine learning techniques
The Data Science Industry: Who Does What
Data Science Venn Diagram
Different Data Science Skills and Roles from this article by Springboard

What is Data Science?

COLLEGES

MOOC's

Data Sets

Bloggers

Facebook Accounts

Twitter Accounts

Youtube Videos & Channels

Toolboxes - Environment

  • Datalab from Google easily explore, visualize, analyze, and transform data using familiar languages, such as Python and SQL, interactively.
  • Hortonworks Sandbox is a personal, portable Hadoop environment that comes with a dozen interactive Hadoop tutorials.
  • R is a free software environment for statistical computing and graphics.
  • RStudio IDE – powerful user interface for R. It’s free and open source, works onWindows, Mac, and Linux.
  • Python - Pandas - Anaconda Completely free enterprise-ready Python distribution for large-scale data processing, predictive analytics, and scientific computing
  • Scikit-Learn Machine Learning in Python
  • Data Science Toolbox - Coursera Course
  • Data Science Toolbox - Blog
  • Wolfram Data Science Platform Take numerical, textual, image, GIS or other data and give it the Wolfram treatment, carrying out a full spectrum of data science analysis and visualization and automatically generating rich interactive reports—all powered by the revolutionary knowledge-based Wolfram Language.
  • Sense Data Science Development Paltform A New Cloud Platform for Data Science and Big Data Analytics Collaborate on, scale, and deploy data analysis and advanced analytics projects radically faster. Use the most powerful tools — R, Python, JavaScript, Redshift, Hive, Impala, Hadoop, and more — supercharged and integrated in the cloud.
  • Mortardata Solutions, code, and devops for high-scale data science.
  • Variance Build powerful data visualizations for the web without writing JavaScript
  • Kite Development Kit The Kite Software Development Kit (Apache License, Version 2.0), or Kite for short, is a set of libraries, tools, examples, and documentation focused on making it easier to build systems on top of the Hadoop ecosystem.
  • Domino Data Labs Run, scale, share, and deploy your models — without any infrastructure or setup.
  • Apache Flink A platform for efficient, distributed, general-purpose data processing.
  • Apache Hama Apache Hama is an Apache Top-Level open source project, allowing you to do advanced analytics beyond MapReduce.
  • Weka Weka is a collection of machine learning algorithms for data mining tasks.
  • Octave GNU Octave is a high-level interpreted language, primarily intended for numerical computations.(Free Matlab)
  • Apache Spark Lightning-fast cluster computing
  • Caffe Deep Learning Framework
  • Torch A SCIENTIFIC COMPUTING FRAMEWORK FOR LUAJIT
  • Nervana's python based Deep Learning Framework
  • Aerosolve - A machine learning package built for humans.
  • Intel framework - Intel® Deep Learning Framework
  • Datawrapper – An open source data visualization platform helping everyone to create simple, correct and embeddable charts. Also at github.com
  • Tensor Flow - TensorFlow is an Open Source Software Library for Machine Intelligence
  • Natural Language Toolkit
  • nlp-toolkit for node.js
  • Julia – high-level, high-performance dynamic programming language for technical computing.
  • IJulia – a Julia-language backend combined with the Jupyter interactive environment.

Visualization Tools - Environments

Journals, Publications and Magazines

Presentations

Comics

  • [](Digital Data)

Other Awesome Lists

About

📝 An awesome Data Science repository to learn and apply for real world problems.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published