Data Science Fundamentals - Class1
Data Science Fundamentals - Class1
https://www.guru99.com/data-science-tutorial.html
History of data science
• He later used the term data science in his 1974 book, Concise
Survey of Computer Methods, describing it as "the science of
dealing with data" -- though again in the context of computer
science, not analytics.
https://www.techtarget.com/searchenterpriseai/definition/data-science
• In 1996, the International Federation of Classification Societies
included data science in the name of the conference it held that
year.
https://www.techtarget.com/searchenterpriseai/definition/data-science
• American computer scientist William S. Cleveland outlined
data science as a full analytics discipline in an article titled
"Data Science: An Action Plan for Expanding the Technical
Areas of Statistics," which was published in 2001 in the
International Statistical Review.
https://www.techtarget.com/searchenterpriseai/definition/data-science
• Why Data Science?
• Data is the oil for today’s world. With the right tools, technologies,
algorithms, we can use data and convert it into a distinct business
advantage
• Data Science can help you to detect fraud using advanced machine learning
algorithms
https://www.guru99.com/data-science-tutorial.html
• Statistics:
• Statistics is the most critical unit of Data Science basics, and it
is the method or science of collecting and analyzing numerical
data in large quantities to get useful insights.
• Visualization:
• Visualization technique helps you access huge amounts of data
in easy to understand and digestible visuals.
https://www.guru99.com/data-science-tutorial.html
• Machine Learning:
• Machine learning explores the building and study of
algorithms that learn to make predictions about
unforeseen/future data.
• Deep Learning:
• Deep learning method is new machine learning research
where the algorithm selects the analysis model to follow.
https://www.guru99.com/data-science-tutorial.html
Key Pillars of Data Science
https://www.geeksforgeeks.org/data-science-fundamentals/
• Domain Knowledge:
– Most people thinking that domain knowledge is not
important in data science but it is essential. The foremost
objective of data science is to extract useful insights from
that data so that it can be profitable to the company’s
business.
– One needs to know how to ask the right questions from
the right people so that we can perceive the appropriate
information we need to obtain the information we need.
There are some visualization tools used on the business
end like Tableau that helps us display your valuable results
or insights in a proper non-technical format such as graphs
or pie charts that business people can understand.
https://www.geeksforgeeks.org/data-science-fundamentals/
• Math Skills:
https://www.geeksforgeeks.org/data-science-fundamentals/
• Computer Science:
https://www.geeksforgeeks.org/data-science-fundamentals/
– Machine Learning:
– It is one of the most vital parts of data science and the
hottest subject of research among researchers so each year
new advancements are made in this. One at least needs to
understand basic algorithms of Supervised and
Unsupervised Learning.
– There are multiple libraries available in Python and R for
implementing these algorithms.
https://www.geeksforgeeks.org/data-science-fundamentals/
• Distributed Computing:
• It is also one of the most important skills to handle a large
amount of data because one can’t process this much data on a
single system.
• The tools that mostly used are Apache Hadoop and Spark. The
two major parts of these tolls are HDFS(Hadoop Distributed
File System) that is used for collecting data over a distributed
file system.
https://www.geeksforgeeks.org/data-science-fundamentals/
• Communication Skill:
https://www.geeksforgeeks.org/data-science-fundamentals/
Skills of a Data scientist
https://www.techtarget.com/searchenterpriseai/definition/data-science
Life cycle
Data Science Jobs Roles
• Data Scientist
• Data Engineer
• Data Analyst
• Statistician
• Data Architect
• Data Admin
• Business Analyst
• Data/Analytics Manager
https://www.guru99.com/data-science-tutorial.html
• Data Scientist:
• Role: A Data Scientist is a professional who manages
enormous amounts of data to come up with compelling
business visions by using various tools, techniques,
methodologies, algorithms, etc.
• Languages: R, SAS, Python, SQL, Hive, Matlab, Pig, Spark
• Data Engineer:
• Role: The role of a data engineer is of working with large
amounts of data. He develops, constructs, tests, and
maintains architectures like large scale processing systems
and databases.
• Languages: SQL, Hive, R, SAS, Matlab, Python, Java, Ruby, C +
+, and Perl
https://www.guru99.com/data-science-tutorial.html
• Data Analyst:
• Role: A data analyst is responsible for mining vast
amounts of data. They will look for relationships, patterns,
trends in data. Later he or she will deliver compelling
reporting and visualization for analyzing the data to take
the most viable business decisions.
• Languages: R, Python, HTML, JS, C, C+ + , SQL
• Statistician:
• Role: The statistician collects, analyses, and understands
qualitative and quantitative data using statistical theories
and methods.
• Languages: SQL, R, Matlab, Tableau, Python, Perl, Spark,
and Hive
https://www.guru99.com/data-science-tutorial.html
• Data Administrator:
• Role: Data admin should ensure that the database is
accessible to all relevant users. He also ensures that it is
performing correctly and keeps it safe from hacking.
• Languages: Ruby on Rails, SQL, Java, C#, and Python
• Business Analyst:
• Role: This professional needs to improve business
processes. He/she is an intermediary between the
business executive team and the IT department.
• Languages: SQL, Tableau, Power BI and, Python
https://www.guru99.com/data-science-tutorial.html
Roles and Responsibilities of a Data Scientist
https://www.guru99.com/data-science-tutorial.html
Tools for Data Science
https://www.guru99.com/data-science-tutorial.html
• R: It is open-source software. It is easy to learn
R as it is well documented. It offers strong
statistical capabilities.
• Python is another popular open-source
scripting language. It is supports libraries such
as Numpy, Scipy, and MatPlotLib. You can
perform any statistical operation, or you can
build any model using these libraries.
• SAS: It is the widely used analytical tool in the
commercial analytics market. With a plethora
of statistical functions and good GUI.
Applications of Data Science
• Internet Search:
• Google search uses Data science technology to search for a
specific result within a fraction of a second
• Recommendation Systems:
• To create a recommendation system. For example, “suggested
friends” on Facebook or suggested videos” on YouTube,
everything is done with the help of Data Science.
• Image & Speech Recognition:
• Speech recognizes systems like Siri, Google Assistant, and Alexa
run on the Data science technique.
• Moreover, Facebook recognizes your friend when you upload a
photo with them, with the help of Data Science.
https://www.guru99.com/data-science-tutorial.html
• Gaming world:
• EA Sports, Sony, Nintendo are using Data science technology.
This enhances your gaming experience. Games are now
developed using Machine Learning techniques, and they can
update themselves when you move to higher levels.
https://www.guru99.com/data-science-tutorial.html
• Banking: loan/credit card approval
– Predict good customers based on old customers
• Targeted marketing
– Identify likely responders to promotions
• Fraud detection:
– From an online stream of event identify fraudulent events
https://www.guru99.com/data-science-tutorial.html
Data Analytics Challenge
32
DATA
https://www.slideshare.net/hemapani/data-science-in-the-real-world-
making-a-difference
https://www.slideshare.net/hemapani/data-science-in-the-real-
world-making-a-difference
https://www.slideshare.net/hemapani/data-science-in-the-
real-world-making-a-difference
INFORMATION
https://www.slideshare.net/hemapani/data-science-in-the-real-
world-making-a-difference
https://www.slideshare.net/hemapani/data-science-in-the-real-
world-making-a-difference
https://www.slideshare.net/hemapani/data-science-in-the-real-
world-making-a-difference
https://www.slideshare.net/hemapani/data-science-in-the-real-world-making-
a-difference
https://www.slideshare.net/hemapani/data-science-in-the-real-world-making-a-
difference
Total Information Awareness
• Alas, he discovered that almost all of them had lost their ESP.
• 1000 days.