Foundation for Data Science and Analytics
Prof. Annappa
CSE, National Institute of Technology Karnataka,
Surathkal
Outline
● What is Data
● Data Science Foundation
● What is Data Analytics
● Video Analytics
● Evaluation of Data Visualization
● Visual Analytics and Trends
● Future Trends of Data and Video
Analytics
● Research Directions
● Tools and Development Resources
2
What is Data?
● Facts and statistics collected together for reference or analysis.
● The quantities, characters, or symbols on which operations are performed by a
computer, being stored and transmitted in the form of electrical signals and
recorded on magnetic, optical, or mechanical recording media.
3
4
5
6
Data Information Knowledge
7
8
Data Information Knowledge
9
Data Information Knowledge
10
Data All Around
● Lots of data is being collected and
warehoused
○
Scientific Experiments
○
Internet of Things
○
Web data, e-commerce
○
Financial transactions, bank/credit
transactions
○
Online trading and purchasing
○
Social Network 11
Global data generated annually
Data volume in Zettabytes
12
13
14
15
Big Data
● Big Data are data sets so large or so
complex that traditional methods of
storing, accessing, and analyzing their
breakdown are too expensive.
● Howeve there is a lot of potential
hidden in value
r, this data, so organizations are
eager to harness it to drive
innovation and competitive advantage.
16
Big Data
● Big Data technologies and approaches are
used to drive value out of data rich
environments in ways that traditional
analytics tools and methods cannot.
17
18
What to do with these data?
● Aggregation and Statistics
○ Data warehousing and OLAP
● Indexing, Searching, and
Querying
○ Keyword based search
○ Pattern matching (XML/RDF)
● Knowledge discovery
○ Data Mining
○ Statistical Modeling
● Data Driven
○ Predictive Analytics 19
○ Deep Learning
Big Data and Data Science
They are not the “same
thing”
● Big data = crude
data oil
is about extracting “crude
oil”, it in
transporti “mega tankers”,
ng
siphoning
through it
“pipelines”, and storing it
in “massive silos”
● Data science is about refining the “crude
oil”
20
What is Data Science?
● “Data Science, also known as Data-driven Science,
is an interdisciplinary field of scientific methods,
processes, algorithms and systems to extract
knowledge or insights from data in various forms,
either structured or unstructured, similar to data
mining
● Data science (DS) is a multidisciplinary field of
study with
goal to address the challenges in big data
21
● Data science principles apply to all data – big and
What is Data Science?
Theories and techniques from many fields and disciplines
are used
to investigate
makers and industries
in many analyze a such
large amount of data to help
as science,
decision
engineering,
economics, politics, finance, and
education
● Computer Science
○ Pattern recognition, visualization, data Hig performan
warehousing, h ce
computing, Databases, AI
● Mathematics
○ Mathematical Modeling
● Statistics
○ Statistical and Stochastic modeling, Probability.
22
Data Science and Data
Mining
● Data science is a broad field that
includes data mining as one of its many
techniques, but also includes other
techniques such as statistical analysis,
machine learning, and visualization.
● Data mining is a specific technique used
to extract patterns and knowledge from
large data sets. 23
Difference Between Data Science and
Data Mining
24
Difference Between Data Science and
Data Mining
25
26
Data Science and AI
27
Contrast: Scientific Computing
28
Contrast: Machine Learning
29
Data Science Applications
30
Data Science: Case Study Cancer Research
● Cancer is an incredibly complex disease; a single tumor can
have more than 100 billion cells, and each cell can acquire
mutations individually. The disease is always changing,
evolving, and adapting.
● Employ the power of big data analytics and high-performance
computing.
● Leverage sophisticated pattern and machine learning
algorithms to
identify patterns that are potentially linked to cancer
● Huge amount of data processing and recognition
31
Data Science: Case Study Health Care
32
Data Science: Case Study Internet of Things (IoT)
33
Data Science: Case Study Customer Analytics
34
Essential Points
● Big Data has given rise to Data Science
● Data science is rooted in solid foundations
of mathematics and statistics, computer
science, and domain knowledge
● Not everything with data or science is Data
Science!
● The use cases for Data Science are
compelling
35
Data Science vs Data Analysis vs Big Data
36
What is Data Analytics?
● Data analytics íefeís to examining
unpíocessed databases to díaw
meaningful and actionable infeíences about the content they
contain.
● It helps analysts and íeseaícheís view tíends in the
unpíocessed data
and extíapolate significant knowledge fíom it.
37
38
39
40
41
42
Data Analytics in Healthcare : Use case
● The healthcare industry generates a tremendous amount of data but
struggles to convert that data into insights that improve patient
outcomes and operational efficiencies.
● Data analytics in healthcare is intended to help providers overcome
obstacles to the
widespread application of data-derived intelligence:
● Making healthcare data easier to share among colleagues and external
partners, and easier to visualize for public consumption
● Providing accurate data-driven forecasts in real time to allow healthcare
providers to respond more quickly to changing healthcare markets and
environments
● Enhancing data collaboration and innovation among healthcare
organizations to 43
How data analytics is used in healthcare settings
● Research and prediction of disease
● Automation of hospital administrative
processes
● Early detection of disease
● Prevention of unnecessary doctor’s visits
● Discovery of new drugs
● More accurate calculation of health
insurance rates
● More effective sharing of patient data
44
●
45
46
Data Analytics in Agriculture : Use case
● The precision agriculture market
continues to evolve, allowing
farmers to embrace data-driven solutions.
● While the future opportunities for data analytics in
agriculture is limitless, there are already strong benefits
emerging, such as:
● Increasing innovation and productivity.
● Greater understanding of environmental challenges.
● Reducing waste and improving profits.
47
●
48
Smart Framing by Analytics
49
50
Sports Data Analytics: Use case
● Sports Data Analytics is simply the study of analytical data involving players and their
performances in order to determine their weaknesses and strengths.
● Sports Data Analytics is now used in a variety of different ancillary industries, and
there’s significant potential for growth. In fact, it’s expected that the Sports Data
Analytics market is going to reach upwards of $4.5 billion by 2025.
51
Predictions That Can Be Made Using Sports Data
Analytics
● Injury Predictions
● Player Valuations
● Team Strategy
● Evaluating Ticket
Churn
● Ticket Pricing
52
Data Analytics in Retail: Use case
● Data Analytics is held to be useful in determining appropriate retail locations. Analysts believe that the best
prospective location for business is one where the targeted customers are believed to spend most of their time. But
how will you determine that? It is through technology like Data Analytics and Machine Learning that helps in
deciding upon the best possible spot for your business.
● Data Analytics Use Cases in Retail is believed to spread across the different stages of the retail process – predictions
for new product launch, in-store optimization, forecasting demand and so on.
● Data Analytics help in anticipating the demand of consumers. This is conducted through analyzing the past and
existing purchasing
patterns and linking the same with the market success of its existing products. This helps in developing predictive
models for new
products.
● When a retailer looks at the mass of its customers, he can see some of them as being more valuable as others. One of
the potential use cases for Big Data Analytics within the field of retail is the assessment of customer lifetime value.
By way of assessing purchasing patterns and customer behavior, the retailer identifies his best customers. This helps
in curating marketing strategies with special emphasis on alluring this specific group.
● Data Analytics help in enhancing customer experience within the retail domain. Through personalized marketing,
53
strategic customer interaction as well as customized selling propositions; retailers are being able to respond to the
54
55
56
Descriptive vs Predictive vs Prescriptive Analytics
57
Descriptive Analytics
● Descíiptive analytics is the píocess of paísing
histoíical data to betteí undeístand the changes
that occuí in a business. Using a íange of histoíic data
and benchmaíking, decision-makeís obtain a holistic view
of peífoímance and tíends on which to base business
stíategy
58
Examples of Descriptive Analytics
Examples of descriptive analytics exist in every aspect of the business, from
finance to
production and sales, including the following.
■ Business reports of revenue and expenses, cash flow, accounts receivable
and accounts payable, inventory and production.
■ Financial metric and other business KPIs are examples of descriptive
analytics. These
include metrics that assess the health and value of a business, such as the
price to earnings ratio, current ratio and return on invested capital.
■ Social media engagement: Descriptive analytics generates metrics that help
determine the return on social media initiatives, such as growth in followers,
engagement rates and revenue attributable to specific social media
platforms. 59
Diagnostic Analytics
Diagnostic analytics is a form of advanced analytics
that examines data or content to answer the
question, “Why did it happen?” It is characterized by
techniques such as drill- down, data discovery, data
mining and correlations.
■ Data drilling: Drilling down into a dataset can reveal more detailed information about which aspects of
the data are driving the observed trends. For example, analysts may drill down into national sales
data to determine whether specific regions, customers or retail channels are responsible for
increased sales growth.
■ Data mining hunts through large volumes of data to find patterns and associations within the data.
For example, data mining might reveal the most common factors associated with a rise in
insurance claims. Data mining can be conducted manually or automatically with machine learning
6
technology. 0
Examples of Diagnostic Analytics
Diagnostic analytics can be helpful in any industry, from manufacturing and retail to
health care. After applying diagnostic analytics to discover why an event occurred,
companies can use that knowledge to create solutions and develop predictive models
for the future.
■ Health care: Diagnostic analytics can support many areas of health care, including the core
function of
diagnosing medical problems
■ Retail: A store that sells eco-friendly products noticed a recent surge in revenue from one state.
■ Manufacturing: A contract manufacturer found that a valuable type of machine started
experiencing intermittent failures. By using diagnostic analytics to examine the machines’
logs, the company discovered that routine software updates had been installed the previous
day. It identified the update as a likely cause
of failure.
■ Human resources: A company’s annual hiring report showed that one department hired more
people than any other department — but there was no net increase in the department’s staff 61
because it was losing people as fast as it hired them. Drilling down into the data revealed that
Predictive analytics
Predictive analytics encompasses a variety of statistical
techniques from data mining, predictive modeling, and
machine learning that analyze current and historical facts to
make predictions about future or otherwise unknown events.
62
63
64
65
Prescriptive Analytics
Prescriptive Analytics is a form of advanced analytics which examines data or content
to answer the question “What should be done?” or “What can we do to make
happen?”, and is characterized by techniques such as graph analysis, simulation,
complex event processing, neural networks, recommendation engines.
66
67
Cognitive Analytics
Cognitive Analytics applies human-like intelligence to ceítain tasks, and bíings
togetheí a numbeí of intelligent technologies, including semantics, aítificial
intelligence algoíithms, deep leaíning and machine leaíning.
68
69
70
71
Tech Stack for Analytics
72
What is Video Analytics ?
● Video analytics are software applications that automatically generate descriptions of what is actually
happening in the video (so-called metadata), which can be used to list persons, cars and other objects
detected in the video stream, as well as their appearance and movements.
● The main goal of video analytics is to automatically recognize temporal and spatial events in videos. A
person who
moves suspiciously, traffic signs that are not obeyed, the sudden appearance of flames and smoke;
these are just a few examples of what a video analytics solution can detect
73
74
Reference Architecture for Video Analytics
75
76
Use cases for large-scale video analytics
● Security and surveillance
● Transport monitoring
● Healthcare
○ Health status
monitoring
○ Telemedicine
○ Surgical video analysis
● Face Recognition
● Behavior Detection
● Person Tracking
● People Count / People
Presence
● Crowd Detection
77
78
Security Cameras Equipped with Video Analytics Technology
79
What are the technologies involved in Analytics?
● Object Detection
● Object Recognition
● Object Tracking
● Real-Time Video Analytics
● Triggering Real-Time Alerts
○ Appearance Similarity
Alerting
○ Count-based Alerting
○ Face Recognition Alerting
80
What are the Video Analytics Challenges?
● Foí many yeaís, the amount of data collected fíom video analysis tools has
íisen; data stoíage is becoming a píoblem with the tíemendous volume of
data obtained.
● ľ h e data obtained by CCľV monitoíing systems aíe just as successful as
youí team can
handle. If the human íesouíces do not adequately handle the knowledge
you have deployed
to do so.
● With íising cases of hacking and inteínet bíeaches íepoíted woíldwide eveíy day,
the secuíity component of the CCľV suíveillance system íaises a majoí píoblem
foí youí company's eveíyday opeíations.
81
A Brief History of Data Visualizations
● The origin of data-driven visualizations and story-telling can be traced back to the beginning
of mankind. The early humans sculpted visuals of animals, weapons, and the position of stars
on the walls of caves. The Turin Papyrus Map, dated 1150 BC, is the oldest surviving
visualization. It illustrates the distribution of geological resources with quarrying information.
One dimensional line graph
Turin papyrus map
82
ľ h e oíigins of baí, line &
pie chaíts
● In the 18th century, there was an explosion of interest in visualization culminating
with the work of William Playfair, the father of our current notions of data
visualization. Playfair developed the line chart, bar chart, area chart, and pie chart.
The line chart and bar chart first appeared in 1786, and the pie chart and circle
graph in 1801.
83
Playfair’s comparison of price of wheat to wages 8
4
Visualizations in the Modern Era
● In the 19th century, the Industrial Revolution began to develop in earnest. The great visualizers of this
era were concerned with specific problems. It was now enabled by greater numeracy, availability of
data, knowledge of statistics, and an interest in perfecting processes. Countries began to publish
reams of information about economics, commerce, population, births and deaths, religion, and
illnesses
Map of fate of armies of Napoleon
Florence Nightingale, Rose diagram 8
5
Current Data Visualization Trends
● Video Visualization
● Data Democratization
● Real-time Visualization
● Mobile and Social Data Visualization
● Artificial Intelligence and Machine Learning
Datavis
86
87
88
5 Key Data Analytics Trends (2023-2026)
● Businesses Emphasize Business Intelligence
● Edge Data Takes Center Stage
● Organizations Favor Cloud-Native Technologies And Self-
Service Data
Analytics
● Growing Use Cases For Data Management And Analytics
● Democratization Of Data Systems
89
Emerging Trends In Data Analytics
Applications
Applications
Applications
Applications
Applications
Applications
Example of Real Time Analytics : Customer Care
Top Open source Data Analytics tools
98
Big Data Tools & Technologies –
Data Storage and Management
99
Big Data Tools & Technologies –
Data Cleaning
100
Big Data Tools & Technologies –
Data Mining
101
Big Data Tools & Technologies –
Data Visualization
102
Big Data Tools & Technologies –
Data Acquisition
103
Big Data Tools & Technologies – Data Analytics
104
Big Data Tools & Technologies – Data Analytics
105
Tscience
Hoerpearea:
arRe seomseeoaf trhce thoplraesbeasrchtocenftoersllaorowundthe world to
follow in big data + data
● RISE Lab at the University of Berkeley, USA
● Doctoral Research Centre in Data Science, The University of Edinburgh, United
Kingdom
● Data Science Institute, Columbia University, USA
● The Institute of Data-Intensive Engineering and Science, John Hopkins University, USA
● Facebook Data Science research
● Big Data Institute, University of Oxford, United Kingdom
● Center for Big Data Analytics, The University of Texas at Austin, USA
● Center for data science and big data analytics, Oakland University, USA
● Institute for Machine Learning, ETH Zurich, Switzerland
● The Alan Turing Institute, United Kingdom
● IISc Computational and Data Sciences Research
● Data Lab, Carnegie Mellon University, USA
106
Best Place to Start Big data Research Journey
https://github.com/0xnr/awesome-bigdata
Learning in Big Data, here are my recommendations:
● Coursera Big Data Specialization
● Big data course from the University of California San Diego
107
[email protected] 108