0% found this document useful (0 votes)
68 views28 pages

Data Science

This document discusses data science applications and use cases. It provides an overview of big data challenges, defines data science and contrasts it with related fields like data engineering and machine learning. The document also presents case studies and examples of data science use at companies like Accenture and in various industries. Key points are made around the need for data scientists and new academic programs to develop pi-shaped researchers with cross-disciplinary skills in science, computation and statistics.

Uploaded by

Deepak Dewangan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
68 views28 pages

Data Science

This document discusses data science applications and use cases. It provides an overview of big data challenges, defines data science and contrasts it with related fields like data engineering and machine learning. The document also presents case studies and examples of data science use at companies like Accenture and in various industries. Key points are made around the need for data scientists and new academic programs to develop pi-shaped researchers with cross-disciplinary skills in science, computation and statistics.

Uploaded by

Deepak Dewangan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 28

Data Science Applications & Use

Cases

Instructor: Ekpe Okorafor


1. Accenture – Big Data Academy
2. Computer Science African University of Science &
Technology
Objectives

Objectives
• Understand Big Data Challenges
• What exactly is Data Science and what do Data
Scientists do
• Data Science contrasted with other disciplines
• Case Study & Use Cases

2
Outline

• Big Data & Challenges


• What is Data Science
• Data Science & Academia
• Data Science & Others
• Case Studies
• Essential points
• Conclusion

3
Data All Around

• Lots of data is being collected


and warehoused
– Scientific Experiments
– Internet of Things
– Web data, e-commerce
– Financial transactions, bank/credit transactions
– Online trading and purchasing
– Social Network
– ……many more!

4
Big Data

• Big Data are data sets so large or so complex that traditional methods
of storing, accessing, and analyzing their breakdown are too
expensive. However, there is a lot of potential value hidden in this
data, so organizations are eager to harness it to drive innovation and
competitive advantage.
• Big Data technologies and approaches are used to drive value out of
data rich environments in ways that traditional analytics tools and
methods cannot.

5
What To Do With These Data?

• Aggregation and Statistics


– Data warehousing and OLAP
• Indexing, Searching, and Querying
– Keyword based search
– Pattern matching (XML/RDF)
• Knowledge discovery
– Data Mining
– Statistical Modeling
• Data Driven
– Predictive Analytics
– Deep Learning

6
Big Data & Data Science

• “… the sexy job in the next 10 years will be


statisticians,” Hal Varian, Google Chief Economist
• The U.S. will need 140,000-190,000 predictive
analysts and 1.5 million managers/analysts by 2018.
McKinsey Global Institute’s June 2011

• New Data Science institutes being created or


repurposed – NYU, Columbia, Washington, UCB,...
• New degree programs, courses, boot-camps:
– e.g., at Berkeley: Stats, I-School, CS, Astronomy…
– One proposal (elsewhere) for an MS in “Big Data Science”
– Plans for Data Science Stream at AUST
– RDA-CODATA School of Research Data Science
7
What is Data Science?

• Some definitions link computational, statistical, and


substantive expertise.

8
What is Data Science?

• Other definitions focus more on technical skills alone.

9
What is Data Science?

• An area that manages, manipulates,


extracts, and interprets knowledge from
tremendous amount of data
• Data science (DS) is a multidisciplinary field
of study with goal to address the challenges
in big data
• Data science principles apply to all data –
big and small

10
What is Data Science?

• Theories and techniques from many fields and


disciplines are used to investigate and analyze a
large amount of data to help decision makers in
many industries such as science, engineering,
economics, politics, finance, and education
– Computer Science
• Pattern recognition, visualization, data warehousing, High
performance computing, Databases, AI
– Mathematics
• Mathematical Modeling
– Statistics
• Statistical and Stochastic modeling, Probability.
11
Data Science Vs Analysis Vs Software
Delivery
Component Traditional Analysis Traditional Software Data Science
Delivery
Tools SAS, R, Excel, SQL, in- Java, source control, Linux, R, Java, scientific Python libraries,
house tools continuous integration, unit Excel, SQL, Hadoop, Hive, Pig,
testing, bug reports and Mahout and other machine learning
project management libraries, github for source control
and issue management
Analytical Regressions, N/A Classification, clustering, similarity
Methods classifications, detection, recommenders,
measuring prediction unsupervised and supervised
accuracy and learning, small- and large-scale
coverage/error, computations, measuring prediction
sampling accuracy and coverage/error
Team Statisticians, Developers, Project Mathematicians, Statisticians,
Structure Mathematicians, Managers, Systems Scientists, Developers, Systems
Scientists Engineers Engineers
Time Frame Either: Regular software release Either:
• Usually on-going cycle, continuous delivery, etc. • Discovery/learning phase leading
research and to product development
discovery within a Or:
team in the • On-going research and product
organization invention/improvement
Or:
• Specific project to
determine answers 12
Contrast: Scientific Computing

Image General purpose classifier


Supernova

Not

Nugent group / C3 LBL

Scientific Modeling Data-Driven Approach


Physics-based models General inference engine replaces model
Problem-Structured Structure not related to problem
Mostly deterministic, precise Statistical models handle true randomness,
and un-modeled complexity.
Run on Supercomputer or High-end Run on cheaper computer Clusters (EC2)
Computing Cluster
13
Contrast: Machine Learning

Machine Learning Data Science


Develop new (individual) models Explore many models, build and tune
hybrids
Prove mathematical properties of Understand empirical properties of
models models
Improve/validate on a few, relatively Develop/use tools that can handle
clean, small datasets massive datasets
Publish a paper  Take action!
14
Contrast: Data Engineering

Data Science Data Engineering


Approach Scientific (Exploration) Engineering (Development)
Problems Unbounded Bounded
Path to Solution Iterative, exploratory, Mostly linear
nonlinear
Education More is better (PhD’s BS and/or self-trained
common)
Presentation Skills Important Not as important
Research Important Not as important
Experience
Programming Not as important Important
Skills
Data Skills Important Important

15
Data Science & Academia

• In the words of Alex Szalay, these sorts of researchers must be "Pi-shaped" as


opposed to the more traditional "T-shaped" researcher. In Szalay's view, a
classic PhD program generates T-shaped researchers: scientists with wide-
but-shallow general knowledge, but deep skill and expertise in one particular
area. The new breed of scientific researchers, the data scientists, must be Pi-
shaped: that is, they maintain the same wide breadth, but push deeper both in
their own subject area and in the statistical or computational methods that help
drive modern research:

16
Data Science & Academia

• In a post by Jake Vanderplas in 2014 related to SciFoo discussion on:


Academia and Data Science, the following questions below were
discussed.
• I encourage you to develop your own thoughts on them and come up
with your assessment

– Where does Data Science fit within the current structure of the
university & research institutions?
– What is it that academic data scientists want from their career?
How can academia offer that?
– What drivers might shift academia toward recognizing & rewarding
data scientists in domain fields?
– Recognizing that graduates will go on to work in both academia
and industry, how do we best prepare them for success in both
worlds?
17
Data Science Applications

Business Health Care Urban Leaving


Summary From car design to Tomorrow’s healthcare may For the first time in human
insurance to pizza delivery, look more efficient thanks to history, more people live in
businesses are using data things like electronic health cities than in suburban or
science to optimize their records. It also may look a lot rural areas. An emerging field
operations and better meet more effective. Reduced called “urban informatics”
their customers’ readmissions, better care, and combines data science with
expectations. earlier detection are on the the unique challenges facing
horizon. the world’s growing cities
Two-Way Street for the Reducing Hospital Taking on Megacity Traffic
Ford Focus Electric Car Readmissions
Better Fraud Detection Better Point-of-Care Decisions Fighting Crime with Data
What is
Boosts Customer "predictive policing"
happening?
Satisfaction
E-Commerce Insights:
Domino’s Secret Sauce
What is possible Using Social Data to Medical Exams by Bathroom Instrumenting cities
Select Successful Retail Mirrors
Locations
.

18
Contrast: Computational Sciences

• Is there a contrast between Data Science and


Computational Science?

19
Data Science: Case Study
Cancer Research
• Cancer is an incredibly complex disease; a single tumor can have
more than 100 billion cells, and each cell can acquire mutations
individually. The disease is always changing, evolving, and adapting.
• Employ the power of big data analytics and high-performance
computing.
• Leverage sophisticated pattern and machine learning algorithms to
identify patterns that are potentially linked to cancer
• Huge amount of data processing and recognition

20
Data Science: Case Study
Health Care

• Stanford Medicine, Google


team up to harness power of
data science for health care
• Stanford Medicine will use the
power, security and scale of
Google Cloud Platform to
support precision health and
more efficient patient care.
• Analyzing genetic data
• Focusing on precision health
• Data as the engine that
drives research

http://med.stanford.edu/news/all-news/2016/08/stanford-medicine-google-team-up-to-harness-power-of-data-science.html 21
Data Science: Case Study
Elections
• The Obama campaigns in 2008 and 2012 are credited for their
successful use of social media and data mining.
• Micro-targeting in 2012
– http://www.theatlantic.com/politics/archive/2012/04/the-
creepiness-factor-how-obama-and-romney-are-getting-to-know-
you/255499/
– http://www.mediabizbloggers.com/group-m/How-Data-and-Micro-
Targeting-Won-the-2012-Election-for-Obama---Antony-Young-
Mindshare-North-America.html
• Micro-profiles built from multiple sources accessed by aps, real-
time updating data based on door-to-door visits, focused media
buys, e-mails and Facebook messages highly targeted.
• 1 million people installed the Obama Facebook app that gave
access to info on “friends”.
22
Data Science: Case Study
Internet of Things (IoT)
• The Internet of Things is rapidly growing. It is predicted that more than 25 billion devices
will be connected by 2020.

• The Internet of Things (IOT) will soon produce a massive volume and variety of data at
unprecedented velocity. If "Big Data" is the product of the IOT, "Data Science" is it's
soul. 23
Data Science: Case Study
Customer Analytics

24
Essential Points

• Big Data has given rise to Data Science


• Data science is rooted in solid foundations of
mathematics and statistics, computer science, and
domain knowledge
• Sexy profession – Data Scientists 
• Not every thing with data or science is Data Science!
• The use cases for Data Science are compelling

25
Conclusion

In this section you have learned


• What Big Data Challenges are
• What exactly is Data Science and what do Data
Scientists do
• Data Science contrasted with other disciplines
• Case Study & Use Cases

26
Questions?

27
Thank
You!

http://www.ign.com/articles/2015/12/16/star-wars-the-force-awakens-review

28

You might also like