0% found this document useful (0 votes)
2 views10 pages

Module For Data Science

This contains three modules for Data Science course.

Uploaded by

Michael Manalo
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
2 views10 pages

Module For Data Science

This contains three modules for Data Science course.

Uploaded by

Michael Manalo
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 10

MODULE 2:

MODULE 1: MULTIMEDIA

MODULE 1

Data Science

Learning Competencies
2.1 Identify the Organizational Standards for assessing “data-driven” maturity.
2.2 Understanding the relation of Data Science to Big Data.
2.3 Familiarize with the Data Science processes and how it is being conducted.
2.4 Recognize how Data Science Platform helps businesses to turn insights faster
and more efficient.

G11: 16
MODULE 2:

MODULE 1: MULTIMEDIA

INTRODUCTION

Data is the universal thread in today’s increasingly technologically advanced world and is
being transformed into valuable knowledge and powerful capabilities by avantgarde businesses.
While these leading businesses increasingly rely on data to make decisions, others struggle to extract
value from it and fail to realize this data-driven ambition. These organizations are facing an increased
need to adapt in order to stay ahead of the competition. But even so, they frequently lack the
information to shift their initial, intuition-based method of operating.

Furthermore, it is said that humans are sometimes irrational. This inhibits decision-making
and results in inferior decisions. With this, it is asserted that computers, due to their information
processing capabilities, potentially play an essential role as decision support systems. The invention of
the computer sparked interest in how computers may aid business processes by quickly converting
information into economic value.

The concept and desire for a data-driven organization did not emerge from thin air. For a long
time, organizations have been conducting data-driven operations in order to make more factual and
economical decisions. Perhaps, it brings many benefits to businesses to keep up with the rapidly
changing digitized world. Thus, form challenges in conducting and processing data.

In this session, we will go further into Data Science and offer answers to the following
questions:
 What are the standards in assessing organizational maturity?
 How did Data Scientists use Big Data?
 How are Data Science procedures carried out?
 How do Data Science Platforms assist businesses in becoming more efficient?

DATA INPUT

The DELTA+ Model

In 2007, in the book "Competing on Analytics: The New Science of Winning," Thomas
Davenport and Jeanne Harris established the Five Stages of Analytics Maturity. While in 2010, in
their book "Analytics at Work: Smarter Decisions, Better Results," Tom and Jeanne were joined by
Robert Morison in presenting the DELTA Model. Tom and Jeanne revised both frameworks in their
2017 version of "Competing on Analytics." The DELTA+ Model was created by combining two
additional components to the DELTA model.

The DELTA+ Model and the Five Stages of Analytics Maturity have become industry-
standard frameworks for measuring corporate analytics maturity. Let us highlight the essential
features of these frameworks to have ideas on the organization's level of analytics maturity.

The DELTA+ Model comprises seven parts that must develop and mature for businesses to be
successful with their Analytics projects.

G11: 17
MODULE 2:

MODULE 1: MULTIMEDIA

These seven elements are as follows:


 D for data that is integrated, high-quality, and readily available.
 E is responsible for managing analytics resources in a centralized manner throughout the
Enterprise.
 L for solid and dedicated leadership that recognizes the value of analytics and continually
promotes their use in decision-making.
 T for determining the appropriate, strategic, organizational targets that will form the
foundation of an Analytics roadmap.
 A for developing high-performing analytics experts.

The + in the DELTA+ model was given by the continuing development of big data and the
adoption of new analytics methods such as machine learning:

 T stands for the technology that will be used to enable analytics throughout the company.
 A for the many analytical approaches available, which range from simple descriptive statistics
to machine learning.

Let's look at each of these elements.


D stands for Data.
It is no secret that many companies struggle with data quality concerns. For effective
analytics to occur, companies must guarantee that high-quality data is structured and available
to the appropriate individuals.

E stands for Enterprise.


Analytical companies push for a unified and uniform view of analytics across the
company. This is achieved by developing an analytics strategy and a plan for implementing
that approach.

L stands for Leadership.


Analytical businesses are led by executives who completely embrace analytics and
guide the company culture toward data-driven decision-making. Analytics should be
supported by all levels of leadership inside the company.

T stands for Targets.


Analytics initiatives must be connected with particular strategic goals that are also
aligned with the organization's goals. At the most excellent maturity level, these goals are
integrated into the strategic planning process and are seen as business efforts rather than
merely analytics projects.

A stands for Analytics Professionals.


Organizations demand analytical talent that spans a variety of skills and positions.
Once the proper personnel are in place, keeping them engaged through new and challenging
initiatives is critical.

T stands for Technology.


As analytics technology advances quickly, an organization's capacity to install and
maintain the underlying infrastructure, tools, and technologies becomes increasingly critical.

A stands for Analytical Techniques.


Many analytical approaches businesses may employ in their decision-making process
have become more sophisticated as technology has quickly evolved. This might include
anything from basic descriptive statistics to machine learning.

G11: 18
MODULE 2:

MODULE 1: MULTIMEDIA

WORKING WITH DATA AT SCALE

Big data refers to the massive amount of data generated on a daily basis. It is data that is
incredibly big and complicated, and it is generated at an incredible rate. The process of gathering,
storing, and analyzing data in order to derive insights has been practiced for a long time. However, the
phrase "Big Data" only appeared in the late 2000’s.

Big data is based on three variables, which are as follows:


 Volume
 Velocity
 Variety

The most meaningful description is that "big data" occurs when the amount of the data itself
becomes part of the problem. We're talking about data concerns spanning from gigabytes to petabytes.
Traditional data-working approaches eventually run out of steam. The significance of data or big data
rests in what you can do with it rather than the amount of data you have.

For a long time, industries such as manufacturing, retail, oil corporations, telecommunications
companies, financial services, health care, and other data-centric sectors have possessed massive
databases. And, as storage capacity grows, today's "big" is almost definitely tomorrow's "medium"
and next week's "little."

Let's look at some big data use cases to see how companies are using big data more than ever
before. Each use case demonstrates how businesses are using data insights to enhance decision-
making, penetrate new markets, and provide better consumer experiences. For the time being, the use
cases that involve health care as an example will be essential and timely.

Big data is being used by healthcare companies for anything from boosting profitability to
saving lives. Massive volumes of data are collected by healthcare businesses, hospitals, and
researchers. However, none of this information is helpful on its own. It becomes critical when the data
is examined to identify trends and dangers in patterns and to develop prediction models.
 Genomic research
Using big data, researchers can uncover disease genes and biomarkers to assist
patients in identifying potential health concerns. The findings may potentially enable
healthcare organizations to develop tailored therapies.
 Patient experience and outcomes
Healthcare companies strive to deliver better treatment and higher quality care
while keeping costs down. Big data enables them to improve the patient experience in the
most cost-effective way possible. Healthcare companies may use big data to generate a
360-degree perspective of patient care as the patient travels through various therapies and
departments.
 Claims fraud
Each healthcare claim may include hundreds of related reports in a variety of
forms. This makes verifying the integrity of insurance incentive schemes and identifying
trends that suggest fraudulent conduct exceedingly challenging. Big data assists
healthcare companies in detecting possible fraud by highlighting specific patterns for
further investigation.
 Healthcare billing analytics
Big data has the potential to boost the bottom line. Organizations can uncover
missed revenue opportunities and areas where payment cash flows may be improved by
examining billing and claims data. This use case necessitates integrating billing data from

G11: 19
MODULE 2:

MODULE 1: MULTIMEDIA

multiple payers, evaluating a massive volume of that data, and then detecting activity
patterns in the billing data.

Big Data can benefit every industry and every organization. But it has no value unless you know
how to put big data at work.

DATA SCIENCE PROCESSES

In Lesson 1, data scientists are described and briefly discuss what they do daily. Data Science
is a multidimensional field that uses scientific methods, tools, and algorithms to extract knowledge
and insights from structured and unstructured data. But, in truth, he does far more than merely
analyzing data. His work is all about data, but it also incorporates a variety of other data-driven
procedures.

Data Science is a multifaceted field. It systematically uses scientific and statistical


methodologies, procedures, algorithm development, and technologies to extract meaningful
information from data. But how do all of these areas interact with one another? To comprehend this,
first understand the data science process and the day-to-day job of a data scientist.

The steps are involved in the entire data science process:


Step1. Ask Questions
In the first stage, try to acquire a sense of a company's needs and then extract data
from them. You start the data science process by asking the right questions to figure out what
the problem is. Take, for example, a bag company's most typical situation: sales.
To begin analyzing the situation, you must first ask several questions:
 Who is the target market, and who are the customers?
 How are you going to reach the target market?
 What is the present state of the sales process?
 What information do you have about the target market?
 How can we find clients who are more inclined to purchase our product?

Following a meeting with the marketing team, you decide to concentrate on the issue: "How
can we identify potential consumers who are more likely to buy our product?" The next stage is for
you to determine what data you have available to answer the following questions.

Step 2. Collect Data


Now that the business problem has been identified, it is time to collect the data to
support its resolution. Before acquiring data, inquire whether the needed information is
already accessible inside the firm. In several situations, obtain datasets that were previously
gathered in prior studies. Data on the following topics is required: age, gender, past client
transaction history, and so forth.

G11: 20
MODULE 2:

MODULE 1: MULTIMEDIA

The majority of the customer-related data may be found in the company's Customer
Relationship Management (CRM) software, maintained by the sales team. SQL databases,
which include many tables, serve as the backbone of CRM software. Going through the SQL
database, discover that the system maintains extensive identification, contact, and
demographic information about clients (that they provided to the firm), as well as their entire
sales process.

If the current data is insufficient, create plans to acquire more data. Display or distribute a
feedback form to your visitors and customers to solicit feedback. That is a significant amount of
engineering work that will take time and effort. The information gathered is actually 'raw data,' which
contains mistakes and missing values. So, before examining the data, you must first clean it.

Step 3. Explore the Data


Exploring the data is the process of cleansing and arranging it. This procedure
consumes more than 70% of the data scientist's time. It is not yet suitable for use despite
gathering all of the data since the raw data obtained frequently contains anomalies. Check that
the data is clean and error-free. This is an important stage in the procedure, and it demands
patience and concentration. Python, R, SQL, and other tools and approaches are used for this
purpose.
Then you begin answering the following questions:
 Are there missing values, such as clients who do not have contact information?
 Is there a list of incorrect values? How can you solve it if there are any?
 Is there more than one dataset? Is it a brilliant idea to combine datasets? If so, how
should you combine them?

Once the missing and incorrect values in your data have been identified, it is ready for
analysis. Remember that having the wrong insights from data is worse than not having any insights at
all.

Step 4. Model the Data


After reviewing the data, there is enough information to build a model that will
answer the question, "How can we identify potential consumers who are more likely to buy
our product?" Analyze the data to extract information from it in this stage. Analyzing data
necessitates the use of several methods to extract meaning from it:
 Create a data model to answer the query.
 Validate the model using the provided data.
 To show data, several visualization technologies are used.
 Carry out the required algorithms and statistical analyses.
 Contrast the results with those of other methodologies and sources.

Answering these questions, however, will only provide clues and hypotheses. Data modeling
is a primary method of representing data in a suitable equation that the computer can comprehend.
Based on the model, make predictions. Try on multiple models to get the best fit.

Step 5: Communicate the Results


Communication skills are a crucial aspect of the job of a data scientist, but they are
frequently undervalued. This will be a challenging aspect of your employment since it will
require you to explain your results to the public and other team members in a way that they
can understand.
Effectively communicate the outcomes of the previously described problem:

G11: 21
MODULE 2:

MODULE 1: MULTIMEDIA

 Graph or chart the data for presentation using R, Python, Tableau, and Excel
programs.
 To fit the findings, use the term "storytelling."
 Respond to the different follow-up questions.
 Data may be presented in a variety of forms, including reports and web pages.
 Answers will always elicit new questions, and the cycle will repeat itself.

DATA SCIENCE PLATFORM

The data science platform delivers new capabilities. Many businesses recognized that data
science activity was inefficient, insecure, and impossible to expand without an integrated platform.
This discovery prompted the creation of data science platforms. These platforms serve as software
centers for all data science activity. According to https://solutionsreview.com/, the Best Data Science
Platforms of 2021 are Altair, Alteryx, Anaconda, Databrix, Dataiku, DataRobot, Domino Data Lab,
Google, H20, IBM, Knime, MatLab, Rapidminer, SAS, and Tibco.

A good platform mitigates many problems associated with deploying data science and enables
organizations to convert their data into insights more quickly and efficiently. Data scientists may work
in a collaborative environment using their favorite open-source tools on a centralized machine
learning platform. All of their work is synchronized via a version control system.

Benefits of a Data Science Platform


A data science platform eliminates redundancy and promotes creativity by allowing teams to
exchange code, findings, and reports. It removes bottlenecks in the workflow by simplifying
management and implementing best practices.
In general, excellent data science systems strive to do the following:
 Make data scientists more productive by assisting them in accelerating and delivering models
with the fewer mistake.
 Make it easy for data scientists to work with vast amounts of data and a wide range of data
types.
 Provide unbiased, auditable, and repeatable enterprise-grade artificial intelligence.

Expert data scientists, citizen data scientists, data engineers, and machine learning engineers
or experts all utilize data science platforms to collaborate. A data science platform, for example, may
allow data scientists to distribute models as APIs, making it simple to integrate them into other
applications. Without having to wait for IT, data scientists may access tools, data, and infrastructure.
The market's need for data science platforms has skyrocketed.

Criteria for Platform


To investigate the potential of data science platforms, here are a few essential features to
consider:

 Select a project-based UI that promotes cooperation. The platform should enable users to
collaborate on a model from conception to completion. It should provide self-service access
to data and resources to all team members.

 Prioritized Integrity and adaptability. Ascertain that the platform supports the most recent
open-source technologies, popular version control providers such as GitHub, GitLab, and
Bitbucket, and tight interaction with other resources.

G11: 22
MODULE 2:

MODULE 1: MULTIMEDIA

 Include skills that are of enterprise standard. As your team develops, be sure the platform
can scale with it. The platform should be highly available, have strict access restrictions, and
be able to handle a large number of concurrent users.

 Increase the level of self-service in data science. Look for a platform that relieves IT and
engineering of the load by allowing data scientists to spin up environments immediately, track
all of their work, and push models into production.

 Ensure that model deployment is as simple as possible. Model deployment and


operationalization are two of the most critical phases in the machine learning lifecycle, but
they are frequently overlooked. Make sure the service you choose facilitates model
operationalization, whether by providing APIs or forcing users to create models in a way that
allows for easy integration.

Data Science Platform is a Right Move


If any of the following is being experience in a business, then the business may be ready for a
data science platform:
 Productivity and cooperation are under stress
 Machine learning models are not auditable or reproducible.
 Models are never used in production.

A data science platform may add significant value to a company.

G11: 23
MODULE 2:

MODULE 1: MULTIMEDIA

DATA CHECK & HANDS ON

You will write a blog entry for this activity. Blogging is the term used to
describe writing, photography, and other forms of media that are self-published online.
Blogging began as a tool for individuals to write diary-style entries, but it has since
been integrated into the websites of many businesses. If you're unfamiliar with blogs,
take a look at the following:
 Careathers, Liz, (2021, June 29). How to Write a Blog Post in 2021: The
Ultimate Guide https://smartblogger.com/how-to-write-a-blog-post/

The content of your blog will be recorded as your Written Work (15 points). It can cover any
topics taught in the lesson. You will need to conduct an additional study on your chosen topics,
consider applicable instances from everyday life, and consider unique case studies that your readers
may not be aware of. Look for research, examples, and case studies to which you can link to illustrate
the data science. Hence, the way you convey your ideas into creativity through graphics, illustrations,
etc. will be graded as your Performance Task, as well as navigation, and team collaboration. This will
be your second entry in your group’s virtual expo.

Please use the rubrics provided below as a guide on how you will be graded.
VIRTUAL EXPO RUBRIC
Exemplary (15) Proficient (12) Partially Incomplete (5)
Proficient (9)
Content The content is rich, Content is There is adequate There is
concise, and complete and detail. Some insufficient
straightforward. includes relevant extraneous detail, or detail is
The content is detail. information and irrelevant and
relevant to the minor gaps are extraneous.
discussed topics and included.
thoroughly answers
the questions.
Creativity/Visual The expo is The expo is visually The main theme Lacks visual
visually sensible. The use of is still clarity. The
effective. graphics/images/ discernible, but graphics/images/
The use of photographs are use of photographs are
graphics/images/ included and graphics/images/ distracting
photographs appropriate. photographs are from the content of
seamlessly relate well included but are the
to the content. used randomly. expo.
Navigation The document is fully Hyperlinks are Hyperlinks are good There are few
hyperlinked. organized but lacks links. Some links are
The index is into logical groups. organization “broken”.
well organized and Not all
easy to possible features
navigate. have been
employed.
Team The group establishes The group establishes The group establishes The group does not
Collaboration and clear and formal roles informal roles for establish roles for
documents clear and for each each
formal each member and member. The member and/or the
roles for each distributes the workload workload is
member and workload could be distributed unequally
distributes the equally. more distributed.
workload equally.
equally.

G11: 24
MODULE 2:

MODULE 1: MULTIMEDIA

REFERENCES

Curry, E. (2016).The Big Data Value Chain: Definitions, Concepts, and Theoretical
Approaches. Springer International Publishing.
Porter ME (1985). Competitive Advantage : Creating and Sustaining Superior
Performance. New York.
Rayport JF, Sviokla JJ (1995) Exploiting the Virtual Value Chain. Harv Bus Rev
Data Flair(n.d.). What is Data Science. https://data-flair.training/blogs/what-is-data-
science/
Oracle.(n.d.).What is Data Science?. https://www.oracle.com/data-science/what-is-
data-science/

MODULE CREATORS

Module Author/Curator : Mrs. Ryanah Ness I. Lalog


Mrs. Floreneth P. Soriano
Template & Layout Designer : Mrs. Jenny P. Macalalad

ANSWER KEY

Answers may vary.

G11: 25

You might also like