0% found this document useful (0 votes)
79 views

Unit 3 Data Analytics

Datarequirementsdefinitionestablishestheprocessusedtoidentify,prioritize,preciselyformulate,andvalidatethedataneededtoachievebusinessobjectives.Whendocumentingdatarequirements,datashouldbereferencedinbusinesslanguage,reusingapprovedstandardbusinesstermsifavailable.Ifbusinesstermshavenotyetbeenstandardizedandapprovedforthedatawithinscope,
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
79 views

Unit 3 Data Analytics

Datarequirementsdefinitionestablishestheprocessusedtoidentify,prioritize,preciselyformulate,andvalidatethedataneededtoachievebusinessobjectives.Whendocumentingdatarequirements,datashouldbereferencedinbusinesslanguage,reusingapprovedstandardbusinesstermsifavailable.Ifbusinesstermshavenotyetbeenstandardizedandapprovedforthedatawithinscope,
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Unit 3

MBA/BBA/B.com /BCA/UGC Net

By
Dr. Anand Vyas
Data Science Project Life Cycle:
• A data science life cycle is an iterative set of data
science steps you take to deliver a project or
analysis. Because every data science project and
team are different, every specific data science life
cycle is different. However, most data science
projects tend to flow through the same general
life cycle of data science steps.
Exploratory Data Analysis
Business Requirement
• Data requirements definition establishes the process used to identify,
prioritize, precisely formulate, and validate the data needed to achieve
business objectives. When documenting data requirements, data should
be referenced in business language, reusing approved standard business
terms if available. If business terms have not yet been standardized and
approved for the data within scope, the data requirements process
provides the occasion to develop them.
• The data requirements analysis process employs a top-down approach
that emphasizes business-driven needs, so the analysis is conducted to
ensure the identified requirements are relevant and feasible. The process
incorporates data discovery and assessment in the context of explicitly
qualified business data consumer needs.

Data Acquisition,
• Data acquisition is the process of sampling signals that measure real
world physical conditions and converting the resulting samples into
digital numeric values that can be manipulated by a computer. Data
acquisition systems, abbreviated by the initialisms DAS, DAQ, or
DAU, typically convert analog waveforms into digital values for
processing. The components of data acquisition systems include:
• Sensors, to convert physical parameters to electrical signals.
• Signal conditioning circuitry, to convert sensor signals into a form
that can be converted to digital values.
• Analog-to-digital converters, to convert conditioned sensor signals
to digital values
Data Preparation,
• Data preparation is the process of gathering,
combining, structuring and organizing data so it can be
used in business intelligence (BI), analytics and data
visualization applications. The components of data
preparation include data pre-processing, profiling,
cleansing, validation and transformation; it often also
involves pulling together data from different internal
systems and external sources.

Hypothesis and,
• Hypothesis testing was introduced by Ronald Fisher,
Jerzy Neyman, Karl Pearson and Pearson’s son, Egon
Pearson. Hypothesis testing is a statistical method
that is used in making statistical decisions using
experimental data. Hypothesis Testing is basically an
assumption that we make about the population
parameter.

Important terms
• (i) Null hypothesis: Null hypothesis is a statistical hypothesis that assumes
that the observation is due to a chance factor. Null hypothesis is denoted
by; H0: μ1 = μ2, which shows that there is no difference between the two
population means.

• (ii) Alternative hypothesis: Contrary to the null hypothesis, the alternative


hypothesis shows that observations are the result of a real effect.

• (iii) Level of significance: Refers to the degree of significance in which we


accept or reject the null-hypothesis. 100% accuracy is not possible for
accepting or rejecting a hypothesis, so we therefore select a level of
significance that is usually 5%.
Importance of Hypothesis testing
• Hypothesis testing is one of the most important
concepts in statistics because it is how you decide if
something really happened, or if certain treatments
have positive effects, or if groups differ from each
other or if one variable predicts another. In short, you
want to proof if your data is statistically significant and
unlikely to have occurred by chance alone. In essence
then, a hypothesis test is a test of significance.
Modeling
• Data Modeling is the process of creating a visual representation of
either a whole information system or parts of it to communicate
connections between data points and structures. The goal is to
illustrate the types of data used and stored within the system, the
relationships among these data types, the ways the data can be
grouped and organized and its formats and attributes.

• Data models are built around business needs. Rules and


requirements are defined upfront through feedback from business
stakeholders so they can be incorporated into the design of a new
system or adapted in the iteration of an existing one.
Types of data models
• Conceptual data models. They are also referred to as domain models and offer a big-picture view of what the
system will contain, how it will be organized, and which business rules are involved. Conceptual models are usually
created as part of the process of gathering initial project requirements. Typically, they include entity classes
(defining the types of things that are important for the business to represent in the data model), their
characteristics and constraints, the relationships between them and relevant security and data integrity
requirements.
• Physical data models. They provide a schema for how the data will be physically stored within a database. As such,
they’re the least abstract of all. They offer a finalized design that can be implemented as a relational database,
including associative tables that illustrate the relationships among entities as well as the primary keys and foreign
keys that will be used to maintain those relationships. Physical data models can include database management
system (DBMS)-specific properties, including performance tuning.
• Logical data models. They are less abstract and provide greater detail about the concepts and relationships in the
domain under consideration. One of several formal data modeling notation systems is followed. These indicate
data attributes, such as data types and their corresponding lengths, and show the relationships among entities.
Logical data models don’t specify any technical system requirements. This stage is frequently omitted in agile or
DevOps practices. Logical data models can be useful in highly procedural implementation environments, or for
projects that are data-oriented by nature, such as data warehouse design or reporting system development.
Evaluation and Interpretation,
• Evaluation plans should illustrate how, where, and from what
sources data will be collected. Quantitative (numeric) and
qualitative (narrative or contextual) data should be collected within
a framework that aligns with stakeholder expectations, project
timelines, and program objectives.
• Data interpretation refers to the process of using diverse analytical
methods to review data and arrive at relevant conclusions. The
interpretation of data helps researchers to categorize, manipulate,
and summarize the information in order to answer critical
questions.

Deployment,
• The concept of deployment in data science refers to
the application of a model for prediction using a new
data. Building a model is generally not the end of the
project. Even if the purpose of the model is to increase
knowledge of the data, the knowledge gained will need
to be organized and presented in a way that the
customer can use it. Depending on the requirements,
the deployment phase can be as simple as generating a
report or as complex as implementing a repeatable
data science process.
Operations,
• Data Operations, combines people, processes, and
products that enable consistent, automated, and
secure data management. It is a delivery system based
on joining and analyzing large databases. Since
Collaboration and Teamwork are the two keys to a
successful business and under this idea, the term
“DataOps” was born. DataOps’s purpose is to be a
cross-functional way of working in terms of the
acquisition, storage, processing, quality monitoring,
execution, betterment, and delivery of information to
the end-user.
Optimization
• Data Optimization is a process that prepares the logical
schema from the data view schema. It is the counterpart of
data de-optimization. Data optimization is an important
aspect in database management in particular and in data
warehouse management in general. Data optimizations is
most commonly known to be a non-specific technique used
by several applications in fetching data from a data source
so that the data could use in data view tools and
applications such as those used in statistical reporting.

You might also like