0% found this document useful (0 votes)
16 views8 pages

Data Analytics I Unit Notes

Uploaded by

jishithashannu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
16 views8 pages

Data Analytics I Unit Notes

Uploaded by

jishithashannu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 8

Data Analytics I Unit Notes

Life Cycle Phases of Data Analytics

Data Analytics Lifecycle :


The Data analytic lifecycle is designed for Big Data problems
and data science projects. The cycle is iterative to represent real
project. To address the distinct requirements for performing
analysis on Big Data, step–by–step methodology is needed to
organize the activities and tasks involved with acquiring,
processing, analyzing, and repurposing data.
• Phase 1: Discovery –
• The data science team learns and investigates the problem.
• Develop context and understanding.
• Come to know about data sources needed and available for the
project.
• The team formulates the initial hypothesis that can be later tested
with data.
• Phase 2: Data Preparation –
• Steps to explore, preprocess, and condition data before modeling and
analysis.
• It requires the presence of an analytic sandbox, the team executes,
loads, and transforms, to get data into the sandbox.
• Data preparation tasks are likely to be performed multiple times and
not in predefined order.
• Several tools commonly used for this phase are – Hadoop, Alpine
Miner, Open Refine, etc.
• Phase 3: Model Planning –
• The team explores data to learn about relationships between
variables and subsequently, selects key variables and the most
suitable models.
• In this phase, the data science team develops data sets for training,
testing, and production purposes.
• Team builds and executes models based on the work done in the
model planning phase.
• Several tools commonly used for this phase are – Matlab and
STASTICA.
• Phase 4: Model Building –
• Team develops datasets for testing, training, and production
purposes.
• Team also considers whether its existing tools will suffice for running
the models or if they need more robust environment for executing
models.
• Free or open-source tools – Rand PL/R, Octave, WEKA.
• Commercial tools – Matlab and STASTICA.
• Phase 5: Communication Results –
• After executing model team need to compare outcomes of modeling
to criteria established for success and failure.
• Team considers how best to articulate findings and outcomes to
various team members and stakeholders, taking into account
warning, assumptions.
• Team should identify key findings, quantify business value, and
develop narrative to summarize and convey findings to stakeholders.
• Phase 6: Operationalize –
• The team communicates benefits of project more broadly and sets up
pilot project to deploy work in controlled way before broadening the
work to full enterprise of users.
• This approach enables team to learn about performance and related
constraints of the model in production environment on small scale
which make adjustments before full deployment.
• The team delivers final reports, briefings, codes.
• Free or open source tools – Octave, WEKA, SQL, MADlib.

You might also like