Modern Data Science - Best Practices For Predictive Analytics
Modern Data Science - Best Practices For Predictive Analytics
PeerPaper Report
Modern Data Science: Best Practices for Predictive Analytics ©2018, IT Central Station
CONTENTS
Page 1. Introduction
Page 7. Conclusion
Modern Data Science: Best Practices for Predictive Analytics ©2018, IT Central Station
INTRODUCTION
Data science and machine learning provide the basis for business growth, cost and risk reduction and even new
business model creation. Implementing predictive analytics does present some challenges, however. The process
can be complex, and it can be difficult to find data scientists and analysts with a mix of the right skillsets.
A drag and drop, visual data science tool, exemplified by IBM SPSS Modeler, enables rapid creation of machine
learning models while making it easy to collaborate with data science and analytics teams as a whole. In particular,
IBM SPSS Modeler extends to the open source environment for data scientists who code in R and Python, where
new innovation and custom algorithms can be built. In this paper, members of IT Central Station who use IBM SPSS
Modeler share their experiences and offer insights and recommended best practices for data science and machine
learning.
Data Science and Machine customers. With data science, they can determine
optimal approaches to customer acquisition, retention,
Learning Overview cross-sell and up-sell as well as segmentation.
The term “data science” refers to a collection of Other examples of data science benefiting businesses
practices that leverage computer power to extract include:
knowledge or insights from data. Businesses can
harness predictive analytics, based on data science, • Sentiment analysis—Analyzing unstructured data
to model behavior based on patterns. Done right, in social media threads and product reviews to
data science delivers value to businesses by enabling improve product messaging and merchandise
them to improve their understanding of operations, selection.
sales growth, customer experience and more. • Purchase intent—Predicting who will buy a specific
product and when the purchase will occur, based
For example, on IT Central Station, a Quantitative on past purchase behavior, browsing behavior,
Researcher at a financial services firm with more than sentiment analysis, demographics and so forth.
10,000 employees described how his company used • Fraud detection—Inspecting transactions and
predictive analytics to estimate the lifetime value of related data, like IP addresses of user devices, to
Modern Data Science: Best Practices for Predictive Analytics ©2018, IT Central Station 1
PRE-EXISTING MACHINE
DATA LEARNING
EXPERIMENT
THEORETICAL MODEL
MODEL VERIFICATION
DOMAIN MODEL
KNOWLEDGE CONSTRUCTION
determine if fraud or other improper activities are IT Central Station members highlighted the following
taking place. issues that can form obstacles to success with data
• Predictive maintenance in industrial operations— science and machine learning projects:
Using data on past repairs and part replacements
to predict when a part will wear out—and replacing • Being unable to hire and retain data scientists—
it before there’s a breakdown in operations or an this is one of the most serious and pervasive
accident. challenges facing organizations interested in
doing predictive analytics.
• Non-intuitive User Interfaces (UIs)—which slow
Challenges to Data Preparation, down data science project implementation.
• Overly long project implementation times—the
Model Development and pace of the predictive analytics lifecycle drags on
Training, and Deployment without the right tooling, people and processes in
place. Getting from the starting line to a working
A successful predictive analytics project doesn’t just prototype often takes too long, and iterations
happen. It’s the result of a series of process steps, are overly time-consuming. Then, getting from
each of which can be difficult and time-consuming. prototype to production may suffer from delays
These include data preparation and development due to technology and process. Legal, security,
of the predictive analytics model, followed by the governance issues and human resources
“training” of the model. The data in its raw form may problems tend to exacerbate the situation.
not be useable. Without effective data preparation, • The need to involve people with diverse
model development and training, the predictive backgrounds—people who don’t usually know
analytics may not work at all. how to write code or create data models.
• Scale and complexity—getting bogged down in a
A number of challenges arise in the predictive analytics complex predictive analytics model with an overly
execution process, depicted in Figure 1. These range ambitious scope; Lacking preparation to scale
from foundation-level deficiencies in platform and and meet service level requirements.
the organization to practical issues in the actual • Integration—with multiple data sources, e.g.
implementation of a model. For one thing, there’s getting blocked from accessing, aggregating and
the not so simple matter of deploying the predictive exploiting data and software assets in a multi-
analytics model in the real world. The scale and scope cloud environment.
of data analytics in a production environment may • Vendor deficits—a lack of expertise in setup and
require further tuning of the model as well as changes ongoing support for data science workloads.
to the compute configuration. All of this takes people,
who are increasingly hard to find.
Modern Data Science: Best Practices for Predictive Analytics ©2018, IT Central Station 2
Solving the Problem: use with the guarantee of robust modeling techniques
and trustworthy accuracy.”
10 Tips for Visual Data Science
A Director of Engineering at a logistics company used
IT Central Station members have shared tips that help SSPS Modeler to create analytical models for use
organizations overcome the challenges in effective cases ranging from pricing to just-in-time inventory
data preparation, model development and training. management. He was pleased that SSPS Modeler
With a visual data science approach based on their allowed his team to put 10 models into production,
use of the IBM SPSS Modeler, they recommend quickly transforming and moving existing models
taking advantage of tools and techniques that speed into the SPSS environment. As he noted, “We saw
up the data science lifecycle. Many of their tips deal increases in accuracy resulting from this. Therefore,
with empowering non-data scientists to accomplish we are running faster and more accurately.”
sophisticated analytic tasks through solutions like IBM
SPSS Modeler, which are designed for the business
or IT generalist.
Modern Data Science: Best Practices for Predictive Analytics ©2018, IT Central Station 3
the user interface is the best part of it. The ability to saved us a lot of time, about 20% or 30% of our time.”
customize some of my streams with R and Python has
been very useful to me. I’ve automated a few things 4. EMPOWER PEOPLE WITH VARYING LEVELS
with that.” OF SKILL WITH AN INTUITIVE USER INTERFACE
Predictive analytics and machine learning projects
3. SEEK ROI BY SPEEDING UP THE END-TO- deliver business results a lot faster when they’re
END DATA SCIENCE LIFECYCLE produced by people with varying skill levels. This is
A machine learning model isn’t earning any Return on a reality given how hard it is to find and retain data
Investment (ROI) until it’s in production and working science professional along with experienced coders.
properly. This issue figured into the comments made Also, given how predictive analytics usually brings
by a Unit Manager at an insurance company with together stakeholders from multiple backgrounds, a
over 1,000 employees. He described his group’s non-technical person familiar with the business issues
capabilities with SSPS Modeler as high, adding, “They may actually be a better candidate to execute the
no longer waste time on modeling and algorithms, project than someone who has mostly technical skills.
meaning they are not coding anymore. For example,
segmentation projects now take one to three months, From this perspective, it makes sense that the
rather than six months to a year, as [they did] before.” Quantitative Researcher at the financial services firm
would describe SSPS Modeler as “a great tool even
For the Enterprise Analytics Manager at the healthcare for an individual with no or basic predictive modeling
company, the advantage of SSPS Modeler was that “it experience.” A Product Team at a healthcare company
minimizes coding.” This meant, “Our go-live process said he would recommend SPSS to someone who has
has been slightly enhanced compared to the previous just started trying to run a lot of modeling. “It’s a good
programmatic process. There is now a faster time to starting point,” he said. “It is very easy to use and will
production from the business end.” The VP, Data and do the basics.”
Analytics at the financial services firm also experienced
a speeding up of his go-live process. He noted, “It’s An Associate Product Manager at a financial services
not just the time to go-live but it’s also the process firm with over 1,000 employees simply said, “IBM
itself. The improvement in terms of performance and was chosen because of usability. It’s point and click,
maintenance is also important. I would say it has whereas the other out-of-the box-solution, or open-
CLOUD CLOUD
Figure 2 - Recommended solution parameters for a fast ROI with predictive analytics.
Modern Data Science: Best Practices for Predictive Analytics ©2018, IT Central Station 4
source solutions, require full-on programming and they can get to business value faster through real-
a much higher skill level.” He framed the idea by time or near real-time deployment. The Director of
reversing the perspective. He said, “If you’re hiring Engineering at the logistics company spoke to the
a data scientist, you don’t need IBM SPSS Modeler. benefits of integration, observing that he had saved
If you only have an MBA who needs to be running time in deployment “based off the ability to build
proofs of concept, then buy IBM SPSS Modeler.” codes quicker.”
5. EXPLOIT A MULTI-CLOUD APPROACH He described how, with SSPS Modeler, his team “put
Machine learning models need to be able to consume them into production because we have collaboration
data from virtually any source. Today, that means employment services, which is another analytic
multiple cloud environments in addition to traditional solution from IBM, so we are able to productionalize
on-premises databases. Figure 2 references this the models and manage the models from this
capability. IT Central Station members recommend environment.” He concluded, “Altogether, this saves
selecting a machine learning solution with multi-cloud us a lot of time versus if we want a programmatic
capabilities. As an Analyst at a transportation company solution and had to have developers write C# and
with more than 10,000 employees described, “We Java around it. Overall, it is a huge increase to time
have a private cloud, which is our corporate cloud. savings.”
Everything is done off of a shared server.” The Director
of Engineering at the logistics company shared that his Other IT Central Station members benefiting from
organization was “using a public Azure cloud. We are SSPS Modeler’s integration capabilities include the IT
not deploying apps, but we are doing the analytics. Specialist at the small government agency, who said,
We are pulling the data in with it. Then, we are writing “We have integration where you can write third-party
the tables.” apps. This sort of feature opens it up to being able to
do anything you want.” The Director of Engineering
“ I use it for quick prototyping. It is just at the logistics company praised SSPS Modeler for
its “integration into all the existing environments.”
a lot faster. So you do not have to
write a bunch of code...” Integration with other business intelligence tools
is of particular importance, as the Analyst at the
transportation company shared. “We are putting
6. PROTOTYPE AND ITERATE QUICKLY seven machine learning models in production to
Rapid prototyping and iterating of machine learning start. We may expand up to 10,” He said. “This is real-
models contributes to faster time to value. Users time, as we are pulling data out of Cognos BI server
of SSPS Modeler appreciate this aspect of the every morning. We manipulate and reload the data
solution. The Director of Engineering at the logistics throughout the day based on parameters that come
company, for example, praised its “ability to quickly in from the field. Then, that gets put back into the
prototype,” while the Associate Product Manager at system and refreshed for the next day.”
the financial services firm liked its “rapid prototyping,
[and] pre-production of models before roll out.” A The IT Specialist at the small government agency
Clinical Assistant Professor added, “I use it for quick also discussed the value of BI integration in terms
prototyping. It is just a lot faster. So you do not have of his future machine learning plans. He said, “We’re
to write a bunch of code, you can throw that stuff on doing real-time right now, but we are doing batch
there pretty quickly and do prototyping quickly.” once we get the server product up and going. In
terms of models, we are getting it off the ground. We
7. INTEGRATE INTO ENVIRONMENTS TO have been using it for about six months, and we have
DEPLOY REAL-TIME AND NEAR-REAL-TIME been just playing with getting our models up and
Machine learning and predictive analytics solutions going, so we actually have the whole pure data and
do not run in isolation. When they can be integrated Hortonworks analytics products that we are going
into broader IT and data management environments, to be deploying in the analytics environment. That’s
Modern Data Science: Best Practices for Predictive Analytics ©2018, IT Central Station 5
where our server product will go. Then we will have experienced data scientists and the subjective,
all of the governance pieces in place to start doing complex nature of the terrain, the right vendor can
production deployment. So, we are almost there.” make a major difference in predictive analytics
outcomes. To this point, the Director of Engineering
8. START SMALL AND SCALE THE SOLUTION at the logistics company shared, “I chose IBM SPSS
UP AND OUT because of their experience with the solution, what
IT Central Station members advise new adopters of they brought to bear, and their relationships.” The
machine learning to start small. Even with intuitive, Analyst at the transportation company added, “The
GUI-based tools, the analytical processes involved most important criteria when selecting a vendor
is sufficiently challenging to make overly ambitious [is] ease of use. They should be able to handle our
early projects an unwise idea. As an Analyst at a unique situation. We have many branches with many
transportation company with over 1,000 employees moving parts, and also a lot of internal customers.”
put it, “Give it a try. Start with a proof of concept
and see where it leads. Right now, I think we have Further to the theme of vendor experience, the
about five or six different machine learning proofs Business Intelligence Manager at the manufacturing
of concept, using real-time data. We’re running them company explained, “What’s most important when
on Bluemix / IBM Cloud.” The Founding Partner at selecting a vendor is the proven practice of the
the tech services company advised, “Do not dive product. [It’s useful] knowing that the product has had
into the server directly. It is very hefty for just doing success for numerous other customers in the past
calculations that can already be done by SQL Server for similar use cases, for similar types of customers. I
R or Oracle. Maximize the utilization of the desktop think knowing that there are a variety of partners out
tool first.” there with expertise in the product is a very strong
selling point for me. I don’t like going to things where
“ Start with a proof of concept and I can’t get help if I get stuck.”
see where it leads. Right now, I think
we have about five or six different
machine learning proofs of concept,
using real-time data.”
Modern Data Science: Best Practices for Predictive Analytics ©2018, IT Central Station 6
CONCLUSION
Getting to success with machine learning and predictive analytics requires a mix of people, processes and tools.
IT Central Station members shared their experiences with the IBM SSPS Modeler to highlight tips for getting the
most out of an investment in machine learning. Their insights emphasize the benefits of solutions that enable non-
coders and non-data scientists to build and deploy models for data science projects for enterprise deployment.
More broadly, they recommend solutions that make it possible for machine learning projects to advance quickly by
streamlining the processes of data preparation and model creation.
According to IT Central Station members, effective machine learning and predictive analytics flow from visual data
science solutions that deploy quickly through the use of GUI-based machine learning algorithms. The goal is to
prototype and iterate quickly. Open source compatibility, especially with R and Python, further accelerate the modeling
processing. Integration with a variety of environments, coupled with a multi-cloud approach, facilitates access to data
resources in multiple locations. With an experienced vendor and the right solution, it is possible to derive desired
business outcome and realize strong ROI with data science and machine learning in a business context.
Modern Data Science: Best Practices for Predictive Analytics ©2018, IT Central Station 7
ABOUT IT CENTRAL STATION
User reviews, candid discussions, and more for enterprise technology professionals.
The Internet has completely changed the way we make buying decisions. We now use ratings and review sites to
see what other real users think before we buy electronics, book a hotel, visit a doctor or choose a restaurant. But in
the world of enterprise technology, most of the information online and in your inbox comes from vendors but what
you really want is objective information from other users. IT Central Station provides technology professionals with
a community platform to share information about enterprise solutions.
IT Central Station is committed to offering user-contributed information that is valuable, objective and relevant. We
validate all reviewers with a triple authentication process, and protect your privacy by providing an environment
where you can post anonymously and freely express your views. As a result, the community becomes a valuable
resource, ensuring you get access to the right information and connect to the right people, whenever you need it.
www.itcentralstation.com
IT Central Station does not endorse or recommend any products or services. The views and opinions of reviewers quoted in
this document, IT Central Station websites, and IT Central Station materials do not reflect the opinions of IT Central Station.
IBM SPSS Modeler is available by subscription, perpetual license or as part of IBM Data Science Experience. To
learn more, please visit: https://www.ibm.com/products/spss-modeler
Modern Data Science: Best Practices for Predictive Analytics ©2018, IT Central Station