0% found this document useful (0 votes)
109 views11 pages

Modern Data Science - Best Practices For Predictive Analytics

Uploaded by

carlos padilla
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
109 views11 pages

Modern Data Science - Best Practices For Predictive Analytics

Uploaded by

carlos padilla
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 11

MODERN DATA SCIENCE:

BEST PRACTICES FOR PREDICTIVE ANALYTICS

PeerPaper Report

10 TIPS FROM REAL USERS OF IBM SPSS MODELER


ABSTRACT
Data science and machine learning provides the basis for business
growth, cost and risk reduction and even new business model creation.
Implementing predictive analytics does present some challenges,
however. The process can be complex, and it can be difficult to find
data scientists and analysts with a mix of the right skillsets. A drag
and drop, visual data science tool, exemplified by IBM SPSS Modeler,
enables rapid creation of machine learning models while making it easy
to collaborate with data science and analytics teams as a whole. In this
paper, members of IT Central Station who use IBM SPSS Modeler share
their experiences and offer insights and recommended best practices
for data science and machine learning.

Modern Data Science: Best Practices for Predictive Analytics ©2018, IT Central Station
CONTENTS
Page 1. Introduction

Data Science And Machine Learning Overview

Page 2. Challenges To Data Preparation, Model Development And Training,


And Deployment

Page 3. Solving The Problem: 10 Tips For Visual Data Science

1. Deploy Quickly By Using GUI-Based Machine Learning Algorithms

2. Take Advantage Of Open Source-Based Innovation Including R Or Python

Page 4. 3. Seek ROI By Speeding Up The End-To-End Data Science Lifecycle

4. Empower People With Varying Levels Of Skill With An Intuitive


User Interface

Page 5. 5. Exploit A Multi-Cloud Approach

6. Prototype And Iterate Quickly

7. Integrate Into Environments To Deploy Real-Time And Near-Real-Time

Page 6. 8. Start Small And Scale The Solution Up And Out

9. Leverage Online Documentation

10. Look For Proven Experience And Expertise In A Vendor

Page 7. Conclusion

Modern Data Science: Best Practices for Predictive Analytics ©2018, IT Central Station
INTRODUCTION

Data science and machine learning provide the basis for business growth, cost and risk reduction and even new
business model creation. Implementing predictive analytics does present some challenges, however. The process
can be complex, and it can be difficult to find data scientists and analysts with a mix of the right skillsets.

A drag and drop, visual data science tool, exemplified by IBM SPSS Modeler, enables rapid creation of machine
learning models while making it easy to collaborate with data science and analytics teams as a whole. In particular,
IBM SPSS Modeler extends to the open source environment for data scientists who code in R and Python, where
new innovation and custom algorithms can be built. In this paper, members of IT Central Station who use IBM SPSS
Modeler share their experiences and offer insights and recommended best practices for data science and machine
learning.

Data Science and Machine customers. With data science, they can determine
optimal approaches to customer acquisition, retention,
Learning Overview cross-sell and up-sell as well as segmentation.

The term “data science” refers to a collection of Other examples of data science benefiting businesses
practices that leverage computer power to extract include:
knowledge or insights from data. Businesses can
harness predictive analytics, based on data science, • Sentiment analysis—Analyzing unstructured data
to model behavior based on patterns. Done right, in social media threads and product reviews to
data science delivers value to businesses by enabling improve product messaging and merchandise
them to improve their understanding of operations, selection.
sales growth, customer experience and more. • Purchase intent—Predicting who will buy a specific
product and when the purchase will occur, based
For example, on IT Central Station, a Quantitative on past purchase behavior, browsing behavior,
Researcher at a financial services firm with more than sentiment analysis, demographics and so forth.
10,000 employees described how his company used • Fraud detection—Inspecting transactions and
predictive analytics to estimate the lifetime value of related data, like IP addresses of user devices, to

Modern Data Science: Best Practices for Predictive Analytics ©2018, IT Central Station 1
PRE-EXISTING MACHINE
DATA LEARNING
EXPERIMENT
THEORETICAL MODEL
MODEL VERIFICATION

DOMAIN MODEL
KNOWLEDGE CONSTRUCTION

Figure 1 - High level flow of the predictive analytics modeling process.

determine if fraud or other improper activities are IT Central Station members highlighted the following
taking place. issues that can form obstacles to success with data
• Predictive maintenance in industrial operations— science and machine learning projects:
Using data on past repairs and part replacements
to predict when a part will wear out—and replacing • Being unable to hire and retain data scientists—
it before there’s a breakdown in operations or an this is one of the most serious and pervasive
accident. challenges facing organizations interested in
doing predictive analytics.
• Non-intuitive User Interfaces (UIs)—which slow
Challenges to Data Preparation, down data science project implementation.
• Overly long project implementation times—the
Model Development and pace of the predictive analytics lifecycle drags on
Training, and Deployment without the right tooling, people and processes in
place. Getting from the starting line to a working
A successful predictive analytics project doesn’t just prototype often takes too long, and iterations
happen. It’s the result of a series of process steps, are overly time-consuming. Then, getting from
each of which can be difficult and time-consuming. prototype to production may suffer from delays
These include data preparation and development due to technology and process. Legal, security,
of the predictive analytics model, followed by the governance issues and human resources
“training” of the model. The data in its raw form may problems tend to exacerbate the situation.
not be useable. Without effective data preparation, • The need to involve people with diverse
model development and training, the predictive backgrounds—people who don’t usually know
analytics may not work at all. how to write code or create data models.
• Scale and complexity—getting bogged down in a
A number of challenges arise in the predictive analytics complex predictive analytics model with an overly
execution process, depicted in Figure 1. These range ambitious scope; Lacking preparation to scale
from foundation-level deficiencies in platform and and meet service level requirements.
the organization to practical issues in the actual • Integration—with multiple data sources, e.g.
implementation of a model. For one thing, there’s getting blocked from accessing, aggregating and
the not so simple matter of deploying the predictive exploiting data and software assets in a multi-
analytics model in the real world. The scale and scope cloud environment.
of data analytics in a production environment may • Vendor deficits—a lack of expertise in setup and
require further tuning of the model as well as changes ongoing support for data science workloads.
to the compute configuration. All of this takes people,
who are increasingly hard to find.

Modern Data Science: Best Practices for Predictive Analytics ©2018, IT Central Station 2
Solving the Problem: use with the guarantee of robust modeling techniques
and trustworthy accuracy.”
10 Tips for Visual Data Science
A Director of Engineering at a logistics company used
IT Central Station members have shared tips that help SSPS Modeler to create analytical models for use
organizations overcome the challenges in effective cases ranging from pricing to just-in-time inventory
data preparation, model development and training. management. He was pleased that SSPS Modeler
With a visual data science approach based on their allowed his team to put 10 models into production,
use of the IBM SPSS Modeler, they recommend quickly transforming and moving existing models
taking advantage of tools and techniques that speed into the SPSS environment. As he noted, “We saw
up the data science lifecycle. Many of their tips deal increases in accuracy resulting from this. Therefore,
with empowering non-data scientists to accomplish we are running faster and more accurately.”
sophisticated analytic tasks through solutions like IBM
SPSS Modeler, which are designed for the business
or IT generalist.

1. DEPLOY QUICKLY BY USING GUI-BASED


MACHINE LEARNING ALGORITHMS
“ The ability to customize some of my
streams with R and Python has been
very useful to me.”
Some machine learning tools help speed up
deployment of algorithms by enabling their creation 2. TAKE ADVANTAGE OF OPEN SOURCE-BASED
through Graphical User Interfaces (GUIs) rather than INNOVATION INCLUDING R OR PYTHON
a standard coding process. This is a feature of IBM Open source components often accelerate
SSPS Modeler admired by an IT Specialist at a small implementation of machine learning models, as
government agency. He said, “It gives you a GUI an Analytics Product & Services Manager at a
interface, which is a lot more user-friendly and easier manufacturing company with over 1,000 employees
to use compared to writing R scripts or Python, like explained. He said that SSPS Modeler’s “performance
some Anaconda type code. It makes it more open has been great.” He then added, “I’ve used it for about
and accessible to users that are not as familiar with eight years or so. [It offers] lots of flexibility. It continues
programming.” to be a very flexible platform, so that it handles R and
Python and other types of technology. It seems to be
An Enterprise Analytics Manager at a healthcare growing with [the] additional open-source movement
company with over 1,000 employees chose IBM SSPS out there on different platforms.”
Modeler for machine learning because of its drag-and-
drop algorithm building capabilities. He commented, He advised, “If you’re considering that open source-
“Most of our business analysts are non-technical, so solution, definitely consider [SSPS] Modeler as well.
this was attractive to them.” A Founding Partner at a Put together some kind of proposal that allows you to
tech services company praised SSPS Modeler for its figure out how much time it’s going to take individual
automated data preparation capabilities. Previously, people to create those models, versus being able to
he had many analytics jobs “stuck in Excel due to huge have an out-of-the-box solution that gets your team
numbers of rows.” Now, he can tackle them rapidly, going more immediately.”
noting, “The automated modeling process helps us to
get going so quickly.” The Quantitative Researcher at the financial services
firm found SSPS Modeler “extremely easy to use”
Analytics with visual modeling capabilities are what because “it offers a generous selection of proprietary
drove the interest of a VP, Data and Analytics at a machine learning algorithms with advanced tuning
financial services firm with over 1,000 employees. capabilities and integration with Python.” Similarly, a
For a Senior Operations Manager at a manufacturing Business Intelligence Manager at a manufacturing
company with more than 10,000 employees, the best company with over 1,000 employees added further
feature of SSPS Modeler was “quickness and ease of color by commenting, “I think the ease of use in

Modern Data Science: Best Practices for Predictive Analytics ©2018, IT Central Station 3
the user interface is the best part of it. The ability to saved us a lot of time, about 20% or 30% of our time.”
customize some of my streams with R and Python has
been very useful to me. I’ve automated a few things 4. EMPOWER PEOPLE WITH VARYING LEVELS
with that.” OF SKILL WITH AN INTUITIVE USER INTERFACE
Predictive analytics and machine learning projects
3. SEEK ROI BY SPEEDING UP THE END-TO- deliver business results a lot faster when they’re
END DATA SCIENCE LIFECYCLE produced by people with varying skill levels. This is
A machine learning model isn’t earning any Return on a reality given how hard it is to find and retain data
Investment (ROI) until it’s in production and working science professional along with experienced coders.
properly. This issue figured into the comments made Also, given how predictive analytics usually brings
by a Unit Manager at an insurance company with together stakeholders from multiple backgrounds, a
over 1,000 employees. He described his group’s non-technical person familiar with the business issues
capabilities with SSPS Modeler as high, adding, “They may actually be a better candidate to execute the
no longer waste time on modeling and algorithms, project than someone who has mostly technical skills.
meaning they are not coding anymore. For example,
segmentation projects now take one to three months, From this perspective, it makes sense that the
rather than six months to a year, as [they did] before.” Quantitative Researcher at the financial services firm
would describe SSPS Modeler as “a great tool even
For the Enterprise Analytics Manager at the healthcare for an individual with no or basic predictive modeling
company, the advantage of SSPS Modeler was that “it experience.” A Product Team at a healthcare company
minimizes coding.” This meant, “Our go-live process said he would recommend SPSS to someone who has
has been slightly enhanced compared to the previous just started trying to run a lot of modeling. “It’s a good
programmatic process. There is now a faster time to starting point,” he said. “It is very easy to use and will
production from the business end.” The VP, Data and do the basics.”
Analytics at the financial services firm also experienced
a speeding up of his go-live process. He noted, “It’s An Associate Product Manager at a financial services
not just the time to go-live but it’s also the process firm with over 1,000 employees simply said, “IBM
itself. The improvement in terms of performance and was chosen because of usability. It’s point and click,
maintenance is also important. I would say it has whereas the other out-of-the box-solution, or open-

DRAG AND DROP/


RAPID
OPEN SOURCE NON-CODE
PROTOTYPING
MODELING

GUI BASED LEARNING ALGORITHMS

CLOUD CLOUD

DATA SOURCE DATA SOURCE

Figure 2 - Recommended solution parameters for a fast ROI with predictive analytics.

Modern Data Science: Best Practices for Predictive Analytics ©2018, IT Central Station 4
source solutions, require full-on programming and they can get to business value faster through real-
a much higher skill level.” He framed the idea by time or near real-time deployment. The Director of
reversing the perspective. He said, “If you’re hiring Engineering at the logistics company spoke to the
a data scientist, you don’t need IBM SPSS Modeler. benefits of integration, observing that he had saved
If you only have an MBA who needs to be running time in deployment “based off the ability to build
proofs of concept, then buy IBM SPSS Modeler.” codes quicker.”

5. EXPLOIT A MULTI-CLOUD APPROACH He described how, with SSPS Modeler, his team “put
Machine learning models need to be able to consume them into production because we have collaboration
data from virtually any source. Today, that means employment services, which is another analytic
multiple cloud environments in addition to traditional solution from IBM, so we are able to productionalize
on-premises databases. Figure 2 references this the models and manage the models from this
capability. IT Central Station members recommend environment.” He concluded, “Altogether, this saves
selecting a machine learning solution with multi-cloud us a lot of time versus if we want a programmatic
capabilities. As an Analyst at a transportation company solution and had to have developers write C# and
with more than 10,000 employees described, “We Java around it. Overall, it is a huge increase to time
have a private cloud, which is our corporate cloud. savings.”
Everything is done off of a shared server.” The Director
of Engineering at the logistics company shared that his Other IT Central Station members benefiting from
organization was “using a public Azure cloud. We are SSPS Modeler’s integration capabilities include the IT
not deploying apps, but we are doing the analytics. Specialist at the small government agency, who said,
We are pulling the data in with it. Then, we are writing “We have integration where you can write third-party
the tables.” apps. This sort of feature opens it up to being able to
do anything you want.” The Director of Engineering

“ I use it for quick prototyping. It is just at the logistics company praised SSPS Modeler for
its “integration into all the existing environments.”
a lot faster. So you do not have to
write a bunch of code...” Integration with other business intelligence tools
is of particular importance, as the Analyst at the
transportation company shared. “We are putting
6. PROTOTYPE AND ITERATE QUICKLY seven machine learning models in production to
Rapid prototyping and iterating of machine learning start. We may expand up to 10,” He said. “This is real-
models contributes to faster time to value. Users time, as we are pulling data out of Cognos BI server
of SSPS Modeler appreciate this aspect of the every morning. We manipulate and reload the data
solution. The Director of Engineering at the logistics throughout the day based on parameters that come
company, for example, praised its “ability to quickly in from the field. Then, that gets put back into the
prototype,” while the Associate Product Manager at system and refreshed for the next day.”
the financial services firm liked its “rapid prototyping,
[and] pre-production of models before roll out.” A The IT Specialist at the small government agency
Clinical Assistant Professor added, “I use it for quick also discussed the value of BI integration in terms
prototyping. It is just a lot faster. So you do not have of his future machine learning plans. He said, “We’re
to write a bunch of code, you can throw that stuff on doing real-time right now, but we are doing batch
there pretty quickly and do prototyping quickly.” once we get the server product up and going. In
terms of models, we are getting it off the ground. We
7. INTEGRATE INTO ENVIRONMENTS TO have been using it for about six months, and we have
DEPLOY REAL-TIME AND NEAR-REAL-TIME been just playing with getting our models up and
Machine learning and predictive analytics solutions going, so we actually have the whole pure data and
do not run in isolation. When they can be integrated Hortonworks analytics products that we are going
into broader IT and data management environments, to be deploying in the analytics environment. That’s

Modern Data Science: Best Practices for Predictive Analytics ©2018, IT Central Station 5
where our server product will go. Then we will have experienced data scientists and the subjective,
all of the governance pieces in place to start doing complex nature of the terrain, the right vendor can
production deployment. So, we are almost there.” make a major difference in predictive analytics
outcomes. To this point, the Director of Engineering
8. START SMALL AND SCALE THE SOLUTION at the logistics company shared, “I chose IBM SPSS
UP AND OUT because of their experience with the solution, what
IT Central Station members advise new adopters of they brought to bear, and their relationships.” The
machine learning to start small. Even with intuitive, Analyst at the transportation company added, “The
GUI-based tools, the analytical processes involved most important criteria when selecting a vendor
is sufficiently challenging to make overly ambitious [is] ease of use. They should be able to handle our
early projects an unwise idea. As an Analyst at a unique situation. We have many branches with many
transportation company with over 1,000 employees moving parts, and also a lot of internal customers.”
put it, “Give it a try. Start with a proof of concept
and see where it leads. Right now, I think we have Further to the theme of vendor experience, the
about five or six different machine learning proofs Business Intelligence Manager at the manufacturing
of concept, using real-time data. We’re running them company explained, “What’s most important when
on Bluemix / IBM Cloud.” The Founding Partner at selecting a vendor is the proven practice of the
the tech services company advised, “Do not dive product. [It’s useful] knowing that the product has had
into the server directly. It is very hefty for just doing success for numerous other customers in the past
calculations that can already be done by SQL Server for similar use cases, for similar types of customers. I
R or Oracle. Maximize the utilization of the desktop think knowing that there are a variety of partners out
tool first.” there with expertise in the product is a very strong
selling point for me. I don’t like going to things where

“ Start with a proof of concept and I can’t get help if I get stuck.”
see where it leads. Right now, I think
we have about five or six different
machine learning proofs of concept,
using real-time data.”

9. LEVERAGE ONLINE DOCUMENTATION


Machine learning practitioners are becoming
members of a large community. Many are learning
that others have previously tackled the same kinds
of difficult predictive analytics challenges they are
facing now. With the right vendor, useful solution
ideas show up in documentation. As the Quantitative
Researcher at the financial services firm described,
“[With] the very detailed online documentation and
examples that IBM SPSS Modeler provides, even
a novice employee can start using the tool and
become productive in a short period of time.”

10. LOOK FOR PROVEN EXPERIENCE AND


EXPERTISE IN A VENDOR
Vendor choice, important for success in any IT
scenario, is distinctly relevant for the successful
adoption of machine learning. With the paucity of

Modern Data Science: Best Practices for Predictive Analytics ©2018, IT Central Station 6
CONCLUSION
Getting to success with machine learning and predictive analytics requires a mix of people, processes and tools.
IT Central Station members shared their experiences with the IBM SSPS Modeler to highlight tips for getting the
most out of an investment in machine learning. Their insights emphasize the benefits of solutions that enable non-
coders and non-data scientists to build and deploy models for data science projects for enterprise deployment.
More broadly, they recommend solutions that make it possible for machine learning projects to advance quickly by
streamlining the processes of data preparation and model creation.

According to IT Central Station members, effective machine learning and predictive analytics flow from visual data
science solutions that deploy quickly through the use of GUI-based machine learning algorithms. The goal is to
prototype and iterate quickly. Open source compatibility, especially with R and Python, further accelerate the modeling
processing. Integration with a variety of environments, coupled with a multi-cloud approach, facilitates access to data
resources in multiple locations. With an experienced vendor and the right solution, it is possible to derive desired
business outcome and realize strong ROI with data science and machine learning in a business context.

Modern Data Science: Best Practices for Predictive Analytics ©2018, IT Central Station 7
ABOUT IT CENTRAL STATION
User reviews, candid discussions, and more for enterprise technology professionals.

The Internet has completely changed the way we make buying decisions. We now use ratings and review sites to
see what other real users think before we buy electronics, book a hotel, visit a doctor or choose a restaurant. But in
the world of enterprise technology, most of the information online and in your inbox comes from vendors but what
you really want is objective information from other users. IT Central Station provides technology professionals with
a community platform to share information about enterprise solutions.

IT Central Station is committed to offering user-contributed information that is valuable, objective and relevant. We
validate all reviewers with a triple authentication process, and protect your privacy by providing an environment
where you can post anonymously and freely express your views. As a result, the community becomes a valuable
resource, ensuring you get access to the right information and connect to the right people, whenever you need it.

www.itcentralstation.com

IT Central Station does not endorse or recommend any products or services. The views and opinions of reviewers quoted in
this document, IT Central Station websites, and IT Central Station materials do not reflect the opinions of IT Central Station.

ABOUT IBM SPSS MODELER


IBM SPSS Modeler is a leading visual data science and machine learning solution. It helps enterprises accelerate
time to value and desired outcome by speeding up operational tasks for data scientists. Leading organizations
worldwide rely on IBM for data discovery, predictive analytics, model management and deployment, and machine
learning to monetize data assets. IBM SPSS Modeler empowers organizations to tap data assets and modern
applications with over 40+ out of the box algorithms and models, suited for hybrid, multi - cloud environments with
robust governance and security posture.

IBM SPSS Modeler empower organizations to:


• Take advantage of open source based innovation including R or Python
• Empower data scientists of all skills programmatic and visual
• Exploit hybrid cloud approach – on-prem, public or private clouds
• Start small and scale to enterprise

IBM SPSS Modeler is available by subscription, perpetual license or as part of IBM Data Science Experience. To
learn more, please visit: https://www.ibm.com/products/spss-modeler

Modern Data Science: Best Practices for Predictive Analytics ©2018, IT Central Station

You might also like