Predictive modeling is a statistical technique that uses machine learning and data mining to analyze historical and current data to generate models that predict future outcomes. Common predictive models include classification, clustering, forecasting, outlier detection, and time series models. Popular predictive algorithms are random forest, generalized linear models, gradient boosted models, K-means clustering, and Prophet for time series forecasting. The goal of predictive modeling is to analyze patterns in historical data to forecast future trends and events.
Download as DOCX, PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
74 views
Predictive Modeling Lecture Notes 1
Predictive modeling is a statistical technique that uses machine learning and data mining to analyze historical and current data to generate models that predict future outcomes. Common predictive models include classification, clustering, forecasting, outlier detection, and time series models. Popular predictive algorithms are random forest, generalized linear models, gradient boosted models, K-means clustering, and Prophet for time series forecasting. The goal of predictive modeling is to analyze patterns in historical data to forecast future trends and events.
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 11
lOMoARcPSD|30147487
Predictive Modeling - Lecture notes 1
lOMoARcPSD|30147487
PREDICTIVE MODELING
Predictive modeling is a commonly used statistical technique to
predict future behavior. Predictive modeling solutions are a form of data-mining technology that works by analyzing historical and current data and generating a model to help predict future outcomes. In predictive modeling, data is collected, a statistical model is formulated, predictions are made, and the model is validated (or revised) as additional data becomes available. For example, risk models can be created to combine member information in complex ways with demographic and lifestyle information from external sources to improve underwriting accuracy. Predictive models analyze past performance to assess how likely a customer is to exhibit a specific behavior in the future. This category also encompasses models that seek out subtle data patterns to answer questions about customer performance, such as fraud detection models. Predictive models often perform calculations during live transactions—for example, to evaluate the risk or opportunity of a given customer or transaction to guide a decision. If health insurers could accurately predict secular trends (for example, utilization), premiums would be set appropriately, profit targets would be met with more consistency, and health insurers would be more competitive in the marketplace.
Predictive modeling is a method of predicting future outcomes by
using data modeling. It’s one of the premier ways a business can see its path forward and make plans accordingly. While not foolproof, this method tends to have high accuracy rates, which is why it is so commonly used.
Predictive modelling uses statistics to predict outcomes. Most often
the event one wants to predict is in the future, but predictive modelling can be applied to any type of unknown event, regardless of when it occurred. For example, predictive models are often used to detect crimes and identify suspects, after the crime has taken place. lOMoARcPSD|30147487
In many cases the model is chosen on the basis of detection theory to
try to guess the probability of an outcome given a set amount of input data, for example given an email determining how likely that it is spam. Models can use one or more classifiers in trying to determine the probability of a set of data belonging to another set. For example, a model might be used to determine whether an email is spam or "ham" (non-spam). Depending on definitional boundaries, predictive modelling is synonymous with, or largely overlapping with, the field of machine learning, as it is more commonly referred to in academic or research and development contexts. When deployed commercially, predictive modelling is often referred to as predictive analytics. Predictive modelling is often contrasted with causal modelling/analysis. In the former, one may be entirely satisfied to make use of indicators of, or proxies for, the outcome of interest. In the latter, one seeks to determine true cause-and-effect relationships. This distinction has given rise to a burgeoning literature in the fields of research methods and statistics and to the common statement that "correlation does not imply causation".
What Is Predictive Modeling?
In short, predictive modeling is a statistical technique using machine learning and data mining to predict and forecast likely future outcomes with the aid of historical and existing data. It works by analyzing current and historical data and projecting what it learns on a model generated to forecast likely outcomes. Predictive modeling can be used to predict just about anything, from TV ratings and a customer’s next purchase to credit risks and corporate earnings.
A predictive model is not fixed; it is validated or revised regularly to
incorporate changes in the underlying data. In other words, it’s not a one-and-done prediction. Predictive models make assumptions based on what has happened in the past and what is happening now. If incoming, new data shows changes in what is happening now, the impact on the likely future outcome must be recalculated, too. For lOMoARcPSD|30147487
example, a software company could model historical sales data
against marketing expenditures across multiple regions to create a model for future revenue based on the impact of the marketing spend.
Most predictive models work fast and often complete their
calculations in real time. That’s why banks and retailers can, for example, calculate the risk of an online mortgage or credit card application and accept or decline the request almost instantly based on that prediction.
Some predictive models are more complex, such as those used
in computational biology and quantum computing; the resulting outputs take longer to compute than a credit card application but are done much more quickly than was possible in the past thanks to advances in technological capabilities, including computing power.
Top 5 Types of Predictive Models
Fortunately, predictive models don’t have to be created from scratch
for every application. Predictive analytics tools use a variety of vetted models and algorithms that can be applied to a wide spread of use cases.
Predictive modeling techniques have been perfected over time. As we
add more data, more muscular computing, AI and machine learning and see overall advancements in analytics, we’re able to do more with these models.
The top five predictive analytics models are:
1. Classification model: Considered the simplest model, it
categorizes data for simple and direct query response. An example use case would be to answer the question “Is this a fraudulent transaction?” 2. Clustering model: This model nests data together by common attributes. It works by grouping things or people with shared characteristics or behaviors and plans strategies for each group lOMoARcPSD|30147487
at a larger scale. An example is in determining credit risk for a
loan applicant based on what other people in the same or a similar situation did in the past. 3. Forecast model: This is a very popular model, and it works on anything with a numerical value based on learning from historical data. For example, in answering how much lettuce a restaurant should order next week or how many calls a customer support agent should be able to handle per day or week, the system looks back to historical data. 4. Outliers model: This model works by analyzing abnormal or outlying data points. For example, a bank might use an outlier model to identify fraud by asking whether a transaction is outside of the customer’s normal buying habits or whether an expense in a given category is normal or not. For example, a $1,000 credit card charge for a washer and dryer in the cardholder’s preferred big box store would not be alarming, but $1,000 spent on designer clothing in a location where the customer has never charged other items might be indicative of a breached account. 5. Time series model: This model evaluates a sequence of data points based on time. For example, the number of stroke patients admitted to the hospital in the last four months is used to predict how many patients the hospital might expect to admit next week, next month or the rest of the year. A single metric measured and compared over time is thus more meaningful than a simple average.
Common Predictive Algorithms
Predictive algorithms use one of two things: machine learning or deep
learning. Both are subsets of artificial intelligence (AI). Machine learning (ML) involves structured data, such as spreadsheet or machine data. Deep learning (DL) deals with unstructured data such as video, audio, text, social media posts and images—essentially the stuff that humans communicate with that are not numbers or metric reads. lOMoARcPSD|30147487
Some of the more common predictive algorithms are:
1. Random Forest: This algorithm is derived from a combination
of decision trees, none of which are related, and can use both classification and regression to classify vast amounts of data. 2. Generalized Linear Model (GLM) for Two Values: This algorithm narrows down the list of variables to find “best fit.” It can work out tipping points and change data capture and other influences, such as categorical predictors, to determine the “best fit” outcome, thereby overcoming drawbacks in other models, such as a regular linear regression. 3. Gradient Boosted Model: This algorithm also uses several combined decision trees, but unlike Random Forest, the trees are related. It builds out one tree at a time, thus enabling the next tree to correct flaws in the previous tree. It’s often used in rankings, such as on search engine outputs. 4. K-Means: A popular and fast algorithm, K-Means groups data points by similarities and so is often used for the clustering model. It can quickly render things like personalized retail offers to individuals within a huge group, such as a million or more customers with a similar liking of lined red wool coats. 5. Prophet: This algorithm is used in time-series or forecast models for capacity planning, such as for inventory needs, sales quotas and resource allocations. It is highly flexible and can easily accommodate heuristics and an array of useful assumptions.
Predictive modeling is a technique that uses mathematical and
computational methods to predict an event or outcome. A mathematical approach uses an equation-based model that describes the phenomenon under consideration. The model is used to forecast an outcome at some future state or time based upon changes to the model inputs. The model parameters help explain how model inputs influence the outcome. Examples include time-series lOMoARcPSD|30147487
regression models for predicting airline traffic volume or predicting
fuel efficiency based on a linear regression model of engine speed versus load. The computational predictive modeling approach differs from the mathematical approach because it relies on models that are not easy to explain in equation form and often require simulation techniques to create a prediction. This approach is often called “black box” predictive modeling because the model structure does not provide insight into the factors that map model input to outcome. Examples include using neural networks to predict which winery a glass of wine originated from or bagged decision trees for predicting the credit rating of a borrower.
Predictive modeling is often performed using curve and surface
fitting, time series regression, or machine learning approaches. Regardless of the approach used, the process of creating a predictive model is the same across methods. The steps are:
1. Clean the data by removing outliers and treating missing data
2. Identify a parametric or nonparametric predictive modeling approach to use 3. Preprocess the data into a form suitable for the chosen modeling algorithm 4. Specify a subset of the data to be used for training the model 5. Train, or estimate, model parameters from the training data set 6. Conduct model performance or goodness-of-fit tests to check model adequacy 7. Validate predictive modeling accuracy on data not used for calibrating the model 8. Use the model for prediction if satisfied with its performance lOMoARcPSD|30147487
Predictive Modeling and Data Analytics
Predictive modeling is also known as predictive analytics. Generally,
the term “predictive modeling” is favored in academic settings, while “predictive analytics” is the preferred term for commercial applications of predictive modeling.
Successful use of predictive analytics depends heavily on unfettered
access to sufficient volumes of accurate, clean and relevant data. While predictive models can be extraordinarily complex, such as those using decision trees and k-means clustering, the most complex part is always the neural network; that is, the model by which computers are trained to predict outcomes. Machine learning uses a neural network to find correlations in exceptionally large data sets and “to learn” and identify patterns within the data.
Benefits of Predictive Modeling
In a nutshell, predictive analytics reduce time, effort and costs in
forecasting business outcomes. Variables such as environmental factors, competitive intelligence, regulation changes and market conditions can be factored into the mathematical calculation to render more complete views at relatively low costs.
Examples of specific types of forecasting that can benefit businesses
include demand forecasting, headcount planning, churn analysis, external factors, competitive analysis, fleet and IT hardware maintenance and financial risks.
Challenges of Predictive Modeling
It’s essential to keep predictive analytics focused on producing useful
business insights because not everything this technology digs up is useful. Some mined information is of value only in satisfying a lOMoARcPSD|30147487
curious mind and has few or no business implications. Getting side-
tracked is a distraction few businesses can afford.
Also, being able to use more data in predictive modeling is an
advantage only to a point. Too much data can skew the calculation and lead to a meaningless or an erroneous outcome. For example, more coats are sold as the outside temperature drops. But only to a point. People do not buy more coats when it’s -20 degrees Fahrenheit outside than they do when it’s -5 degrees below freezing. At a certain point, cold is cold enough to spur the purchase of coats and more frigid temps no longer appreciably change that pattern.
And with the massive volumes of data involved in predictive
modeling, maintaining security and privacy will also be a challenge. Further challenges rest in machine learning’s limitations.
Limitations of Predictive Modeling
According to a McKinsey report, common limitations and their “best
fixes” include:
1. Errors in data labeling: These can be overcome
with reinforcement learning or generative adversarial networks (GANs). 2. Shortage of massive data sets needed to train machine learning: Apossible fix is “one-shot learning,” wherein a machine learns from a small number of demonstrations rather than on a massive data set. 3. The machine’s inability to explain what and why it did what it did: Machines do not “think” or “learn” like humans. Likewise, their computations can be so exceptionally complex that humans have trouble finding, let alone following, the logic. All this makes it difficult for a machine to explain its work, or for humans to do so. Yet model transparency is necessary for a number of reasons, with human safety chief among them. lOMoARcPSD|30147487
explanations (LIME) and attention techniques. 4. Generalizability of learning, or rather lack thereof: Unlike humans, machines have difficulty carrying what they’ve learned forward. In other words, they have trouble applying what they’ve learned to a new set of circumstances. Whatever it has learned is applicable to one use case only. This is largely why we need not worry about the rise of AI overlords anytime soon. For predictive modeling using machine learning to be reusable—that is, useful in more than one use case—a possible fix is transfer learning. 5. Bias in data and algorithms: Non-representation can skew outcomes and lead to mistreatment of large groups of humans. Further, baked-in biases are difficult to find and purge later. In other words, biases tend to self-perpetuate. This is a moving target, and no clear fix has yet been identified.
The Future of Predictive Modeling
Predictive modeling, also known as predictive analytics, and machine learning are still young and developing technologies, meaning there is much more to come. As techniques, methods, tools and technologies improve, so will the benefits to businesses and societies. However, these are not technologies that businesses can afford to adopt later, after the tech reaches maturity and all the kinks are worked out. The near-term advantages are simply too strong for a late adopter to overcome and remain competitive. Our advice: Understand and deploy the technology now and then grow the business benefits alongside subsequent advances in the technologies. Predictive Modeling in Platforms For all but the largest companies, reaping the benefits of predictive analytics is most easily achieved by using ERP systems that have the technologies built-in and contain pretrained machine learning. For example, planning, forecasting and budgeting features may provide a statistical model engine to lOMoARcPSD|30147487
rapidly model multiple scenarios that deal with changing market
conditions. As another example, a supply planning or supply capacity function can similarly predict potentially late deliveries, purchase or sales orders and other risks or impacts. Alternate suppliers can also be represented on the dashboard to enable companies to pivot to meet manufacturing or distribution requirements. Financial modeling and planning and budgeting are key areas to reap the many benefits of using these advanced technologies without overwhelming your team.