PREDICTIVE MODELLING –
LINEAR REGRESSION
SHOWTIME – OTT SERVICES
APRIL 20, 2025
PGPDSBA
Sham Solanki
Contents
Problem Statement .................................................................................................................................................2
Business Context .................................................................................................................................................2
Objective..................................................................................................................................................................2
Data Description..................................................................................................................................................2
Data Overview ...........................................................................................................................................................3
Displaying the first 5 rows of dataset .......................................................................................................3
Displaying the shape of the dataset ...........................................................................................................3
Displaying the Datatypes of the Columns ...............................................................................................3
Displaying the statistical Summary............................................................................................................4
Checking the missing values ..........................................................................................................................4
Exploratory Data Analysis ...................................................................................................................................5
Univariate Analysis ............................................................................................................................................5
Bivariate Analysis ............................................................................................................................................ 10
KEY QUESTIONS .................................................................................................................................................... 21
DATA PROCESSING .............................................................................................................................................. 23
[Link] and Missing values ................................................................................................................. 23
2. Feature Engineering .................................................................................................................................. 24
[Link] Detection and Treatment ......................................................................................................... 25
[Link] Preparation for Modelling ............................................................................................................ 25
Model Building ....................................................................................................................................................... 26
Mean Squared Error (MSE): 0.0025........................................................................................................ 27
Interpretation:................................................................................................................................................... 27
Coefficients.......................................................................................................................................................... 27
Testing Assumptions Of Linear Regression Model .......................................................................... 27
Model Performance Evaluation Based On Different Metrics ...................................................... 30
Actionable Insights & Business Recommendation .................................................................................... 31
Overall Assesssment ............................................................................................................................................ 31
Key Takeaways For Business .......................................................................................................................... 32
Problem Statement
Business Context
An over-the-top (OTT) media service is a media service offered directly to viewers via
the internet. The term is most synonymous with subscription-based video-on-demand
services that offer access to film and television content, including existing series
acquired from other producers, as well as original content produced specifically for the
service. They are typically accessed via websites on personal computers, apps on
smartphones and tablets, or televisions with integrated Smart TV platforms.
Presently, OTT services are at a relatively nascent stage and are widely accepted as a
trending technology across the globe. With the increasing change in customers' social
behavior, which is shifting from traditional subscriptions to broadcasting services and
OTT on-demand video and music subscriptions every year, OTT streaming is expected
to grow at a very fast pace. The global OTT market size was valued at $121.61 billion in
2019 and is projected to reach $1,039.03 billion by 2027, growing at a CAGR of 29.4%
from 2020 to 2027. The shift from television to OTT services for entertainment is driven
by benefits such as on-demand services, ease of access, and access to better networks
and digital connectivity.
With the outbreak of COVID19, OTT services are striving to meet the growing
entertainment appetite of viewers, with some platforms already experiencing a 46%
increase in consumption and subscriber count as viewers seek fresh content. With
innovations and advanced transformations, which will enable the customers to access
everything they want in a single space, OTT platforms across the world are expected to
increasingly attract subscribers on a concurrent basis.
Objective
ShowTime is an OTT service provider and offers a wide variety of content (movies, web
shows, etc.) for its users. They want to determine the driver variables for first-day
content viewership so that they can take necessary measures to improve the viewership
of the content on their platform. Some of the reasons for the decline in viewership of
content would be the decline in the number of people coming to the platform, decreased
marketing spend, content timing clashes, weekends and holidays, etc. They have hired
you as a Data Scientist, shared the data of the current content in their platform, and
asked you to analyze the data and come up with a linear regression model to determine
the driving factors for first-day viewership.
Data Description
1. visitors: Average number of visitors, in millions, to the platform in the past week
2. ad_impressions: Number of ad impressions, in millions, across all ad campaigns
for the content (running and completed)
3. major_sports_event: Any major sports event on the day
4. genre: Genre of the content
5. dayofweek: Day of the release of the content
6. season: Season of the release of the content
7. views_trailer: Number of views, in millions, of the content trailer
8. views_content: Number of first-day views, in millions, of the content
Data Overview
The initial steps to get an overview of any dataset is to:
• observe the first few rows of the dataset, to check whether the dataset has been
loaded properly or not
• get information about the number of rows and columns in the dataset
• find out the data types of the columns to ensure that data is stored in the
preferred format and the value of each property is as expected.
• check the statistical summary of the dataset to get an overview of the numerical
columns of the data
Displaying the first 5 rows of dataset
Displaying the shape of the dataset
Data has 1000 row and 8 columns
(1000, 8)
Displaying the Datatypes of the Columns
<class '[Link]'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 8 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 visitors 1000 non-null float64
1 ad_impressions 1000 non-null float64
2 major_sports_event 1000 non-null int64
3 genre 1000 non-null object
4 dayofweek 1000 non-null object
5 season 1000 non-null object
6 views_trailer 1000 non-null float64
7 views_content 1000 non-null float64
dtypes: float64(4), int64(1), object(3)
memory usage: 62.6+ KB
Displaying the statistical Summary
Checking the missing values
visitors 0
ad_impressions 0
major_sports_event 0
genre 0
dayofweek 0
season 0
views_trailer 0
views_content 0
dtype: int64
Exploratory Data Analysis
Univariate Analysis
Visitors
Observation
Shape: Approximately normal distribution with a slight right skew Range: Primarily
between 1.2 and 2.2 (likely in millions) Peak: Highest frequency around 1.7-1.8 Key
Observations:
Most content receives between 1.6-1.8 million visitors There's a gradual decline in
frequency for both very low (<1.4) and very high (>2.0) visitor counts The relatively
symmetrical distribution suggests stable platform traffic patterns
Ad Impressions
Observation
Shape: Bimodal distribution (two peaks) Range: 1000-2400 impressions Primary Peak:
Around 1300-1400 impressions Secondary Peak: Around 1600-1700 impressions Key
Observations:
The bimodal nature suggests two distinct advertising strategies or campaigns Majority
of content receives between 1200-1800 ad impressions Long right tail indicates some
content getting significantly higher advertising exposure This could indicate targeted
marketing for certain content types
Views Trailers
Observation
Shape: Highly right-skewed with a sharp peak Range: 25-200 views Peak: Concentrated
around 50 views Key Observations:
Most trailers receive around 50 views Very few trailers receive more than 75 views The
sharp peak suggests consistency in trailer viewing behavior Long right tail indicates a
few highly successful trailer campaigns
Views Content
Observation
Shape: Slightly right-skewed with a clear central tendency Range: 0.2-0.9 (likely in
millions) Peak: Around 0.4-0.5 Key Observations:
Most content receives between 0.4-0.5 million views More symmetric than trailer views,
suggesting more predictable content viewing patterns Gradual decline in higher view
counts indicates natural viewing limits
Genres
Observations
Genre Distribution Pattern:
The distribution is multimodal, with several peaks across different genres "Others"
category shows the highest frequency with around 250 entries Most other genres have
frequencies between 100-125 entries
Genre-specific Frequencies:
"Others" category is significantly overrepresented compared to specific genres Horror,
Thriller, Sci-Fi show similar frequencies (around 100 entries each) Comedy and
Romance also show comparable frequencies to each other Drama and Action have
slightly lower frequencies than other specific genres
Category Groupings:
Thriller and Sci-Fi genres show similar frequencies, suggesting comparable popularity
Comedy and Romance have nearly identical frequencies, possibly indicating viewer
overlap Horror maintains a moderate presence in the collection
Balance Analysis:
Excluding the "Others" category, there's relatively even distribution across genres The
specific genres show a balanced representation in the content library No specific genre
(excluding "Others") dominates or is significantly underrepresented
Bivariate Analysis
Correlation Heatmap
Observation
1. Strong Positive Correlation:
The strongest correlation (0.75) exists between views_trailer and views_content This
suggests that people who watch trailers are likely to watch the actual content The
relationship makes intuitive sense as interest in trailers often translates to content
viewing
1. Weak Correlations:
Visitors and ad_impressions show almost no correlation (0.03) Ad_impressions have
very weak correlations with all other metrics:
Only 0.01 with views_trailer 0.05 with views_content
This suggests ad impressions operate somewhat independently of other metrics
1. Moderate Correlations:
Visitors have a weak positive correlation (0.26) with views_content This indicates that
higher visitor numbers somewhat translate to more content views, but the relationship
isn't very strong Visitors have a slight negative correlation (-0.03) with views_trailer
1. Pattern Insights:
The effectiveness of ads seems questionable given their weak correlations with viewing
metrics Trailer views are a much better predictor of content viewing than visitor
numbers or ad impressions
1. Visitors vs All Other Variables
Visitors vs Genre
Observation
Distribution Patterns:
Action shows highest median visitors (~1.75) Romance and Comedy follow closely
Horror, Thriller, and Sci-Fi show similar visitor patterns 'Others' category shows
highest variability
Key Insights:
Action content drives more platform traffic Traditional genres (Horror, Thriller) show
consistent but lower visitor numbers All genres maintain baseline visitors between 1.4-
1.8 million
Outlier Treatment
# Visitors vs Day of Week
Outlier Treatment
Observation
Temporal Patterns:
Saturday shows highest median visitors Weekdays (Tuesday-Thursday) show relatively
consistent patterns Weekend days (Friday-Sunday) show slightly higher visitor
numbers Monday shows moderate visitor numbers
Before vs After Comparison:
More consistent patterns in the "After" period Reduced outliers in recent data Slightly
higher median values across most days Weekend preference remains consistent
Visitors vs Season
Outlier Treatment
Observation
From the original seasonal distribution (Image 1):
Seasonal Patterns:
All seasons show similar median visitor numbers (around 1.7) The interquartile ranges
(box sizes) are relatively consistent across seasons Each season has some outliers at
higher visitor numbers (around 2.2) The distribution appears fairly symmetric for all
seasons
Looking at the Before/After outlier treatment (Image 2):
Changes After Treatment:
The extreme outliers at the top (around 2.3-2.4) have been removed The overall shape
and central tendency of the distributions remain largely unchanged The whiskers
(vertical lines extending from boxes) are now more uniform across seasons The boxes
(interquartile ranges) maintain their original positions
Impact of Treatment:
The outlier treatment has preserved the underlying seasonal patterns The data is now
more consistent across seasons The treatment appears to have been conservative,
maintaining the natural variation while removing only extreme values Winter and
Summer show slightly higher medians after treatment, but the difference is minimal
Overall, this suggests that:
Visitor numbers are relatively stable across seasons The outlier treatment successfully
removed extreme values while preserving the natural seasonal patterns There's no
strong seasonal effect on visitor numbers The data cleaning has made the distributions
more comparable across seasons without losing important trends
Visitors vs Views Content
Observation
Relationship Pattern:
There's a positive correlation between visitors and content views The relationship
appears to be somewhat linear, but with considerable scatter The density of points is
highest in the middle range (around 0.4-0.6 content views)
Distribution Features:
Most data points cluster between 1.4 and 2.0 visitors Content views mostly fall between
0.3 and 0.7 There are some outlier points, particularly in the upper right quadrant The
scatter becomes more dispersed as content views increase
Notable Insights:
Higher content views generally correspond to higher visitor numbers There's
significant variation in visitor numbers for any given level of content views The
relationship seems to have more variability at higher content view levels
2. Views Content vs Other Variables
Views Content vs Day of Week
Observation
Daily Patterns:
Wednesday shows the highest median content views and largest interquartile range
Friday appears to have the lowest median content views The distribution is relatively
consistent across other days All days show several outliers (dots) above the whiskers,
indicating some unusually high viewing periods
Distribution Characteristics:
Each day shows a similar interquartile range (box size), suggesting consistent
variability The medians (horizontal lines in boxes) are fairly stable across days, mostly
between 0.4 and 0.5 There's slight positive skew across all days, shown by the higher
position of outliers
Outlier Treatment
Views Content vs Season
Observation
The median content views are fairly consistent across seasons, hovering around 0.45-
0.5 Winter appears to have a slightly higher median and larger spread There are several
outliers in all seasons, particularly in Summer, with content views reaching up to 0.8-0.9
Summer and Winter show slightly more variability in their distributions compared to
Spring and Fall
Views Content vs Views Trailer
Observation
There's a clear positive correlation between trailer views and content views The
relationship appears to be non-linear, with the correlation becoming stronger as trailer
views increase There's a dense cluster of data points around 50 trailer views, suggesting
this might be a common baseline The spread of content views increases with higher
trailer views, indicating more variability in content performance for videos with
popular trailers The maximum content views (around 0.9) are associated with higher
trailer views (150-200 range)
Views Content vs Genre
Observation
The median content views are relatively consistent across genres, typically between 0.4-
0.5 Romance and Comedy show slightly higher variability in their distributions Sci-Fi
and Thriller genres have several high-performing outliers (reaching 0.8+ views) The
"Others" category shows the most compact distribution, suggesting more consistent but
moderate performance
3. Ad Impressions
Ad Impressions vs Views Trailer
Observation
There's a clear concentration of trailer views around the 50-view mark across all ad
impression levels There's a scattered distribution of higher trailer views (75-200) that
doesn't show a strong correlation with ad impressions Ad impressions range primarily
from 1000 to 2400, with the densest cluster between 1000-1800 The relationship
doesn't appear to be strongly linear - increasing ad impressions doesn't necessarily lead
to proportionally higher trailer views
Ad Impressions vs Views Content
Observation
Content views are fairly dispersed between 0.3 and 0.8, regardless of ad impression
count There's no clear correlation between ad impressions and content views The
highest content views (0.8-0.9) appear across various ad impression levels The density
of points is highest in the 0.4-0.6 content views range
KEY QUESTIONS
1. What does the distribution of the content look like?
It has a normal distribution but slightly right
skewed and the mean values lie between 4 to
5 million.
[Link] does the distribution of genres look like?
The category others has the highest
content release, followed by the genres
comedy and thrillers
[Link] day of the week on which content is released generally plays a key role in
the viewership. How does the viewership vary with the day of release?
Wednesday and Saturday have the highest
content release. Friday has the least
viewership, while the other days have the
same average views during the content release.
[Link] does the viewership vary with the season of release?
There is not much of an impact of season on
the viewership and release of content.
Summer has the highest viewership followed
by Winter.
5. What is the correlation between trailer views and content views?
There is a positive relationship between
Trailer views and , content views
increase correspondingly.
DATA PROCESSING
[Link] and Missing values
There are no duplicate or missing values.
2. Feature Engineering
1. Data Copy:
Making a copy of the input data to keep the original dataset unchanged.
2. Categorical Encoding:
Converting the genre column into multiple binary columns (one-hot encoding)
and also turn the day o fweek and season columns into numeric codes. This
makes all categorical data usable for modeling.
3. Cyclical Feature Creation:
To capture repeating patterns, transform the numeric day-of-week and season
values using sine and cosine functions, allowing the model to recognize cycles in
the data.
4. Numerical Transformations:
Applying a logarithmic transformation to features like ad impressions, content
views, and trailer views. This reduces the effect of extreme values and helps
normalize their distributions.
5. Interaction Features:
Creating new features by multiplying content views with both the day-of-week
code and the genre code. These capture how the effect of content views might
change depending on the day or genre.
6. Derived Features:
Calculate engagement per visitor (content views divided by visitors) and the
difference between content views and trailer views, providing more insight into
user behavior.
7. Merging Encoded Features:
The one-hot encoded genre columns are added back to the main dataset,
ensuring all relevant information is included.
8. Feature Scaling:
Standardize all important numerical features so they have a mean of zero and a
standard deviation of one, which helps many machine learning models perform
better.
9. Output:
The function returns the processed dataset, now containing a rich set of
engineered features that are ready for use in predictive modeling.
Processed features:
['visitors', 'ad_impressions', 'major_sports_event', 'genre', 'dayofweek',
'season', 'views_trailer', 'views_content', 'dayofweek_num', 'season_num',
'dayofweek_sin', 'dayofweek_cos', 'season_sin', 'season_cos',
'log_ad_impressions', 'log_views_content', 'log_views_trailer',
'interaction_views_day', 'interaction_views_genre', 'engagement_per_visitor',
'conversion_trailer_to_content', 'genre_Comedy', 'genre_Drama', 'genre_Horror',
'genre_Others', 'genre_Romance', 'genre_Sci-Fi', 'genre_Thriller']
Sample of processed data:
visitors ad_impressions major_sports_event genre dayofweek season \
0 -0.147893 -1.108892 0 Horror Wednesday Spring
1 -1.053623 0.220110 1 Thriller Friday Fall
2 -1.010493 -1.228523 1 Thriller Wednesday Fall
3 0.628448 -0.317711 1 Sci-Fi Friday Fall
4 -1.053623 0.220110 0 Sci-Fi Sunday Winter
views_trailer views_content dayofweek_num season_num ... \
0 -0.292011 0.345735 6 1 ...
1 -0.406636 -1.449064 0 0 ...
2 -0.519546 -0.787822 6 0 ...
3 -0.488961 -0.315507 0 0 ...
4 -0.316880 -0.126581 3 3 ...
interaction_views_genre engagement_per_visitor \
0 -0.146168 0.370693
1 0.489149 -0.929328
2 0.927607 -0.233740
3 0.847074 -0.647948
4 0.954451 0.516651
conversion_trailer_to_content genre_Comedy genre_Drama genre_Horror \
0 0.293727 False False True
1 0.403170 False False False
2 0.518344 False False False
3 0.489121 False False False
4 0.317220 False False False
genre_Others genre_Romance genre_Sci-Fi genre_Thriller
0 False False False False
1 False False False True
2 False False False True
3 False False True False
4 False False True False
[5 rows x 28 columns]
Notable Insights:
The data suggests different viewing behaviors across different days and seasons There
appears to be varying levels of engagement between trailers and content The
conversion rates from trailer to content views vary significantly Genre distribution
shows a mix of different content types
[Link] Detection and Treatment
We have carried out some outlier treatment earlier and further
treatment is unnecessary at this point.
[Link] Preparation for Modelling
• Target variable is “views of the content”
• The categorical features are – Major Sport event, Genre, Season, day of the week.
Model Building
Model Performance:
Mean Squared Error: 0.0025
R^2 Score: 0.7743
Model Coefficients:
Feature Coefficient
0 visitors 0.128909
12 dayofweek_Saturday 0.052561
16 dayofweek_Wednesday 0.049532
11 dayofweek_Monday 0.045065
18 season_Summer 0.044605
13 dayofweek_Sunday 0.038818
15 dayofweek_Tuesday 0.032412
19 season_Winter 0.026532
17 season_Spring 0.023201
14 dayofweek_Thursday 0.019637
10 genre_Thriller 0.011518
5 genre_Drama 0.010636
9 genre_Sci-Fi 0.010008
6 genre_Horror 0.009434
7 genre_Others 0.004984
4 genre_Comedy 0.004389
3 views_trailer 0.002311
1 ad_impressions 0.000008
8 genre_Romance -0.001385
2 major_sports_event -0.059559
Mean Squared Error (MSE): 0.0025
Interpretation: This indicates the average squared difference between predicted and
actual values. A lower MSE reflects better model performance.
Comment: While the MSE is low, its meaningfulness depends on the scale of the target
variable (views_content). If the values are small (e.g., between 0 and 1), this MSE is
acceptable; otherwise, it might need further investigation. R² Score: 0.7743
Interpretation:
The R² score explains how much of the variance in the target variable is
explained by the model. Here, approximately 77.43% of the variation in
views_content is explained by the independent variables.
Comment: An R² of 0.77 is generally considered a good fit for linear regression.
However, it also suggests there is still 22.57% of unexplained variance, potentially
due to missing features, non-linear relationships, or noise in the data.
Coefficients
The coefficients indicate the contribution (weight) of each independent variable to the
prediction of the target (views_content). Key observations:
i. Positive Impact: visitors has the highest positive coefficient (0.1289), suggesting it is
the most significant predictor of views_content. Several dayofweek and season variables
(e.g., dayofweek_Saturday, season_Summer) also positively impact the target.
ii. Negative Impact: major_sports_event has the most significant negative impact (-
0.0956), indicating that the presence of major sports events might reduce
views_content. genre_Romance has a slight negative impact, potentially due to lower
engagement for this genre compared to others.
iii. Minimal Impact: Variables like ad_impressions and views_trailer have coefficients
close to zero, suggesting they might have negligible influence on views_content. These
might need further exploration or even exclusion.
Testing Assumptions Of Linear Regression Model
Shapiro-Wilk Test: Test Statistic = 0.9979, p-value = 0.4379
Residuals appear to be normally distributed.
Variance Inflation Factor (VIF):
Feature VIF
0 visitors 26.065912
1 ad_impressions 19.731307
2 major_sports_event 1.742907
3 views_trailer 4.511708
4 genre_Comedy 1.939941
5 genre_Drama 2.068384
6 genre_Horror 2.037733
7 genre_Others 3.261380
8 genre_Romance 2.038509
9 genre_Sci-Fi 1.937147
10 genre_Thriller 2.010787
11 dayofweek_Monday 1.076586
12 dayofweek_Saturday 1.228074
13 dayofweek_Sunday 1.202715
14 dayofweek_Thursday 1.281478
15 dayofweek_Tuesday 1.084141
16 dayofweek_Wednesday 1.897971
17 season_Spring 2.011645
18 season_Summer 2.001957
19 season_Winter 2.088696
Observations
1. Linearity (Residuals vs Fitted Plot):
The residuals plot shows points scattered around the horizontal zero line (red dashed
line) There's no clear pattern in the residuals. This suggests the linearity assumption is
met
1. Normality of Residuals (Q-Q Plot and Shapiro-Wilk Test):
The Q-Q plot shows points following the diagonal red line quite closely. Shapiro-Wilk
test results: statistic = 0.9979, p-value = 0.4379 Since p-value > 0.05, we fail to reject
the null hypothesis that residuals are normally distributed Both visual and statistical
tests strongly support the normality assumption
1. Multicollinearity (VIF values):
Most variables show acceptable VIF values (< 5), except: visitors (VIF = 26.07) and
ad_impressions (VIF = 19.73) show very high multicollinearity. This is a concern as
these variables are strongly correlated, which could affect coefficient stability and
interpretation. Consider removing one of these variables or combining them into a
single metric.
1. Homoscedasticity (Residuals vs Fitted Plot ):
The spread of residuals appears consistent across fitted values. There's no clear funnel
or fan shape. This suggests the homoscedasticity assumption is met.
Model Performance Evaluation Based On Different Metrics
Model Performance Metrics:
Train Mean Absolute Error (MAE): 0.0389
Test Mean Absolute Error (MAE): 0.0399
Train Mean Squared Error (MSE): 0.0024
Test Mean Squared Error (MSE): 0.0025
Train Root Mean Squared Error (RMSE): 0.0489
Test Root Mean Squared Error (RMSE): 0.0500
Train R^2 Score: 0.7868
Test R^2 Score: 0.7743
Adjusted R^2 for Test Set: 0.7661
Observations
1. Mean Absolute Error (MAE) Train MAE: 0.0389 Test MAE: 0.0399 The MAE
values for both train and test sets are very close, indicating that the model
generalizes well and is not overfitting or underfitting. The average prediction
error is approximately 0.04, meaning the model predicts values with minimal
error.
2. Mean Squared Error (MSE) Train MSE: 0.0024 Test MSE: 0.0025 The MSE
values for train and test sets are also very close, further confirming that the
model performs similarly on both sets. The small MSE indicates that the squared
errors (and thus large prediction deviations) are minimal.
3. Root Mean Squared Error (RMSE) Train RMSE: 0.0489 Test RMSE: 0.0500
RMSE gives a sense of the average magnitude of the prediction error in the same
scale as the target variable. A small and comparable RMSE for train and test sets
suggests a good fit and no significant overfitting.
4. R-squared, Train 𝑅 ^ 2 : 0.7868 Test 𝑅 ^2 : 0.7743 𝑅 ^2 indicates how well the
model explains the variability of the dependent variable. Approximately 78.68%
of the variance in the training data and 77.43% of the variance in the test data
are explained by the model. The slight drop in 𝑅 ^2 from train to test is expected
and acceptable, indicating no major overfitting.
5. Adjusted R-squared Adjusted 𝑅 ^2 for Test Set: 0.7661 Adjusted 𝑅 ^2 accounts
for the number of predictors in the model and penalizes unnecessary predictors.
The small difference between 𝑅 ^2 and adjusted 𝑅 ^2 (0.7743, 0.7743 vs.0.7661,
0.7661) suggests that the predictors in the model are relevant and contribute
meaningfully to explaining the target variable.
Actionable Insights & Business Recommendation
Overall Assesssment
The model performs well on both the train and test datasets, with comparable errors
and variance explained. The small difference in performance metrics between train and
test sets indicates that the model generalizes well and is not overfitting. The high 𝑅 2 R
2 and adjusted 𝑅 2 R 2 values suggest that the model explains a significant proportion of
the variance in the target variable.
1. Visitors (Average visitors in the past week): High significance: The number of
visitors to the platform in the past week is a strong predictor of first-day
viewership. A higher visitor count indicates a larger engaged audience, directly
translating to increased content viewership. Business Insight: Boosting platform
traffic (e.g., through promotions, partnerships, or improved user experience) in
the days leading up to content releases can significantly improve first-day
viewership.
2. Ad Impressions (Number of ad impressions): Moderate to high significance:
Increased ad impressions correlate positively with higher first-day viewership,
suggesting that marketing campaigns are effective in driving engagement.
Business Insight: Prioritize well-targeted ad campaigns with high visibility to
ensure new content reaches the maximum audience. Focus on optimizing ad
spend to maximize returns.
3. Major Sports Event: Negative correlation: The presence of a major sports event
reduces viewership, likely due to competition for audience attention. Business
Insight: Avoid scheduling major content releases on days when significant sports
events are happening. Use such periods for less critical content or alternative
marketing strategies.
4. Genre:
Significant variation: Certain genres perform better than others (e.g., action, drama,
comedy). This indicates audience preferences for specific types of content. Business
Insight: Tailor content production and acquisition strategies toward high-performing
genres. For lower-performing genres, consider niche marketing to target specific
audience segments.
1. Day of the Week:
Significant impact: Content released on weekends and holidays typically garners higher
first-day viewership, likely due to increased free time and leisure activities. Business
Insight: Strategically schedule major content releases on weekends and public holidays
to maximize viewership.
1. Season of Release:
Moderate significance: Seasonal factors (e.g., holiday seasons or summer vacations)
affect viewership trends. Content released during popular seasons tends to perform
better. Business Insight: Capitalize on seasonal trends by aligning content releases with
periods of high platform activity.
1. Trailer Views (Number of trailer views):
Highly significant: Trailer views are directly proportional to first-day viewership,
indicating that effective trailers create anticipation and engagement. Business Insight:
Invest in creating high-quality, engaging trailers to build excitement for upcoming
content. Promote trailers across multiple platforms to maximize visibility.
Key Takeaways For Business
1. Boost Platform Traffic:
Increase platform visitors by using strategic campaigns, offering free trials, discounts, or
limited-time access to premium content to attract more users.
1. Optimize Marketing Spend:
Focus ad budgets on campaigns for genres and content types that show high potential
for viewership. Target platforms and times that align with audience habits to maximize
ad impressions.
1. Strategic Scheduling:
Avoid content clashes with major sports events or other high-viewership activities.
Release high-priority content on weekends, holidays, or during seasons with higher
engagement.
1. Focus on High-Performing Genres:
Double down on producing or acquiring content in genres that consistently perform
well. For less popular genres, identify niche audiences and personalize marketing
efforts.
1. Enhance Trailer Effectiveness:
Trailers are critical for driving excitement and engagement. Ensure trailers are
compelling, well-produced, and promoted on both traditional and social media
channels.
1. Seasonal Campaign Planning:
Align major content releases with seasonal periods of higher activity (e.g., holiday
seasons or summer vacations) to maximize reach and viewership.
1. Competitive Landscape:
Monitor competitor activity and schedule content releases strategically to avoid head-
to-head clashes with major sports events, blockbuster releases, or other high-profile
launches.