Project Report MBA 944
Project Report MBA 944
Learning Techniques
Project Report
IIT Kanpur
Table Of Contents
Content
Overview
Key Observations
Findings
Dataset Description
Justification from Literature
Time Series Analysis
Distribution
Heatmap
Decomposition
Rolling Mean
Autocorrelation
Change Point Detection
Volatility
Time Series Forecasting Regression Modelling
Regressors
Model Used
Key Advantages of Prophet Over SARIMAX
Additional Dataset Used for Realistic Modelling
Impact of Revised MSP on Coconut Prices
Impact of Festivals on Coconut Prices
Importance of Copra Lag & Difference Features
Prophet Model Settings
Regression Equation for the model Used
Summary of Model Efficiency & Forecast Accuracy
References
Overview
This dataset contains monthly average price records (in Rs/Ql) of Milling Copra and Coconut Oil Prices
from Kangayam market between January 2012 and March 2025.
Dataset has three columns:
• Product (‘Coconut Oil’, ’Milling Copra’)
• Month
• Monthly Average Price
Let's perform a comprehensive EDA to understand the price trends and patterns.
Key observations:
• The dataset contains 310 monthly average price records
• Date range: 1st January 2012 to 1st March 2025
• Product: Coconut Oil and Milling Copra
• Single market: "Kangayam"
• Price column: "Kangayam-Price(Rs/Ql)"
Findings:
• No missing values found
• No duplicate records found
• Date format standardized
Dataset Description:
Monthly Average Price
Count 311.000000
Mean 10282.399254
Std 3655.568401
Min 3624.193548
25% 7827.822581
50% 9848.387097
75% 12639.950000
Max 22658.096774
Justification from Literature:
• Box & Jenkins (2008): Support ARIMAX for causal time series.
• Chand & Raju (2010): Highlight MSP as a key price control mechanism.
• Both series show stationary seasonality (seasonal variation does not increase with trend),so the additive
model is the better choice for both.
Rolling Mean:
• After smoothing out random noises we can see clear upward trend in both Coconut Oil and Milling Copra.
• Coconut Oil shows sudden market surge in 2014,2018, 2021 whereas in Copra surge around 2015,2018.
• Both the product shows major dip around 2022 to 2024.
• Again market spikes from 2024.
Justification
6 – month rolling mean
• A shorter window(3-month) can react too quickly to changes, and exaggerate small fluctuations.
• A longer window(12-month) can smooth out fluctuations and can hide important fluctuations.
• Many agriculture-commodities (including coconut/copra) have seasonal patterns tied to harvest cycles.
• A 6-month window can help detect pre- and post-harvest price effects or mid-year supply shocks.
• Many governments and financial agencies use semi-annual or quarterly indicators for agriculture-pricing.
• A 6-month rolling mean aligns well with economic understanding, policy evaluations.
Rolling mean curve for Coconut Oil is always above the original curve but not for Milling Copra
• Milling Copra prices are more stable or show cyclical fluctuations, that’s why the rolling mean move above and
below the original line more naturally.
• I have use center = ‘True’, so that while calculating rolling mean it take 3-month past, current and 2-month
future, it helps in getting more balanced values.
• Red line is mostly above blue line in Coconut Oil prices because it contains major price spike which raises the
overall average prices during 2013 to 2014, 2017 to 2018, and 2023 onwards.
Autocorrelation:
• Gradual exponential decay in lags in ACF plot of both Coconut Oil and Copra shows strong autocorrelation.
• This shows current prices are affected by previous month.
• PACF plot for both Coconut oil and copra shows sharp drop after lag 1.
• This PACF plot shows that previous month price has direct impact on current month price.
Formula for ACF:
𝐧−𝐤
𝚺𝐭=𝟏 (𝐱 𝐭 − 𝐱̅)(𝐱 𝐭−𝐤 − 𝐱̅)
𝛒𝐤 =
∑𝐧𝐭=𝟏(𝐱 𝐭 − 𝐱̅)𝟐
• Regression Coefficient βk of the variable xt−k , where the current value xt is regressed on its past k values.
• For both products, you can clearly see three major volatility phases:
o 2014
o 2017–2018
o 2024–2025
• In 2014 we see first major spike where price regime change (confirmed by change point detection)
• In 2017–2018 again we saw peak volatility which is caused by market disruption (possibly policy, weather,
or demand shock)
• In 2024–2025 it rising again possibly due to renewed uncertainty, large price surges.
Why these regressor play vital role in prediction of coconut oil prices
Using Milling Copra, Monsoon, Festival, and Global Coconut Oil Prices as regressors in your forecasting model makes sense
because each factor directly influences coconut oil prices in different ways. Here’s why they matter:
• Endogenous Variable:
o Target variable — the one you're forecasting:
▪ Coconut Oil Price (₹/Quintal)
• Exogenous Variables
o copra_diff:
▪ Monthly price difference or percentage change in Milling Copra prices.
▪ Strong economic link to Coconut Oil as copra is its raw input.
o msp_revised:
▪ MSP (Minimum Support Price) revisions — typically for copra.
▪ Policy-driven, affects supply chain behavior and floor prices.
o monsoon:
▪ Likely a binary flag or rainfall index for the monsoon season.
▪ Impacts coconut cultivation, hence oil production and price.
o global_price_pct_change:
▪ Percentage change in global coconut oil prices.
▪ Reflects trade dynamics and international demand-supply conditions.
o festival_flag:
▪ Binary indicator (0/1) for festival months like Onam, Dussehra, etc.
▪ Captures price surges due to seasonal demand.
Both Prophet and SARIMAX are powerful time-series forecasting models, but following comparison between both give
reasonable justification for using Prophet.
• Monsoon_Data_2020_2025.csv
▪ Columns:
▪ Year: The respective monsoon year.
▪ Onset_Date: The start date of monsoon for that year.
▪ Onset_DOY: The day-of-year (DOY) when monsoon began (e.g., 153 means June 1).
▪ Withdrawal_DOY: The DOY when the monsoon withdrew (missing for 2025).
▪ Monthly Rainfall Percentages (pct_LPA),Jun_pct_LPA, Jul_pct_LPA, Aug_pct_LPA, Sep_pct_LPA: Rainfall
as percentage of Long Period Average (LPA), indicating how wet/dry each month was.
▪ Season_pct_LPA: The overall monsoon season rainfall as a percentage of LPA.
▪ Early_Onset: Binary flag (1 = early monsoon onset, 0 = normal/late onset).
▪ Late_Withdrawal: Binary flag (1 = prolonged withdrawal, 0 = normal exit).
▪ August_Deficit: Binary flag (1 = significant dry spell in August, 0 = normal rainfall).
▪ Stall_Flag: Flag for mid-season monsoon stall periods affecting crop productivity.
▪ June_Heavy: Flag for above-normal early-season rainfall, impacting soil saturation & planting.
▪ Insights for Forecasting Coconut Oil Prices
▪ Early Onset & Heavy June Rainfall : May increase coconut yield, stabilizing prices.
▪ August Deficit or Stall Events : May reduce productivity, leading to price volatility.
▪ Late Withdrawal : Prolonged rains can impact drying & harvesting, possibly delaying supply chains.
• International_CoconutOil.csv
▪ Columns:
▪ Month: The period for which the price is recorded.
▪ Price (Rs/MT): Monthly price of coconut oil per metric ton, showing fluctuations over time.
▪ Change (%): Month-over-month percentage price change, indicating growth or decline.
▪ Insights for Forecasting Coconut Oil Prices
▪ Sharp Price Surges (Nov 2020 & Oct 2021)
o Nov 2020: Price jumped by 23.73%, possibly due to seasonal demand or supply shortages.
o Oct 2021: A massive 31.75% price increase, likely driven by global commodity trends or export
restrictions.
▪ Major Declines (Aug-Sep 2022)
o Prices dropped by 10.2% (Aug) and 9.07% (Sep), indicating oversupply or weak demand during that
season.
▪ Steady Growth from Mid-2023 Onward
o From Aug 2023, prices consistently increased, peaking in Mar-Apr 2024 with a 9.97% and 11.21% rise,
reflecting global market recovery.
▪ Potential Seasonal Effects
o Festive season (Oct-Nov) typically sees price hikes due to increased demand.
o Post-monsoon dips in Aug-Sep may be due to excess supply hitting the market.
• festival_flag
▪ To incorporate seasonal demand fluctuations, a festival flag was introduced in the forecasting model.
▪ This flag accounts for major consumption peaks that typically occur due to cultural and economic activities
during June, July, August, and September.
The coconut oil price trends from 2012 to 2024 in Rs per quintal.
o Prices show fluctuations over time, with notable peaks and declines.
o The price range spans from 5,000 to 22,500 Rs per quintal.
o Festival periods appear to coincide with price changes, likely due to increased demand.
o This suggests a correlation between cultural events and price fluctuations.
Importance of Copra Difference Features
• Growth = 'linear'
o Assumes steady price movements without exponential trends.
o Best when coconut oil prices fluctuate but do not explode upward indefinitely.
• Changepoint Prior Scale = 1.5
o Increases sensitivity to sharp price surges or declines.
o Helps capture major policy shifts (e.g., MSP revisions) or market shocks.
• Seasonality Mode = 'multiplicative'
o Adjusts seasonal effects proportionally to price levels.
o Useful when festival-driven price increases are relative rather than fixed.
• Monthly Seasonality with Fourier Order = 6
o Adds monthly patterns to detect regular price fluctuations.
o Fourier order controls complexity—higher values capture finer seasonality changes.
Why Multiplicative Over Additive?
Where:
• y(t): Observed value (coconut oil price at time t)
• g(t): Trend function (piecewise linear by default)
• βi: Coefficients for each regressor (learned by Prophet)
• s(t): Seasonal component using multiplicative mode (Fourier series for monthly seasonality).
• h(t): Holiday effects (you didn’t explicitly add Prophet holidays, but you created a festival_flag)
• xi(t): Regressors (copra_diff, msp_revised, monsoon, etc.)
• ϵt : Error term representing residuals.
• Omitted Variable Bias: Missing influential factors (e.g., supply chain disruptions, government trade policies)
could skew forecasts.
• Collinearity: If regressors are highly correlated (e.g., copra_diff and global prices), it may be difficult to
distinguish individual effects.
• Incorrect Seasonality Specification: Prophet assumes regular seasonal patterns — if your series has irregular
events (like shifting festivals), these may be misestimated.
• Changepoint Sensitivity Risk: A high changepoint prior scale (1.5) may exaggerate price jumps or trend shifts
without sufficient justification.
• Residual Autocorrelation: Prophet doesn’t directly model autocorrelation in residuals, which can violate
assumptions and reduce accuracy.
• Policy Shocks Not Accounted for: Unexpected government interventions (MSP hikes, import-export policies)
could cause deviations.
• Global Price Volatility Not Fully Captured: International coconut oil price fluctuations may not always impact
domestic prices in a predictable way.
• Climate Variability Beyond Monsoon: Extreme weather events (cyclones, floods) could disrupt normal
monsoon-season predictions.
• Consumer Behavior Shifts: Changing dietary trends or substitutes (e.g., rising palm oil use) could reduce
coconut oil demand, affecting accuracy.
• Unforeseen Economic Crises: Economic downturns or inflation spikes could create price shocks that historical
trends fail to predict.
Observations:
• There’s a strong upward trend in coconut oil prices from late 2024 to mid-2025.
• Peak forecasted price in June 2025 (₹31,390).
• Some volatility is captured in the confidence interval (shaded region), especially after mid-2025.
• The forecast appears to incorporate global price trends, monsoon influence, and festival demand
Forecast vs. Actual Comparison (Mar–Jun 2025):
Forecasted Coconut Oil Prices (Next 6 Months) with Global & Seasonal Influence:
Conclusion
• Overall, your Prophet model has strong predictive power and performs well even in unseen data.
• With a mean absolute percentage error (MAPE) of approximately 6.71%, it shows high reliability for decision-
making.
Interpretation:
• The Prophet model is robust, showing excellent in-sample fit with high R² and low error metrics.
• The inclusion of global price trends and monsoon variables adds meaningful explanatory power.
• Forecast trends align with historical seasonality and external economic signals.
Omnibus/Jarque-Bera tests
suggest residuals are reasonably
normal.
Scenario/Stress Testing Code:
• Blue Line (Base Forecast): Smoothly rising until June–July, then dips slightly by August.
• Red Dotted Line (Stress Scenario): Mimics base shape, but shifted consistently ~7% lower across months.
• Shaded Area: Base scenario confidence interval – the stress scenario mostly stays within the lower bound of the base
case, indicating the stress case is plausible but on the conservative end.
Interpretation:
• Stress scenario still predicts rising prices, but lower amplitude compared to base, indicating demand/market resilience
even under stress.
• Impact magnitude: 6–7% dip under stress suggests:
o External shocks (e.g., weaker global demand, poor monsoon, policy impact) have moderate downward pressure.
o Prices remain elevated overall despite the shock.
• Base vs. Stress Balance:
o The base case provides a growth expectation.
o Stress scenario provides a risk-buffered outlook useful for inventory planning, procurement hedging, or
investment decisions.
Conclusion
• Your scenario forecasting setup effectively models price vulnerability under shocks.
• The stress scenario offers a realistic, lower-bound forecast without being overly pessimistic.
• With only 7% downside, your model suggests strong market fundamentals, and stress-adjusted prices still
remain above historical averages.
Machine Learning Model Used:
Preferred Model : Ensemble Prophet with XG Boost
• Prophet captures:
o Seasonality (daily, weekly, yearly)
o Holidays or events (like South Indian festivals)
o Trend changes
• XGBoost captures:
o Non-linear interactions
o Lagged effects and autoregressive terms
o Residual patterns Prophet may miss
• Leverages Strengths of Both Models:
o Prophet: excels at modeling seasonality, trend, and events like festivals.
o XGBoost: excels at modeling complex, nonlinear relationships, lags, and interactions.
o Together, they handle both global structure (via Prophet) and local variability/noise (via XGBoost).
• Improved Generalization:
o Ensemble methods tend to be more robust and less prone to overfitting, especially when the models
learn different aspects of the data.
Where:
From Prophet
From XGBoost
• Omitted Variable Bias: Missing influential factors (e.g., supply chain disruptions, government trade policies)
could skew forecasts.
• Correlated Predictors : Highly correlated features can confuse the model or inflate variable importance.
• Residual Model Overfitting: XGBoost might fit noise in Prophet residuals, especially on small datasets.
• Incorrect Assumptions by Prophet: If trend or seasonality is wrongly specified, residuals may contain structure
not easily corrected by XGBoost.
• Data Drift: If the distribution of regressors (like monsoon, global prices) changes significantly, XGBoost may fail.
• Concept Drift: If relationships between features and coconut oil prices evolve (e.g., policy change), coefficients
and patterns may no longer hold.
• Overfitting to Training Period: Particularly in small datasets, the hybrid model might capture quirks specific to
the historical period.
• Forecast Horizon Inaccuracy: If future regressor values are estimated (not known), small errors can compound.
Summary of Model Efficiency & Forecast Accuracy
Insights:
• R² = 0.001, Adjusted R² = -0.088: Residuals are almost completely uncorrelated with these features.
• p-values > 0.9 for all predictors: No statistical significance.
• Residuals are white noise (random), validating the XGBoost layer's effectiveness.
Residual Diagnostics:
Residual Distribution
• Excellent Fit: Very close tracking of historical prices with near-zero residuals.
• Robust Forecast: Consistent and economically plausible future trend.
• Effective Ensemble: Prophet handles temporal structure; XGBoost captures residual non-linearities.
• Minimal Overfitting: Despite high R², residuals are noise-like, and generalization appears intact.
Considerations:
• The sharp residual spike in 2025 might need domain-specific explanation (e.g., policy change, supply shock).
• External regressors like MSP, festivals, and global prices do not contribute significantly after XGBoost
correction—suggesting either their effects are already captured or were not significant during this period.
Strengths:
• Forecast Direction Matches Market Trend: Both forecasted and actual prices rise month by month (March to
June).
• Low Error in June (1.3%): Model aligns very closely with real prices toward the later months.
• Average % Error (across 4 months): 6.2%, which is highly acceptable in volatile commodity markets.
• Could be due to regional market dynamics, transportation, local demand-supply constraints, or events not
captured in the national dataset.
Restated Hypothesis:
"Seasonal events (like Onam), government interventions (such as MSP), agro-climatic conditions (e.g., Monsoon), and
international trade variables (e.g., exports/imports) have a significant causal effect on Milling Copra prices in the
Kangayam market."
• Statistical modeling
2. Coconut Development Board (CDB). (2024). Copra and Coconut Oil Price Statistics. Ministry of Agriculture,
Government of India. https://www.coconutboard.gov.in
4. Reserve Bank of India (RBI). (2024). Consumer Price Index and Inflation Reports. https://www.rbi.org.in
5. Directorate General of Foreign Trade (DGFT). (2024). Export Data for Copra and Coconut Oil.
https://www.dgft.gov.in
6. Box, G. E. P., Jenkins, G. M., Reinsel, G. C., & Ljung, G. M. (2015). Time Series Analysis: Forecasting and Control
(5th ed.). John Wiley & Sons.
7. Hyndman, R. J., & Athanasopoulos, G. (2021). Forecasting: Principles and Practice (3rd ed.). OTexts.
https://otexts.com/fpp3/
8. Chatfield, C. (2016). The Analysis of Time Series: An Introduction (7th ed.). CRC Press.
9. Mroz, T. A. (1987). The Sensitivity of an Empirical Model of Married Women's Hours of Work to Economic and
Statistical Assumptions. Econometrica, 55(4), 765–799.
10. Kumar, D., & Shinoj, P. (2016). Price Volatility in Coconut Markets of India: An Econometric Analysis. Indian
Journal of Agricultural Economics, 71(3), 370–383.
11. Narayan, P. K., & Smyth, R. (2009). Modeling the Determinants of Commodity Prices: A Review and Empirical
Application to World Market Prices of Copra. Journal of Applied Economics, 12(1), 27–46.
12. Brockwell, P. J., & Davis, R. A. (2016). Introduction to Time Series and Forecasting (3rd ed.). Springer.
13. Montgomery, D. C., Jennings, C. L., & Kulahci, M. (2015). Introduction to Time Series Analysis and Forecasting
(2nd ed.). Wiley.