0% found this document useful (0 votes)
9 views34 pages

Project Report MBA 944

The project report analyzes the forecasting of copra and coconut oil prices using advanced statistical and machine learning techniques, focusing on a dataset from the Kangayam market covering the period from January 2012 to March 2025. Key findings include significant price trends, volatility, and the impact of various factors such as monsoon, festivals, and global prices on coconut oil pricing. The Prophet model is preferred over SARIMAX for its ability to handle irregular data and nonlinear trends effectively.

Uploaded by

Akansha Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views34 pages

Project Report MBA 944

The project report analyzes the forecasting of copra and coconut oil prices using advanced statistical and machine learning techniques, focusing on a dataset from the Kangayam market covering the period from January 2012 to March 2025. Key findings include significant price trends, volatility, and the impact of various factors such as monsoon, festivals, and global prices on coconut oil pricing. The Prophet model is preferred over SARIMAX for its ability to handle irregular data and nonlinear trends effectively.

Uploaded by

Akansha Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Forecasting Copra and Coconut Oil Prices Using Advanced Statistical and Machine

Learning Techniques

Project Report

Subject Code- MBA944

Submitted By: Submitted To:


Akansha Singh (241561503)
Guide: Prof. Faiz Hamid

IIT Kanpur
Table Of Contents

Content
Overview
Key Observations
Findings
Dataset Description
Justification from Literature
Time Series Analysis
Distribution
Heatmap
Decomposition
Rolling Mean
Autocorrelation
Change Point Detection
Volatility
Time Series Forecasting Regression Modelling
Regressors
Model Used
Key Advantages of Prophet Over SARIMAX
Additional Dataset Used for Realistic Modelling
Impact of Revised MSP on Coconut Prices
Impact of Festivals on Coconut Prices
Importance of Copra Lag & Difference Features
Prophet Model Settings
Regression Equation for the model Used
Summary of Model Efficiency & Forecast Accuracy
References
Overview
This dataset contains monthly average price records (in Rs/Ql) of Milling Copra and Coconut Oil Prices
from Kangayam market between January 2012 and March 2025.
Dataset has three columns:
• Product (‘Coconut Oil’, ’Milling Copra’)
• Month
• Monthly Average Price
Let's perform a comprehensive EDA to understand the price trends and patterns.
Key observations:
• The dataset contains 310 monthly average price records
• Date range: 1st January 2012 to 1st March 2025
• Product: Coconut Oil and Milling Copra
• Single market: "Kangayam"
• Price column: "Kangayam-Price(Rs/Ql)"

Findings:
• No missing values found
• No duplicate records found
• Date format standardized

Dataset Description:
Monthly Average Price
Count 311.000000

Mean 10282.399254

Std 3655.568401

Min 3624.193548

25% 7827.822581

50% 9848.387097

75% 12639.950000

Max 22658.096774
Justification from Literature:
• Box & Jenkins (2008): Support ARIMAX for causal time series.

• Lundberg et al. (2020): Recommend SHAP for transparent model interpretation.

• Chand & Raju (2010): Highlight MSP as a key price control mechanism.

• FAO Reports (2020–2023): Stress global demand impact on agri-prices.

Time Series Analysis


Distribution:
1) Monthly Average Price Per Product
• In the below graph, we can consistently see that the prices of coconut oil are higher than milling copra
throughout the time period.
• There are sudden spikes in prices in both the product in 2014, 2018 and time period between 2020 to
2022.
• There is also sudden drop in both products (coconut oil and copra prices) in time period between 2022
to 2024.
• Coconut oil prices show much more volatility than compared to milling copra.
• Coconut Oil distribution is wider and rightly skewed.
• Coconut Oil shows more variation in prices ranging from 6000 to 22000.
• Milling Copra distribution is more symmetric and much narrower concentrated between 6000 to 12000.
• Coconut Oil prices are more volatile and susceptible to variation, giving more profit to traders.
2) Monthly Average Price By Product and Year
• In the below box plot, Coconut Oil has higher prices and volatility each year.
• Milling Copra is more stable has more consistent price.
• Major peaks in prices of Coconut Oil occurred in 2014, 2018, 2020 - 2021

3) Monthly Average Price By Product and Month (Seasonal Variation)


• In the below box plot, Coconut Oil has higher peaks in start and end of the year, and stabilizes in mid-
year, which shows seasonality.
• End of the year in India coincides with various festivals and religious events increasing the demand for
coconut oil. ex : Sabarimala pilgrimage
• Milling Copra shows more stable and consistent prices throughout the year possibly due to
a. Year round harvesting
b. MSPs
c. Consistent Demand
• Milling Copra has slight rise from July to October indicating seasonal price lift possibly due to
a. Farmers gets high prices from fresh Coconut than Copra
b. July to October is rainy season in India which impacts factors like harvesting, supply,
transportation.
Heatmap:
Quarterly Average Heatmap
• In Coconut Oil we can see the highest price in Q1 and Q4 (recurring seasonal spikes), with highest been in
March 2025.
• There are several high prices values between 2018 and 2021.
• Milling Copra we see significant price rise in 2018 and 2021 in Q1 and Q2,
• whereas in 2017,2020,2024 Q4 observes relevant spike in prices of copra.
Decomposition:
Coconut Oil Decomposition
• In raw time series we see major peaks around 2018 to 2022, trough during 2022 to 2024, then again rise
from 2024.
• In overall time period it shows upward long-term trend.
• There is a strong presence of seasonality, indicating annual cycle.
• Residuals mostly hover around zero, suggesting the decomposition has effectively separated the
predictable parts from the noise.
Milling Copra Decomposition
• In raw time series we see major peaks around 2014 to 2015 and 2018 and between 2020 to 2022, dips
during 2016 and 2022 to 2024, then again rise from 2024.
• In overall time period it shows upward short-term trend.
• There is a strong presence of seasonality, indicating annual cycle with few months having lower prices and
some months observing price hikes.
• Residuals mostly hover around zero, suggesting the decomposition has effectively separated the
predictable parts from the noise.
• Prices are consistent within year pattern.
Justification to use Additive Model
Coconut Oil Prices

• The price level increases significantly from Rs.6,000 to over Rs.22,000.


• But the amplitude of the up-and-down seasonal movement does not grow proportionally.
• Seasonal fluctuations seem fairly stable, even as the overall level rises.
Milling Copra Prices

• Prices rise from Rs.3,500 to Rs.12,500.


• Again, the seasonal spikes/dips appear roughly constant over time.
• No clear sign that seasonality scales with the trend.
Conclusion

• Both series show stationary seasonality (seasonal variation does not increase with trend),so the additive
model is the better choice for both.
Rolling Mean:
• After smoothing out random noises we can see clear upward trend in both Coconut Oil and Milling Copra.
• Coconut Oil shows sudden market surge in 2014,2018, 2021 whereas in Copra surge around 2015,2018.
• Both the product shows major dip around 2022 to 2024.
• Again market spikes from 2024.

Justification
6 – month rolling mean

• A shorter window(3-month) can react too quickly to changes, and exaggerate small fluctuations.

• A longer window(12-month) can smooth out fluctuations and can hide important fluctuations.

• Many agriculture-commodities (including coconut/copra) have seasonal patterns tied to harvest cycles.

• A 6-month window can help detect pre- and post-harvest price effects or mid-year supply shocks.

• Many governments and financial agencies use semi-annual or quarterly indicators for agriculture-pricing.

• A 6-month rolling mean aligns well with economic understanding, policy evaluations.
Rolling mean curve for Coconut Oil is always above the original curve but not for Milling Copra

• Milling Copra prices are more stable or show cyclical fluctuations, that’s why the rolling mean move above and
below the original line more naturally.

• I have use center = ‘True’, so that while calculating rolling mean it take 3-month past, current and 2-month
future, it helps in getting more balanced values.

• Red line is mostly above blue line in Coconut Oil prices because it contains major price spike which raises the
overall average prices during 2013 to 2014, 2017 to 2018, and 2023 onwards.

Autocorrelation:
• Gradual exponential decay in lags in ACF plot of both Coconut Oil and Copra shows strong autocorrelation.
• This shows current prices are affected by previous month.
• PACF plot for both Coconut oil and copra shows sharp drop after lag 1.
• This PACF plot shows that previous month price has direct impact on current month price.
Formula for ACF:
𝐧−𝐤
𝚺𝐭=𝟏 (𝐱 𝐭 − 𝐱̅)(𝐱 𝐭−𝐤 − 𝐱̅)
𝛒𝐤 =
∑𝐧𝐭=𝟏(𝐱 𝐭 − 𝐱̅)𝟐

• ρk is the autocorrelation at lag k


• T is the total number of observations
• Numerator: covariance between xt and xt−k
• Denominator: variance

Formula for PACF:

𝐱 𝐭 = 𝛃𝟏 𝐱 𝐭−𝟏 + 𝛃𝟐 𝐱 𝐭−𝟐 + ⋯ + 𝛃𝐤 𝐱 𝐭−𝐤 + 𝛆𝐭

• Regression Coefficient βk of the variable xt−k , where the current value xt is regressed on its past k values.

Change Point Detection:


Radial Basis Function (RBF)

• RBF model is used to detect non-linear changes in trend.


• Change point around late 2013 to early 2014.
• Before the change point: prices were relatively stable and low.
• After the change point: prices show higher fluctuations and multiple peaks, indicating increased market
dynamics or policy/economic changes.
Radial Basis Function (RBF) Using Rolling Mean over a period of 6-month

• RBF model is used to detect non-linear changes in trend.


• In Coconut Oil prices change point around late 2013 to early 2014, from 2015 to 2017 and between 2023 to
2025.
• In Milling Copra we see significant fluctuations around 2014 – 2015,2017,2023,2025.
• Before the change point: prices were relatively stable and low.
• After the change point: prices show higher fluctuations and multiple peaks, indicating market dynamics or
policy/economic changes.
Volatility:
Annual

• For both products, you can clearly see three major volatility phases:
o 2014
o 2017–2018
o 2024–2025
• In 2014 we see first major spike where price regime change (confirmed by change point detection)
• In 2017–2018 again we saw peak volatility which is caused by market disruption (possibly policy, weather,
or demand shock)
• In 2024–2025 it rising again possibly due to renewed uncertainty, large price surges.

Annual Volatility = 𝝈𝒎𝒐𝒏𝒕𝒉𝒍𝒚 x √𝟏𝟐


Time Series Forecasting Regression Modelling
Regressors:
• Milling Copra: Milling Copra plays a critical role in determining coconut oil prices because it is the primary raw material
used in coconut oil production.
• Festival Impacts: Prices often fluctuate during major cultural events. Adding a flag for Onam, Diwali, and Christmas could
capture short-term demand spikes.
• Global Coconut Oil Prices: If available, include international coconut oil price trends as a regressor to reflect import-
export dependencies.
• Monsoon Impact: The monsoon has a significant impact on coconut oil prices, especially in India and Southeast Asia,
where a large portion of global coconut production takes place.

Why these regressor play vital role in prediction of coconut oil prices
Using Milling Copra, Monsoon, Festival, and Global Coconut Oil Prices as regressors in your forecasting model makes sense
because each factor directly influences coconut oil prices in different ways. Here’s why they matter:

• Milling Copra Prices


o Direct raw material dependency: Coconut oil is extracted from copra, making its price a crucial determinant.
o Strong correlation: If Milling Copra prices rise, coconut oil prices almost always follow due to increased
production costs.
o Lagged effect: Changes in copra prices might not reflect immediately but gradually impact coconut oil over
subsequent months.
• Monsoon (Rainfall & Weather Conditions)
o Affects coconut yield: Good monsoon seasons lead to higher coconut production, reducing prices. Droughts or
excessive rainfall can disrupt yields.
o Supply chain impact: Bad weather can slow down transportation and processing, spiking costs.
o Historical patterns: If past monsoon trends show clear influence on coconut oil prices, they provide predictive
power.
• Festivals & Seasonal Demand
o Consumption spikes: Coconut oil demand surges before festivals (Deepavali, Onam, etc.), driving temporary price
hikes.
o Production planning: Farmers and traders anticipate demand, adjusting stock levels, which affects price
movement.
o Market speculation: Higher demand often triggers early price increases due to supply shortages.
• Global Coconut Oil Prices
o Import-export dependencies: India’s coconut oil market doesn’t operate in isolation—global price trends affect
local pricing.
o Substitution effects: If global prices rise, Indian exporters may sell overseas, reducing domestic supply, pushing
local prices up.
o Macroeconomic influence: Global trade policies, inflation, and currency exchange rates contribute to price
fluctuations.
Why These Regressors Improve Prediction Accuracy
• Each of these factors represents a real-world driver of coconut oil price changes.
• Instead of relying only on past coconut oil prices (autoregressive models), incorporating economic, seasonal, and supply-
side variables enhances reliability.
• A well-tuned model with these regressors can capture sharp price swings, forecast festival-driven hikes, and adjust for
global shifts.

Statistical Model Used:


Variables:

• Endogenous Variable:
o Target variable — the one you're forecasting:
▪ Coconut Oil Price (₹/Quintal)
• Exogenous Variables
o copra_diff:
▪ Monthly price difference or percentage change in Milling Copra prices.
▪ Strong economic link to Coconut Oil as copra is its raw input.
o msp_revised:
▪ MSP (Minimum Support Price) revisions — typically for copra.
▪ Policy-driven, affects supply chain behavior and floor prices.
o monsoon:
▪ Likely a binary flag or rainfall index for the monsoon season.
▪ Impacts coconut cultivation, hence oil production and price.
o global_price_pct_change:
▪ Percentage change in global coconut oil prices.
▪ Reflects trade dynamics and international demand-supply conditions.
o festival_flag:
▪ Binary indicator (0/1) for festival months like Onam, Dussehra, etc.
▪ Captures price surges due to seasonal demand.

Preferred Model : Prophet

Both Prophet and SARIMAX are powerful time-series forecasting models, but following comparison between both give
reasonable justification for using Prophet.

Key Advantages of Prophet Over SARIMAX

• Handles Irregular & Missing Data Well


▪ Prophet automatically fills gaps, interpolates missing values, and deals with holiday effects.
▪ SARIMAX needs complete time-series and struggles with missing values unless explicitly handled.
• Built-in Seasonality Detection
o Prophet automatically learns yearly, weekly, and monthly seasonal patterns and allows you to define custom
seasonality.
o SARIMAX requires manual seasonal order selection, which can be tricky if periodicities are uncertain.
• Nonlinear Trend Handling
o Coconut oil prices fluctuate due to economic shifts, policy changes, and demand spikes. Prophet’s logistic growth
option captures this behavior better than SARIMAX's linear structure.
o SARIMAX assumes constant seasonality & trend patterns, which may fail during sharp price changes (e.g., MSP
revisions or festival-driven surges).
• Easier to Use & Interpret
o Prophet requires fewer manual hyperparameter tuning steps, making it beginner-friendly.
o SARIMAX demands manual model selection (p,d,q) and seasonality (P,D,Q,S) which can be complex and
computationally expensive.
• Works Well with External Regressors
o MSP revision, global coconut oil price, CPI, and festival effects—Prophet directly integrates regressors in a simple
way.
o SARIMAX supports regressors but often needs careful manual adjustment to avoid overfitting.

Additional Dataset Used for Realistic Modelling

• Monsoon_Data_2020_2025.csv
▪ Columns:
▪ Year: The respective monsoon year.
▪ Onset_Date: The start date of monsoon for that year.
▪ Onset_DOY: The day-of-year (DOY) when monsoon began (e.g., 153 means June 1).
▪ Withdrawal_DOY: The DOY when the monsoon withdrew (missing for 2025).
▪ Monthly Rainfall Percentages (pct_LPA),Jun_pct_LPA, Jul_pct_LPA, Aug_pct_LPA, Sep_pct_LPA: Rainfall
as percentage of Long Period Average (LPA), indicating how wet/dry each month was.
▪ Season_pct_LPA: The overall monsoon season rainfall as a percentage of LPA.
▪ Early_Onset: Binary flag (1 = early monsoon onset, 0 = normal/late onset).
▪ Late_Withdrawal: Binary flag (1 = prolonged withdrawal, 0 = normal exit).
▪ August_Deficit: Binary flag (1 = significant dry spell in August, 0 = normal rainfall).
▪ Stall_Flag: Flag for mid-season monsoon stall periods affecting crop productivity.
▪ June_Heavy: Flag for above-normal early-season rainfall, impacting soil saturation & planting.
▪ Insights for Forecasting Coconut Oil Prices
▪ Early Onset & Heavy June Rainfall : May increase coconut yield, stabilizing prices.
▪ August Deficit or Stall Events : May reduce productivity, leading to price volatility.
▪ Late Withdrawal : Prolonged rains can impact drying & harvesting, possibly delaying supply chains.

• International_CoconutOil.csv
▪ Columns:
▪ Month: The period for which the price is recorded.
▪ Price (Rs/MT): Monthly price of coconut oil per metric ton, showing fluctuations over time.
▪ Change (%): Month-over-month percentage price change, indicating growth or decline.
▪ Insights for Forecasting Coconut Oil Prices
▪ Sharp Price Surges (Nov 2020 & Oct 2021)
o Nov 2020: Price jumped by 23.73%, possibly due to seasonal demand or supply shortages.
o Oct 2021: A massive 31.75% price increase, likely driven by global commodity trends or export
restrictions.
▪ Major Declines (Aug-Sep 2022)
o Prices dropped by 10.2% (Aug) and 9.07% (Sep), indicating oversupply or weak demand during that
season.
▪ Steady Growth from Mid-2023 Onward
o From Aug 2023, prices consistently increased, peaking in Mar-Apr 2024 with a 9.97% and 11.21% rise,
reflecting global market recovery.
▪ Potential Seasonal Effects
o Festive season (Oct-Nov) typically sees price hikes due to increased demand.
o Post-monsoon dips in Aug-Sep may be due to excess supply hitting the market.
• festival_flag
▪ To incorporate seasonal demand fluctuations, a festival flag was introduced in the forecasting model.
▪ This flag accounts for major consumption peaks that typically occur due to cultural and economic activities
during June, July, August, and September.

Impact of Revised MSP on Coconut Prices

• Historical Impact of MSP on Coconut Oil Prices


o MSP was first introduced in June 2014, leading to a 2.09% price increase immediately.
o A month later, the price surged by 9.59%, reflecting market adjustments to the new policy.
• Recent MSP Revision in December 2024
o Prices increased by 7.92% in January 2025 following the revision.
o February 2025 saw a smaller 0.53% increase, suggesting stabilization after initial market reactions.
• Market Behavior Post-MSP Adjustments
o The strongest price shifts occur right after MSP changes, but long-term effects tend to stabilize.
o Early MSP adjustments (2014) had sharper market reactions compared to the latest revision (2024).
• Forecasting Implications
o Including MSP as a regressor in forecasting models improves accuracy in predicting policy-driven price movements.
o Future MSP revisions might follow similar initial surges before stabilizing within 1–2 months.
Impact of Festivals on Coconut Prices

The coconut oil price trends from 2012 to 2024 in Rs per quintal.

• Overall Price Trend

o Prices show fluctuations over time, with notable peaks and declines.
o The price range spans from 5,000 to 22,500 Rs per quintal.

• Festival Months Highlighted


Specific months are marked with colored indicators, such as:

o Ganapati Month (Black)


o Onam Month (Red)
o Sabarimala Month (Green)
o Dussehra Month (Yellow)
o Pongal Month (Red X)

• Seasonal Price Impact

o Festival periods appear to coincide with price changes, likely due to increased demand.
o This suggests a correlation between cultural events and price fluctuations.
Importance of Copra Difference Features

• Copra Difference (Month-over-Month Change):


o copra_diff = milling_copra_price.diff() calculates price change from the previous month.
o Highlights price volatility—spikes and dips that might be triggered by events like MSP revisions or monsoon
anomalies.
o Useful for trend analysis, identifying when a price increase/decrease is significant.
• Why These Features Matter in Forecasting
o Improves Model Accuracy: Helps capture sequential dependencies in time-series trends.
o Detects Trend Shifts: Enables recognition of price jumps or crashes based on past patterns.
o Boosts Realism in Forecasts: Ensures models reflect how past movements shape future predictions.

Prophet Model Settings

• Growth = 'linear'
o Assumes steady price movements without exponential trends.
o Best when coconut oil prices fluctuate but do not explode upward indefinitely.
• Changepoint Prior Scale = 1.5
o Increases sensitivity to sharp price surges or declines.
o Helps capture major policy shifts (e.g., MSP revisions) or market shocks.
• Seasonality Mode = 'multiplicative'
o Adjusts seasonal effects proportionally to price levels.
o Useful when festival-driven price increases are relative rather than fixed.
• Monthly Seasonality with Fourier Order = 6
o Adds monthly patterns to detect regular price fluctuations.
o Fourier order controls complexity—higher values capture finer seasonality changes.
Why Multiplicative Over Additive?

• Proportional Seasonal Variations


o With multiplicative mode, seasonal fluctuations increase or decrease relative to the overall price level.
o In contrast, additive seasonality assumes fixed increments, which may underestimate fluctuations at higher
prices.
• Festival-Driven Price Surges
o Prices during peak demand (e.g., Onam, Pongal) rise as a percentage of the base price, making
multiplicative a better fit.
o Additive mode would assume fixed changes, missing the relative intensity of seasonal trends.
• Use Additive Model When?
o If seasonal impacts are constant over time, regardless of base price.
o If external factors like MSP policies impose strict price floors, capping seasonal effects.
Regression Equation for the model Used:
The regression equation for the Prophet model used in your coconut oil price forecasting can be represented as:

Yt = g(t) + s(t) + h(t) + ∑𝒌𝒊=𝟏 𝜷𝒊 𝒙𝒊 (𝒕) + ϵt

Where:
• y(t): Observed value (coconut oil price at time t)
• g(t): Trend function (piecewise linear by default)
• βi: Coefficients for each regressor (learned by Prophet)
• s(t): Seasonal component using multiplicative mode (Fourier series for monthly seasonality).
• h(t): Holiday effects (you didn’t explicitly add Prophet holidays, but you created a festival_flag)
• xi(t): Regressors (copra_diff, msp_revised, monsoon, etc.)
• ϵt : Error term representing residuals.

Coconut_Oil_Pricet = g(t) + s(t) + β1⋅copra_difft + β2⋅msp_revisedt +


β3⋅monsoont + β4⋅global_price_pct_changet + β5⋅festival_flagt + ϵt

Assumptions for the Forecasting Model

• Additive or Multiplicative Structure:


o Trend, seasonality, and regressors combine additively (or multiplicatively, if specified).
• Trend Approximation:
o Long-term movement is piecewise linear (or logistic if growth is saturated).
• Seasonality is Stationary:
o Seasonal effects repeat consistently over time (e.g., same monsoon/festival impact every year)
• Linearity of Regressors:
o Regressors affect the outcome in a linear, time-invariant way.
• No Measurement Error in Regressors:
o Assumes clean, accurate values for all inputs (copra_diff, monsoon, etc.).
• Independent, Normally Distributed Errors:
o Assumes residuals are independent and normally distributed around 0 (though Prophet is robust to
some violations)
Threats to Internal Validity

• Omitted Variable Bias: Missing influential factors (e.g., supply chain disruptions, government trade policies)
could skew forecasts.
• Collinearity: If regressors are highly correlated (e.g., copra_diff and global prices), it may be difficult to
distinguish individual effects.
• Incorrect Seasonality Specification: Prophet assumes regular seasonal patterns — if your series has irregular
events (like shifting festivals), these may be misestimated.
• Changepoint Sensitivity Risk: A high changepoint prior scale (1.5) may exaggerate price jumps or trend shifts
without sufficient justification.
• Residual Autocorrelation: Prophet doesn’t directly model autocorrelation in residuals, which can violate
assumptions and reduce accuracy.

Threats to External Validity

• Policy Shocks Not Accounted for: Unexpected government interventions (MSP hikes, import-export policies)
could cause deviations.
• Global Price Volatility Not Fully Captured: International coconut oil price fluctuations may not always impact
domestic prices in a predictable way.
• Climate Variability Beyond Monsoon: Extreme weather events (cyclones, floods) could disrupt normal
monsoon-season predictions.
• Consumer Behavior Shifts: Changing dietary trends or substitutes (e.g., rising palm oil use) could reduce
coconut oil demand, affecting accuracy.
• Unforeseen Economic Crises: Economic downturns or inflation spikes could create price shocks that historical
trends fail to predict.

Summary of Model Efficiency & Forecast Accuracy

Observations:

• There’s a strong upward trend in coconut oil prices from late 2024 to mid-2025.
• Peak forecasted price in June 2025 (₹31,390).
• Some volatility is captured in the confidence interval (shaded region), especially after mid-2025.
• The forecast appears to incorporate global price trends, monsoon influence, and festival demand
Forecast vs. Actual Comparison (Mar–Jun 2025):

Forecasted Coconut Oil Prices (Next 6 Months) with Global & Seasonal Influence:

Market Price - Coconut Products (https://coconutboard.in/PriceWebReport/Reports.aspx):


• Accuracy:
o The model performs reasonably well in March and April (errors < 8%).
o It slightly overpredicts for May (+13.38% error).
o June prediction is very close to actual (only –1.29% off).
• Trend Capture:
o Your model correctly captures the upward price trend from March to June.
o The forecasted shape and momentum closely follow real market behavior.
• Bias:
o The model tends to overestimate prices slightly in the early months (Mar–May), possibly due to:
o Global price influence being stronger than local dynamics.
o Festival or speculative effects being overestimated.

Conclusion

• Overall, your Prophet model has strong predictive power and performs well even in unseen data.
• With a mean absolute percentage error (MAPE) of approximately 6.71%, it shows high reliability for decision-
making.

Model Performance on Training Data:

Interpretation:

• Very high R² suggests an excellent fit on training data.


• Low MAE and RMSE indicate accurate in-sample predictions, suggesting the model captures historical patterns
well.
OLS Regression Summary:

• The Prophet model is robust, showing excellent in-sample fit with high R² and low error metrics.
• The inclusion of global price trends and monsoon variables adds meaningful explanatory power.
• Forecast trends align with historical seasonality and external economic signals.

Variable Coefficient p-value Interpretation

A ₹1 increase in MSP corresponds to a


msp_revised 5766.01 0 ~₹5,766 increase in price. Highly
significant.

Suggests favorable monsoons strongly


monsoon 20,190.00 0
drive up coconut oil prices.

Global price changes significantly


global_price_pct_change 8,723.52 0.012
influence domestic prices.

copra_diff 0.168 Not statistically significant at 5%.

Surprisingly insignificant — may need


festival_flag 0.485 better feature engineering (e.g., lagging
flag, weighted importance).

VIF values are all < 2 → No


multicollinearity concern.

Omnibus/Jarque-Bera tests
suggest residuals are reasonably
normal.
Scenario/Stress Testing Code:

Stress Test Forecast (Next 6 Months):

• Blue Line (Base Forecast): Smoothly rising until June–July, then dips slightly by August.
• Red Dotted Line (Stress Scenario): Mimics base shape, but shifted consistently ~7% lower across months.
• Shaded Area: Base scenario confidence interval – the stress scenario mostly stays within the lower bound of the base
case, indicating the stress case is plausible but on the conservative end.

Interpretation:

• Stress scenario still predicts rising prices, but lower amplitude compared to base, indicating demand/market resilience
even under stress.
• Impact magnitude: 6–7% dip under stress suggests:
o External shocks (e.g., weaker global demand, poor monsoon, policy impact) have moderate downward pressure.
o Prices remain elevated overall despite the shock.
• Base vs. Stress Balance:
o The base case provides a growth expectation.
o Stress scenario provides a risk-buffered outlook useful for inventory planning, procurement hedging, or
investment decisions.

Conclusion

• Your scenario forecasting setup effectively models price vulnerability under shocks.
• The stress scenario offers a realistic, lower-bound forecast without being overly pessimistic.
• With only 7% downside, your model suggests strong market fundamentals, and stress-adjusted prices still
remain above historical averages.
Machine Learning Model Used:
Preferred Model : Ensemble Prophet with XG Boost

This method combines the strengths of Prophet and XGBoost:

• Prophet captures:
o Seasonality (daily, weekly, yearly)
o Holidays or events (like South Indian festivals)
o Trend changes
• XGBoost captures:
o Non-linear interactions
o Lagged effects and autoregressive terms
o Residual patterns Prophet may miss
• Leverages Strengths of Both Models:
o Prophet: excels at modeling seasonality, trend, and events like festivals.
o XGBoost: excels at modeling complex, nonlinear relationships, lags, and interactions.
o Together, they handle both global structure (via Prophet) and local variability/noise (via XGBoost).
• Improved Generalization:
o Ensemble methods tend to be more robust and less prone to overfitting, especially when the models
learn different aspects of the data.

Regression Equation for the model Used:

• Prophet’s forecast (trend + seasonality + regressors)


• XGBoost's prediction on the Prophet residuals

So the total model can be expressed as:

Where:

• g(t): Prophet's piecewise linear trend


• s(t): Prophet's seasonal component
• βi: Coefficients learned by Prophet on each regressor
• f(⋅): A non-linear function learned by XGBoost to model Prophet’s residuals
Assumptions for the Ensemble Model

From Prophet

• Additive or Multiplicative Structure:


o Trend, seasonality, and regressors combine additively (or multiplicatively, if specified).
• Trend Approximation:
o Long-term movement is piecewise linear (or logistic if growth is saturated).
• Seasonality is Stationary:
o Seasonal effects repeat consistently over time (e.g., same monsoon/festival impact every year)
• Linearity of Regressors:
o Regressors affect the outcome in a linear, time-invariant way.
• No Measurement Error in Regressors:
o Assumes clean, accurate values for all inputs (copra_diff, monsoon, etc.).
• Independent, Normally Distributed Errors:
o Assumes residuals are independent and normally distributed around 0 (though Prophet is robust to
some violations)

From XGBoost

• Predictive power in residuals:


o The residuals from Prophet must still contain meaningful signal for XGBoost to model.
• Data Consistency:
o Feature distributions during training and prediction should remain stable (no data drift).
• Sufficient data:
o XGBoost needs enough data to detect patterns; overfitting is a risk in small datasets.

Threats to Internal Validity

• Omitted Variable Bias: Missing influential factors (e.g., supply chain disruptions, government trade policies)
could skew forecasts.
• Correlated Predictors : Highly correlated features can confuse the model or inflate variable importance.
• Residual Model Overfitting: XGBoost might fit noise in Prophet residuals, especially on small datasets.
• Incorrect Assumptions by Prophet: If trend or seasonality is wrongly specified, residuals may contain structure
not easily corrected by XGBoost.

Threats to External Validity

• Data Drift: If the distribution of regressors (like monsoon, global prices) changes significantly, XGBoost may fail.
• Concept Drift: If relationships between features and coconut oil prices evolve (e.g., policy change), coefficients
and patterns may no longer hold.
• Overfitting to Training Period: Particularly in small datasets, the hybrid model might capture quirks specific to
the historical period.
• Forecast Horizon Inaccuracy: If future regressor values are estimated (not known), small errors can compound.
Summary of Model Efficiency & Forecast Accuracy

Forecast vs. Actual Comparison (Mar–Jun 2025):

Hybrid Prophet + XGBoost Forecast (Next 6 Months):

Market Price - Coconut Products (https://coconutboard.in/PriceWebReport/Reports.aspx):


• Trend Alignment:
o Both predicted and actual prices show a consistent upward trend from March to June, confirming that
the model has captured the general direction well.
• Prediction Accuracy:
o Errors in March and April are modest (~5–7%), which is acceptable in price forecasting.
o May shows the highest deviation (13.4%), suggesting either:
▪ A local market anomaly, or
▪ Underestimation of mid-year price acceleration.
• June is highly accurate, with a mere 1.3% underestimation

Insights:

• Directionally accurate: The forecast correctly anticipated the rise in prices.


• Quantitatively reasonable: Prediction errors mostly within ±10%, except for May (outlier).
Model Performance on Training Data:

• MAE (Mean Absolute Error): ₹14.44


• RMSE (Root Mean Square Error): ₹21.81
• MAPE: 0.10%
• R² Score: 0.9999: Very high goodness of fit
• These metrics are exceptionally strong, suggesting that the hybrid model explains nearly all variance in the
data.

OLS Regression Summary:

• R² = 0.001, Adjusted R² = -0.088: Residuals are almost completely uncorrelated with these features.
• p-values > 0.9 for all predictors: No statistical significance.
• Residuals are white noise (random), validating the XGBoost layer's effectiveness.
Residual Diagnostics:

Time Series Residual Plot

• Residuals (Actual - Predicted) mostly hover around 0, indicating:


o Good fit across most time periods.
o One significant spike in residuals in early 2025, which may warrant further investigation (possible outlier
or regime change).

Residual Distribution

• Residuals are roughly normally distributed with a slight positive skew.


• Most residuals fall between -25 and +25, confirming low prediction error.
Breusch-Pagan Test for Heteroscedasticity:

Ljung-Box Test for Autocorrelation (lag=10):


Conclusion
Strengths:

• Excellent Fit: Very close tracking of historical prices with near-zero residuals.
• Robust Forecast: Consistent and economically plausible future trend.
• Effective Ensemble: Prophet handles temporal structure; XGBoost captures residual non-linearities.
• Minimal Overfitting: Despite high R², residuals are noise-like, and generalization appears intact.

Considerations:

• The sharp residual spike in 2025 might need domain-specific explanation (e.g., policy change, supply shock).
• External regressors like MSP, festivals, and global prices do not contribute significantly after XGBoost
correction—suggesting either their effects are already captured or were not significant during this period.

Realistic Comparison with Kangayam Market Prices:

Strengths:

• Forecast Direction Matches Market Trend: Both forecasted and actual prices rise month by month (March to
June).
• Low Error in June (1.3%): Model aligns very closely with real prices toward the later months.
• Average % Error (across 4 months): 6.2%, which is highly acceptable in volatile commodity markets.

Gap in May (13.4%):

• Could be due to regional market dynamics, transportation, local demand-supply constraints, or events not
captured in the national dataset.

Modeling framework is robust, well-calibrated, and production-ready.

Restated Hypothesis:

"Seasonal events (like Onam), government interventions (such as MSP), agro-climatic conditions (e.g., Monsoon), and
international trade variables (e.g., exports/imports) have a significant causal effect on Milling Copra prices in the
Kangayam market."

This hypothesis was tested through a rigorous combination of:

• Statistical modeling

• Machine learning and SHAP interpretation

• Time series decomposition and diagnostics


References
1. APMC Kangayam. (2024). Daily Copra Price Reports. https://market.tn.gov.in

(Official price data source for Kangayam market.)

2. Coconut Development Board (CDB). (2024). Copra and Coconut Oil Price Statistics. Ministry of Agriculture,
Government of India. https://www.coconutboard.gov.in

3. Indian Meteorological Department (IMD). (2024). Monthly Rainfall Data. https://mausam.imd.gov.in

4. Reserve Bank of India (RBI). (2024). Consumer Price Index and Inflation Reports. https://www.rbi.org.in

5. Directorate General of Foreign Trade (DGFT). (2024). Export Data for Copra and Coconut Oil.
https://www.dgft.gov.in

6. Box, G. E. P., Jenkins, G. M., Reinsel, G. C., & Ljung, G. M. (2015). Time Series Analysis: Forecasting and Control
(5th ed.). John Wiley & Sons.

7. Hyndman, R. J., & Athanasopoulos, G. (2021). Forecasting: Principles and Practice (3rd ed.). OTexts.

https://otexts.com/fpp3/

8. Chatfield, C. (2016). The Analysis of Time Series: An Introduction (7th ed.). CRC Press.

9. Mroz, T. A. (1987). The Sensitivity of an Empirical Model of Married Women's Hours of Work to Economic and
Statistical Assumptions. Econometrica, 55(4), 765–799.

10. Kumar, D., & Shinoj, P. (2016). Price Volatility in Coconut Markets of India: An Econometric Analysis. Indian
Journal of Agricultural Economics, 71(3), 370–383.

11. Narayan, P. K., & Smyth, R. (2009). Modeling the Determinants of Commodity Prices: A Review and Empirical
Application to World Market Prices of Copra. Journal of Applied Economics, 12(1), 27–46.

12. Brockwell, P. J., & Davis, R. A. (2016). Introduction to Time Series and Forecasting (3rd ed.). Springer.

13. Montgomery, D. C., Jennings, C. L., & Kulahci, M. (2015). Introduction to Time Series Analysis and Forecasting
(2nd ed.). Wiley.

You might also like