How accurate is EnergyMap's LightGBM demand model?
We run a per-state LightGBM nowcast for the 11 Indian states with mature realtime SLDC feeds. This is the verified per-state MAPE, R², and bias against actuals — including the states where the model isn't yet production-ready.
What this model is for #
27 of 36 Indian states and UTs do not publish realtime electricity demand to a stable public API. For grid-wide nowcasting and analytics, we need a uniform 15-minute demand series across every state — even where there is no live SLDC feed. We model demand for those states.
For the 11 states that do publish a usable realtime feed (Andhra Pradesh, Delhi, Gujarat, Himachal Pradesh, Karnataka, Kerala, Punjab, Rajasthan, Tamil Nadu, Telangana, West Bengal), we still maintain a parallel modelled curve. That sounds redundant — until you join across states. Each SLDC publishes at a different cadence, drops samples, and has its own timezone quirks. The modelled layer guarantees a gap-free 96-block grid every day, every state. The actuals stay the canonical source of truth where they exist; the model is the safety net.
The current production model is a per-state LightGBM gradient boosting regressor with the source label modeled_ml_v1. There is also a 7-day-ahead recursive forecast variant called modeled_ml_forecast_v1 using the same boosters. Code lives in scripts/train_demand_model.py (training, weekly cron) and scripts/run_demand_forecast.py (inference, every 15 minutes).
What the LightGBM trains on #
One booster per state. Same feature set across states, but each state learns its own response.
- hour, day_of_week, day_of_year, month
- is_weekend, is_holiday (India national + per-state regional via the holidays Python lib — AP gets the regional Telugu calendar, Punjab gets Sikh festivals)
- Cyclic encodings: hour_sin/cos, dow_sin/cos, doy_sin/cos so the gradient-boosted tree can split on smooth wrap-around features
Pure structural features the model gets for free.
- all_india_demand_mw (the NPP MERIT national curve at the same timestamp)
- demand_lag_24h (own-state demand 24 h ago)
- demand_lag_7d (own-state demand 7 days ago)
all_india_demand_mw is empirically the highest-importance feature — it captures 60–80% of the variance before any state-specific signal is added.
- temperature_c at the state capital
- humidity_pct at the state capital
- precipitation_mm at the state capital
Joined via merge_asof with a 30-min tolerance. Captures the AC-load spike that calendar + national demand alone cannot.
Targets are state demand_mw rows from the official SLDC scrape. Train/test split is rolling-time (no leakage). Validation is per-state with seasonal cross-validation on month boundaries.
Headline numbers #
We pair every 15-min model prediction with the closest actual SLDC reading within ±5 minutes, drop predictions without a paired actual, then aggregate. Last 30 days, all hours.
| State | Samples | MAPE | RMSE | Bias | R² | Typical demand |
|---|---|---|---|---|---|---|
| Andhra Pradesh | 568 | 2.51% | 374 MW | -94 MW | 0.907 | 10.8 GW |
| Gujarat | 942 | 3.77% | 1001 MW | -444 MW | 0.836 | 19.6 GW |
| Delhi | 509 | 5.79% | 371 MW | -278 MW | 0.866 | 5.2 GW |
| Punjab | 828 | 6.89% | 581 MW | +7 MW | 0.782 | 6.5 GW |
Four states cluster in the production-quality range — single-digit MAPE, R² above 0.78, and bias that's small relative to absolute load. Andhra Pradesh at 2.5% MAPE on a 10.8 GW typical load with R² = 0.91 is the best-performing state. Punjab is the most calibrated, with a mean bias of just +7 MW on a ~6.5 GW load — the model is essentially unbiased on average.
Andhra Pradesh deep dive — the 2.5% MAPE state #
A single day, three layers — actual SLDC reading, the LightGBM nowcast for the same instant, and the LightGBM's 7-day-ahead forecast issued before the day started:
Daily MAPE for AP, last 21 days — this is the stability story we care about. A model that's 2% one day and 12% the next isn't useful no matter what its mean is.
Predicted vs actual scatter, coloured by hour-of-day — shows where the model tracks the diagonal and where it diverges:
Where it misses #
MAPE broken out by hour of day for AP — this answers “when should I trust the model and when should I cross-check?”:
Residual distribution — predicted minus actual:
States not yet production-ready #
Two of the six states where the booster is currently emitting rows are flagged as not production-ready. We're keeping them visible — the data is there if you want to inspect it — but we don't advertise them as a shipped product. The other five OFFICIAL_FEED_STATES (Karnataka, Rajasthan, Tamil Nadu, Telangana, West Bengal) have boosters trained but not yet deployed to production inference.
| State | Samples | MAPE | Why we hold it back |
|---|---|---|---|
| Himachal Pradesh | 965 | 220.8% | Calibration error — small absolute load amplifies any over-prediction. Booster needs per-state target normalisation. |
| Kerala | 11 | 24.2% | Insufficient paired observations — upstream feed publishes daily-summary, not 15-min |
Himachal Pradesh in particular is interesting: the booster trained, the inference pipeline runs, but the absolute load is so small (~310 MW peak) that any over-prediction looks catastrophic in percentage terms. The fix is per-state target normalisation, queued as part of the next training run.
Reproduce these numbers #
You can reproduce every number on this page two ways: pull our metrics snapshot directly, or hit the live API and re-run the comparison yourself. The CodeSamples below copy-paste cleanly into a fresh terminal — set API_KEY first.
# requires: pip install requests
import os, requests
r = requests.get(
"https://www.energymap.in/blog/lightgbm/metrics.json",
headers={"User-Agent": "energymap-readme"},
timeout=10,
)
r.raise_for_status()
data = r.json()
print(f"window_days={data['window_days']}, hero_state={data['hero_state']}")
for s in data["states"]:
if s["mape_pct"] is not None:
print(f"{s['state']:<18} MAPE={s['mape_pct']:>5.2f}% R²={s['r2']:.3f}")# requires: pip install requests
import os, requests
API_KEY = os.environ["API_KEY"]
r = requests.get(
"https://api.energymap.in/api/intelligence/state-demand",
params={"state": "andhra-pradesh", "source": "modeled_ml_v1"},
headers={"Authorization": f"Bearer {API_KEY}"},
timeout=15,
)
r.raise_for_status()
print(r.json()["count"], "blocks returned")# requires: pip install requests
import os, requests
API_KEY = os.environ["API_KEY"]
r = requests.get(
"https://api.energymap.in/api/intelligence/state-demand-forecast",
params={"state": "andhra-pradesh", "horizon_days": 7},
headers={"Authorization": f"Bearer {API_KEY}"},
timeout=15,
)
r.raise_for_status()
print("first block:", r.json()["forecast"][0])Or grab the raw artefacts that back this post — the per-state metrics, the JSON variant, and the Cloudinary plot manifest:
lightgbm-demand-metrics.csvCSV~2 KB11 states × 8 metricslightgbm-demand-metrics.jsonJSON~6 KBlightgbm-cloudinary-manifest.jsonJSON~3 KB6 plotsWant to re-run the full evaluation script against the database? It is scripts/eval_demand_models.py in the atlas backend repo:
pip install psycopg[binary] sqlalchemy matplotlib pandas numpy
export DATABASE_URL='postgresql://USER:PASS@HOST:25060/grid?sslmode=require'
python scripts/eval_demand_models.py --out eval-output/ --days 30Limitations & open work #
- Historical depth. Most clean SLDC data starts September 2025. The booster has seen one summer and one winter. Expect material seasonal-bias correction once we have a full annual cycle plus a year of holdout for validation.
- No real-time signal in the model.
demand_lag_24handdemand_lag_7dare observations from the past. Anything unusual today (heatwave, regional outage, tariff change) only enters via weather and the calendar — not directly. The model under-predicts on high-stress days and we have an open ticket to add a 1-hour lag with proper feature freshness handling. - Hourly weather resolution. Open-Meteo gives 1-hour temperature; AP morning ramp can swing 500+ MW within a single hour. We're effectively low-pass filtering temperature. A 15-min weather feed would close most of the daytime MAPE gap.
- No cross-state spillover signal. Each state's model knows All-India demand and its own lags, but not what's happening live in neighbouring states. Pooled multi-state models on the next iteration.
- 5 of 11 trained states not yet emitting predictions. Karnataka, Rajasthan, Tamil Nadu, Telangana, and West Bengal have trained boosters but the inference cron is not yet writing rows for them. Tracked internally; expect them in production by the end of the next sprint.
Computed from 3,823 matched 15-minute samples over the last 30 days. Snapshot generated 3 May 2026. Code: scripts/eval_demand_models.py.