TSF Forecast Pipeline Methodology

Preamble: The Temporal Firewall Principle

Every component of the TSF forecast pipeline observes a strict temporal separation: the inputs to any forecast for date t are derived exclusively from data that existed before date t. No future prices, no current-season accuracy metrics, and no contemporaneous actuals ever enter the computation of a forecast or its associated confidence interval. This document traces the exact data provenance at each stage to make that separation explicit.

1. Base Forecasts

What this step produces

Three time-series price forecasts (SES, Holt’s Damped Trend, ARIMA) for each trading day, generated from a rolling training window that ends strictly before the forecast date.

Backward-looking inputs (all that exist)

Training window: For each target week, the system defines a 90-calendar-day lookback window ending the day before the target week begins. Only closing prices within this historical window are used. For a forecast targeting the week of March 10, the training window spans December 10 through March 9 — no price from March 10 or later is visible to the model.

Monthly and quarterly mean values (MMV, QMV): Computed as the mean closing price within each completed calendar month and quarter. These are grouped by period and assigned to all days within that period. They are backward-looking aggregates of realized prices.

What is explicitly NOT available to this step

The actual closing price on the forecast date. No price from the target week or any future date enters the training window. The 90-day window is defined to end one day before the target week starts.

How the computation works

The closing prices within the 90-day lookback window are z-score normalized (subtract mean, divide by standard deviation). All three models are fitted on this standardized historical series:

SES (Simple Exponential Smoothing) fits a smoothing model with automatically optimized parameters and no trend component. It produces a forecast that converges to a weighted average of recent historical prices. On failure, it falls back to the mean of the last ten historical observations.

HWES (Holt’s Damped Trend) adds a linear trend component that decays toward zero at longer horizons, preventing the forecast from extrapolating historical trends indefinitely. Parameters are estimated via maximum likelihood on the historical training data.

ARIMA first runs a seasonality detection routine on the historical training data, testing autocorrelation at lags of 7, 30, and 365 days. If meaningful periodicity is detected, a seasonal ARIMA is fitted; otherwise, a non-seasonal specification is used. The auto_arima function conducts a stepwise search over model orders (constrained to max_p=2, max_q=2, max_P=1, max_Q=1) using corrected AIC on the historical training data as the selection criterion.

All three forecasts are inverse-transformed back to the original price scale and clipped to a plausibility range (Q1 − 10×IQR to Q3 + 10×IQR of the historical training data).

For live forecasts targeting future dates, the same procedure is applied using the most recent 90 days of known historical prices as the training window.

2. Seasonal Relatives

What this step produces

Multiplicative seasonal adjustment factors (FMSR values) that modulate the base forecasts, derived entirely from prior-cycle seasonal behavior.

Backward-looking inputs (all that exist)

Seasonal lens matrices: Each of the ~30 proprietary lens matrices maps every calendar date to a season label and provides keys pointing to the same seasonal position one, two, and three full cycles back. These are fixed structural definitions, not derived from the ticker’s price data.

Historical closing prices: The actual realized closing prices for all dates in the ticker’s history, used to compute the Seasonal Mean Value (SMV) within each completed historical season.

Monthly mean values (MMV): The backward-looking monthly price averages computed in Stage 1, used as the denominator in the seasonal relative ratio.

Prior-cycle MSR values: The seasonal relatives from one, two, and three complete cycles ago. These are historical ratios computed from historical prices in historical seasons. For example, if the current season maps to “Q2-2025 in Lens X,” the prior-cycle keys retrieve the MSR values from Q2-2024, Q2-2023, and Q2-2022 — all fully completed seasons with known prices.

What is explicitly NOT available to this step

The current season’s MSR. The seasonal adjustment factors are built exclusively from prior-cycle data. The system never uses the season being forecasted to compute its own adjustment factor. The FMSR for the current season is assembled from the MSR values of the same seasonal position in previous cycles only.

How the computation works

The Seasonal Mean Value (SMV) is the average closing price within each historical season. The Monthly Seasonal Relative (MSR) is the ratio SMV / MMV — how much the historical seasonal average deviates from the historical monthly baseline. An MSR of 1.05 means prices in that historical season averaged 5% above their monthly mean.

The system retrieves the MSR from one cycle ago (p1_msr), two cycles ago (p2_msr), and three cycles ago (p3_msr) using the lens matrix’s prior-season keys. From these three backward-looking values, six FMSR series are constructed:

A0: No seasonal adjustment (FMSR = 1.0). The pure base forecast, serving as the experimental control.

A1: Uses only the most recent prior cycle’s MSR. One backward-looking input.

A2: Simple average of the two most recent prior cycles’ MSRs. Two backward-looking inputs, equal-weighted.

A2W: Recency-weighted: 75% weight on one cycle ago, 25% on two cycles ago. Two backward-looking inputs, recency-favored.

A3: Simple average of the three most recent prior cycles’ MSRs. Three backward-looking inputs, equal-weighted.

A3W: Recency-weighted: 50% on one cycle ago, 30% on two, 20% on three. Three backward-looking inputs, recency-favored.

Every FMSR value is derived entirely from completed prior-cycle seasonal behavior. The current season contributes nothing to its own adjustment factor.

3. Generate Forecasts and Accuracy Metrics

What this step produces

The full candidate forecast matrix (~540 forecasts per ticker per date) and point-level / season-level accuracy measurements computed by comparing historical forecasts against historical actuals.

Backward-looking inputs (all that exist)

Base forecasts from Stage 1: Generated from the 90-day historical lookback window (backward-looking).

FMSR values from Stage 2: Derived from prior-cycle MSR values (backward-looking).

Historical actual closing prices: Used after the fact to measure how accurate the historical forecasts were. These are the realized prices that the forecasts were attempting to predict. They are used only for accuracy measurement, never as inputs to the forecasts themselves.

The critical distinction: forecasted vs. actual

This step is where the forecasted values and the historical actuals first meet, but strictly for the purpose of measuring accuracy. The forecast value FV for each historical date was computed in isolation from that date’s actual close. The actual close is then compared to the FV to produce error metrics. This is a retrospective accuracy assessment: “how close was the forecast to what actually happened?” The actual close never feeds back into the forecast that predicted it.

How the computation works

For each combination of base model (ARIMA, SES, HWES) × FMSR series (A0, A1, A2, A2W, A3, A3W) × seasonal lens (~30 matrices), the forecast value is computed as:

FV = base_forecast × FMSR_value

Both inputs are backward-looking: the base forecast was trained on historical prices preceding the forecast date, and the FMSR value was derived from prior-cycle seasonal relatives. The FV exists independently of the actual close on that date.

Accuracy metrics are then computed by comparing FV against the historical actual close:

FV Error: |actual_close − FV| — absolute error in price units.

FV MAE: Same as FV Error at the point level (aggregated differently at season level).

sMAPE: 2 × |actual_close − FV| / (|actual_close| + |FV|) — symmetric percentage error bounded between 0 and 2.

These point-level metrics are aggregated within each completed historical season to produce season-level sMAPE (mean of daily values), season-level MAE, and season-level RMSE (root of mean squared daily errors). These season-level accuracy scores describe how the model performed across each historical season and become the inputs to the next stage.

For a live forecast targeting a future date, no actual close exists yet. The FV is computed from the same backward-looking inputs (base forecast from the last 90 days of known prices, FMSR from prior-cycle seasonals), and the accuracy metrics from completed historical seasons are carried forward to inform the confidence bands.

4. Historical Patterns of Accuracy

What this step produces

Running track records for each forecast candidate: cumulative accuracy statistics computed from all prior completed seasons, with the current season explicitly excluded.

Backward-looking inputs (all that exist)

Season-level accuracy metrics from Stage 3: The sMAPE, MAE, and RMSE scores from each completed historical season. These are the results of comparing historical forecasts (backward-looking) against historical actuals (realized) for seasons that have already ended.

What is explicitly NOT available to this step

The current season’s accuracy metrics. The rolling means are computed by accumulating all prior seasons’ scores and then subtracting the current season’s contribution from both the numerator (sum) and denominator (count). This is implemented in the code as: prior_sum = cumulative_sum − current_season_value; prior_count = cumulative_count − 1; rolling_mean = prior_sum / prior_count. The current season cannot inflate or deflate its own accuracy estimate.

How the computation works

Rolling mean sMAPE, MAE, and RMSE: For each model/lens/FMSR group, the system sorts all completed seasons chronologically and computes an expanding cumulative mean of each accuracy metric, always excluding the current season. The rolling mean entering season N is the average of season-level scores from seasons 1 through N−1. This is a walk-forward expanding window — the track record grows as each season completes, and it never includes the season being evaluated.

Season count (fv_mean_smape_c, etc.): The number of prior completed seasons that contributed to the rolling mean. This is stored alongside every rolling mean as a reliability indicator. A rolling mean based on 20 prior seasons carries more weight than one based on 3.

Accuracy comparison flags: For each seasonally-adjusted forecast (A1 through A3W), the system tests whether it outperforms its corresponding A0 control on the same base model and lens. A ‘Y’ flag requires both conditions to hold simultaneously: (a) the current season’s sMAPE is lower than A0’s current-season sMAPE, and (b) the rolling mean sMAPE from all prior seasons is lower than A0’s rolling mean sMAPE from all prior seasons. Both the current-season metric and the historical track record must favor the adjusted model.

Lagged accuracy counts (best_smape_count, etc.): A running tally of how many prior completed seasons each model has earned a ‘Y’ on the accuracy comparison. These counts exclude the current season — they answer the question “in how many historical seasons has this model’s seasonal adjustment beaten the unadjusted baseline?” A model entering season 20 with best_smape_count = 15 has beaten A0 in 15 of its 19 prior completed seasons.

5. Confidence Bands

What this step produces

Calibrated prediction intervals (CI85, CI90, CI95) whose widths are determined entirely by the model’s own demonstrated historical accuracy, with coverage rates computed from all prior completed seasons only.

Backward-looking inputs (all that exist)

Forecast value (FV): The seasonally adjusted forecast from Stage 3, computed from backward-looking base forecasts and prior-cycle FMSR values.

Rolling mean sMAPE (fv_mean_smape): The cumulative average of season-level sMAPE from all prior completed seasons, excluding the current season, computed in Stage 4. This is the model’s historical accuracy track record.

Historical actual closing prices: Used to test whether historical forecasts’ bands captured the realized price. These are the same actuals used in Stage 3’s accuracy measurement. They are used here solely to compute empirical hit rates for band calibration.

What is explicitly NOT available to this step

The actual closing price on the date being forecasted (for live forecasts, it does not exist yet). The current season’s band hit rates (excluded from coverage computation by the same prior-season-only methodology used for rolling means). No new data enters this step — the bands and their calibration are built entirely from the model’s historical forecast-vs-actual track record.

How the computation works

Step 5a: Band construction.

The system generates ten concentric bands around each forecast value using multipliers [1.25, 1.50, 1.75, 2.00, 2.25, 2.50, 2.75, 3.00, 3.25, 3.50]. For each multiplier k:

Upper = FV + (FV × k × rolling_mean_sMAPE)

Lower = FV − (FV × k × rolling_mean_sMAPE)

Both FV and rolling_mean_sMAPE are backward-looking. The lower bound is floored at zero. A model with superior historical accuracy (lower rolling_mean_sMAPE) produces tighter bands — the band width is directly proportional to the model’s own demonstrated imprecision, calibrated from its prior-season track record.

Step 5b: Historical hit testing.

For each historical date where the actual close is known, the system checks whether that actual close falls within each of the ten bands. This produces a Y/N hit flag per band per historical date. This is a retrospective test: “when this model forecasted in the past, did the band capture the actual price?” The forecast was computed before the actual was known; the hit test is applied after the fact.

Step 5c: Coverage rate computation.

Hit flags are averaged within each completed historical season to produce a season-level hit rate per band (e.g., “in this historical season, the 2.00× band captured the actual close on 88% of trading days”). These season-level hit rates are then accumulated across all prior completed seasons using the same prior-only methodology: the cumulative sum and count are computed, then the current season’s contribution is subtracted. The resulting coverage rate entering any season reflects only the band’s performance in all seasons that came before it.

Step 5d: CI band selection.

From the ten backward-looking coverage rates, the system selects three confidence intervals:

CI85: The narrowest band whose prior-season coverage rate is ≥ 85%.

CI90: The narrowest band wider than CI85 whose prior-season coverage rate is ≥ 90%.

CI95: The narrowest band wider than CI90 whose prior-season coverage rate is ≥ 95%.

If no band achieves the target coverage, defaults are assigned to the widest available bands. The selected bands’ upper and lower bounds become the final CI values (ci85_low/ci85_high, etc.). These intervals are empirically calibrated: their widths are not assumed or imposed by a distributional model but are earned by the model’s demonstrated ability to capture the actual close within each band across its entire prior history.

Step 5e: Variance measurement.

For historical dates where the actual close falls outside the CI85 band, the system measures how far outside it landed, expressed as a percentage of the band boundary. This is averaged within each season to produce fv_variance_mean — a measure of tail behavior describing how bad the misses are when the model is wrong. This metric uses the historical actual and the historical CI85 boundaries, both of which are backward-looking.

6. Final Forecast Selection

What this step produces

A single best forecast per ticker per date, selected from the ~540 candidates based on demonstrated historical accuracy.

Backward-looking inputs (all that exist)

Rolling mean sMAPE (fv_mean_smape): The cumulative prior-season accuracy track record from Stage 4. This is the primary selection criterion.

Season count (fv_mean_smape_c): The number of prior completed seasons that contributed to the rolling mean. Used as a tiebreaker and minimum threshold.

What is explicitly NOT available to this step

The actual close on the forecast date. The current season’s accuracy metrics. Any information about whether this particular forecast will turn out to be correct. Selection is based entirely on the model’s historical track record — how well it has performed across all prior completed seasons.

How the computation works

Filtering: Candidates must have at least five prior completed seasons of accuracy data (fv_mean_smape_c ≥ 5). This threshold ensures no forecast surfaces based on a thin historical track record. Forecasts with null or zero FV values are excluded, and only dates from 2020 onward are retained for the output tables.

Selection: Among eligible candidates for each ticker-date, the system selects the forecast with the lowest rolling mean sMAPE — the model/lens/FMSR combination that has demonstrated the best cumulative accuracy across the most prior seasons. Ties are broken by season count (higher count preferred). This is a pure backward-looking accuracy-first selection: the model with the best historical track record wins, regardless of what it is currently forecasting.

Output: The selected forecasts are written to seven tables. The primary table (swing_all_forecasts_wk) receives the single best forecast across all base models, all FMSR series, and all lenses. Six additional tables receive the best ARIMA-based forecast for each FMSR series independently, enabling downstream analysis of how each level of seasonal adjustment performs relative to the A0 control.

Each output row carries the complete provenance: which lens produced it, which base model, which FMSR series, the point forecast, all error metrics, all ten band boundaries and hit flags, the three calibrated CI bands, the comparison flags against A0, the cumulative accuracy counts, and the variance measure. Every number in this provenance trail traces back through the stages above to backward-looking historical data.

Summary: Data Flow and Temporal Separation

The complete data flow for a live forecast on a future date is:

Stage 1 (Base Forecasts): Trained on the most recent 90 days of known historical closing prices. No future data.

Stage 2 (Seasonal Relatives): FMSR values derived from prior-cycle seasonal behavior (1, 2, or 3 completed cycles back). No current-season data.

Stage 3 (Forecast Generation): FV = base_forecast × FMSR. Both inputs are backward-looking. Accuracy metrics computed retrospectively against historical actuals for completed seasons only.

Stage 4 (Historical Accuracy): Rolling means and comparison counts accumulated from all prior completed seasons, with the current season explicitly excluded from every computation.

Stage 5 (Confidence Bands): Band widths proportional to prior-season rolling mean sMAPE. Coverage rates computed from prior-season hit testing only. CI band selection based on prior-season coverage rates only. No new data enters.

Stage 6 (Selection): Best model chosen by lowest prior-season rolling mean sMAPE, with a minimum five-season track record requirement.

At no point in this chain does the system use the actual closing price on the date being forecasted, the current season’s accuracy metrics, or any data that postdates the forecast. The temporal firewall is maintained from the first model fit through the final selection.