Golden Bracket: A Multi-Stage Ensemble Approach to NCAA Tournament Prediction

Golden Bracket Research Team

March 2026


Abstract

We present Golden Bracket, a multi-stage ensemble system for predicting NCAA Division I Men's Basketball Tournament outcomes. The system integrates efficiency-based team ratings, pairwise comparison via the Log5 model, contextual adjustments for geography, player health, and matchup dynamics, historical seed-line calibration, and prediction market signals into a unified probabilistic framework. Tournament-level forecasts are generated through Monte Carlo simulation with Bayesian in-simulation learning. We describe each algorithmic component with full mathematical formulation, discuss design decisions grounded in basketball analytics literature, and outline the system architecture that enables real-time prediction serving. The approach achieves principled uncertainty quantification by blending model-derived probabilities with efficient-market signals, while maintaining graceful degradation when data sources are unavailable.


1. Introduction

1.1 Problem Statement

The NCAA Division I Men's Basketball Tournament — colloquially known as "March Madness" — is a 68-team single-elimination tournament that has captivated American sports culture since 1939. The tournament's single-elimination format, combined with the inherent variance in college basketball, makes it one of the most challenging prediction problems in sports analytics. Warren Buffett's famous billion-dollar bracket challenge underscores the near-impossibility of a perfect bracket: the naive probability of correctly predicting all 63 games in the main draw is approximately 11 in 9.2×10189.2 \times 10^{18}.

The prediction task can be decomposed into two levels: (1) pairwise matchup probability estimation — given two teams, what is the probability that team AA defeats team BB? — and (2) tournament simulation — given a bracket of 64 teams, what is the probability distribution over all possible outcomes?

1.2 Related Work

Several prominent systems address this problem:

1.3 Contributions

Golden Bracket makes the following contributions:

  1. A ten-factor composite rating with z-score normalization that addresses multicollinearity between efficiency metrics (Section 3).
  2. Contextual weight adjustment that amplifies intangible factors in close matchups where the talent gap is small (Section 3.4).
  3. A player health proxy using game-leader absence detection as a heuristic for injury impact (Section 5.2).
  4. A Bayesian in-simulation learning mechanism that updates team ratings within each Monte Carlo iteration based on the information revealed by simulated tournament results (Section 7.2).
  5. Market ensemble blending with graceful degradation, combining model-derived probabilities with prediction market signals (Section 6).

2. Data Sources and Feature Engineering

2.1 Primary Data Sources

Golden Bracket ingests data from three complementary sources:

SourceData TypeUpdate FrequencyCache TTL
CollegeBasketballData (CBBD)Adjusted efficiency metrics, Four FactorsDaily6 hours
ESPN APIReal-time scores, schedules, game leaders, box scoresPer-game5 minutes
PolymarketChampionship odds, matchup-level prediction marketsContinuous30 minutes

2.2 Dean Oliver's Four Factors

The system builds on Dean Oliver's "Four Factors of Basketball Success" (Oliver, 2004), which decompose team performance into four orthogonal dimensions. For each factor, we compute both offensive and defensive variants:

Effective Field Goal Percentage (eFG%):

eFG%=FGM+0.5×3PMFGA\text{eFG\%} = \frac{\text{FGM} + 0.5 \times \text{3PM}}{\text{FGA}}

This weights three-point field goals at 1.5x to reflect their higher point value per attempt.

Turnover Rate (TOV%):

TOV%=TOVFGA+0.44×FTA+TOV\text{TOV\%} = \frac{\text{TOV}}{\text{FGA} + 0.44 \times \text{FTA} + \text{TOV}}

The denominator approximates possessions. The coefficient 0.440.44 on free throw attempts accounts for and-one plays, technical free throws, and three-shot fouls, which do not consume a full possession.

Offensive Rebounding Percentage (ORB%):

ORB%=ORBORB+Opponent DRB\text{ORB\%} = \frac{\text{ORB}}{\text{ORB} + \text{Opponent DRB}}

Free Throw Rate (FTR):

FTR=FTAFGA\text{FTR} = \frac{\text{FTA}}{\text{FGA}}

This measures a team's ability to get to the free throw line relative to its field goal attempts.

2.3 Feature Extraction Pipeline

Raw statistics undergo a sanitization pipeline before entering the model:

  1. Null/NaN coercion: All numeric fields default to 00 when missing.
  2. Derived field recovery: When AdjEM\text{AdjEM} is null but AdjOE\text{AdjOE} and AdjDE\text{AdjDE} are available, we compute AdjEM=AdjOEAdjDE\text{AdjEM} = \text{AdjOE} - \text{AdjDE}.
  3. Cross-source reconciliation: Team identifiers are mapped across ESPN, CBBD, and market data using a canonical ID registry.
  4. Caching: Tiered cache TTLs (6h/30min/5min) balance freshness against API rate limits.

3. Composite Team Rating Model

3.1 Ten-Factor Weighted Z-Score Formulation

Each team is assigned a composite rating RR computed as a weighted sum of z-score-normalized factors across K=10K = 10 dimensions. Let fi=(fi,1,,fi,K)\mathbf{f}_i = (f_{i,1}, \ldots, f_{i,K}) denote the raw factor vector for team ii and let T\mathcal{T} be the set of all tournament teams. The composite rating is:

Ri=k=1Kwkzi,kR_i = \sum_{k=1}^{K} w_k \cdot z_{i,k}

where the z-score for factor kk is:

zi,k=fi,kμkσkz_{i,k} = \frac{f_{i,k} - \mu_k}{\sigma_k}

with population statistics computed over the tournament field:

μk=1TjTfj,k,σk=1TjT(fj,kμk)2\mu_k = \frac{1}{|\mathcal{T}|} \sum_{j \in \mathcal{T}} f_{j,k}, \qquad \sigma_k = \sqrt{\frac{1}{|\mathcal{T}|} \sum_{j \in \mathcal{T}} (f_{j,k} - \mu_k)^2}

The composite z-score is then mapped to a [0,100][0, 100] scale via:

Riscaled=clamp(Ri,3,3)+36×100R_i^{\text{scaled}} = \frac{\text{clamp}(R_i, -3, 3) + 3}{6} \times 100

3.2 Weight Vector

The weight vector w\mathbf{w} was derived through iterative analysis addressing multicollinearity between efficiency metrics:

Factor kkSymbolWeight wkw_kDescription
1adjEM\text{adjEM}0.32Adjusted efficiency margin (primary quality signal)
2oEFG\text{oEFG}0.05Offensive effective FG%
3dEFG\text{dEFG}0.05Defensive effective FG% (negated: lower is better)
4tov\text{tov}0.07Turnover differential (oTOV+dTOV-\text{oTOV} + \text{dTOV})
5orb\text{orb}0.06Offensive rebounding differential (oORBdORB\text{oORB} - \text{dORB})
6ftr\text{ftr}0.05Free throw rate differential (oFTRdFTR\text{oFTR} - \text{dFTR})
7sos\text{sos}0.10Strength of schedule
8recentForm\text{recentForm}0.10Recency-weighted performance (Section 3.5.3)
9coaching\text{coaching}0.10Coaching tournament pedigree (Section 3.5.1)
10variance\text{variance}0.10Performance consistency (Section 3.5.2)

Multicollinearity analysis. Adjusted efficiency margin (AdjEM) is, by construction, highly correlated with the Four Factors (eFG%, TOV%, ORB%, FTR). Naively weighting all factors equally double-counts shared variance. Our approach assigns AdjEM a dominant weight (w1=0.32w_1 = 0.32) as the primary quality signal while reducing oEFG and dEFG from an initial 0.090.09 each to 0.050.05, preserving their marginal discriminative power without redundancy. The freed weight budget (0.080.08) is redistributed to the intangible factors (coaching, variance) that capture orthogonal information.

Constraint: k=1Kwk=1.0\sum_{k=1}^{K} w_k = 1.0.

3.3 Z-Score Normalization Across Tournament Population

Z-score normalization is performed over the full tournament population T\mathcal{T} (typically 68 teams) rather than the entire D-I population (360\approx 360 teams). This design choice reflects the fact that tournament teams constitute a truncated distribution — comparing a 1-seed's efficiency margin to the D-I average would compress the meaningful range. By normalizing within the tournament field, we maximize the discriminative power of each factor among the teams that actually compete.

3.4 Contextual Weight Adjustment

In closely seeded matchups where the talent gap is small, intangible factors (coaching experience, recent momentum, consistency) become relatively more decisive. We model this with a smooth boost function:

β(Δs)=max ⁣(0,  1Δs8)×0.5\beta(\Delta s) = \max\!\left(0,\; 1 - \frac{|\Delta s|}{8}\right) \times 0.5

where Δs=sAsB\Delta s = s_A - s_B is the seed differential. For intangible factor indices k{coaching,recentForm,variance}k \in \{\text{coaching}, \text{recentForm}, \text{variance}\}:

wk=wk(1+β(Δs))w_k^{*} = w_k \cdot (1 + \beta(\Delta s))

All weights are then renormalized:

w^k=wkj=1Kwj\hat{w}_k = \frac{w_k^{*}}{\sum_{j=1}^{K} w_j^{*}}

This yields a maximum 50% boost to intangible weights for equal seeds (e.g., 8 vs. 9), tapering linearly to zero for seed differentials 8\geq 8.

3.5 Sub-Models

3.5.1 Coaching Pedigree Score

The coaching score CC quantifies a coach's historical tournament success:

C=CFF+Cchamp+Capps+CpenaltyC = C_{\text{FF}} + C_{\text{champ}} + C_{\text{apps}} + C_{\text{penalty}}

where:

CFF={3.0if FF appearances42.0if FF appearances21.0if FF appearances10otherwiseC_{\text{FF}} = \begin{cases} 3.0 & \text{if FF appearances} \geq 4 \\ 2.0 & \text{if FF appearances} \geq 2 \\ 1.0 & \text{if FF appearances} \geq 1 \\ 0 & \text{otherwise} \end{cases} Capps={1.5if appearances151.0if appearances80.5if appearances30otherwiseC_{\text{apps}} = \begin{cases} 1.5 & \text{if appearances} \geq 15 \\ 1.0 & \text{if appearances} \geq 8 \\ 0.5 & \text{if appearances} \geq 3 \\ 0 & \text{otherwise} \end{cases}

The raw coaching score (range approximately [1,7][-1, 7]) is z-score normalized against the tournament population alongside all other factors.

3.5.2 Performance Variance

Performance consistency is measured as the negative standard deviation of game-by-game scoring margins:

Vi=1ng=1n(mgmˉ)2V_i = -\sqrt{\frac{1}{n}\sum_{g=1}^{n}(m_g - \bar{m})^2}

where mg=scoregoppScoregm_g = \text{score}_g - \text{oppScore}_g is the margin for game gg and mˉ\bar{m} is the mean margin. Negation ensures that lower variance (more consistent teams) yields a higher factor value after z-score normalization. A minimum of 3 games is required; teams with insufficient data receive Vi=0V_i = 0.

3.5.3 Recency-Weighted Form

Recent performance is weighted using an exponential decay kernel with a 30-day half-life:

Fi=g=1nm~geλdgg=1neλdgF_i = \frac{\sum_{g=1}^{n} \tilde{m}_g \cdot e^{-\lambda \cdot d_g}}{\sum_{g=1}^{n} e^{-\lambda \cdot d_g}}

where:


4. Pairwise Matchup Probability

4.1 Log5 Formula

Golden Bracket uses the Log5 method (James, 1981; Miller, 2006) to convert team ratings into pairwise win probabilities. Given team AA's latent strength pAp_A (expressed as a probability of beating an average team) and team BB's strength pBp_B, the probability that AA defeats BB is:

P(A>B)=pA(1pB)pA(1pB)+pB(1pA)P(A > B) = \frac{p_A(1 - p_B)}{p_A(1 - p_B) + p_B(1 - p_A)}

This is equivalent to the Bradley-Terry model (Bradley and Terry, 1952) under the transformation πi=pi/(1pi)\pi_i = p_i / (1 - p_i):

P(A>B)=πAπA+πBP(A > B) = \frac{\pi_A}{\pi_A + \pi_B}

Implementation detail: Input probabilities are clamped to [0.01,0.99][0.01, 0.99] to prevent division by zero and degenerate outputs:

pA=clamp(pA,0.01,0.99),pB=clamp(pB,0.01,0.99)p_A' = \text{clamp}(p_A, 0.01, 0.99), \qquad p_B' = \text{clamp}(p_B, 0.01, 0.99)

4.2 Mathematical Properties

The Log5 formula satisfies several desirable properties:

  1. Symmetry: P(A>B)+P(B>A)=1P(A > B) + P(B > A) = 1
  2. Monotonicity: P(A>B)P(A > B) is strictly increasing in pAp_A and strictly decreasing in pBp_B
  3. Identity: When pA=pBp_A = p_B, P(A>B)=0.5P(A > B) = 0.5
  4. Non-transitivity: The model does not guarantee transitivity — P(A>B)>0.5P(A > B) > 0.5 and P(B>C)>0.5P(B > C) > 0.5 does not imply P(A>C)>0.5P(A > C) > 0.5 when matchup adjustments are applied

4.3 Post-Hoc Matchup Adjustments

After computing the base Log5 probability, we apply additive adjustments for four matchup-specific factors:

Tempo Mismatch

When one team plays at an extremely fast pace and the opponent plays extremely slowly, the faster team gains an advantage by imposing their preferred style:

Δtempo={+0.02if tempoA>70 and tempoB<640.02if tempoB>70 and tempoA<640otherwise\Delta_{\text{tempo}} = \begin{cases} +0.02 & \text{if } \text{tempo}_A > 70 \text{ and } \text{tempo}_B < 64 \\ -0.02 & \text{if } \text{tempo}_B > 70 \text{ and } \text{tempo}_A < 64 \\ 0 & \text{otherwise} \end{cases}

The thresholds (70 = top-20 fast, 64 = bottom-20 slow) are derived from the empirical distribution of D-I tempos.

Rebounding Edge

Significant offensive rebounding differentials (ORB%AORB%B>5%|\text{ORB\%}_A - \text{ORB\%}_B| > 5\%) produce second-chance opportunities that compound over a 40-minute game:

Δorb=sgn(ORB%AORB%B)×0.015if ORB%AORB%B>0.05\Delta_{\text{orb}} = \text{sgn}(\text{ORB\%}_A - \text{ORB\%}_B) \times 0.015 \quad \text{if } |\text{ORB\%}_A - \text{ORB\%}_B| > 0.05

Turnover Battle

Teams with significantly lower turnover rates (TOV%BTOV%A>3%|\text{TOV\%}_B - \text{TOV\%}_A| > 3\%) enjoy extra possessions:

Δtov=sgn(TOV%BTOV%A)×0.015if TOV%BTOV%A>0.03\Delta_{\text{tov}} = \text{sgn}(\text{TOV\%}_B - \text{TOV\%}_A) \times 0.015 \quad \text{if } |\text{TOV\%}_B - \text{TOV\%}_A| > 0.03

Note the sign convention: lower TOV%\text{TOV\%} is better for the offense, so the comparison is from team BB's perspective (subtracting AA's rate).

Three-Point Volatility Penalty

Teams heavily dependent on three-point shooting face elevated tournament variance. A three-point dependency index is computed as:

D3=eFG%FG%0.5×3P%D_{3} = \frac{\text{eFG\%} - \text{FG\%}}{0.5 \times \text{3P\%}}

Teams with D3>0.40D_3 > 0.40 receive a penalty Δ3=0.01\Delta_{3} = -0.01.

4.4 Adjustment Cap

The total matchup adjustment for each team is capped at ±5%\pm 5\% to prevent extreme distortions:

Δtotal=clamp ⁣(Δtempo+Δorb+Δtov+Δ3,  0.05,  0.05)\Delta_{\text{total}} = \text{clamp}\!\left(\Delta_{\text{tempo}} + \Delta_{\text{orb}} + \Delta_{\text{tov}} + \Delta_3,\; -0.05,\; 0.05\right)

The adjusted probability is:

P(A>B)=clamp ⁣(P(A>B)+ΔtotalA+ΔtotalB,  0.01,  0.99)P'(A > B) = \text{clamp}\!\left(P(A > B) + \Delta_{\text{total}}^A + \Delta_{\text{total}}^B,\; 0.01,\; 0.99\right)

5. Contextual Probability Modifiers

5.1 Geographic Location Advantage

Tournament games are played at neutral-site venues, but teams playing closer to the venue benefit from crowd support and reduced travel fatigue. We model this using the Haversine formula and a tiered proximity scoring function.

Haversine distance between two geographic coordinates (lat1,lng1)(lat_1, lng_1) and (lat2,lng2)(lat_2, lng_2):

d=2Rarcsin ⁣(sin2 ⁣(Δϕ2)+cosϕ1cosϕ2sin2 ⁣(Δλ2))d = 2R \cdot \arcsin\!\left(\sqrt{\sin^2\!\left(\frac{\Delta\phi}{2}\right) + \cos\phi_1 \cos\phi_2 \sin^2\!\left(\frac{\Delta\lambda}{2}\right)}\right)

where R=3958.8R = 3958.8 miles (Earth's radius), Δϕ=ϕ2ϕ1\Delta\phi = \phi_2 - \phi_1, and Δλ=λ2λ1\Delta\lambda = \lambda_2 - \lambda_1 in radians.

Proximity scoring maps distance to a discrete advantage score:

prox(d)={1.00if d<100 mi (strong home-crowd effect)0.60if 100d<300 mi (regional advantage)0.25if 300d<600 mi (slight edge)0.00if d600 mi (neutral)\text{prox}(d) = \begin{cases} 1.00 & \text{if } d < 100 \text{ mi (strong home-crowd effect)} \\ 0.60 & \text{if } 100 \leq d < 300 \text{ mi (regional advantage)} \\ 0.25 & \text{if } 300 \leq d < 600 \text{ mi (slight edge)} \\ 0.00 & \text{if } d \geq 600 \text{ mi (neutral)} \end{cases}

Differential adjustment: The location advantage is inherently relative — only the difference in proximity matters:

Δloc=(prox(dA)prox(dB))×0.03\Delta_{\text{loc}} = (\text{prox}(d_A) - \text{prox}(d_B)) \times 0.03

The maximum location adjustment is ±3%\pm 3\%, reflecting the empirical finding that neutral-site effects in college basketball are smaller than true home-court advantage (3.5%\approx 3.5\%) but non-negligible.

5.2 Player Health Assessment

Injuries to key players significantly impact team performance, but reliable injury data is often unavailable or delayed. Golden Bracket uses a leader-absence heuristic: if a player who has historically led their team in a statistical category suddenly stops appearing in recent game leader data, they are likely injured or suspended.

Absence ratio for player jj in category cc:

aj,c=clamp ⁣(1rj,cej,c,  0,  1)a_{j,c} = \text{clamp}\!\left(1 - \frac{r_{j,c}}{e_{j,c}},\; 0,\; 1\right)

where:

Category-weighted impact: Each category carries a weight reflecting its importance to team performance:

impactj,c=aj,c×wc×clamp(1.5×dominancej,c,  0,  1)\text{impact}_{j,c} = a_{j,c} \times w_c \times \text{clamp}(1.5 \times \text{dominance}_{j,c},\; 0,\; 1)
CategoryWeight wcw_c
Points0.55
Rebounds0.25
Assists0.20

Team health score:

Hi=clamp ⁣(1j,cimpactj,c,  0,  1)H_i = \text{clamp}\!\left(1 - \sum_{j,c} \text{impact}_{j,c},\; 0,\; 1\right)

Player status classification:

StatusCondition
likely_outaj,c0.80a_{j,c} \geq 0.80
questionable0.40aj,c<0.800.40 \leq a_{j,c} < 0.80
availableaj,c<0.40a_{j,c} < 0.40

Team-level status:

StatusCondition
healthyHi0.95H_i \geq 0.95
minor_concern0.80Hi<0.950.80 \leq H_i < 0.95
degraded0.60Hi<0.800.60 \leq H_i < 0.80
significant_concernHi<0.60H_i < 0.60

Differential health adjustment:

Δhealth=(HAHB)×0.08\Delta_{\text{health}} = (H_A - H_B) \times 0.08

The ±8%\pm 8\% maximum adjustment reflects analysis showing that the absence of a team's leading scorer can shift win probability by 1010-15%15\%. Graceful degradation: when either team has insufficient game leader data (fewer than 8 games), the adjustment is zero.

5.3 Seed-Line Historical Calibration

Pure model-based probabilities can be poorly calibrated against historical base rates. For example, 12-over-5 upsets historically occur 35%\approx 35\% of the time, but many models systematically underpredict this rate.

We apply a Bayesian-style calibration that blends the model probability with the historical win rate for the specific seed matchup:

Pcal=(1λ)Pmodel+λPhistP_{\text{cal}} = (1 - \lambda) \cdot P_{\text{model}} + \lambda \cdot P_{\text{hist}}

where λ=0.25\lambda = 0.25 is the calibration strength. This is a linear opinion pool (Stone, 1961) with 75%75\% weight on the model and 25%25\% on the historical base rate.

Historical win rates (favored seed), aggregated from 2001-2025 NCAA tournaments:

MatchupFavored Win RateUpset Rate
1 vs 1699.3%0.7%
2 vs 1593.8%6.2%
3 vs 1485.3%14.7%
4 vs 1379.3%20.7%
5 vs 1264.9%35.1%
6 vs 1162.8%37.2%
7 vs 1060.7%39.3%
8 vs 952.0%48.0%

Later-round historical rates (Round of 32, Sweet 16, Elite 8) are also incorporated where sufficient data exists, including matchups such as 1 vs 8 (79.7%), 2 vs 7 (66.7%), 1 vs 4 (62.8%), and 1 vs 2 (53.8%).

The output is clamped to [0.01,0.99][0.01, 0.99] to maintain well-defined probabilities.

Rationale for λ=0.25\lambda = 0.25: The calibration is intentionally gentle. Too much weight on historical rates would override the model's team-specific analysis (e.g., a historically dominant 12-seed should not be dragged toward the generic 35% upset rate). The 25/75 blend nudges the model toward empirically validated base rates while preserving team-specific discriminative power.


6. Prediction Market Ensemble

6.1 Market Data Integration

Prediction markets (e.g., Polymarket) aggregate diverse information — including injury news, lineup changes, and collective wisdom — that may not be captured by statistical models alone. Golden Bracket integrates market-derived win probabilities as an ensemble signal.

Two types of market data are ingested:

  1. Championship futures odds: Long-range probabilities that a team wins the entire tournament, converted to implied head-to-head probabilities via odds ratios.
  2. Direct matchup odds: When available, pairwise market probabilities for specific tournament games.

6.2 Linear Ensemble Blend

The final probability is a linear combination of the model-derived probability and the market-implied probability:

Pfinal=αPmodel+(1α)PmarketP_{\text{final}} = \alpha \cdot P_{\text{model}} + (1 - \alpha) \cdot P_{\text{market}}

where α=0.70\alpha = 0.70 (70% model, 30% market). The output is clamped to [0.01,0.99][0.01, 0.99].

6.3 Efficient Markets Hypothesis Justification

The 30% market weight reflects several considerations:

  1. Information aggregation: Prediction markets efficiently aggregate dispersed private information (Hayek, 1945; Wolfers and Zitzewitz, 2004). Market participants incorporate injury reports, travel conditions, and emotional factors that pure statistical models miss.
  2. Historical calibration: Prediction markets have been shown to be well-calibrated for sporting events — when a market assigns 70% probability, the event occurs approximately 70% of the time (Tetlock and Gardner, 2015).
  3. Complementarity: Markets and models fail in different ways. Markets can exhibit herding behavior and public-bias (e.g., overvaluing blue-blood programs), while models can miss qualitative factors. Blending exploits this complementarity.
  4. The 70/30 split: The model retains majority weight because (a) our model incorporates most of the same information markets use, and (b) market prices in the NCAA tournament can reflect recreational bettor biases.

6.4 Graceful Degradation

When market data is unavailable (API failure, no market exists for a matchup, or data is stale), the system falls back to model-only prediction with αeffective=1.0\alpha_{\text{effective}} = 1.0. This ensures the system never produces null predictions:

Pfinal={αPmodel+(1α)Pmarketif market data availablePmodelotherwiseP_{\text{final}} = \begin{cases} \alpha \cdot P_{\text{model}} + (1 - \alpha) \cdot P_{\text{market}} & \text{if market data available} \\ P_{\text{model}} & \text{otherwise} \end{cases}

7. Monte Carlo Tournament Simulation

7.1 Simulation Architecture

To generate tournament-level forecasts (e.g., "probability of reaching the Final Four"), we simulate the entire bracket N=10,000N = 10{,}000 times using Monte Carlo methods. Each iteration proceeds as follows:

  1. Initialize the bracket with all 64 teams in their seeded positions.
  2. For each round (Round of 64 through Championship): a. Compute the pairwise win probability P(A>B)P(A > B) for each matchup using the full pipeline (Log5 + matchup adjustments + location + seed calibration). b. Draw a Bernoulli random variable XBernoulli(P(A>B))X \sim \text{Bernoulli}(P(A > B)) to determine the winner. c. Apply Bayesian in-simulation updates (Section 7.2).
  3. Record milestone achievements (Sweet 16, Elite 8, Final Four, Championship) for surviving teams.
  4. After all iterations, convert counts to percentages:
Pi(milestone)=counti(milestone)N×100%P_i(\text{milestone}) = \frac{\text{count}_i(\text{milestone})}{N} \times 100\%

7.2 Bayesian In-Simulation Learning

A key insight is that tournament results reveal information about team quality. If a 12-seed upsets a 5-seed, this outcome should update our belief about the 12-seed's strength for subsequent rounds within the same simulation path.

We implement a lightweight surprise-based Bayesian update. After team WW defeats team LL in a simulated game:

surprise=pLpW+pL\text{surprise} = \frac{p_L}{p_W + p_L} pWnew=clamp ⁣(pW+η(surprise0.5),  0.01,  0.99)p_W^{\text{new}} = \text{clamp}\!\left(p_W + \eta \cdot (\text{surprise} - 0.5),\; 0.01,\; 0.99\right)

where η=0.03\eta = 0.03 is the learning rate. The surprise term measures how unexpected the outcome was:

Learning rate choice: η=0.03\eta = 0.03 produces updates of at most ±0.015\pm 0.015 per game, ensuring that (a) genuine Cinderella runs are rewarded with increasing survival probability, while (b) no single upset result can radically distort the remaining bracket.

7.3 Deep-Copy Isolation

Each simulation iteration operates on an independent deep copy of the precomputed team data. This isolation guarantee ensures that Bayesian updates within one simulation path do not leak across iterations, preserving the statistical independence required for valid Monte Carlo estimation.

for each iteration i in 1..N:
    iterData ← deepCopy(basePrecomputed)  // isolated copy
    simulate bracket using iterData        // Bayesian updates modify iterData only
    record milestones

7.4 Round Detection and Milestone Tracking

The simulation tracks four milestone thresholds, mapped from the round index:

Rounds from EndRound NameMilestone
3Sweet 16sweetSixteenPct
2Elite 8eliteEightPct
1Final FourfinalFourPct
0ChampionshipchampionshipPct

For a standard 64-team bracket (log2(64)=6\lceil\log_2(64)\rceil = 6 rounds), the Sweet 16 corresponds to round 3, Elite 8 to round 4, Final Four to round 5, and Championship to round 6.


8. System Architecture and Implementation

8.1 Prediction Pipeline

The prediction API (/api/predict) executes the following sequential stages for a single matchup:

┌─────────────────────────────────────────────────────────┐
│ Stage 1: Data Acquisition (concurrent)                  │
│   ├── Fetch tournament teams + stats     [CBBD API]     │
│   ├── Fetch team schedules (×2)          [ESPN API]     │
│   └── Fetch market odds                  [Polymarket]   │
├─────────────────────────────────────────────────────────┤
│ Stage 2: Feature Engineering                            │
│   ├── Sanitize stats (NaN guards)                       │
│   ├── Calculate coaching scores (all teams)             │
│   ├── Calculate variance scores (target teams)          │
│   └── Build z-score normalization context               │
├─────────────────────────────────────────────────────────┤
│ Stage 3: Composite Rating                               │
│   └── 10-factor weighted z-score → R₁, R₂              │
├─────────────────────────────────────────────────────────┤
│ Stage 4: Log5 Base Probability                          │
│   └── P(A>B) = Log5(R₁/100, R₂/100)                   │
├─────────────────────────────────────────────────────────┤
│ Stage 5: Matchup Adjustments (+/- 5% cap)               │
│   ├── Tempo mismatch                                    │
│   ├── Rebounding edge                                   │
│   ├── Turnover battle                                   │
│   └── 3PT volatility penalty                            │
├─────────────────────────────────────────────────────────┤
│ Stage 6: Location Advantage (+/- 3% cap)                │
│   └── Haversine distance → proximity differential       │
├─────────────────────────────────────────────────────────┤
│ Stage 7: Health Assessment (+/- 8% cap)                 │
│   └── Leader-absence heuristic → health differential    │
├─────────────────────────────────────────────────────────┤
│ Stage 8: Seed Calibration (λ = 0.25)                    │
│   └── Bayesian blend with historical upset rates        │
├─────────────────────────────────────────────────────────┤
│ Stage 9: Market Ensemble (α = 0.70)                     │
│   └── Linear blend: 70% model + 30% market              │
├─────────────────────────────────────────────────────────┤
│ Stage 10: Output                                        │
│   └── P_final ∈ [0.01, 0.99], confidence label          │
└─────────────────────────────────────────────────────────┘

8.2 Concurrent Data Fetching

Data acquisition is parallelized using Promise.all to minimize latency:

This design reduces the critical path from sequential (tteams+tstats+tmarkets+2×tschedulet_{\text{teams}} + t_{\text{stats}} + t_{\text{markets}} + 2 \times t_{\text{schedule}}) to approximately max(tteams+stats,tmarkets,tschedule)\max(t_{\text{teams+stats}}, t_{\text{markets}}, t_{\text{schedule}}).

8.3 NaN Guards and Probability Clamping

The system employs defensive programming at every probability computation boundary:

  1. Input sanitization: All stat fields default to 00 when null/undefined.
  2. AdjEM recovery: Derived from AdjOEAdjDE\text{AdjOE} - \text{AdjDE} when the primary value is missing.
  3. Post-Log5 NaN check: If Log5 produces NaN (from degenerate inputs), the probability resets to 0.50.5.
  4. Pre-ensemble NaN check: A final guard before market blending catches any accumulated NaN.
  5. Universal clamping: All probabilities are clamped to [0.01,0.99][0.01, 0.99] after every additive adjustment.

8.4 Caching Architecture

Data SourceCache TTLRationale
Team statistics (CBBD)6 hoursUpdated daily; stale data is acceptable within a game day
Market odds (Polymarket)30 minutesMarkets move continuously; shorter TTL captures line movement
Live scores (ESPN)5 minutesRequired for in-progress game awareness

9. Validation and Limitations

9.1 Distributional Assumptions

The model makes several implicit distributional assumptions:

  1. Gaussian factor distribution: Z-score normalization assumes that each factor is approximately normally distributed across the tournament population. For metrics like AdjEM, this is well-supported empirically. For coaching scores (a discrete, heavily right-skewed distribution), the assumption is weaker.
  2. Exponential decay for recency: The eλde^{-\lambda d} weighting assumes that information decays exponentially with time. This is a reasonable first-order approximation but may underweight the signal from early-season games against strong opponents.
  3. Linear blending: Both the seed calibration (λ=0.25\lambda = 0.25) and market ensemble (α=0.70\alpha = 0.70) use linear opinion pools rather than logarithmic pools or Bayesian model averaging. Linear pools are suboptimal when one source is consistently better-calibrated but offer simplicity and interpretability.

9.2 Independence Assumptions and Violations

The Monte Carlo simulation assumes:

  1. Game independence: The outcome of one game does not affect another, conditional on team ratings. This is violated by fatigue accumulation, momentum effects, and schedule density.
  2. Bernoulli outcomes: Each game is modeled as a single Bernoulli trial. In reality, game outcomes are influenced by within-game dynamics (foul trouble, pace control, late-game situations) that create correlated risk.
  3. Static team quality: Apart from the Bayesian in-simulation update (η=0.03\eta = 0.03), team ratings are treated as fixed throughout the tournament. Injuries, suspensions, and "getting hot" are not dynamically modeled beyond the health heuristic.

9.3 Known Limitations

  1. No player-level modeling: The health assessment uses a leader-absence heuristic rather than individual player impact metrics (e.g., box plus-minus, win shares). A player's absence in game leader data may reflect a coaching decision rather than injury.
  2. Static coaching factor: The coaching score captures career-level pedigree but not in-season tactical adaptations, game-planning ability, or timeout usage patterns.
  3. Market bias propagation: Prediction markets for college basketball are influenced by public sentiment and fan loyalty, which can create systematic biases (e.g., overpricing traditional powerhouses). The 30% market weight partially imports these biases.
  4. Small-sample calibration: Later-round historical seed matchup rates (e.g., 1 vs 4 in the Sweet 16) are based on relatively few observations (n25n \approx 25 per matchup), introducing calibration noise.
  5. No conference-level effects: The model does not account for systematic conference strength beyond what is captured by strength of schedule.

9.4 Calibration Analysis Framework

A proper calibration analysis would bin predicted probabilities into deciles and compare predicted vs. observed win rates (calibration curve). The expected calibration error (ECE) is:

ECE=b=1BnbNpˉbyˉb\text{ECE} = \sum_{b=1}^{B} \frac{n_b}{N} \left| \bar{p}_b - \bar{y}_b \right|

where pˉb\bar{p}_b is the mean predicted probability in bin bb and yˉb\bar{y}_b is the observed win rate. This analysis requires historical backtesting data that is reserved for future work.


10. Conclusion and Future Work

Golden Bracket demonstrates that a multi-stage ensemble approach — combining efficiency metrics, pairwise comparison models, contextual adjustments, historical calibration, and market signals — can produce well-calibrated and informative NCAA tournament predictions. The system's modular architecture allows each component to be independently validated and improved.

Future Extensions

  1. Player-level impact modeling: Integrate individual player metrics (BPM, OBPM, usage rate) to replace the leader-absence heuristic with a proper player impact model.
  2. Dynamic α\alpha scheduling: Vary the model/market blend weight based on market liquidity, time-to-tipoff, and historical accuracy of each source for specific matchup types.
  3. Historical backtesting: Evaluate model performance against past tournaments (2015-2025) using log-loss, Brier score, and bracket scoring.
  4. Bayesian optimization of weights: Replace manual weight specification with Bayesian hyperparameter optimization over historical tournament data.
  5. Conference tournament integration: Use conference tournament results as a real-time signal for team quality updates in the days immediately preceding the NCAA tournament.
  6. Fatigue modeling: Account for back-to-back game effects and travel distance accumulation across rounds.

References

Bradley, R. A. and Terry, M. E. (1952). "Rank Analysis of Incomplete Block Designs: I. The Method of Paired Comparisons." Biometrika, 39(3/4), 324-345.

Hayek, F. A. (1945). "The Use of Knowledge in Society." American Economic Review, 35(4), 519-530.

James, B. (1981). The Bill James Baseball Abstract. Self-published. (Origin of the Log5 method.)

Miller, S. J. (2006). "A Derivation of the Log5 Formula in Baseball and Other Sports." arXiv:math/0609157.

Oliver, D. (2004). Basketball on Paper: Rules and Tools for Performance Analysis. Potomac Books.

Pomeroy, K. (2002-present). "KenPom.com: College Basketball Ratings." https://kenpom.com.

Stone, M. (1961). "The Opinion Pool." Annals of Mathematical Statistics, 32(4), 1339-1342.

Tetlock, P. E. and Gardner, D. (2015). Superforecasting: The Art and Science of Prediction. Crown.

Wolfers, J. and Zitzewitz, E. (2004). "Prediction Markets." Journal of Economic Perspectives, 18(2), 107-126.


Appendix A: Complete Weight Vector

FactorKeyWeightRaw Factor FormulaData Source
Adj. Efficiency MarginadjEM0.32AdjEM (pre-computed)CBBD
Offensive eFG%oEFG0.05oEFG\text{oEFG}CBBD
Defensive eFG%dEFG0.05dEFG-\text{dEFG} (negated: lower is better)CBBD
Turnover Differentialtov0.07oTOV+dTOV-\text{oTOV} + \text{dTOV}CBBD
Rebounding Differentialorb0.06oORBdORB\text{oORB} - \text{dORB}CBBD
Free Throw Rate Diff.ftr0.05oFTRdFTR\text{oFTR} - \text{dFTR}CBBD
Strength of Schedulesos0.10SOS (pre-computed)CBBD
Recent FormrecentForm0.10Exponential decay weighted margin (Section 3.5.3)ESPN
Coaching Pedigreecoaching0.10Tiered scoring (Section 3.5.1)Manual
Performance Variancevariance0.10σ(margins)-\sigma(\text{margins}) (Section 3.5.2)ESPN
Total1.00

Appendix B: Historical Seed Upset Rates

Round of 64

MatchupFavored Seed Win RateUpset RateKeynn (approx.)
1 vs 160.9930.0071-16100
2 vs 150.9380.0622-15100
3 vs 140.8530.1473-14100
4 vs 130.7930.2074-13100
5 vs 120.6490.3515-12100
6 vs 110.6280.3726-11100
7 vs 100.6070.3937-10100
8 vs 90.5200.4808-9100

Round of 32

MatchupFavored Seed Win RateKey
1 vs 80.7971-8
1 vs 90.8381-9
2 vs 70.6672-7
2 vs 100.6182-10
3 vs 60.5713-6
3 vs 110.5773-11
4 vs 50.5454-5
4 vs 120.5914-12

Sweet 16 and Beyond

MatchupFavored Seed Win RateKey
1 vs 40.6281-4
1 vs 50.7141-5
2 vs 30.5352-3
1 vs 120.7501-12
2 vs 60.6002-6
2 vs 110.6472-11
1 vs 20.5381-2
1 vs 30.5831-3

Appendix C: Source Code Reference

Algorithm ComponentSource FileKey Exports
Composite Rating (10-factor)lib/algorithm/composite-rating.tscalculateCompositeRating, WEIGHTS, getContextualWeights
Log5 Pairwise Modellib/algorithm/log5.tslog5
Four Factors Derivationlib/algorithm/four-factors.tscalculateFourFactors
Matchup Adjustmentslib/algorithm/matchup.tscalculateMatchupAdjustment
Location Advantagelib/algorithm/location.tscalculateLocationAdvantage, haversineDistance, proximityAdvantage
Health Assessmentlib/algorithm/health.tsassessTeamHealth, calculateHealthAdjustment
Seed-Line Calibrationlib/algorithm/seed-calibration.tsapplySeedCalibration, HISTORICAL_WIN_RATES
Recency Weightinglib/algorithm/recency.tscalculateRecentForm, getDecayWeight
Market Ensemble Blendlib/algorithm/ensemble.tsensembleBlend
Monte Carlo Simulationlib/algorithm/monte-carlo.tssimulateBracket, bayesianUpdate
Prediction API Pipelineapp/api/predict/route.tsPOST handler (orchestrates full pipeline)
Algorithm Type Definitionslib/algorithm/types.tsCompositeFactors, CompositeWeights, TeamRating, SimulationResult