Golden Bracket: A Multi-Stage Ensemble Approach to NCAA Tournament Prediction

Golden Bracket Research Team

March 2026

Abstract

We present Golden Bracket, a multi-stage ensemble system for predicting NCAA Division I Men's Basketball Tournament outcomes. The system integrates efficiency-based team ratings, pairwise comparison via the Log5 model, contextual adjustments for geography, player health, and matchup dynamics, historical seed-line calibration, and prediction market signals into a unified probabilistic framework. Tournament-level forecasts are generated through Monte Carlo simulation with Bayesian in-simulation learning. We describe each algorithmic component with full mathematical formulation, discuss design decisions grounded in basketball analytics literature, and outline the system architecture that enables real-time prediction serving. The approach achieves principled uncertainty quantification by blending model-derived probabilities with efficient-market signals, while maintaining graceful degradation when data sources are unavailable.

1. Introduction

1.1 Problem Statement

The NCAA Division I Men's Basketball Tournament — colloquially known as "March Madness" — is a 68-team single-elimination tournament that has captivated American sports culture since 1939. The tournament's single-elimination format, combined with the inherent variance in college basketball, makes it one of the most challenging prediction problems in sports analytics. Warren Buffett's famous billion-dollar bracket challenge underscores the near-impossibility of a perfect bracket: the naive probability of correctly predicting all 63 games in the main draw is approximately $1$ in $9.2 \times 10^{18}$ .

The prediction task can be decomposed into two levels: (1) pairwise matchup probability estimation — given two teams, what is the probability that team $A$ defeats team $B$ ? — and (2) tournament simulation — given a bracket of 64 teams, what is the probability distribution over all possible outcomes?

1.2 Related Work

Several prominent systems address this problem:

KenPom (Pomeroy, 2002–present): Adjusted efficiency margin (AdjEM) ratings using tempo-free statistics and strength-of-schedule corrections. The gold standard for efficiency-based team evaluation.
FiveThirtyEight Elo (Silver, 2015–2023): Elo-based power ratings with travel adjustments and preseason priors, combined with Monte Carlo simulation.
ESPN BPI (Berry, 2012–present): Bayesian Performance Index combining game-level data with preseason projections and roster continuity.
Sagarin Ratings (Sagarin, 1985–present): Linear regression-based composite incorporating schedule strength and margin of victory.

1.3 Contributions

Golden Bracket makes the following contributions:

A ten-factor composite rating with z-score normalization that addresses multicollinearity between efficiency metrics (Section 3).
Contextual weight adjustment that amplifies intangible factors in close matchups where the talent gap is small (Section 3.4).
A player health proxy using game-leader absence detection as a heuristic for injury impact (Section 5.2).
A Bayesian in-simulation learning mechanism that updates team ratings within each Monte Carlo iteration based on the information revealed by simulated tournament results (Section 7.2).
Market ensemble blending with graceful degradation, combining model-derived probabilities with prediction market signals (Section 6).

2. Data Sources and Feature Engineering

2.1 Primary Data Sources

Golden Bracket ingests data from three complementary sources:

Source	Data Type	Update Frequency	Cache TTL
CollegeBasketballData (CBBD)	Adjusted efficiency metrics, Four Factors	Daily	6 hours
ESPN API	Real-time scores, schedules, game leaders, box scores	Per-game	5 minutes
Polymarket	Championship odds, matchup-level prediction markets	Continuous	30 minutes

2.2 Dean Oliver's Four Factors

The system builds on Dean Oliver's "Four Factors of Basketball Success" (Oliver, 2004), which decompose team performance into four orthogonal dimensions. For each factor, we compute both offensive and defensive variants:

Effective Field Goal Percentage (eFG%):

\text{eFG\%} = \frac{\text{FGM} + 0.5 \times \text{3PM}}{\text{FGA}}

This weights three-point field goals at 1.5x to reflect their higher point value per attempt.

Turnover Rate (TOV%):

\text{TOV\%} = \frac{\text{TOV}}{\text{FGA} + 0.44 \times \text{FTA} + \text{TOV}}

The denominator approximates possessions. The coefficient $0.44$ on free throw attempts accounts for and-one plays, technical free throws, and three-shot fouls, which do not consume a full possession.

Offensive Rebounding Percentage (ORB%):

\text{ORB\%} = \frac{\text{ORB}}{\text{ORB} + \text{Opponent DRB}}

Free Throw Rate (FTR):

\text{FTR} = \frac{\text{FTA}}{\text{FGA}}

This measures a team's ability to get to the free throw line relative to its field goal attempts.

2.3 Feature Extraction Pipeline

Raw statistics undergo a sanitization pipeline before entering the model:

Null/NaN coercion: All numeric fields default to $0$ when missing.
Derived field recovery: When $\text{AdjEM}$ is null but $\text{AdjOE}$ and $\text{AdjDE}$ are available, we compute $\text{AdjEM} = \text{AdjOE} - \text{AdjDE}$ .
Cross-source reconciliation: Team identifiers are mapped across ESPN, CBBD, and market data using a canonical ID registry.
Caching: Tiered cache TTLs (6h/30min/5min) balance freshness against API rate limits.

3. Composite Team Rating Model

3.1 Ten-Factor Weighted Z-Score Formulation

Each team is assigned a composite rating $R$ computed as a weighted sum of z-score-normalized factors across $K = 10$ dimensions. Let $\mathbf{f}_i = (f_{i,1}, \ldots, f_{i,K})$ denote the raw factor vector for team $i$ and let $\mathcal{T}$ be the set of all tournament teams. The composite rating is:

R_i = \sum_{k=1}^{K} w_k \cdot z_{i,k}

where the z-score for factor $k$ is:

z_{i,k} = \frac{f_{i,k} - \mu_k}{\sigma_k}

with population statistics computed over the tournament field:

\mu_k = \frac{1}{|\mathcal{T}|} \sum_{j \in \mathcal{T}} f_{j,k}, \qquad \sigma_k = \sqrt{\frac{1}{|\mathcal{T}|} \sum_{j \in \mathcal{T}} (f_{j,k} - \mu_k)^2}

The composite z-score is then mapped to a $[0, 100]$ scale via:

R_i^{\text{scaled}} = \frac{\text{clamp}(R_i, -3, 3) + 3}{6} \times 100

3.2 Weight Vector

The weight vector $\mathbf{w}$ was derived through iterative analysis addressing multicollinearity between efficiency metrics:

Factor $k$	Symbol	Weight $w_k$	Description
1	$\text{adjEM}$	0.32	Adjusted efficiency margin (primary quality signal)
2	$\text{oEFG}$	0.05	Offensive effective FG%
3	$\text{dEFG}$	0.05	Defensive effective FG% (negated: lower is better)
4	$\text{tov}$	0.07	Turnover differential ( $-\text{oTOV} + \text{dTOV}$ )
5	$\text{orb}$	0.06	Offensive rebounding differential ( $\text{oORB} - \text{dORB}$ )
6	$\text{ftr}$	0.05	Free throw rate differential ( $\text{oFTR} - \text{dFTR}$ )
7	$\text{sos}$	0.10	Strength of schedule
8	$\text{recentForm}$	0.10	Recency-weighted performance (Section 3.5.3)
9	$\text{coaching}$	0.10	Coaching tournament pedigree (Section 3.5.1)
10	$\text{variance}$	0.10	Performance consistency (Section 3.5.2)

Multicollinearity analysis. Adjusted efficiency margin (AdjEM) is, by construction, highly correlated with the Four Factors (eFG%, TOV%, ORB%, FTR). Naively weighting all factors equally double-counts shared variance. Our approach assigns AdjEM a dominant weight ( $w_1 = 0.32$ ) as the primary quality signal while reducing oEFG and dEFG from an initial $0.09$ each to $0.05$ , preserving their marginal discriminative power without redundancy. The freed weight budget ( $0.08$ ) is redistributed to the intangible factors (coaching, variance) that capture orthogonal information.

Constraint: $\sum_{k=1}^{K} w_k = 1.0$ .

3.3 Z-Score Normalization Across Tournament Population

Z-score normalization is performed over the full tournament population $\mathcal{T}$ (typically 68 teams) rather than the entire D-I population ( $\approx 360$ teams). This design choice reflects the fact that tournament teams constitute a truncated distribution — comparing a 1-seed's efficiency margin to the D-I average would compress the meaningful range. By normalizing within the tournament field, we maximize the discriminative power of each factor among the teams that actually compete.

3.4 Contextual Weight Adjustment

In closely seeded matchups where the talent gap is small, intangible factors (coaching experience, recent momentum, consistency) become relatively more decisive. We model this with a smooth boost function:

\beta(\Delta s) = \max\!\left(0,\; 1 - \frac{|\Delta s|}{8}\right) \times 0.5

where $\Delta s = s_A - s_B$ is the seed differential. For intangible factor indices $k \in \{\text{coaching}, \text{recentForm}, \text{variance}\}$ :

w_k^{*} = w_k \cdot (1 + \beta(\Delta s))

All weights are then renormalized:

\hat{w}_k = \frac{w_k^{*}}{\sum_{j=1}^{K} w_j^{*}}

This yields a maximum 50% boost to intangible weights for equal seeds (e.g., 8 vs. 9), tapering linearly to zero for seed differentials $\geq 8$ .

3.5 Sub-Models

3.5.1 Coaching Pedigree Score

The coaching score $C$ quantifies a coach's historical tournament success:

C = C_{\text{FF}} + C_{\text{champ}} + C_{\text{apps}} + C_{\text{penalty}}

where:

Final Four experience (tiered):

C_{\text{FF}} = \begin{cases} 3.0 & \text{if FF appearances} \geq 4 \\ 2.0 & \text{if FF appearances} \geq 2 \\ 1.0 & \text{if FF appearances} \geq 1 \\ 0 & \text{otherwise} \end{cases}

Championships (capped): $C_{\text{champ}} = \min(1.5 \times \text{championships},\; 3.0)$
Tournament appearances (tiered):

C_{\text{apps}} = \begin{cases} 1.5 & \text{if appearances} \geq 15 \\ 1.0 & \text{if appearances} \geq 8 \\ 0.5 & \text{if appearances} \geq 3 \\ 0 & \text{otherwise} \end{cases}

First-time penalty: $C_{\text{penalty}} = -1.0$ if tournament appearances $\leq 1$

The raw coaching score (range approximately $[-1, 7]$ ) is z-score normalized against the tournament population alongside all other factors.

3.5.2 Performance Variance

Performance consistency is measured as the negative standard deviation of game-by-game scoring margins:

V_i = -\sqrt{\frac{1}{n}\sum_{g=1}^{n}(m_g - \bar{m})^2}

where $m_g = \text{score}_g - \text{oppScore}_g$ is the margin for game $g$ and $\bar{m}$ is the mean margin. Negation ensures that lower variance (more consistent teams) yields a higher factor value after z-score normalization. A minimum of 3 games is required; teams with insufficient data receive $V_i = 0$ .

3.5.3 Recency-Weighted Form

Recent performance is weighted using an exponential decay kernel with a 30-day half-life:

F_i = \frac{\sum_{g=1}^{n} \tilde{m}_g \cdot e^{-\lambda \cdot d_g}}{\sum_{g=1}^{n} e^{-\lambda \cdot d_g}}

where:

$d_g$ is the number of days between game $g$ and the reference date
$\lambda = \frac{\ln 2}{30} \approx 0.0231$ (half-life of 30 days)
$\tilde{m}_g = \text{clamp}(m_g, -20, 20)$ is the margin capped at $\pm 20$ to prevent blowout distortion
Conference tournament games receive a $1.5\times$ multiplier on their decay weight to reflect their elevated importance as the most recent high-stakes games

4. Pairwise Matchup Probability

4.1 Log5 Formula

Golden Bracket uses the Log5 method (James, 1981; Miller, 2006) to convert team ratings into pairwise win probabilities. Given team $A$ 's latent strength $p_A$ (expressed as a probability of beating an average team) and team $B$ 's strength $p_B$ , the probability that $A$ defeats $B$ is:

P(A > B) = \frac{p_A(1 - p_B)}{p_A(1 - p_B) + p_B(1 - p_A)}

This is equivalent to the Bradley-Terry model (Bradley and Terry, 1952) under the transformation $\pi_i = p_i / (1 - p_i)$ :

P(A > B) = \frac{\pi_A}{\pi_A + \pi_B}

Implementation detail: Input probabilities are clamped to $[0.01, 0.99]$ to prevent division by zero and degenerate outputs:

p_A' = \text{clamp}(p_A, 0.01, 0.99), \qquad p_B' = \text{clamp}(p_B, 0.01, 0.99)

4.2 Mathematical Properties

The Log5 formula satisfies several desirable properties:

Symmetry: $P(A > B) + P(B > A) = 1$
Monotonicity: $P(A > B)$ is strictly increasing in $p_A$ and strictly decreasing in $p_B$
Identity: When $p_A = p_B$ , $P(A > B) = 0.5$
Non-transitivity: The model does not guarantee transitivity — $P(A > B) > 0.5$ and $P(B > C) > 0.5$ does not imply $P(A > C) > 0.5$ when matchup adjustments are applied

4.3 Post-Hoc Matchup Adjustments

After computing the base Log5 probability, we apply additive adjustments for four matchup-specific factors:

Tempo Mismatch

When one team plays at an extremely fast pace and the opponent plays extremely slowly, the faster team gains an advantage by imposing their preferred style:

\Delta_{\text{tempo}} = \begin{cases} +0.02 & \text{if } \text{tempo}_A > 70 \text{ and } \text{tempo}_B < 64 \\ -0.02 & \text{if } \text{tempo}_B > 70 \text{ and } \text{tempo}_A < 64 \\ 0 & \text{otherwise} \end{cases}

The thresholds (70 = top-20 fast, 64 = bottom-20 slow) are derived from the empirical distribution of D-I tempos.

Rebounding Edge

Significant offensive rebounding differentials ( $|\text{ORB\%}_A - \text{ORB\%}_B| > 5\%$ ) produce second-chance opportunities that compound over a 40-minute game:

\Delta_{\text{orb}} = \text{sgn}(\text{ORB\%}_A - \text{ORB\%}_B) \times 0.015 \quad \text{if } |\text{ORB\%}_A - \text{ORB\%}_B| > 0.05

Turnover Battle

Teams with significantly lower turnover rates ( $|\text{TOV\%}_B - \text{TOV\%}_A| > 3\%$ ) enjoy extra possessions:

\Delta_{\text{tov}} = \text{sgn}(\text{TOV\%}_B - \text{TOV\%}_A) \times 0.015 \quad \text{if } |\text{TOV\%}_B - \text{TOV\%}_A| > 0.03

Note the sign convention: lower $\text{TOV\%}$ is better for the offense, so the comparison is from team $B$ 's perspective (subtracting $A$ 's rate).

Three-Point Volatility Penalty

Teams heavily dependent on three-point shooting face elevated tournament variance. A three-point dependency index is computed as:

D_{3} = \frac{\text{eFG\%} - \text{FG\%}}{0.5 \times \text{3P\%}}

Teams with $D_3 > 0.40$ receive a penalty $\Delta_{3} = -0.01$ .

4.4 Adjustment Cap

The total matchup adjustment for each team is capped at $\pm 5\%$ to prevent extreme distortions:

\Delta_{\text{total}} = \text{clamp}\!\left(\Delta_{\text{tempo}} + \Delta_{\text{orb}} + \Delta_{\text{tov}} + \Delta_3,\; -0.05,\; 0.05\right)

The adjusted probability is:

P'(A > B) = \text{clamp}\!\left(P(A > B) + \Delta_{\text{total}}^A + \Delta_{\text{total}}^B,\; 0.01,\; 0.99\right)

5. Contextual Probability Modifiers

5.1 Geographic Location Advantage

Tournament games are played at neutral-site venues, but teams playing closer to the venue benefit from crowd support and reduced travel fatigue. We model this using the Haversine formula and a tiered proximity scoring function.

Haversine distance between two geographic coordinates $(lat_1, lng_1)$ and $(lat_2, lng_2)$ :

d = 2R \cdot \arcsin\!\left(\sqrt{\sin^2\!\left(\frac{\Delta\phi}{2}\right) + \cos\phi_1 \cos\phi_2 \sin^2\!\left(\frac{\Delta\lambda}{2}\right)}\right)

where $R = 3958.8$ miles (Earth's radius), $\Delta\phi = \phi_2 - \phi_1$ , and $\Delta\lambda = \lambda_2 - \lambda_1$ in radians.

Proximity scoring maps distance to a discrete advantage score:

\text{prox}(d) = \begin{cases} 1.00 & \text{if } d < 100 \text{ mi (strong home-crowd effect)} \\ 0.60 & \text{if } 100 \leq d < 300 \text{ mi (regional advantage)} \\ 0.25 & \text{if } 300 \leq d < 600 \text{ mi (slight edge)} \\ 0.00 & \text{if } d \geq 600 \text{ mi (neutral)} \end{cases}

Differential adjustment: The location advantage is inherently relative — only the difference in proximity matters:

\Delta_{\text{loc}} = (\text{prox}(d_A) - \text{prox}(d_B)) \times 0.03

The maximum location adjustment is $\pm 3\%$ , reflecting the empirical finding that neutral-site effects in college basketball are smaller than true home-court advantage ( $\approx 3.5\%$ ) but non-negligible.

5.2 Player Health Assessment

Injuries to key players significantly impact team performance, but reliable injury data is often unavailable or delayed. Golden Bracket uses a leader-absence heuristic: if a player who has historically led their team in a statistical category suddenly stops appearing in recent game leader data, they are likely injured or suspended.

Absence ratio for player $j$ in category $c$ :

a_{j,c} = \text{clamp}\!\left(1 - \frac{r_{j,c}}{e_{j,c}},\; 0,\; 1\right)

where:

$r_{j,c}$ = number of times player $j$ led category $c$ in the last 5 games
$e_{j,c} = \text{dominance}_{j,c} \times 5$ = expected appearances based on season-long dominance rate
$\text{dominance}_{j,c} = \text{games led in } c \,/\, \text{total games with data}$

Category-weighted impact: Each category carries a weight reflecting its importance to team performance:

\text{impact}_{j,c} = a_{j,c} \times w_c \times \text{clamp}(1.5 \times \text{dominance}_{j,c},\; 0,\; 1)

Category	Weight $w_c$
Points	0.55
Rebounds	0.25
Assists	0.20

Team health score:

H_i = \text{clamp}\!\left(1 - \sum_{j,c} \text{impact}_{j,c},\; 0,\; 1\right)

Player status classification:

Status	Condition
`likely_out`	$a_{j,c} \geq 0.80$
`questionable`	$0.40 \leq a_{j,c} < 0.80$
`available`	$a_{j,c} < 0.40$

Team-level status:

Status	Condition
`healthy`	$H_i \geq 0.95$
`minor_concern`	$0.80 \leq H_i < 0.95$
`degraded`	$0.60 \leq H_i < 0.80$
`significant_concern`	$H_i < 0.60$

Differential health adjustment:

\Delta_{\text{health}} = (H_A - H_B) \times 0.08

The $\pm 8\%$ maximum adjustment reflects analysis showing that the absence of a team's leading scorer can shift win probability by $10$ - $15\%$ . Graceful degradation: when either team has insufficient game leader data (fewer than 8 games), the adjustment is zero.

5.3 Seed-Line Historical Calibration

Pure model-based probabilities can be poorly calibrated against historical base rates. For example, 12-over-5 upsets historically occur $\approx 35\%$ of the time, but many models systematically underpredict this rate.

We apply a Bayesian-style calibration that blends the model probability with the historical win rate for the specific seed matchup:

P_{\text{cal}} = (1 - \lambda) \cdot P_{\text{model}} + \lambda \cdot P_{\text{hist}}

where $\lambda = 0.25$ is the calibration strength. This is a linear opinion pool (Stone, 1961) with $75\%$ weight on the model and $25\%$ on the historical base rate.

Historical win rates (favored seed), aggregated from 2001-2025 NCAA tournaments:

Matchup	Favored Win Rate	Upset Rate
1 vs 16	99.3%	0.7%
2 vs 15	93.8%	6.2%
3 vs 14	85.3%	14.7%
4 vs 13	79.3%	20.7%
5 vs 12	64.9%	35.1%
6 vs 11	62.8%	37.2%
7 vs 10	60.7%	39.3%
8 vs 9	52.0%	48.0%

Later-round historical rates (Round of 32, Sweet 16, Elite 8) are also incorporated where sufficient data exists, including matchups such as 1 vs 8 (79.7%), 2 vs 7 (66.7%), 1 vs 4 (62.8%), and 1 vs 2 (53.8%).

The output is clamped to $[0.01, 0.99]$ to maintain well-defined probabilities.

Rationale for $\lambda = 0.25$ : The calibration is intentionally gentle. Too much weight on historical rates would override the model's team-specific analysis (e.g., a historically dominant 12-seed should not be dragged toward the generic 35% upset rate). The 25/75 blend nudges the model toward empirically validated base rates while preserving team-specific discriminative power.

6. Prediction Market Ensemble

6.1 Market Data Integration

Prediction markets (e.g., Polymarket) aggregate diverse information — including injury news, lineup changes, and collective wisdom — that may not be captured by statistical models alone. Golden Bracket integrates market-derived win probabilities as an ensemble signal.

Two types of market data are ingested:

Championship futures odds: Long-range probabilities that a team wins the entire tournament, converted to implied head-to-head probabilities via odds ratios.
Direct matchup odds: When available, pairwise market probabilities for specific tournament games.

6.2 Linear Ensemble Blend

The final probability is a linear combination of the model-derived probability and the market-implied probability:

P_{\text{final}} = \alpha \cdot P_{\text{model}} + (1 - \alpha) \cdot P_{\text{market}}

where $\alpha = 0.70$ (70% model, 30% market). The output is clamped to $[0.01, 0.99]$ .

6.3 Efficient Markets Hypothesis Justification

The 30% market weight reflects several considerations:

Information aggregation: Prediction markets efficiently aggregate dispersed private information (Hayek, 1945; Wolfers and Zitzewitz, 2004). Market participants incorporate injury reports, travel conditions, and emotional factors that pure statistical models miss.
Historical calibration: Prediction markets have been shown to be well-calibrated for sporting events — when a market assigns 70% probability, the event occurs approximately 70% of the time (Tetlock and Gardner, 2015).
Complementarity: Markets and models fail in different ways. Markets can exhibit herding behavior and public-bias (e.g., overvaluing blue-blood programs), while models can miss qualitative factors. Blending exploits this complementarity.
The 70/30 split: The model retains majority weight because (a) our model incorporates most of the same information markets use, and (b) market prices in the NCAA tournament can reflect recreational bettor biases.

6.4 Graceful Degradation

When market data is unavailable (API failure, no market exists for a matchup, or data is stale), the system falls back to model-only prediction with $\alpha_{\text{effective}} = 1.0$ . This ensures the system never produces null predictions:

P_{\text{final}} = \begin{cases} \alpha \cdot P_{\text{model}} + (1 - \alpha) \cdot P_{\text{market}} & \text{if market data available} \\ P_{\text{model}} & \text{otherwise} \end{cases}

7. Monte Carlo Tournament Simulation

7.1 Simulation Architecture

To generate tournament-level forecasts (e.g., "probability of reaching the Final Four"), we simulate the entire bracket $N = 10{,}000$ times using Monte Carlo methods. Each iteration proceeds as follows:

Initialize the bracket with all 64 teams in their seeded positions.
For each round (Round of 64 through Championship): a. Compute the pairwise win probability $P(A > B)$ for each matchup using the full pipeline (Log5 + matchup adjustments + location + seed calibration). b. Draw a Bernoulli random variable $X \sim \text{Bernoulli}(P(A > B))$ to determine the winner. c. Apply Bayesian in-simulation updates (Section 7.2).
Record milestone achievements (Sweet 16, Elite 8, Final Four, Championship) for surviving teams.
After all iterations, convert counts to percentages:

P_i(\text{milestone}) = \frac{\text{count}_i(\text{milestone})}{N} \times 100\%

7.2 Bayesian In-Simulation Learning

A key insight is that tournament results reveal information about team quality. If a 12-seed upsets a 5-seed, this outcome should update our belief about the 12-seed's strength for subsequent rounds within the same simulation path.

We implement a lightweight surprise-based Bayesian update. After team $W$ defeats team $L$ in a simulated game:

\text{surprise} = \frac{p_L}{p_W + p_L}

p_W^{\text{new}} = \text{clamp}\!\left(p_W + \eta \cdot (\text{surprise} - 0.5),\; 0.01,\; 0.99\right)

where $\eta = 0.03$ is the learning rate. The surprise term measures how unexpected the outcome was:

If the winner was the underdog ( $p_W < p_L$ ), surprise $> 0.5$ , and $p_W$ increases.
If the winner was the heavy favorite ( $p_W \gg p_L$ ), surprise $< 0.5$ , and $p_W$ decreases slightly (regression to the mean).
If teams were evenly matched ( $p_W \approx p_L$ ), surprise $\approx 0.5$ , and $p_W$ is essentially unchanged.

Learning rate choice: $\eta = 0.03$ produces updates of at most $\pm 0.015$ per game, ensuring that (a) genuine Cinderella runs are rewarded with increasing survival probability, while (b) no single upset result can radically distort the remaining bracket.

7.3 Deep-Copy Isolation

Each simulation iteration operates on an independent deep copy of the precomputed team data. This isolation guarantee ensures that Bayesian updates within one simulation path do not leak across iterations, preserving the statistical independence required for valid Monte Carlo estimation.

for each iteration i in 1..N:
    iterData ← deepCopy(basePrecomputed)  // isolated copy
    simulate bracket using iterData        // Bayesian updates modify iterData only
    record milestones

7.4 Round Detection and Milestone Tracking

The simulation tracks four milestone thresholds, mapped from the round index:

Rounds from End	Round Name	Milestone
3	Sweet 16	`sweetSixteenPct`
2	Elite 8	`eliteEightPct`
1	Final Four	`finalFourPct`
0	Championship	`championshipPct`

For a standard 64-team bracket ( $\lceil\log_2(64)\rceil = 6$ rounds), the Sweet 16 corresponds to round 3, Elite 8 to round 4, Final Four to round 5, and Championship to round 6.

8. System Architecture and Implementation

8.1 Prediction Pipeline

The prediction API (/api/predict) executes the following sequential stages for a single matchup:

┌─────────────────────────────────────────────────────────┐
│ Stage 1: Data Acquisition (concurrent)                  │
│   ├── Fetch tournament teams + stats     [CBBD API]     │
│   ├── Fetch team schedules (×2)          [ESPN API]     │
│   └── Fetch market odds                  [Polymarket]   │
├─────────────────────────────────────────────────────────┤
│ Stage 2: Feature Engineering                            │
│   ├── Sanitize stats (NaN guards)                       │
│   ├── Calculate coaching scores (all teams)             │
│   ├── Calculate variance scores (target teams)          │
│   └── Build z-score normalization context               │
├─────────────────────────────────────────────────────────┤
│ Stage 3: Composite Rating                               │
│   └── 10-factor weighted z-score → R₁, R₂              │
├─────────────────────────────────────────────────────────┤
│ Stage 4: Log5 Base Probability                          │
│   └── P(A>B) = Log5(R₁/100, R₂/100)                   │
├─────────────────────────────────────────────────────────┤
│ Stage 5: Matchup Adjustments (+/- 5% cap)               │
│   ├── Tempo mismatch                                    │
│   ├── Rebounding edge                                   │
│   ├── Turnover battle                                   │
│   └── 3PT volatility penalty                            │
├─────────────────────────────────────────────────────────┤
│ Stage 6: Location Advantage (+/- 3% cap)                │
│   └── Haversine distance → proximity differential       │
├─────────────────────────────────────────────────────────┤
│ Stage 7: Health Assessment (+/- 8% cap)                 │
│   └── Leader-absence heuristic → health differential    │
├─────────────────────────────────────────────────────────┤
│ Stage 8: Seed Calibration (λ = 0.25)                    │
│   └── Bayesian blend with historical upset rates        │
├─────────────────────────────────────────────────────────┤
│ Stage 9: Market Ensemble (α = 0.70)                     │
│   └── Linear blend: 70% model + 30% market              │
├─────────────────────────────────────────────────────────┤
│ Stage 10: Output                                        │
│   └── P_final ∈ [0.01, 0.99], confidence label          │
└─────────────────────────────────────────────────────────┘

8.2 Concurrent Data Fetching

Data acquisition is parallelized using Promise.all to minimize latency:

Tournament teams and aggregated statistics are fetched concurrently (Stage 1a).
Market odds and individual team schedules are initiated concurrently as soon as team IDs are known (Stage 1b), running in parallel with model computation.
Schedule data is await-ed only when needed (before variance calculation and health assessment).

This design reduces the critical path from sequential ( $t_{\text{teams}} + t_{\text{stats}} + t_{\text{markets}} + 2 \times t_{\text{schedule}}$ ) to approximately $\max(t_{\text{teams+stats}}, t_{\text{markets}}, t_{\text{schedule}})$ .

8.3 NaN Guards and Probability Clamping

The system employs defensive programming at every probability computation boundary:

Input sanitization: All stat fields default to $0$ when null/undefined.
AdjEM recovery: Derived from $\text{AdjOE} - \text{AdjDE}$ when the primary value is missing.
Post-Log5 NaN check: If Log5 produces NaN (from degenerate inputs), the probability resets to $0.5$ .
Pre-ensemble NaN check: A final guard before market blending catches any accumulated NaN.
Universal clamping: All probabilities are clamped to $[0.01, 0.99]$ after every additive adjustment.

8.4 Caching Architecture

Data Source	Cache TTL	Rationale
Team statistics (CBBD)	6 hours	Updated daily; stale data is acceptable within a game day
Market odds (Polymarket)	30 minutes	Markets move continuously; shorter TTL captures line movement
Live scores (ESPN)	5 minutes	Required for in-progress game awareness

9. Validation and Limitations

9.1 Distributional Assumptions

The model makes several implicit distributional assumptions:

Gaussian factor distribution: Z-score normalization assumes that each factor is approximately normally distributed across the tournament population. For metrics like AdjEM, this is well-supported empirically. For coaching scores (a discrete, heavily right-skewed distribution), the assumption is weaker.
Exponential decay for recency: The $e^{-\lambda d}$ weighting assumes that information decays exponentially with time. This is a reasonable first-order approximation but may underweight the signal from early-season games against strong opponents.
Linear blending: Both the seed calibration ( $\lambda = 0.25$ ) and market ensemble ( $\alpha = 0.70$ ) use linear opinion pools rather than logarithmic pools or Bayesian model averaging. Linear pools are suboptimal when one source is consistently better-calibrated but offer simplicity and interpretability.

9.2 Independence Assumptions and Violations

The Monte Carlo simulation assumes:

Game independence: The outcome of one game does not affect another, conditional on team ratings. This is violated by fatigue accumulation, momentum effects, and schedule density.
Bernoulli outcomes: Each game is modeled as a single Bernoulli trial. In reality, game outcomes are influenced by within-game dynamics (foul trouble, pace control, late-game situations) that create correlated risk.
Static team quality: Apart from the Bayesian in-simulation update ( $\eta = 0.03$ ), team ratings are treated as fixed throughout the tournament. Injuries, suspensions, and "getting hot" are not dynamically modeled beyond the health heuristic.

9.3 Known Limitations

No player-level modeling: The health assessment uses a leader-absence heuristic rather than individual player impact metrics (e.g., box plus-minus, win shares). A player's absence in game leader data may reflect a coaching decision rather than injury.
Static coaching factor: The coaching score captures career-level pedigree but not in-season tactical adaptations, game-planning ability, or timeout usage patterns.
Market bias propagation: Prediction markets for college basketball are influenced by public sentiment and fan loyalty, which can create systematic biases (e.g., overpricing traditional powerhouses). The 30% market weight partially imports these biases.
Small-sample calibration: Later-round historical seed matchup rates (e.g., 1 vs 4 in the Sweet 16) are based on relatively few observations ( $n \approx 25$ per matchup), introducing calibration noise.
No conference-level effects: The model does not account for systematic conference strength beyond what is captured by strength of schedule.

9.4 Calibration Analysis Framework

A proper calibration analysis would bin predicted probabilities into deciles and compare predicted vs. observed win rates (calibration curve). The expected calibration error (ECE) is:

\text{ECE} = \sum_{b=1}^{B} \frac{n_b}{N} \left| \bar{p}_b - \bar{y}_b \right|

where $\bar{p}_b$ is the mean predicted probability in bin $b$ and $\bar{y}_b$ is the observed win rate. This analysis requires historical backtesting data that is reserved for future work.

10. Conclusion and Future Work

Golden Bracket demonstrates that a multi-stage ensemble approach — combining efficiency metrics, pairwise comparison models, contextual adjustments, historical calibration, and market signals — can produce well-calibrated and informative NCAA tournament predictions. The system's modular architecture allows each component to be independently validated and improved.

Future Extensions

Player-level impact modeling: Integrate individual player metrics (BPM, OBPM, usage rate) to replace the leader-absence heuristic with a proper player impact model.
Dynamic $\alpha$ scheduling: Vary the model/market blend weight based on market liquidity, time-to-tipoff, and historical accuracy of each source for specific matchup types.
Historical backtesting: Evaluate model performance against past tournaments (2015-2025) using log-loss, Brier score, and bracket scoring.
Bayesian optimization of weights: Replace manual weight specification with Bayesian hyperparameter optimization over historical tournament data.
Conference tournament integration: Use conference tournament results as a real-time signal for team quality updates in the days immediately preceding the NCAA tournament.
Fatigue modeling: Account for back-to-back game effects and travel distance accumulation across rounds.

References

Bradley, R. A. and Terry, M. E. (1952). "Rank Analysis of Incomplete Block Designs: I. The Method of Paired Comparisons." Biometrika, 39(3/4), 324-345.

Hayek, F. A. (1945). "The Use of Knowledge in Society." American Economic Review, 35(4), 519-530.

James, B. (1981). The Bill James Baseball Abstract. Self-published. (Origin of the Log5 method.)

Miller, S. J. (2006). "A Derivation of the Log5 Formula in Baseball and Other Sports." arXiv:math/0609157.

Oliver, D. (2004). Basketball on Paper: Rules and Tools for Performance Analysis. Potomac Books.

Pomeroy, K. (2002-present). "KenPom.com: College Basketball Ratings." https://kenpom.com.

Stone, M. (1961). "The Opinion Pool." Annals of Mathematical Statistics, 32(4), 1339-1342.

Tetlock, P. E. and Gardner, D. (2015). Superforecasting: The Art and Science of Prediction. Crown.

Wolfers, J. and Zitzewitz, E. (2004). "Prediction Markets." Journal of Economic Perspectives, 18(2), 107-126.

Appendix A: Complete Weight Vector

Factor	Key	Weight	Raw Factor Formula	Data Source
Adj. Efficiency Margin	`adjEM`	0.32	AdjEM (pre-computed)	CBBD
Offensive eFG%	`oEFG`	0.05	$\text{oEFG}$	CBBD
Defensive eFG%	`dEFG`	0.05	$-\text{dEFG}$ (negated: lower is better)	CBBD
Turnover Differential	`tov`	0.07	$-\text{oTOV} + \text{dTOV}$	CBBD
Rebounding Differential	`orb`	0.06	$\text{oORB} - \text{dORB}$	CBBD
Free Throw Rate Diff.	`ftr`	0.05	$\text{oFTR} - \text{dFTR}$	CBBD
Strength of Schedule	`sos`	0.10	SOS (pre-computed)	CBBD
Recent Form	`recentForm`	0.10	Exponential decay weighted margin (Section 3.5.3)	ESPN
Coaching Pedigree	`coaching`	0.10	Tiered scoring (Section 3.5.1)	Manual
Performance Variance	`variance`	0.10	$-\sigma(\text{margins})$ (Section 3.5.2)	ESPN
Total		1.00

Appendix B: Historical Seed Upset Rates

Round of 64

Matchup	Favored Seed Win Rate	Upset Rate	Key	$n$ (approx.)
1 vs 16	0.993	0.007	`1-16`	100
2 vs 15	0.938	0.062	`2-15`	100
3 vs 14	0.853	0.147	`3-14`	100
4 vs 13	0.793	0.207	`4-13`	100
5 vs 12	0.649	0.351	`5-12`	100
6 vs 11	0.628	0.372	`6-11`	100
7 vs 10	0.607	0.393	`7-10`	100
8 vs 9	0.520	0.480	`8-9`	100

Round of 32

Matchup	Favored Seed Win Rate	Key
1 vs 8	0.797	`1-8`
1 vs 9	0.838	`1-9`
2 vs 7	0.667	`2-7`
2 vs 10	0.618	`2-10`
3 vs 6	0.571	`3-6`
3 vs 11	0.577	`3-11`
4 vs 5	0.545	`4-5`
4 vs 12	0.591	`4-12`

Sweet 16 and Beyond

Matchup	Favored Seed Win Rate	Key
1 vs 4	0.628	`1-4`
1 vs 5	0.714	`1-5`
2 vs 3	0.535	`2-3`
1 vs 12	0.750	`1-12`
2 vs 6	0.600	`2-6`
2 vs 11	0.647	`2-11`
1 vs 2	0.538	`1-2`
1 vs 3	0.583	`1-3`

Appendix C: Source Code Reference

Algorithm Component	Source File	Key Exports
Composite Rating (10-factor)	`lib/algorithm/composite-rating.ts`	`calculateCompositeRating`, `WEIGHTS`, `getContextualWeights`
Log5 Pairwise Model	`lib/algorithm/log5.ts`	`log5`
Four Factors Derivation	`lib/algorithm/four-factors.ts`	`calculateFourFactors`
Matchup Adjustments	`lib/algorithm/matchup.ts`	`calculateMatchupAdjustment`
Location Advantage	`lib/algorithm/location.ts`	`calculateLocationAdvantage`, `haversineDistance`, `proximityAdvantage`
Health Assessment	`lib/algorithm/health.ts`	`assessTeamHealth`, `calculateHealthAdjustment`
Seed-Line Calibration	`lib/algorithm/seed-calibration.ts`	`applySeedCalibration`, `HISTORICAL_WIN_RATES`
Recency Weighting	`lib/algorithm/recency.ts`	`calculateRecentForm`, `getDecayWeight`
Market Ensemble Blend	`lib/algorithm/ensemble.ts`	`ensembleBlend`
Monte Carlo Simulation	`lib/algorithm/monte-carlo.ts`	`simulateBracket`, `bayesianUpdate`
Prediction API Pipeline	`app/api/predict/route.ts`	`POST` handler (orchestrates full pipeline)
Algorithm Type Definitions	`lib/algorithm/types.ts`	`CompositeFactors`, `CompositeWeights`, `TeamRating`, `SimulationResult`