How Ratings Work

Every player rating in XvO Football is derived from real NFL stats using an era-adjusted z-score system. No manual overrides, no subjective rankings — just math applied to data.

The Rating Scale

1–39
40–59
60–74
75–89
90–99

50 is league average for that position in that season. A rating of 75 means "significantly above average" — roughly top 15% of starters. A rating of 90+ is elite, reserved for the best handful of players at their position that year.

KEY INSIGHT
A 75-rated QB in 2024 was compared to 2024 QBs, not all-time. Ratings are relative to the era — a 70 in 1975 and a 70 in 2020 both mean "well above their contemporaries."

The Formula

For each skill: Rating = 50 + (z-score × 22), clamped 1–99

The z-score measures how many standard deviations a player is above or below the league average for their position group that season. The multiplier (22) controls how spread out ratings are — it's tuned so that roughly 2 standard deviations above average yields a 94.

EXAMPLE
If the average QB throws for 250 passing yards per game with a standard deviation of 47, a QB averaging 344 yd/g is +2.0 std devs above the mean. That gives a rating of 50 + (2.0 × 22) = 94.

Per-Game Normalization

All counting stats (yards, touchdowns, sacks, tackles, etc.) are converted to per-game rates before z-scoring. Both the league baselines and the player values are divided by games played, so the comparison is always rate-vs-rate.

This means a QB who plays 6 games due to injury is rated on his per-game production, not his season totals. A 6-game QB averaging 280 yards per game rates the same as a 17-game QB averaging 280 yards per game.

WHAT'S EXCLUDED
Stats that are already rates are not divided by games: completion percentage over expected (CPOE), target share, field goal percentage, and per-attempt proxies like yards per carry. Peak stats like longest field goal are also excluded. Only counting stats — things that accumulate over a season — get the per-game treatment.

Position Breakdowns

Each position has its own set of skills, each mapped to one or more real stats. Here's what drives every rating.

Offense

Quarterback

SkillStat(s)Pre-1999 ProxyNotes
Arm StrengthAir yards/gYards per attemptDownfield passing distance per game
AccuracyCPOE or EPA/gCompletion % or TD %Takes the higher rating — protects efficient short passers
Decision MakingINTs/g + TDs/g + Yards/gThree-way blend; INTs inverted (fewer = better)
MobilityRushing yards/gScrambling and designed runs per game
PoiseSacks/g + Yards/g + EPA/gSacks/g + Yards/gOnly averages components with data — avoids padding with 50

Running Back

SkillStatPre-1999 ProxyNotes
PowerRushing yards/gGround production per game
ElusivenessRushing first downs/gYards per carryNo broken tackles data; YPC captures elusiveness pre-1999
VisionRushing EPA/gRushing TDs/gEfficiency per game; TDs proxy for finding the end zone
ReceivingReceiving yards/gPass-catching contribution per game

Wide Receiver

SkillStatPre-1999 ProxyNotes
Route RunningTarget shareReceptions per gameHow often the offense looks to this receiver
HandsReceptions/gCatches per game
ReleaseReceiving first downs/gReceiving TDs/gAbility to win at the line and convert
YACYards after catch/gYards per receptionProduction after the catch per game

Tight End

SkillStatPre-1999 ProxyNotes
BlockingNo dataAlways rated 50 — no individual blocking stats available
ReceivingReceiving yards/gPass-catching production per game
YACYards after catch/gYards per receptionProduction after the catch per game

Offensive Line

SkillStatNotes
Pass BlockingNo dataAlways rated 50
Run BlockingNo dataAlways rated 50
Pull/ScreenNo dataAlways rated 50
LIMITATION
No publicly available dataset tracks individual OL performance. All offensive linemen are rated 50 across the board. Future PFF or similar data could improve this.
Defense

Defensive Line — EDGE (DE) & IDL (DT, NT)

Edge rushers and interior defensive linemen use the same four skills but are rated against separate baseline pools and use different overall weights. DEs are compared only to other DEs, and DTs/NTs only to other DTs/NTs. This prevents the higher sack production of edge rushers from diluting interior linemen's baselines (and vice versa).

SkillStatNotes
Pass RushSacks/gQuarterback takedowns per game
Run StuffingTotal tackles/gSolo + assisted tackles per game
PowerTackles for loss/gPlays behind the line per game
QuicknessQB hits/gPressures per game

EDGE overall weights: Pass Rush (1.0), Quickness (0.8), Power (0.6), Speed (0.4), Run Stuffing (0.4). Edge rushers live and die by their ability to get to the quarterback.

IDL overall weights: Run Stuffing (1.0), Power (0.9), Pass Rush (0.6), Quickness (0.5), Speed (0.2). Interior linemen anchor the run defense first; pass-rush production is a bonus.

OLBs (including hybrid pass rushers like T.J. Watt) remain in the Linebacker group because they export with LB skill keys in the game engine.

Linebacker

SkillStatNotes
TacklingTotal tackles/gSolo + assisted tackles per game
CoveragePasses defended/gBatted or broken-up passes per game
BlitzSacks/gRush production per game
Play RecognitionTackles for loss/gReading and stopping plays behind the line

Defensive Back

SkillStatNotes
Man CoveragePasses defended/gOne-on-one coverage ability per game
Zone Awareness(PD + INTs)/gCombined coverage production per game
Ball SkillsInterceptions/gAbility to come away with the ball
TacklingSolo tackles/gRun support and open-field tackling per game
Special Teams

Kicker

SkillStatNotes
Leg StrengthLongest FG madeMaximum proven distance
AccuracyFG percentageOverall make rate

Punter

SkillStatNotes
Leg StrengthNo dataAlways rated 50
PlacementNo dataAlways rated 50

Pre-1999 Proxy Stats

Advanced metrics like EPA, CPOE, YAC, and target share weren't tracked before 1999. Rather than defaulting every affected skill to 50, the pipeline uses proxy stats — derived values from available counting data that approximate what the advanced stat measures.

HOW IT WORKS
Each proxy is computed as a per-season z-score, just like primary stats. For example, a 1978 RB's elusiveness is rated against 1978 yards-per-carry baselines instead of 2024 rushing-first-down baselines. Proxies only activate when the primary stat's baseline is absent — 1999+ ratings are completely unchanged.

Defensive stats, kicker stats, and punter stats don't exist before 1999 at all, so those positions remain at 50 for pre-1999 seasons.

Universal Attributes

Every player receives three universal ratings that apply across all positions.

Speed

Speed is derived from real NFL Combine 40-yard dash times when available (7,800+ players since 2000), with three fallback paths for players without combine data.

PathWhoHow It Works
Combine2000+ players with 40 timeLinear conversion: 4.3s = 100, 4.5s = 85, 5.0s = 46. Apply age decay and a small performance nudge (capped at ±5) from speed-correlated production.
ProxyPre-2000 skill positionsCompute a z-score from the player's YPC (RB), Y/R (WR/TE), or rush yd/game (QB), then map it onto the known combine 40 distribution for that position. A pre-2000 RB with top-tier YPC gets the same speed as a combine-era RB who ran 2 standard deviations faster than average.
FallbackDefenders, OL, K/PPosition default with weight adjustment (lighter = faster). For speed positions (CB, S, WR, RB), career All-Pro selections and Hall of Fame status provide a small boost (up to +6).

Age decay is applied on all paths: no penalty before age 27, then -1/year through 29, -2/year through 32, and -3/year beyond 32. A player who ran 4.4s (base 92) at the combine rates around 83 by age 32.

HISTORICAL OVERRIDES
For 85 notable players (1970–2025), a curated database provides per-season award data and speed floors based on scouting reports and known athleticism. This ensures players like Deion Sanders (4.27s at Florida State, pre-combine era) and Bo Jackson rate appropriately even when the algorithm has limited data. The override acts as a floor: if the algorithm produces a higher rating, it wins.

Stamina & Intelligence

AttributeHow It's Calculated
StaminaPosition default + games played adjustment (more games = higher stamina)
IntelligencePosition default + EPA per game bonus for QBs; other positions use baseline

Overall Rating

The overall rating is a weighted average of position skills and universal attributes. Each position has its own weight table emphasizing the most important skills.

For example, QB accuracy and intelligence are weighted heavily, while a RB's power and elusiveness carry more weight than receiving. Defensive linemen use separate weight tables for EDGE (pass-rush emphasis) and IDL (run-stuffing emphasis). The weights ensure that a player's overall number reflects what matters most for their role on the field.

XvO Factor

Beyond stat-based ratings, we measure each player's on-field impact using Regularized Adjusted Plus-Minus (RAPM) — a ridge regression technique adapted from basketball analytics — then normalize it to a 1–99 scale.

How It Works

For every scrimmage play from 2020–2024, we know all 22 players on the field and the play's Expected Points Added (EPA). A ridge regression estimates each player-season's marginal EPA contribution per snap, controlling for every other player on the field simultaneously.

The raw EPA/snap values are converted to XvO Factor using z-scores within each position group per season: 50 + (z × multiplier), clamped 1–99. Crucially, the multiplier varies by position — see Reliability-Scaled Spread below.

DetailValue
Plays analyzed~178,000 scrimmage plays (2020–2024)
Player-seasons~7,000 (offense + defense, ≥100 snaps)
MethodRidge regression with 5-fold cross-validation
ModelsSeparate offense (EPA added) and defense (EPA prevented)
Scale1–99, spread scaled by position reliability
ELITE badgeOne of several paths — see The ELITE Badge below

Reliability-Scaled Spread

Not all positions are equally measurable. RAPM reliably differentiates quarterbacks — the player who touches the ball every snap — but struggles to separate a running back's contribution from his offensive line's. To keep XvO Factor honest, we scale each position's spread by how much signal the model actually carries.

We measure this with split-half reliability: split all plays into odd-week and even-week halves, run the regression independently on each, and correlate the player estimates. High correlation means the model is measuring something real. Low correlation means the differences are mostly noise.

PositionSplit-Half rMultiplierTypical XF Range
QB0.5612.425–90+
OL0.4810.530–85
WR0.429.230–83
RB0.367.835–77
TE0.327.036–75
DL0.214.540–69
DB0.194.142–63
LB0.173.743–63

The formula is multiplier = 22 × reliability. A QB with reliability 0.56 gets a multiplier of 12.4, spreading scores across a wide range. A linebacker with reliability 0.17 gets a multiplier of 3.7, honestly compressing scores around 50. This means a QB XvO Factor of 85 and a LB XvO Factor of 60 both represent approximately the same confidence: “among the best we can measure at this position.”

WHY OL STANDS OUT
Offensive linemen have no individual box score stats, so their traditional ratings stay near 50. XvO Factor is the only metric in our system that differentiates offensive linemen — and it turns out to be our second most reliable signal (r = 0.48), ahead of wide receivers. When an OL shows an elite XvO Factor, the data is telling you something meaningful about that player's impact on the offense.

Limitations

ConstraintDetails
Coverage2020–2024 only (NFL participation data starts 2016, reliable from 2020)
Pre-2020XvO Factor column shows “–” for seasons without participation data
DefenseDefensive XF clusters tightly around 50 because individual defensive impact is hard to isolate from 11v11 data. The ELITE badge rarely appears for defenders — this is the model being honest, not a bug.
System effectsRAPM isolates individual contribution better than raw stats but cannot fully separate a player from their scheme, coaching, and teammates

The ELITE Badge

The ELITE badge identifies players who were among the very best at their position in a given season. It draws from three independent sources, layered by era:

SourceEraCriteria
XvO Factor2020–2024Top 10% of position group AND XvO Factor ≥ 65
AP All-Pro1970–20251st or 2nd Team All-Pro (Associated Press)
Major Awards1970–2025MVP, OPOY, DPOY, Offensive or Defensive Rookie of the Year

A player only needs to qualify through one of these paths to earn the badge. For 2020+ seasons, all three sources overlap and reinforce each other. For earlier eras, the badge relies on All-Pro selections and major awards.

WHY ALL-PRO?
We use Associated Press All-Pro selections rather than Pro Bowl appearances. The Pro Bowl was a meaningful honor through the mid-1990s, but fan voting (introduced in 1995) and opt-outs have eroded its reliability. AP All-Pro selections are voted on by a panel of sportswriters who watch every game, making them a more consistent standard across eras. All-Pro data is sourced from Wikipedia's comprehensive season-by-season records (1,155 players, 2,390 player-seasons from 1970 to 2025).

Data & Limitations

All stats come from nflverse (nflfastR/nflreadpy), licensed under CC-BY 4.0. The dataset covers 1970–2025.

ConstraintDetails
Minimum gamesPlayers need 8+ games to establish reliable baselines
Volume thresholdsQB: 150 attempts, RB: 50 carries, WR/TE: 30/15 targets (falls back to 20/10 receptions pre-1999), K: 10 FG attempts
OL & PuntersNo individual stats available — always rated 50
Pre-1999 seasonsAdvanced metrics (EPA, CPOE, YAC, target share) don't exist — proxy stats derived from counting data are used instead (see "Pre-1999 Proxy" column above)
Defense pre-1999No defensive stats available before 1999 — all defensive skills default to 50
Game count16 games/team through 2020, 17 games/team from 2021 onward
OPEN SOURCE
The rating pipeline code and all data transformations are available in the project repository. Every rating can be traced back to the underlying stats.

Team Ratings

Team ratings combine individual player evaluations with actual game results to produce four distinct measures per team-season.

Talent Rating (1–99)

A weighted composite of the 8 unit ratings (Pass Game, OL Pass, Run Game, OL Run, Pass Rush, Run Defense, Secondary, Special Teams). The raw weighted average is z-scored and mapped to 1–99. Weights shift across four strategic eras to reflect how the game was actually played:

UnitSmashmouth (70–78)Dynasty D (79–89)West Coast (90–04)Modern Pass (05–25)
Pass Game10%14%18%24%
OL Pass10%10%14%16%
Run Game18%14%12%8%
OL Run12%12%10%8%
Pass Rush12%14%14%18%
Run Defense16%14%12%6%
Secondary12%14%14%16%
Special Teams10%8%6%4%
ERA BOUNDARIES
1978 belongs to the Smashmouth era because rosters were still built for the pre-Mel Blount rules. The 1979 season is when teams adapted to the new passing game. The 2005 boundary reflects the Ty Law rule, expanded roughing the passer, and defenseless receiver protections that structurally favored passing offenses.
TWO-POOL NORMALIZATION
Pre-1999 teams have only 4 of 8 units that vary (Pass Game, Run Game, Secondary, and Special Teams); the other 4 (OL Pass, OL Run, Pass Rush, Run Defense) are fixed at 50 due to data limitations. To prevent this compressed range from producing artificially narrow ratings, pre-1999 and 1999+ teams are normalized in separate pools. Each pool is independently z-scored with the same multiplier (15), so an 80 Talent in 1985 represents the same relative standing among its contemporaries as an 80 in 2020.

Results Rating (1–99)

Derived from actual game outcomes, z-scored within each season. For 1999–2025, game data comes from nflverse. For 1978–1998, historical schedule data (scores and playoff results) is sourced from the devstopfix/nfl_results dataset (Public Domain). Pre-1978 seasons have no schedule data available.

Where play-by-play data is not available (1978–1998), the EPA per play component naturally drops out (all teams z-score to 0), so Results depends on point differential, win percentage, and playoff depth.

InputWeightDescription
Point differential / game30%Best single predictor of team quality
Win percentage20%Regular season only; ties count as 0.5 wins
EPA per play25%Combined offensive + defensive expected points added, per play
Playoff multiplier25%Bonus or penalty based on postseason depth (see below)

Playoff multiplier values: Missed playoffs (−0.5), Wild Card loss (0), Divisional loss (+0.3), Conf. Championship loss (+0.6), Super Bowl loss (+0.4), Super Bowl win (+2.5). The large gap between winning and losing the Super Bowl ensures that regular season dominance alone cannot produce the highest ratings.

Clutch Factor

Clutch = Results − Talent. Measures whether game outcomes exceeded roster quality. Displayed as a signed number with a label:

RangeLabel
+5 or higherOverachiever
−4 to +4As Expected
−5 to −14Underachiever
−15 or lowerFlameout

Playoff Fate

A separate measure of whether a team's postseason matched the expectations set by their regular season. A 14-2 team losing in the conference championship is a different kind of disappointment than a 9-7 team losing in the same round — Playoff Fate captures that distinction.

Teams are classified into a record tier based on their win-percentage rank within their season, then matched against their playoff result to produce a label. By ranking teams relative to their peers each year rather than using fixed win totals, this adapts naturally to different schedule lengths and competitive landscapes — a 12-4 record in a 14-game era is treated as elite when it leads the league.

Record TierMissedWC LossDIV LossCON LossSB LossSB Win
Elite (top 12.5%)CollapsedCollapsedCollapsedStunnedHeartbreakDominant
Strong (top 25%)CollapsedUpsetUpsetContenderHeartbreakDark Horse
Good (top 50%)SnubbedExpectedSolid RunOverachieverHeartbreakCinderella
Average (top 81%)ExpectedExpectedSurpriseSurpriseStunnedCinderella
Poor (bottom 19%)ExpectedSurprise / Cinderella
REAL EXAMPLES
1979 Steelers (12-4, best record in 28-team league) = Dominant. 2011 Giants (9-7, ranked 10th of 32) = Cinderella. 2007 Patriots (16-0, Super Bowl loss) = Heartbreak. 2005 Steelers (11-5, 6th seed Super Bowl win) = Dark Horse.

Overall Rating (1–99)

Overall = (Talent × 0.45) + (Results × 0.50) + clamp(Clutch × 0.10, −5, +5)

The Clutch modifier is clamped at ±5 points so extreme over- or underperformance nudges the composite without dominating it.

Pre-1999 adjustments: For 1978–1998 seasons, Talent is measured from fewer differentiating units (4 of 8), making it less reliable. The Overall formula shifts weight toward Results: T × 0.30 + R × 0.65 + clutch. This ensures that great teams with unmeasured defensive talent (like the 1985 Bears) are properly rated through their dominant game results.

Pre-1978 (no schedule data): Overall regresses Talent toward league average: Talent × 0.70 + 50 × 0.30. This caps talent-only teams around 79, reflecting the inherent uncertainty of roster ratings without any game outcomes to validate them.

Draft Value Analysis

The Draft Value page answers a specific question: given a draft position and a player position, what does historical data say about the value of that pick? It analyzes every draft pick in rounds 1–3 from 2000–2023 (~2,300 players) across 11 positional groups.

First-Contract Production

For each drafted player, we sum their per-season XvO player ratings (1–99) over the rookie contract window:

Draft RoundContract WindowRationale
Round 15 yearsIncludes 5th-year option
Rounds 2–34 yearsStandard rookie deal

An elite pick might total 350+ over the contract window; a bust might total under 150. Players with fewer seasons get their actual total — no imputation. Busts, injuries, and early exits are real outcomes reflected in the data.

Contract Cost

Two eras require different cost models, both normalized to a 0–100 scale:

EraCost SourceDetails
CBA Era (2011–2024)Rookie wage scaleApproximate guaranteed money by pick number, from public NFL data
Pre-CBA (2000–2010)Jimmy Johnson chartTraditional draft value chart as cost proxy; actual contracts were negotiated and wildly variable

Surplus Value & Grading

Surplus = Production (normalized 0–100) − Cost (normalized 0–100)

Positive surplus means the player outproduced their draft slot cost; negative means they underperformed relative to the investment. The surplus is mapped to a letter grade:

GradeSurplus Range
A+ to A−+35 and above
B+ to B−+5 to +34
C+ to C−−25 to +4
D+ to D−−55 to −26
FBelow −55

Outcome Classification

Each pick is classified into one of three outcomes based on first-contract production:

OutcomeCriteria
EliteProduction total ≥ 320 OR any All-Pro selection during contract
StarterProduction total 150–319
BustProduction total below 150

Elite Impact Multiplier

Measures how much having a top-rated player at a position correlates with team win improvement. For each position, we compare the average win-percentage residual (actual wins minus expected wins from talent rating) for teams with an elite player at that position versus teams without. Granular positions are resolved from the database using depth chart data (e.g., DE vs DT, T vs G) rather than the generic DL/OL categories in roster files, giving each positional group its own distinct signal.

KEY INSIGHT
QB elite impact (+39.6%) dwarfs every other position. TE (+13.2%), CB (+10.6%), RB (+10.3%), and S (+10.1%) cluster in the mid-range. WR (+8.9%), LB (+7.0%), EDGE (+5.5%), and IDL (+2.3%) follow. Edge rushers and interior linemen are rated against separate baseline pools with position-specific overall weights — EDGE prioritizes pass rush, IDL prioritizes run stuffing. Tackle (T) elite impact cannot be measured because OL individual ratings lack variance. The elite impact is shown in the Pick Evaluator when a specific position is selected.

Positional Groups

11 groups, granular where sample size supports it, merged where it doesn't:

GroupPositions Merged
QB, RB, WR, TEStandalone
TT, OT
G/CG, C, OL, OG
EDGEDE, OLB, EDGE
IDLDT, NT, DL
LBLB, ILB, MLB
CBCB
SS, SS, FS, DB, SAF

Draft Tiers

Six tiers aligned to CBA-era rookie wage scale cost cliffs:

TierPicksCBA-Era Guaranteed $
Elite1–3~$35–40M
Premium4–10~$20–35M
Mid-First11–20~$12–20M
Late First21–32~$8–12M
Early Second33–48~$5–8M
Mid-to-Late 2nd/3rd49–100~$2–5M

Data Scope & Limitations

ConstraintDetails
CoverageRounds 1–3, picks 1–100, draft years 2000–2023 (~2,300 players). 2024–2025 classes excluded because picks with fewer than two seasons of data produce misleadingly low surplus values.
Match rate99.2% of draft picks successfully matched to XvO player IDs
Incomplete contractsRecent picks (2020–2023) have incomplete contract windows; production reflects only seasons played so far. Incomplete picks are included in aggregate statistics but excluded from best/worst pick lists.
OL ratingsOffensive linemen (T, G/C) are rated ~50 across all skills due to no individual stats — their draft value analysis relies primarily on surplus calculations. Tackle (T) elite impact cannot be measured; G/C barely registers (+0.7%).
Rounds 4–7Excluded — too noisy and near-minimum contracts make surplus analysis less meaningful