How Ratings Work — XvO Football

The Rating Scale

1–39

40–59

60–74

75–89

90–99

50 is league average for that position in that season. A rating of 75 means "significantly above average" — roughly top 15% of starters. A rating of 90+ is elite, reserved for the best handful of players at their position that year.

KEY INSIGHT

A 75-rated QB in 2024 was compared to 2024 QBs, not all-time. Ratings are relative to the era — a 70 in 1975 and a 70 in 2020 both mean "well above their contemporaries."

The Formula

For each skill: Rating = 50 + (z-score × 22), clamped 1–99

The z-score measures how many standard deviations a player is above or below the league average for their position group that season. The multiplier (22) controls how spread out ratings are — it's tuned so that roughly 2 standard deviations above average yields a 94.

EXAMPLE

If the average QB throws for 250 passing yards per game with a standard deviation of 47, a QB averaging 344 yd/g is +2.0 std devs above the mean. That gives a rating of 50 + (2.0 × 22) = 94.

Per-Game Normalization

All counting stats (yards, touchdowns, sacks, tackles, etc.) are converted to per-game rates before z-scoring. Both the league baselines and the player values are divided by games played, so the comparison is always rate-vs-rate.

This means a QB who plays 6 games due to injury is rated on his per-game production, not his season totals. A 6-game QB averaging 280 yards per game rates the same as a 17-game QB averaging 280 yards per game.

WHAT'S EXCLUDED

Stats that are already rates are not divided by games: completion percentage over expected (CPOE), target share, field goal percentage, and per-attempt proxies like yards per carry. Peak stats like longest field goal are also excluded. Only counting stats — things that accumulate over a season — get the per-game treatment.

Position Breakdowns

Each position has its own set of skills, each mapped to one or more real stats. Here's what drives every rating.

Offense

Quarterback

Skill	Stat(s)	Pre-1999 Proxy	Notes
Arm Strength	Air yards/g	Yards per attempt	Downfield passing distance per game
Accuracy	CPOE or EPA/g	Completion % or TD %	Takes the higher rating — protects efficient short passers
Decision Making	INTs/g + TDs/g + Yards/g	—	Three-way blend; INTs inverted (fewer = better)
Mobility	Rushing yards/g	—	Scrambling and designed runs per game
Poise	Sacks/g + Yards/g + EPA/g	Sacks/g + Yards/g	Only averages components with data — avoids padding with 50

Running Back

Skill	Stat	Pre-1999 Proxy	Notes
Power	Rushing yards/g	—	Ground production per game
Elusiveness	Rushing first downs/g	Yards per carry	No broken tackles data; YPC captures elusiveness pre-1999
Vision	Rushing EPA/g	Rushing TDs/g	Efficiency per game; TDs proxy for finding the end zone
Receiving	Receiving yards/g	—	Pass-catching contribution per game

Wide Receiver

Skill	Stat	Pre-1999 Proxy	Notes
Route Running	Target share	Receptions per game	How often the offense looks to this receiver
Hands	Receptions/g	—	Catches per game
Release	Receiving first downs/g	Receiving TDs/g	Ability to win at the line and convert
YAC	Yards after catch/g	Yards per reception	Production after the catch per game

Tight End

Skill	Stat	Pre-1999 Proxy	Notes
Blocking	No data	—	Always rated 50 — no individual blocking stats available
Receiving	Receiving yards/g	—	Pass-catching production per game
YAC	Yards after catch/g	Yards per reception	Production after the catch per game

Offensive Line

Skill	Stat	Notes
Pass Blocking	No data	Always rated 50
Run Blocking	No data	Always rated 50
Pull/Screen	No data	Always rated 50

LIMITATION

No publicly available dataset tracks individual OL performance. All offensive linemen are rated 50 across the board. Future PFF or similar data could improve this.

Defense

Defensive Line — EDGE (DE) & IDL (DT, NT)

Edge rushers and interior defensive linemen use the same four skills but are rated against separate baseline pools and use different overall weights. DEs are compared only to other DEs, and DTs/NTs only to other DTs/NTs. This prevents the higher sack production of edge rushers from diluting interior linemen's baselines (and vice versa).

Skill	Stat	Notes
Pass Rush	Sacks/g	Quarterback takedowns per game
Run Stuffing	Total tackles/g	Solo + assisted tackles per game
Power	Tackles for loss/g	Plays behind the line per game
Quickness	QB hits/g	Pressures per game

EDGE overall weights: Pass Rush (1.0), Quickness (0.8), Power (0.6), Speed (0.4), Run Stuffing (0.4). Edge rushers live and die by their ability to get to the quarterback.

IDL overall weights: Run Stuffing (1.0), Power (0.9), Pass Rush (0.6), Quickness (0.5), Speed (0.2). Interior linemen anchor the run defense first; pass-rush production is a bonus.

OLBs (including hybrid pass rushers like T.J. Watt) remain in the Linebacker group because they export with LB skill keys in the game engine.

Linebacker

Skill	Stat	Notes
Tackling	Total tackles/g	Solo + assisted tackles per game
Coverage	Passes defended/g	Batted or broken-up passes per game
Blitz	Sacks/g	Rush production per game
Play Recognition	Tackles for loss/g	Reading and stopping plays behind the line

Defensive Back

Skill	Stat	Notes
Man Coverage	Passes defended/g	One-on-one coverage ability per game
Zone Awareness	(PD + INTs)/g	Combined coverage production per game
Ball Skills	Interceptions/g	Ability to come away with the ball
Tackling	Solo tackles/g	Run support and open-field tackling per game

Special Teams

Kicker

Skill	Stat	Notes
Leg Strength	Longest FG made	Maximum proven distance
Accuracy	FG percentage	Overall make rate

Punter

Skill	Stat	Notes
Leg Strength	No data	Always rated 50
Placement	No data	Always rated 50

Pre-1999 Proxy Stats

Advanced metrics like EPA, CPOE, YAC, and target share weren't tracked before 1999. Rather than defaulting every affected skill to 50, the pipeline uses proxy stats — derived values from available counting data that approximate what the advanced stat measures.

HOW IT WORKS

Each proxy is computed as a per-season z-score, just like primary stats. For example, a 1978 RB's elusiveness is rated against 1978 yards-per-carry baselines instead of 2024 rushing-first-down baselines. Proxies only activate when the primary stat's baseline is absent — 1999+ ratings are completely unchanged.

Defensive stats, kicker stats, and punter stats don't exist before 1999 at all, so those positions remain at 50 for pre-1999 seasons.

Universal Attributes

Every player receives three universal ratings that apply across all positions.

Speed

Speed is derived from real NFL Combine 40-yard dash times when available (7,800+ players since 2000), with three fallback paths for players without combine data.

Path	Who	How It Works
Combine	2000+ players with 40 time	Linear conversion: 4.3s = 100, 4.5s = 85, 5.0s = 46. Apply age decay and a small performance nudge (capped at ±5) from speed-correlated production.
Proxy	Pre-2000 skill positions	Compute a z-score from the player's YPC (RB), Y/R (WR/TE), or rush yd/game (QB), then map it onto the known combine 40 distribution for that position. A pre-2000 RB with top-tier YPC gets the same speed as a combine-era RB who ran 2 standard deviations faster than average.
Fallback	Defenders, OL, K/P	Position default with weight adjustment (lighter = faster). For speed positions (CB, S, WR, RB), career All-Pro selections and Hall of Fame status provide a small boost (up to +6).

Age decay is applied on all paths: no penalty before age 27, then -1/year through 29, -2/year through 32, and -3/year beyond 32. A player who ran 4.4s (base 92) at the combine rates around 83 by age 32.

HISTORICAL OVERRIDES

For 85 notable players (1970–2025), a curated database provides per-season award data and speed floors based on scouting reports and known athleticism. This ensures players like Deion Sanders (4.27s at Florida State, pre-combine era) and Bo Jackson rate appropriately even when the algorithm has limited data. The override acts as a floor: if the algorithm produces a higher rating, it wins.

Stamina & Intelligence

Attribute	How It's Calculated
Stamina	Position default + games played adjustment (more games = higher stamina)
Intelligence	Position default + EPA per game bonus for QBs; other positions use baseline

Overall Rating

The overall rating is a weighted average of position skills and universal attributes. Each position has its own weight table emphasizing the most important skills.

For example, QB accuracy and intelligence are weighted heavily, while a RB's power and elusiveness carry more weight than receiving. Defensive linemen use separate weight tables for EDGE (pass-rush emphasis) and IDL (run-stuffing emphasis). The weights ensure that a player's overall number reflects what matters most for their role on the field.

XvO Factor

Beyond stat-based ratings, we measure each player's on-field impact using Regularized Adjusted Plus-Minus (RAPM) — a ridge regression technique adapted from basketball analytics — then normalize it to a 1–99 scale.

How It Works

For every scrimmage play from 2020–2024, we know all 22 players on the field and the play's Expected Points Added (EPA). A ridge regression estimates each player-season's marginal EPA contribution per snap, controlling for every other player on the field simultaneously.

The raw EPA/snap values are converted to XvO Factor using z-scores within each position group per season: 50 + (z × multiplier), clamped 1–99. Crucially, the multiplier varies by position — see Reliability-Scaled Spread below.

Detail	Value
Plays analyzed	~178,000 scrimmage plays (2020–2024)
Player-seasons	~7,000 (offense + defense, ≥100 snaps)
Method	Ridge regression with 5-fold cross-validation
Models	Separate offense (EPA added) and defense (EPA prevented)
Scale	1–99, spread scaled by position reliability
ELITE badge	One of several paths — see The ELITE Badge below

Reliability-Scaled Spread

Not all positions are equally measurable. RAPM reliably differentiates quarterbacks — the player who touches the ball every snap — but struggles to separate a running back's contribution from his offensive line's. To keep XvO Factor honest, we scale each position's spread by how much signal the model actually carries.

We measure this with split-half reliability: split all plays into odd-week and even-week halves, run the regression independently on each, and correlate the player estimates. High correlation means the model is measuring something real. Low correlation means the differences are mostly noise.

Position	Split-Half r	Multiplier	Typical XF Range
QB	0.56	12.4	25–90+
OL	0.48	10.5	30–85
WR	0.42	9.2	30–83
RB	0.36	7.8	35–77
TE	0.32	7.0	36–75
DL	0.21	4.5	40–69
DB	0.19	4.1	42–63
LB	0.17	3.7	43–63

The formula is multiplier = 22 × reliability. A QB with reliability 0.56 gets a multiplier of 12.4, spreading scores across a wide range. A linebacker with reliability 0.17 gets a multiplier of 3.7, honestly compressing scores around 50. This means a QB XvO Factor of 85 and a LB XvO Factor of 60 both represent approximately the same confidence: “among the best we can measure at this position.”

WHY OL STANDS OUT

Offensive linemen have no individual box score stats, so their traditional ratings stay near 50. XvO Factor is the only metric in our system that differentiates offensive linemen — and it turns out to be our second most reliable signal (r = 0.48), ahead of wide receivers. When an OL shows an elite XvO Factor, the data is telling you something meaningful about that player's impact on the offense.

Limitations

Constraint	Details
Coverage	2020–2024 only (NFL participation data starts 2016, reliable from 2020)
Pre-2020	XvO Factor column shows “–” for seasons without participation data
Defense	Defensive XF clusters tightly around 50 because individual defensive impact is hard to isolate from 11v11 data. The ELITE badge rarely appears for defenders — this is the model being honest, not a bug.
System effects	RAPM isolates individual contribution better than raw stats but cannot fully separate a player from their scheme, coaching, and teammates

The ELITE Badge

The ELITE badge identifies players who were among the very best at their position in a given season. It draws from three independent sources, layered by era:

Source	Era	Criteria
XvO Factor	2020–2024	Top 10% of position group AND XvO Factor ≥ 65
AP All-Pro	1970–2025	1st or 2nd Team All-Pro (Associated Press)
Major Awards	1970–2025	MVP, OPOY, DPOY, Offensive or Defensive Rookie of the Year

A player only needs to qualify through one of these paths to earn the badge. For 2020+ seasons, all three sources overlap and reinforce each other. For earlier eras, the badge relies on All-Pro selections and major awards.

WHY ALL-PRO?

We use Associated Press All-Pro selections rather than Pro Bowl appearances. The Pro Bowl was a meaningful honor through the mid-1990s, but fan voting (introduced in 1995) and opt-outs have eroded its reliability. AP All-Pro selections are voted on by a panel of sportswriters who watch every game, making them a more consistent standard across eras. All-Pro data is sourced from Wikipedia's comprehensive season-by-season records (1,155 players, 2,390 player-seasons from 1970 to 2025).

Data & Limitations

All stats come from nflverse (nflfastR/nflreadpy), licensed under CC-BY 4.0. The dataset covers 1970–2025.

Constraint	Details
Minimum games	Players need 8+ games to establish reliable baselines
Volume thresholds	QB: 150 attempts, RB: 50 carries, WR/TE: 30/15 targets (falls back to 20/10 receptions pre-1999), K: 10 FG attempts
OL & Punters	No individual stats available — always rated 50
Pre-1999 seasons	Advanced metrics (EPA, CPOE, YAC, target share) don't exist — proxy stats derived from counting data are used instead (see "Pre-1999 Proxy" column above)
Defense pre-1999	No defensive stats available before 1999 — all defensive skills default to 50
Game count	16 games/team through 2020, 17 games/team from 2021 onward

OPEN SOURCE

The rating pipeline code and all data transformations are available in the project repository. Every rating can be traced back to the underlying stats.

Team Ratings

Team ratings combine individual player evaluations with actual game results to produce four distinct measures per team-season.

Talent Rating (1–99)

A weighted composite of the 8 unit ratings (Pass Game, OL Pass, Run Game, OL Run, Pass Rush, Run Defense, Secondary, Special Teams). The raw weighted average is z-scored and mapped to 1–99. Weights shift across four strategic eras to reflect how the game was actually played:

Unit	Smashmouth (70–78)	Dynasty D (79–89)	West Coast (90–04)	Modern Pass (05–25)
Pass Game	10%	14%	18%	24%
OL Pass	10%	10%	14%	16%
Run Game	18%	14%	12%	8%
OL Run	12%	12%	10%	8%
Pass Rush	12%	14%	14%	18%
Run Defense	16%	14%	12%	6%
Secondary	12%	14%	14%	16%
Special Teams	10%	8%	6%	4%

ERA BOUNDARIES

1978 belongs to the Smashmouth era because rosters were still built for the pre-Mel Blount rules. The 1979 season is when teams adapted to the new passing game. The 2005 boundary reflects the Ty Law rule, expanded roughing the passer, and defenseless receiver protections that structurally favored passing offenses.

TWO-POOL NORMALIZATION

Pre-1999 teams have only 4 of 8 units that vary (Pass Game, Run Game, Secondary, and Special Teams); the other 4 (OL Pass, OL Run, Pass Rush, Run Defense) are fixed at 50 due to data limitations. To prevent this compressed range from producing artificially narrow ratings, pre-1999 and 1999+ teams are normalized in separate pools. Each pool is independently z-scored with the same multiplier (15), so an 80 Talent in 1985 represents the same relative standing among its contemporaries as an 80 in 2020.

Results Rating (1–99)

Derived from actual game outcomes, z-scored within each season. For 1999–2025, game data comes from nflverse. For 1978–1998, historical schedule data (scores and playoff results) is sourced from the devstopfix/nfl_results dataset (Public Domain). Pre-1978 seasons have no schedule data available.

Where play-by-play data is not available (1978–1998), the EPA per play component naturally drops out (all teams z-score to 0), so Results depends on point differential, win percentage, and playoff depth.

Input	Weight	Description
Point differential / game	30%	Best single predictor of team quality
Win percentage	20%	Regular season only; ties count as 0.5 wins
EPA per play	25%	Combined offensive + defensive expected points added, per play
Playoff multiplier	25%	Bonus or penalty based on postseason depth (see below)

Playoff multiplier values: Missed playoffs (−0.5), Wild Card loss (0), Divisional loss (+0.3), Conf. Championship loss (+0.6), Super Bowl loss (+0.4), Super Bowl win (+2.5). The large gap between winning and losing the Super Bowl ensures that regular season dominance alone cannot produce the highest ratings.

Clutch Factor

Clutch = Results − Talent. Measures whether game outcomes exceeded roster quality. Displayed as a signed number with a label:

Range	Label
+5 or higher	Overachiever
−4 to +4	As Expected
−5 to −14	Underachiever
−15 or lower	Flameout

Playoff Fate

A separate measure of whether a team's postseason matched the expectations set by their regular season. A 14-2 team losing in the conference championship is a different kind of disappointment than a 9-7 team losing in the same round — Playoff Fate captures that distinction.

Teams are classified into a record tier based on their win-percentage rank within their season, then matched against their playoff result to produce a label. By ranking teams relative to their peers each year rather than using fixed win totals, this adapts naturally to different schedule lengths and competitive landscapes — a 12-4 record in a 14-game era is treated as elite when it leads the league.

Record Tier	Missed	WC Loss	DIV Loss	CON Loss	SB Loss	SB Win
Elite (top 12.5%)	Collapsed	Collapsed	Collapsed	Stunned	Heartbreak	Dominant
Strong (top 25%)	Collapsed	Upset	Upset	Contender	Heartbreak	Dark Horse
Good (top 50%)	Snubbed	Expected	Solid Run	Overachiever	Heartbreak	Cinderella
Average (top 81%)	Expected	Expected	Surprise	Surprise	Stunned	Cinderella
Poor (bottom 19%)	Expected	Surprise / Cinderella

REAL EXAMPLES

1979 Steelers (12-4, best record in 28-team league) = Dominant. 2011 Giants (9-7, ranked 10th of 32) = Cinderella. 2007 Patriots (16-0, Super Bowl loss) = Heartbreak. 2005 Steelers (11-5, 6th seed Super Bowl win) = Dark Horse.

Overall Rating (1–99)

Overall = (Talent × 0.45) + (Results × 0.50) + clamp(Clutch × 0.10, −5, +5)

The Clutch modifier is clamped at ±5 points so extreme over- or underperformance nudges the composite without dominating it.

Pre-1999 adjustments: For 1978–1998 seasons, Talent is measured from fewer differentiating units (4 of 8), making it less reliable. The Overall formula shifts weight toward Results: T × 0.30 + R × 0.65 + clutch. This ensures that great teams with unmeasured defensive talent (like the 1985 Bears) are properly rated through their dominant game results.

Pre-1978 (no schedule data): Overall regresses Talent toward league average: Talent × 0.70 + 50 × 0.30. This caps talent-only teams around 79, reflecting the inherent uncertainty of roster ratings without any game outcomes to validate them.

Draft Value Analysis

The Draft Value page answers a specific question: given a draft position and a player position, what does historical data say about the value of that pick? It analyzes every draft pick in rounds 1–3 from 2000–2023 (~2,300 players) across 11 positional groups.

First-Contract Production

For each drafted player, we sum their per-season XvO player ratings (1–99) over the rookie contract window:

Draft Round	Contract Window	Rationale
Round 1	5 years	Includes 5th-year option
Rounds 2–3	4 years	Standard rookie deal

An elite pick might total 350+ over the contract window; a bust might total under 150. Players with fewer seasons get their actual total — no imputation. Busts, injuries, and early exits are real outcomes reflected in the data.

Contract Cost

Two eras require different cost models, both normalized to a 0–100 scale:

Era	Cost Source	Details
CBA Era (2011–2024)	Rookie wage scale	Approximate guaranteed money by pick number, from public NFL data
Pre-CBA (2000–2010)	Jimmy Johnson chart	Traditional draft value chart as cost proxy; actual contracts were negotiated and wildly variable

Surplus Value & Grading

Surplus = Production (normalized 0–100) − Cost (normalized 0–100)

Positive surplus means the player outproduced their draft slot cost; negative means they underperformed relative to the investment. The surplus is mapped to a letter grade:

Grade	Surplus Range
A+ to A−	+35 and above
B+ to B−	+5 to +34
C+ to C−	−25 to +4
D+ to D−	−55 to −26
F	Below −55

Outcome Classification

Each pick is classified into one of three outcomes based on first-contract production:

Outcome	Criteria
Elite	Production total ≥ 320 OR any All-Pro selection during contract
Starter	Production total 150–319
Bust	Production total below 150

Elite Impact Multiplier

Measures how much having a top-rated player at a position correlates with team win improvement. For each position, we compare the average win-percentage residual (actual wins minus expected wins from talent rating) for teams with an elite player at that position versus teams without. Granular positions are resolved from the database using depth chart data (e.g., DE vs DT, T vs G) rather than the generic DL/OL categories in roster files, giving each positional group its own distinct signal.

KEY INSIGHT

QB elite impact (+39.6%) dwarfs every other position. TE (+13.2%), CB (+10.6%), RB (+10.3%), and S (+10.1%) cluster in the mid-range. WR (+8.9%), LB (+7.0%), EDGE (+5.5%), and IDL (+2.3%) follow. Edge rushers and interior linemen are rated against separate baseline pools with position-specific overall weights — EDGE prioritizes pass rush, IDL prioritizes run stuffing. Tackle (T) elite impact cannot be measured because OL individual ratings lack variance. The elite impact is shown in the Pick Evaluator when a specific position is selected.

Positional Groups

11 groups, granular where sample size supports it, merged where it doesn't:

Group	Positions Merged
QB, RB, WR, TE	Standalone
T	T, OT
G/C	G, C, OL, OG
EDGE	DE, OLB, EDGE
IDL	DT, NT, DL
LB	LB, ILB, MLB
CB	CB
S	S, SS, FS, DB, SAF

Draft Tiers

Six tiers aligned to CBA-era rookie wage scale cost cliffs:

Tier	Picks	CBA-Era Guaranteed $
Elite	1–3	~$35–40M
Premium	4–10	~$20–35M
Mid-First	11–20	~$12–20M
Late First	21–32	~$8–12M
Early Second	33–48	~$5–8M
Mid-to-Late 2nd/3rd	49–100	~$2–5M

Data Scope & Limitations

Constraint	Details
Coverage	Rounds 1–3, picks 1–100, draft years 2000–2023 (~2,300 players). 2024–2025 classes excluded because picks with fewer than two seasons of data produce misleadingly low surplus values.
Match rate	99.2% of draft picks successfully matched to XvO player IDs
Incomplete contracts	Recent picks (2020–2023) have incomplete contract windows; production reflects only seasons played so far. Incomplete picks are included in aggregate statistics but excluded from best/worst pick lists.
OL ratings	Offensive linemen (T, G/C) are rated ~50 across all skills due to no individual stats — their draft value analysis relies primarily on surplus calculations. Tackle (T) elite impact cannot be measured; G/C barely registers (+0.7%).
Rounds 4–7	Excluded — too noisy and near-minimum contracts make surplus analysis less meaningful