How Ratings Work
Every player rating in XvO Football is derived from real NFL stats using an era-adjusted z-score system. No manual overrides, no subjective rankings — just math applied to data.
The Rating Scale
50 is league average for that position in that season. A rating of 75 means "significantly above average" — roughly top 15% of starters. A rating of 90+ is elite, reserved for the best handful of players at their position that year.
The Formula
The z-score measures how many standard deviations a player is above or below the league average for their position group that season. The multiplier (22) controls how spread out ratings are — it's tuned so that roughly 2 standard deviations above average yields a 94.
Per-Game Normalization
All counting stats (yards, touchdowns, sacks, tackles, etc.) are converted to per-game rates before z-scoring. Both the league baselines and the player values are divided by games played, so the comparison is always rate-vs-rate.
This means a QB who plays 6 games due to injury is rated on his per-game production, not his season totals. A 6-game QB averaging 280 yards per game rates the same as a 17-game QB averaging 280 yards per game.
Position Breakdowns
Each position has its own set of skills, each mapped to one or more real stats. Here's what drives every rating.
Quarterback
| Skill | Stat(s) | Pre-1999 Proxy | Notes |
|---|---|---|---|
| Arm Strength | Air yards/g | Yards per attempt | Downfield passing distance per game |
| Accuracy | CPOE or EPA/g | Completion % or TD % | Takes the higher rating — protects efficient short passers |
| Decision Making | INTs/g + TDs/g + Yards/g | — | Three-way blend; INTs inverted (fewer = better) |
| Mobility | Rushing yards/g | — | Scrambling and designed runs per game |
| Poise | Sacks/g + Yards/g + EPA/g | Sacks/g + Yards/g | Only averages components with data — avoids padding with 50 |
Running Back
| Skill | Stat | Pre-1999 Proxy | Notes |
|---|---|---|---|
| Power | Rushing yards/g | — | Ground production per game |
| Elusiveness | Rushing first downs/g | Yards per carry | No broken tackles data; YPC captures elusiveness pre-1999 |
| Vision | Rushing EPA/g | Rushing TDs/g | Efficiency per game; TDs proxy for finding the end zone |
| Receiving | Receiving yards/g | — | Pass-catching contribution per game |
Wide Receiver
| Skill | Stat | Pre-1999 Proxy | Notes |
|---|---|---|---|
| Route Running | Target share | Receptions per game | How often the offense looks to this receiver |
| Hands | Receptions/g | — | Catches per game |
| Release | Receiving first downs/g | Receiving TDs/g | Ability to win at the line and convert |
| YAC | Yards after catch/g | Yards per reception | Production after the catch per game |
Tight End
| Skill | Stat | Pre-1999 Proxy | Notes |
|---|---|---|---|
| Blocking | No data | — | Always rated 50 — no individual blocking stats available |
| Receiving | Receiving yards/g | — | Pass-catching production per game |
| YAC | Yards after catch/g | Yards per reception | Production after the catch per game |
Offensive Line
| Skill | Stat | Notes |
|---|---|---|
| Pass Blocking | No data | Always rated 50 |
| Run Blocking | No data | Always rated 50 |
| Pull/Screen | No data | Always rated 50 |
Defensive Line — EDGE (DE) & IDL (DT, NT)
Edge rushers and interior defensive linemen use the same four skills but are rated against separate baseline pools and use different overall weights. DEs are compared only to other DEs, and DTs/NTs only to other DTs/NTs. This prevents the higher sack production of edge rushers from diluting interior linemen's baselines (and vice versa).
| Skill | Stat | Notes |
|---|---|---|
| Pass Rush | Sacks/g | Quarterback takedowns per game |
| Run Stuffing | Total tackles/g | Solo + assisted tackles per game |
| Power | Tackles for loss/g | Plays behind the line per game |
| Quickness | QB hits/g | Pressures per game |
EDGE overall weights: Pass Rush (1.0), Quickness (0.8), Power (0.6), Speed (0.4), Run Stuffing (0.4). Edge rushers live and die by their ability to get to the quarterback.
IDL overall weights: Run Stuffing (1.0), Power (0.9), Pass Rush (0.6), Quickness (0.5), Speed (0.2). Interior linemen anchor the run defense first; pass-rush production is a bonus.
OLBs (including hybrid pass rushers like T.J. Watt) remain in the Linebacker group because they export with LB skill keys in the game engine.
Linebacker
| Skill | Stat | Notes |
|---|---|---|
| Tackling | Total tackles/g | Solo + assisted tackles per game |
| Coverage | Passes defended/g | Batted or broken-up passes per game |
| Blitz | Sacks/g | Rush production per game |
| Play Recognition | Tackles for loss/g | Reading and stopping plays behind the line |
Defensive Back
| Skill | Stat | Notes |
|---|---|---|
| Man Coverage | Passes defended/g | One-on-one coverage ability per game |
| Zone Awareness | (PD + INTs)/g | Combined coverage production per game |
| Ball Skills | Interceptions/g | Ability to come away with the ball |
| Tackling | Solo tackles/g | Run support and open-field tackling per game |
Kicker
| Skill | Stat | Notes |
|---|---|---|
| Leg Strength | Longest FG made | Maximum proven distance |
| Accuracy | FG percentage | Overall make rate |
Punter
| Skill | Stat | Notes |
|---|---|---|
| Leg Strength | No data | Always rated 50 |
| Placement | No data | Always rated 50 |
Pre-1999 Proxy Stats
Advanced metrics like EPA, CPOE, YAC, and target share weren't tracked before 1999. Rather than defaulting every affected skill to 50, the pipeline uses proxy stats — derived values from available counting data that approximate what the advanced stat measures.
Defensive stats, kicker stats, and punter stats don't exist before 1999 at all, so those positions remain at 50 for pre-1999 seasons.
Universal Attributes
Every player receives three universal ratings that apply across all positions.
Speed
Speed is derived from real NFL Combine 40-yard dash times when available (7,800+ players since 2000), with three fallback paths for players without combine data.
| Path | Who | How It Works |
|---|---|---|
| Combine | 2000+ players with 40 time | Linear conversion: 4.3s = 100, 4.5s = 85, 5.0s = 46. Apply age decay and a small performance nudge (capped at ±5) from speed-correlated production. |
| Proxy | Pre-2000 skill positions | Compute a z-score from the player's YPC (RB), Y/R (WR/TE), or rush yd/game (QB), then map it onto the known combine 40 distribution for that position. A pre-2000 RB with top-tier YPC gets the same speed as a combine-era RB who ran 2 standard deviations faster than average. |
| Fallback | Defenders, OL, K/P | Position default with weight adjustment (lighter = faster). For speed positions (CB, S, WR, RB), career All-Pro selections and Hall of Fame status provide a small boost (up to +6). |
Age decay is applied on all paths: no penalty before age 27, then -1/year through 29, -2/year through 32, and -3/year beyond 32. A player who ran 4.4s (base 92) at the combine rates around 83 by age 32.
Stamina & Intelligence
| Attribute | How It's Calculated |
|---|---|
| Stamina | Position default + games played adjustment (more games = higher stamina) |
| Intelligence | Position default + EPA per game bonus for QBs; other positions use baseline |
Overall Rating
The overall rating is a weighted average of position skills and universal attributes. Each position has its own weight table emphasizing the most important skills.
For example, QB accuracy and intelligence are weighted heavily, while a RB's power and elusiveness carry more weight than receiving. Defensive linemen use separate weight tables for EDGE (pass-rush emphasis) and IDL (run-stuffing emphasis). The weights ensure that a player's overall number reflects what matters most for their role on the field.
XvO Factor
Beyond stat-based ratings, we measure each player's on-field impact using Regularized Adjusted Plus-Minus (RAPM) — a ridge regression technique adapted from basketball analytics — then normalize it to a 1–99 scale.
How It Works
For every scrimmage play from 2020–2024, we know all 22 players on the field and the play's Expected Points Added (EPA). A ridge regression estimates each player-season's marginal EPA contribution per snap, controlling for every other player on the field simultaneously.
The raw EPA/snap values are converted to XvO Factor using z-scores within each position group per season: 50 + (z × multiplier), clamped 1–99. Crucially, the multiplier varies by position — see Reliability-Scaled Spread below.
| Detail | Value |
|---|---|
| Plays analyzed | ~178,000 scrimmage plays (2020–2024) |
| Player-seasons | ~7,000 (offense + defense, ≥100 snaps) |
| Method | Ridge regression with 5-fold cross-validation |
| Models | Separate offense (EPA added) and defense (EPA prevented) |
| Scale | 1–99, spread scaled by position reliability |
| ELITE badge | One of several paths — see The ELITE Badge below |
Reliability-Scaled Spread
Not all positions are equally measurable. RAPM reliably differentiates quarterbacks — the player who touches the ball every snap — but struggles to separate a running back's contribution from his offensive line's. To keep XvO Factor honest, we scale each position's spread by how much signal the model actually carries.
We measure this with split-half reliability: split all plays into odd-week and even-week halves, run the regression independently on each, and correlate the player estimates. High correlation means the model is measuring something real. Low correlation means the differences are mostly noise.
| Position | Split-Half r | Multiplier | Typical XF Range |
|---|---|---|---|
| QB | 0.56 | 12.4 | 25–90+ |
| OL | 0.48 | 10.5 | 30–85 |
| WR | 0.42 | 9.2 | 30–83 |
| RB | 0.36 | 7.8 | 35–77 |
| TE | 0.32 | 7.0 | 36–75 |
| DL | 0.21 | 4.5 | 40–69 |
| DB | 0.19 | 4.1 | 42–63 |
| LB | 0.17 | 3.7 | 43–63 |
The formula is multiplier = 22 × reliability. A QB with reliability 0.56 gets a multiplier of 12.4, spreading scores across a wide range. A linebacker with reliability 0.17 gets a multiplier of 3.7, honestly compressing scores around 50. This means a QB XvO Factor of 85 and a LB XvO Factor of 60 both represent approximately the same confidence: “among the best we can measure at this position.”
Limitations
| Constraint | Details |
|---|---|
| Coverage | 2020–2024 only (NFL participation data starts 2016, reliable from 2020) |
| Pre-2020 | XvO Factor column shows “–” for seasons without participation data |
| Defense | Defensive XF clusters tightly around 50 because individual defensive impact is hard to isolate from 11v11 data. The ELITE badge rarely appears for defenders — this is the model being honest, not a bug. |
| System effects | RAPM isolates individual contribution better than raw stats but cannot fully separate a player from their scheme, coaching, and teammates |
The ELITE Badge
The ELITE badge identifies players who were among the very best at their position in a given season. It draws from three independent sources, layered by era:
| Source | Era | Criteria |
|---|---|---|
| XvO Factor | 2020–2024 | Top 10% of position group AND XvO Factor ≥ 65 |
| AP All-Pro | 1970–2025 | 1st or 2nd Team All-Pro (Associated Press) |
| Major Awards | 1970–2025 | MVP, OPOY, DPOY, Offensive or Defensive Rookie of the Year |
A player only needs to qualify through one of these paths to earn the badge. For 2020+ seasons, all three sources overlap and reinforce each other. For earlier eras, the badge relies on All-Pro selections and major awards.
Data & Limitations
All stats come from nflverse (nflfastR/nflreadpy), licensed under CC-BY 4.0. The dataset covers 1970–2025.
| Constraint | Details |
|---|---|
| Minimum games | Players need 8+ games to establish reliable baselines |
| Volume thresholds | QB: 150 attempts, RB: 50 carries, WR/TE: 30/15 targets (falls back to 20/10 receptions pre-1999), K: 10 FG attempts |
| OL & Punters | No individual stats available — always rated 50 |
| Pre-1999 seasons | Advanced metrics (EPA, CPOE, YAC, target share) don't exist — proxy stats derived from counting data are used instead (see "Pre-1999 Proxy" column above) |
| Defense pre-1999 | No defensive stats available before 1999 — all defensive skills default to 50 |
| Game count | 16 games/team through 2020, 17 games/team from 2021 onward |
Team Ratings
Team ratings combine individual player evaluations with actual game results to produce four distinct measures per team-season.
Talent Rating (1–99)
A weighted composite of the 8 unit ratings (Pass Game, OL Pass, Run Game, OL Run, Pass Rush, Run Defense, Secondary, Special Teams). The raw weighted average is z-scored and mapped to 1–99. Weights shift across four strategic eras to reflect how the game was actually played:
| Unit | Smashmouth (70–78) | Dynasty D (79–89) | West Coast (90–04) | Modern Pass (05–25) |
|---|---|---|---|---|
| Pass Game | 10% | 14% | 18% | 24% |
| OL Pass | 10% | 10% | 14% | 16% |
| Run Game | 18% | 14% | 12% | 8% |
| OL Run | 12% | 12% | 10% | 8% |
| Pass Rush | 12% | 14% | 14% | 18% |
| Run Defense | 16% | 14% | 12% | 6% |
| Secondary | 12% | 14% | 14% | 16% |
| Special Teams | 10% | 8% | 6% | 4% |
Results Rating (1–99)
Derived from actual game outcomes, z-scored within each season. For 1999–2025, game data comes from nflverse. For 1978–1998, historical schedule data (scores and playoff results) is sourced from the devstopfix/nfl_results dataset (Public Domain). Pre-1978 seasons have no schedule data available.
Where play-by-play data is not available (1978–1998), the EPA per play component naturally drops out (all teams z-score to 0), so Results depends on point differential, win percentage, and playoff depth.
| Input | Weight | Description |
|---|---|---|
| Point differential / game | 30% | Best single predictor of team quality |
| Win percentage | 20% | Regular season only; ties count as 0.5 wins |
| EPA per play | 25% | Combined offensive + defensive expected points added, per play |
| Playoff multiplier | 25% | Bonus or penalty based on postseason depth (see below) |
Playoff multiplier values: Missed playoffs (−0.5), Wild Card loss (0), Divisional loss (+0.3), Conf. Championship loss (+0.6), Super Bowl loss (+0.4), Super Bowl win (+2.5). The large gap between winning and losing the Super Bowl ensures that regular season dominance alone cannot produce the highest ratings.
Clutch Factor
Clutch = Results − Talent. Measures whether game outcomes exceeded roster quality. Displayed as a signed number with a label:
| Range | Label |
|---|---|
| +5 or higher | Overachiever |
| −4 to +4 | As Expected |
| −5 to −14 | Underachiever |
| −15 or lower | Flameout |
Playoff Fate
A separate measure of whether a team's postseason matched the expectations set by their regular season. A 14-2 team losing in the conference championship is a different kind of disappointment than a 9-7 team losing in the same round — Playoff Fate captures that distinction.
Teams are classified into a record tier based on their win-percentage rank within their season, then matched against their playoff result to produce a label. By ranking teams relative to their peers each year rather than using fixed win totals, this adapts naturally to different schedule lengths and competitive landscapes — a 12-4 record in a 14-game era is treated as elite when it leads the league.
| Record Tier | Missed | WC Loss | DIV Loss | CON Loss | SB Loss | SB Win |
|---|---|---|---|---|---|---|
| Elite (top 12.5%) | Collapsed | Collapsed | Collapsed | Stunned | Heartbreak | Dominant |
| Strong (top 25%) | Collapsed | Upset | Upset | Contender | Heartbreak | Dark Horse |
| Good (top 50%) | Snubbed | Expected | Solid Run | Overachiever | Heartbreak | Cinderella |
| Average (top 81%) | Expected | Expected | Surprise | Surprise | Stunned | Cinderella |
| Poor (bottom 19%) | Expected | Surprise / Cinderella | ||||
Overall Rating (1–99)
Overall = (Talent × 0.45) + (Results × 0.50) + clamp(Clutch × 0.10, −5, +5)The Clutch modifier is clamped at ±5 points so extreme over- or underperformance nudges the composite without dominating it.
Pre-1999 adjustments: For 1978–1998 seasons, Talent is measured from fewer differentiating units (4 of 8), making it less reliable. The Overall formula shifts weight toward Results: T × 0.30 + R × 0.65 + clutch. This ensures that great teams with unmeasured defensive talent (like the 1985 Bears) are properly rated through their dominant game results.
Pre-1978 (no schedule data): Overall regresses Talent toward league average: Talent × 0.70 + 50 × 0.30. This caps talent-only teams around 79, reflecting the inherent uncertainty of roster ratings without any game outcomes to validate them.
Draft Value Analysis
The Draft Value page answers a specific question: given a draft position and a player position, what does historical data say about the value of that pick? It analyzes every draft pick in rounds 1–3 from 2000–2023 (~2,300 players) across 11 positional groups.
First-Contract Production
For each drafted player, we sum their per-season XvO player ratings (1–99) over the rookie contract window:
| Draft Round | Contract Window | Rationale |
|---|---|---|
| Round 1 | 5 years | Includes 5th-year option |
| Rounds 2–3 | 4 years | Standard rookie deal |
An elite pick might total 350+ over the contract window; a bust might total under 150. Players with fewer seasons get their actual total — no imputation. Busts, injuries, and early exits are real outcomes reflected in the data.
Contract Cost
Two eras require different cost models, both normalized to a 0–100 scale:
| Era | Cost Source | Details |
|---|---|---|
| CBA Era (2011–2024) | Rookie wage scale | Approximate guaranteed money by pick number, from public NFL data |
| Pre-CBA (2000–2010) | Jimmy Johnson chart | Traditional draft value chart as cost proxy; actual contracts were negotiated and wildly variable |
Surplus Value & Grading
Surplus = Production (normalized 0–100) − Cost (normalized 0–100)Positive surplus means the player outproduced their draft slot cost; negative means they underperformed relative to the investment. The surplus is mapped to a letter grade:
| Grade | Surplus Range |
|---|---|
| A+ to A− | +35 and above |
| B+ to B− | +5 to +34 |
| C+ to C− | −25 to +4 |
| D+ to D− | −55 to −26 |
| F | Below −55 |
Outcome Classification
Each pick is classified into one of three outcomes based on first-contract production:
| Outcome | Criteria |
|---|---|
| Elite | Production total ≥ 320 OR any All-Pro selection during contract |
| Starter | Production total 150–319 |
| Bust | Production total below 150 |
Elite Impact Multiplier
Measures how much having a top-rated player at a position correlates with team win improvement. For each position, we compare the average win-percentage residual (actual wins minus expected wins from talent rating) for teams with an elite player at that position versus teams without. Granular positions are resolved from the database using depth chart data (e.g., DE vs DT, T vs G) rather than the generic DL/OL categories in roster files, giving each positional group its own distinct signal.
Positional Groups
11 groups, granular where sample size supports it, merged where it doesn't:
| Group | Positions Merged |
|---|---|
| QB, RB, WR, TE | Standalone |
| T | T, OT |
| G/C | G, C, OL, OG |
| EDGE | DE, OLB, EDGE |
| IDL | DT, NT, DL |
| LB | LB, ILB, MLB |
| CB | CB |
| S | S, SS, FS, DB, SAF |
Draft Tiers
Six tiers aligned to CBA-era rookie wage scale cost cliffs:
| Tier | Picks | CBA-Era Guaranteed $ |
|---|---|---|
| Elite | 1–3 | ~$35–40M |
| Premium | 4–10 | ~$20–35M |
| Mid-First | 11–20 | ~$12–20M |
| Late First | 21–32 | ~$8–12M |
| Early Second | 33–48 | ~$5–8M |
| Mid-to-Late 2nd/3rd | 49–100 | ~$2–5M |
Data Scope & Limitations
| Constraint | Details |
|---|---|
| Coverage | Rounds 1–3, picks 1–100, draft years 2000–2023 (~2,300 players). 2024–2025 classes excluded because picks with fewer than two seasons of data produce misleadingly low surplus values. |
| Match rate | 99.2% of draft picks successfully matched to XvO player IDs |
| Incomplete contracts | Recent picks (2020–2023) have incomplete contract windows; production reflects only seasons played so far. Incomplete picks are included in aggregate statistics but excluded from best/worst pick lists. |
| OL ratings | Offensive linemen (T, G/C) are rated ~50 across all skills due to no individual stats — their draft value analysis relies primarily on surplus calculations. Tackle (T) elite impact cannot be measured; G/C barely registers (+0.7%). |
| Rounds 4–7 | Excluded — too noisy and near-minimum contracts make surplus analysis less meaningful |