|  |     Defense in Scoresheet
 Greg Hardy -- The Annie Savoy Memorial
    American League
 
 Updated 17 Feb 99  
 Baseball HQ's recent experiment in Scoresheet
    strategies got me thinking, once again, about the most important factors in building a
    winning Scoresheet baseball team. While we can debate draft strategies at length, I
    decided to take a look at stats after the fact. I wanted to know which stats were
    most important in determining whether a Scoresheet team was successful or
    unsuccessful.   I looked at 20 seasons of Scoresheet results, from
    1990 to 1998. I had on hand nine seasons in which I have participated, and I took the rest
    from leagues that posted their results on the Internet. Thanks to all the webguys who took
    the time to make their results available -- the NLTAL, NL114, NL No Glory, NL Over the
    Wire, NL Mad Chatters, NL113, AL Sloane, AIL 1998, AL NAIL 98, 1998 Big Hurt, and
    AL-Worrall. Many of the leagues were one-year leagues, but there were some perpetual ones
    as well. They ranged in size from 8 to 12 teams. All told, I looked at 8 NLs and 12 ALs.
      I went through the final standings and
    statistics for each season. I picked out the top three and bottom three teams in terms of
    won-loss records and looked over specific stats for each team in relation to the rest of
    the league.   The question I asked of each league is: How many
    of the three most successful teams finished in their league's top three in terms of
    statistic X, and how many of the losing teams finished in the bottom three of statistic X?
    The higher the correlation of each -- for example, if the top three teams had the three
    best ERAs in the league, and the bottom three teams had the three worst ERAs in the league
    -- the more likely it would be that the statistic in question was important to building a
    successful Scoresheet team.   Or at least that's what I'm assuming; I'm no
    statistician. Not that this fact will stop me from making some broad generalizations about
    the numbers that crop up. But I wanted to see how these factors compared to each other
    in a relative sense and draw conclusions from there.   This is not an effort to decide which specific
    players should be drafted. Rather, it should show which statistics are most determinate in
    building a successful team. Using this information, one can then develop draft lists
    focused on which players will contribute positively to the key statistics that result in
    winning teams. This is like using real data to make your draft selections!   In terms of methodology, this study yielded
    more than 60 "winning" teams and more than 60 "losing" teams -- there
    were some seasons in which two teams tied for the third-best or third-worst record. Some
    of the "winning" records were not all that great; 84 wins were sometimes enough
    to qualify a team as one of the top three in its league. Also, some of the
    "losing" teams were not so bad; two of the four "losing" teams from
    the 1998 ASMAL season actually went 82-80.   The most successful team in the study went 112-50
    in a regular 162-game season. The least successful team went through a ghastly 39-123
    campaign.   Finally, I dealt in absolutes. Teams were ranked
    against their peers, and it did not matter to me if only a single percentage point
    separated Team Y's on-base percentage from that of Team Z. The top three were the top
    three, no matter how slim the margin. The same went for the bottom three.   So what happened?   Well, Scoresheet says that Run Differential
    is usually a good indicator of a team's success, and there are many studies that support
    that theory in real-life baseball, so I figured out the run differentials for the 20
    seasons. How often did the three winningest (is that a word?) teams enjoy the league's top
    three run differentials? 75 percent of the time. And how often did the worst three teams
    suffer through the worst run differentials? 80 percent of the time. Run Differential
    successfully predicted the top and bottom three teams a combined 78 percent of the time.
      Well, sure. We all know that to win, you
    have to outscore your opponent. But which individual Scoresheet stats most often
    determined the success or failure of a ballclub?   I broke the study down into three main
    areas: pitching, offense, and defense.   Under pitching, I grouped ERA, complete games
    (CG), shutouts (Sho), saves (Sv), and WHIP, or Walks plus Hits allowed per Inning Pitched.
    I was curious to see if these numbers would give us an idea as to whether starting
    pitching or relief pitching were more important. I don't think anything conclusive came
    out of this, but the info is there to absorb.   Under defense, I looked at Scoresheet's
    Outstanding Plays (OP), which is tied directly to range factors, and Errors (E).
    Scoresheet started including OP in 1992, so those numbers reflect results from 1992-1998.
      Under offense, I looked at Runs Scored (RS),
    on-base percentage (OBA), slugging percentage (Slg), OPS (OBA plus Slg), home runs (HR),
    and total stolen bases (SB). I realize that SB success rate may have been a better
    measure, but didn't want to take the time to hem and haw over that.   One more quick note on methodology: Since I
    was pulling three teams for each category from leagues ranging in size from 8 to 12 teams,
    I'm working under the assumption that results ranging from 25% (3 of 12) to 38% (3 of 8)
    indicate a correlation that is no better than random selection; that is, it's a stat that
    has no indicative value in terms of success or failure. I could be wrong on this, and if
    anyone can explain it to me without using the word "coefficient" more than once,
    please do. Regardless of whether this assumption is correct or not, we will still have a
    relative ranking of factors.   So here, in terms of percentage, is how
    often the best three teams posted the best stats in each category, and the worst three
    teams posted the worst stats in each category. I then combined the two numbers to come up
    with an overall percentage and ranked each stat according to the apparent influence it has
    on building a successful team, relative to the other stats in the study:  
          
    
      
        | Scoresheet Success Stats |  
        | Statistic Measured | Best 3 Pct | Worst 3 Pct | Combined Pct | Rank |  
        | Run Differential | 75 | 80 | 78 | 1 |  
        | Pitching |  
        | ERA | 60 | 68 | 64 | 3 |  
        | Complete Games | 44 | 50 | 47 | 11 |  
        | Shutouts | 55 | 52 | 53 | 9 |  
        | Saves | 56 | 57 | 56 | 8 |  
        | WHIP | 61 | 61 | 61 | 4T |  
        | Defense |  
        | Outstanding Plays | 38 | 47 | 43 | 12 |  
        | Errors | 33 | 43 | 38 | 14 |  
        | Offense |  
        | On-Base Pct | 57 | 63 | 60 | 6 |  
        | Slugging Pct | 54 | 62 | 58 | 7 |  
        | Home Runs | 48 | 54 | 51 | 10 |  
        | Total Stolen Bases | 38 | 43 | 41 | 13 |  
        | Runs Scored | 62 | 74 | 68 | 2 |  
        | OPS | 57 | 64 | 61 | 4T |  Let's put those stats in order of relative
    importance, or how they rank against each other: 
 
          
    
      
        | Scoresheet Success Stats |  
        | Statistic Measured | Best 3 Pct | Worst 3 Pct | Combined Pct | Rank |  
        | Run Differential | 75 | 80 | 78 | 1 |  
        | Runs Scored | 62 | 74 | 68 | 2 |  
        | ERA | 60 | 68 | 64 | 3 |  
        | OPS | 57 | 64 | 61 | 4T |  
        | WHIP | 61 | 61 | 61 | 4T |  
        | On-Base Pct | 57 | 63 | 60 | 6 |  
        | Slugging Pct | 54 | 62 | 58 | 7 |  
        | Saves | 56 | 57 | 56 | 8 |  
        | Shutouts | 55 | 52 | 53 | 9 |  
        | Home Runs | 48 | 54 | 51 | 10 |  
        | Complete Games | 44 | 50 | 47 | 11 |  
        | Outstanding Plays | 38 | 47 | 43 | 12 |  
        | Total Stolen Bases | 38 | 43 | 41 | 13 |  
        | Errors | 33 | 43 | 38 | 14 |  ConclusionsAssuming my methodology is legitimate, one can
    draw some conclusions from this exercise. 
 
      
        Of the statistics measured, Run Differential is
        indeed the best predictor of a team's success or lack thereof. Not surprising. However,
        this is a stat that grows out of other stats, and it's not like you can draft a guy who
        provides "good run differential." 
        In my first survey of just 10 leagues, ERA was the
        most accurate determinant of a team's success or lack thereof in terms of individual
        stats. However, in this larger sample, Runs Scored is now ranked higher. Note that there
        is a solid balance of offensive and pitching stats at the top of the chart. This segues
        directly into the discussion on how much more difficult it is to predict how pitchers will
        perform than how hitters will perform. Of course, I picked Frank Thomas -- a consistently
        great hitter -- with the third overall pick in 1998 to form the cornerstone of my AL
        continuing league team, and he promptly went into the tank, relatively speaking. I'll stop
        whining now. 
        I thought Saves would rank much higher on the list,
        since more wins would mean more save opportunities and therefore more saves. But this stat
        finished in the middle of the pack. 
        The numbers for the worst three teams were higher
        than those of the best three teams in every category except Shutouts. I guess this means
        that there was more competition and/or balance at the top of the standings than at the
        bottom, where the three worst teams were clearly inferior to their league counterparts,
        but I welcome any thoughts on the mattter. 
        The two defensive statistics, Outstanding Plays
        (Range) and Errors, ranked 12th and 14th out of 14 stats measured. If the methodology is
        correct, then one could reach the conclusion that fielding a good or bad defensive team
        seems to have little effect on one's chances of putting together a winning season. As I
        understand it, Baseball HQ's 1998 Scoresheet Exhibition also underscored the apparent lack
        of effect of Scoresheet defensive range numbers. Scoresheet has responded by prohibiting
        some out-of-position moves and plans to make their range factors more drastic in both
        directions, giving good defensive players even greater range and poor players even less.
        It will be interesting to see if these moves have any effect. I beat the defensive numbers
        drum even more below. 
        Total Stolen Bases came in 13th on the list, just a
        couple of points above randomville. So, um, if defense and stolen bases are the least
        indicative of success, do Brian Hunter, Rey Ordonez, Deivi Cruz, and their ilk have any
        value at all in Scoresheet? I'll continue this discussion on defense below.  In view of the fact that Run Differential is
    clearly the best indicator of success, I decided to take a look at which stats have the
    greatest effect on Runs Scored and Runs Allowed. I had some time on my hands one weekend
    -- a rarity -- so I went through the same 20 Scoresheet leagues and used the same
    methodology. Instead of picking out the three teams with the most wins and the three teams
    with the most losses in each league, I looked at the three teams that scored the most
    runs, the three teams that scored the fewest runs, the three teams that gave up the fewest
    runs, and the three teams that gave up the most runs.   In terms of Runs Scored, I checked out
    correlations with HR, BA, Slg, OBA, OPS, BB, strikeouts (K), Sac Hits, GIDP, SB, and LOB.
    I looked at the positive and negative sides of each stat. For example, a team that scored
    a lot of runs would be expected to be in the top three in HR, BA, Slg, OBA, OPS, BB, Sac
    Hits, and SB, and should be in the bottom three in terms of K, GIDP, and LOB. On the other
    hand, the three teams per league that scored the fewest runs should be expected to score
    in the bottom three in HR, BA, Slg, OBA, OPS, BB, Sac hits, and SB, but would be in the
    top three in terms of Ks, GIDP, and LOB.   I knew ahead of time that some of these
    stats were worthless indicators, but I ran all of them anyway to get some kind of a sanity
    check on the numbers that should mean something. Happily, many of my assumptions turned
    out to be correct.   Here's how the numbers crunched, in order of
    highest correlation with Runs Scored: 
 
          
    
      
        | Runs Scored |  
        | Statistic Measured | Best 3 Pct | Worst 3 Pct | Combined Pct | Rank |  
        | OPS | 75 | 79 | 77 | 1 |  
        | OBA | 69 | 76(1) | 73 | 2 |  
        | Slg | 68 | 70 | 69 | 3 |  
        | BA | 70 | 66 | 68 | 4 |  
        | HR | 54 | 61 | 57 | 5 |  
        | BB | 45 | 62(1) | 53 | 6 |  
        | Total SB | 35 | 48 | 41(3) | 7 |  
        | GIDP | 37 | 31 | 34(2) | 8 |  
        | Sacrifice Hits | 30 | 36 | 33(2,3) | 9 |  
        | Strikeouts | 32 | 27 | 29(2) | 10 |  
        | Left on Base | 15 | 15 | 15(4) | 11 |  
      
        There seems to be a huge difference between the
        "best" and "worst" numbers for OBA and BB. The inability to get on
        base through hits or walks seems to be a real killer. 
        GIDP, Ks, and Sacs seem to fall into the category
        of statistically insignificant. Not surprising for the first two, but since a sacrifice is
        designed specifically to produce a run, one would think that Sac Hits would score higher
        than 33%. See note 3 for further discussion on this. 
        Curious about NL vs. AL statistical differences? I
        was, so I separated the 8 NLs from the 12 ALs and then took a look at Stolen Bases and Sac
        Hits. Here's what came up:
        
 
          
            | NL vs. AL |  
            | NL Stolen Bases | AL Stolen Bases |  
            | Best 3 Pct | Worst 3 Pct | Best 3 Pct | Worst 3 Pct |  
            | 41 | 54 | 30 | 42 |  
            | NL Sacrifice Hits | AL Sacrifice Hits |  
            | 41 | 50 | 21 | 24 |  It seems that the ability -- or lack thereof -- to
        steal bases and lay down sacrifices has some effect on Runs Scored in the NL, or at least
        a much greater effect than in the AL. Note also that the AL sacrifice numbers fall below
        the random 25-38%. I take this as a counter-indicator; the use of the sacrifice in the AL
        may actually lead to fewer Runs Scored rather than to more. Food for thought, certainly.
        
        Speaking of counter-indicators, the LOB numbers
        jump out too. For the sake of the study I assumed that high LOB numbers were a bad thing,
        a failure to score runs. In reality, however, higher-scoring teams seem to leave more men
        on base than their counterparts, a by-product of their higher OBA, I guess. But wait, there's more!   The other half of getting a good Run Differential
    number is preventing runs from scoring. So I looked at the three teams from each league
    that gave up the fewest runs and the three teams that gave up the most runs. How many of
    the three most successful pitching staffs and defenses finished in their league's top
    three in terms of statistic X, and how many of the teams finished in the bottom three of
    statistic X? Which stats had the highest correlation with success or failure at preventing
    runs?   On the pitching side, I looked at ERA, WHIP, CG,
    Shutouts, Hits allowed, BA against, BB allowed, and Ks. Theoretically, teams that allow
    fewer runs should have lower ERAs, WHIPs, Hits Allowed, BA Against, and BB allowed, and
    high numbers of CG, Shutouts, and Ks. Teams that were in the bottom tier of runs allowed
    should have higher ERAs, WHIP, Hits Allowed, BA against, and BB allowed, and lower CGs,
    Shutouts, and Ks. You can see how often that happened below.   On the defensive side, I looked at Outstanding
    Plays, Errors Committed, and the Opponent Caught Stealing (OCS) numbers. I threw the
    latter in to get an idea of whether poor-hitting catchers with strong arms (Charles
    Johnson comes to mind) had any usefulness in Scoresheet. In real life a poor-hitting
    catcher can have other useful qualities -- calling a good game, handling the pitching
    staff, those intangible leadership merits -- but in Scoresheet, the numbers tell the whole
    story. Anyway, you can make your own judgment.   Here are the numbers in rank order: 
 
 
          
    
      
        | Runs Allowed |  
        | Statistic Measured | Best 3 Pct | Worst 3 Pct | Combined Pct | Rank |  
        | ERA | 90 | 92 | 91 | 1 |  
        | BA Against | 79 | 81 | 80 | 2 |  
        | Hits Allowed | 75 | 84 | 79 | 3 |  
        | WHIP | 80 | 76 | 78 | 4 |  
        | Shutouts | 58 | 58 | 58 | 5 |  
        | BB Allowed | 62 | 52 | 57 | 6 |  
        | Strikeouts | 58 | 50 | 54 | 7 |  
        | Complete Games | 48 | 56 | 52 | 8 |  
        | Outstanding Plays | 36 | 44 | 40 | 9 |  
        | Errors Committed | 33 | 39 | 36 | 10 |  
        | OCS | 27 | 20 | 23 | 11 |  A couple of points to make on this last chart.
    First, note the dramatic dropoff in correlation numbers after WHIP -- from 78 to 58%. All
    the highest-ranked numbers have to do with Hits Allowed, to a much greater extent than BB
    Allowed. So often we hear announcers chide a pitcher after a leadoff walk, and they point
    out that "a leadoff walk comes around to score X percent of the time" -- I can
    never remember the exact number. But I'm thinking -- and this study seems to support --
    that a leadoff single is just as bad, and a leadoff extra-base hit is even worse. I
    desperately avoid pitchers who walk more than an average number of batters, but I may have
    to reexamine that philosophy.   Finally, another note on Scoresheet defense. Once
    again, defensive statistics ended up at the bottom of the chart, waaay below even Ks and
    CG. So now we can see that range and errors seem to have little effect on building a
    winning Scoresheet team, and they seem to have little effect on preventing runs, at least
    in comparison to most other pitching/defensive statistics.   Is this a surprise? It probably shouldn't
    be. Scoresheet clearly states in its draft packet that "a difference of .10 in range
    is equal to .1 (a tenth) of a hit per 9 innings." Well, it takes a couple fairly
    rangy players to achieve that .10 range advantage, especially since most teams have some
    subpar range players at other positions.   Just for fun, I went through the 1999 AL
    Player List and picked out the rangiest guys at all eight field positions. If you somehow
    managed to draft all eight of these guys -- essentially including three CF types -- you
    would enjoy a range rating of +.56. If I'm doing the math right, that means this squad of
    defensive whizzes would save your pitching staff .56 hits per game, or about 91 hits per
    season (about two weeks of work for Tim Belcher). I assume that .56/game advantage would
    work out over the course of the season.   However, in my Scoresheet AL experience, a
    great fielding team may manage a range of +.25. That's still pretty high, but we can use
    it for the sake of comparison. This defense will save the pitching staff .25 hits per
    game, or about 41 hits per season. Is that a significant number? Is it worth carrying a
    couple of weak bats?   Well, teams usually give up between 1300 and
    1700 hits in a 162-game season, depending on how good their pitchers are. Saving 41 hits
    for a team of great pitchers -- which would give up 1300 hits in the season -- represents
    a 3.2% improvement. Saving 41 hits for a bad pitching staff -- the 1700-hit group --
    represents a 2.4% improvement. Saving 41 hits for an average staff -- 1500 hits allowed --
    is a 2.7% improvement. The improvement numbers for the top-ranked defensive team that
    saves 91 hits per season would be 7.0 (1300), 5.4 (1700), and 6.1 (1500) percent.   For the sake of comparison: A survey of the
    20 leagues I used to create this page indicates that the teams that gave up the fewest
    hits in each league beat their nearest competitor in Hits Allowed by an average of 50 hits
    per season. The average gap between the team that gave up the fewest hits and the team in
    the league that gave up the most hits was 267.   Let's look at a couple of positions. In a one-year
    league, if you have the choice of Mike Bordick (4.77) or Mike Caruso (4.72), who do you
    take? Caruso makes a lot of errors, but that doesn't seem to have a great effect on
    winning or losing or even preventing runs. Bordick's range advantage of .05 translates to
    him saving you 8 hits per year. Well, I'm pretty sure Caruso will make up those 8 hits
    with the stick, despite the fact that he starts hacking when he steps off the team bus.
      A more extreme example would be choosing
    between Jose Canseco (2.01) and Brian Hunter (2.21) to play a corner OF position. Hunter
    will theoretically get to 32 more balls than Canseco in the OF, but then you have to watch
    him bat. This is about the most extreme example I could think of without delving into
    playing guys out of position, and it boils down to 32 hits over the course of the season.
    Since Canseco will probably produce 75 more RUNS than Hunter over the same season, I think
    you gotta let the big guy wear a glove. And maybe a hard hat.   Oh, there is a more extreme example.
    Scoresheet points out that one position at which range makes a great difference is CF,
    which is 1.6 times more important than the other OF spots. I wonder if that means that
    playing Canseco in CF instead of Hunter would mean an extra 51 (32x1.6) hits falling in?
    That seems like a lot, despite the 75 extra runs produced. Plus your pitchers might get
    irritated.   Does defense matter? I think it can
    marginally improve your pitching. A great fielding team may even be able to cancel out the
    offensive "All-Star factor" that balloons real-life ERAs up by .25 runs or so.
    But even a simply good team defensive range -- +.25 -- is only going to reduce your hits
    allowed by 2 to 3 percent, and the correlations between Range and Wins/Losses and Range
    and Preventing Runs are quite low. Is it worth the accompanying loss of offense that these
    trade-offs often require?   In the past I've always drafted like defense
    did matter, but those differences of .03 or .04 that I've worried about in the past boil
    down to 5 or 6 hits prevented per season. It's not like Scoresheet has concealed this
    fact; it's all right there in the packet. In the future, I'll draft the better hitter and
    hope to lead the league in Runs Scored.   
 If for some reason you would like to comment on
    this study or the methodology, please send me a note at this address. I hope you
    found it at least somewhat interesting.   |