Scoresheet Notes -  Defense

 

    
  Defense in Scoresheet
  Greg Hardy -- The Annie Savoy Memorial American League


Updated 17 Feb 99


Baseball HQ's recent experiment in Scoresheet strategies got me thinking, once again, about the most important factors in building a winning Scoresheet baseball team. While we can debate draft strategies at length, I decided to take a look at stats after the fact. I wanted to know which stats were most important in determining whether a Scoresheet team was successful or unsuccessful. 

I looked at 20 seasons of Scoresheet results, from 1990 to 1998. I had on hand nine seasons in which I have participated, and I took the rest from leagues that posted their results on the Internet. Thanks to all the webguys who took the time to make their results available -- the NLTAL, NL114, NL No Glory, NL Over the Wire, NL Mad Chatters, NL113, AL Sloane, AIL 1998, AL NAIL 98, 1998 Big Hurt, and AL-Worrall. Many of the leagues were one-year leagues, but there were some perpetual ones as well. They ranged in size from 8 to 12 teams. All told, I looked at 8 NLs and 12 ALs.

 I went through the final standings and statistics for each season. I picked out the top three and bottom three teams in terms of won-loss records and looked over specific stats for each team in relation to the rest of the league. 

The question I asked of each league is: How many of the three most successful teams finished in their league's top three in terms of statistic X, and how many of the losing teams finished in the bottom three of statistic X? The higher the correlation of each -- for example, if the top three teams had the three best ERAs in the league, and the bottom three teams had the three worst ERAs in the league -- the more likely it would be that the statistic in question was important to building a successful Scoresheet team.

 Or at least that's what I'm assuming; I'm no statistician. Not that this fact will stop me from making some broad generalizations about the numbers that crop up. But I wanted to see how these factors compared to each other in a relative sense and draw conclusions from there. 

This is not an effort to decide which specific players should be drafted. Rather, it should show which statistics are most determinate in building a successful team. Using this information, one can then develop draft lists focused on which players will contribute positively to the key statistics that result in winning teams. This is like using real data to make your draft selections!

 In terms of methodology, this study yielded more than 60 "winning" teams and more than 60 "losing" teams -- there were some seasons in which two teams tied for the third-best or third-worst record. Some of the "winning" records were not all that great; 84 wins were sometimes enough to qualify a team as one of the top three in its league. Also, some of the "losing" teams were not so bad; two of the four "losing" teams from the 1998 ASMAL season actually went 82-80. 

The most successful team in the study went 112-50 in a regular 162-game season. The least successful team went through a ghastly 39-123 campaign. 

Finally, I dealt in absolutes. Teams were ranked against their peers, and it did not matter to me if only a single percentage point separated Team Y's on-base percentage from that of Team Z. The top three were the top three, no matter how slim the margin. The same went for the bottom three.

 So what happened?

 Well, Scoresheet says that Run Differential is usually a good indicator of a team's success, and there are many studies that support that theory in real-life baseball, so I figured out the run differentials for the 20 seasons. How often did the three winningest (is that a word?) teams enjoy the league's top three run differentials? 75 percent of the time. And how often did the worst three teams suffer through the worst run differentials? 80 percent of the time. Run Differential successfully predicted the top and bottom three teams a combined 78 percent of the time.

 Well, sure. We all know that to win, you have to outscore your opponent. But which individual Scoresheet stats most often determined the success or failure of a ballclub?

 I broke the study down into three main areas: pitching, offense, and defense. 

Under pitching, I grouped ERA, complete games (CG), shutouts (Sho), saves (Sv), and WHIP, or Walks plus Hits allowed per Inning Pitched. I was curious to see if these numbers would give us an idea as to whether starting pitching or relief pitching were more important. I don't think anything conclusive came out of this, but the info is there to absorb.

 Under defense, I looked at Scoresheet's Outstanding Plays (OP), which is tied directly to range factors, and Errors (E). Scoresheet started including OP in 1992, so those numbers reflect results from 1992-1998.

 Under offense, I looked at Runs Scored (RS), on-base percentage (OBA), slugging percentage (Slg), OPS (OBA plus Slg), home runs (HR), and total stolen bases (SB). I realize that SB success rate may have been a better measure, but didn't want to take the time to hem and haw over that.

 One more quick note on methodology: Since I was pulling three teams for each category from leagues ranging in size from 8 to 12 teams, I'm working under the assumption that results ranging from 25% (3 of 12) to 38% (3 of 8) indicate a correlation that is no better than random selection; that is, it's a stat that has no indicative value in terms of success or failure. I could be wrong on this, and if anyone can explain it to me without using the word "coefficient" more than once, please do. Regardless of whether this assumption is correct or not, we will still have a relative ranking of factors.

 So here, in terms of percentage, is how often the best three teams posted the best stats in each category, and the worst three teams posted the worst stats in each category. I then combined the two numbers to come up with an overall percentage and ranked each stat according to the apparent influence it has on building a successful team, relative to the other stats in the study:

Scoresheet Success Stats

Statistic Measured

Best 3 Pct

Worst 3 Pct

Combined Pct

Rank

Run Differential

75

80

78

1

Pitching

ERA

60

68

64

3

Complete Games

44

50

47

11

Shutouts

55

52

53

9

Saves

56

57

56

8

WHIP

61

61

61

4T

Defense

Outstanding Plays

38

47

43

12

Errors

33

43

38

14

Offense

On-Base Pct

57

63

60

6

Slugging Pct

54

62

58

7

Home Runs

48

54

51

10

Total Stolen Bases

38

43

41

13

Runs Scored

62

74

68

2

OPS

57

64

61

4T

Let's put those stats in order of relative importance, or how they rank against each other:
 

Scoresheet Success Stats

Statistic Measured

Best 3 Pct

Worst 3 Pct

Combined Pct

Rank

Run Differential

75

80

78

1

Runs Scored

62

74

68

2

ERA

60

68

64

3

OPS

57

64

61

4T

WHIP

61

61

61

4T

On-Base Pct

57

63

60

6

Slugging Pct

54

62

58

7

Saves

56

57

56

8

Shutouts

55

52

53

9

Home Runs

48

54

51

10

Complete Games

44

50

47

11

Outstanding Plays

38

47

43

12

Total Stolen Bases

38

43

41

13

Errors

33

43

38

14

Conclusions

Assuming my methodology is legitimate, one can draw some conclusions from this exercise.
 

  • Of the statistics measured, Run Differential is indeed the best predictor of a team's success or lack thereof. Not surprising. However, this is a stat that grows out of other stats, and it's not like you can draft a guy who provides "good run differential." 

  • In my first survey of just 10 leagues, ERA was the most accurate determinant of a team's success or lack thereof in terms of individual stats. However, in this larger sample, Runs Scored is now ranked higher. Note that there is a solid balance of offensive and pitching stats at the top of the chart. This segues directly into the discussion on how much more difficult it is to predict how pitchers will perform than how hitters will perform. Of course, I picked Frank Thomas -- a consistently great hitter -- with the third overall pick in 1998 to form the cornerstone of my AL continuing league team, and he promptly went into the tank, relatively speaking. I'll stop whining now. 

  • I thought Saves would rank much higher on the list, since more wins would mean more save opportunities and therefore more saves. But this stat finished in the middle of the pack. 

  • The numbers for the worst three teams were higher than those of the best three teams in every category except Shutouts. I guess this means that there was more competition and/or balance at the top of the standings than at the bottom, where the three worst teams were clearly inferior to their league counterparts, but I welcome any thoughts on the mattter. 

  • The two defensive statistics, Outstanding Plays (Range) and Errors, ranked 12th and 14th out of 14 stats measured. If the methodology is correct, then one could reach the conclusion that fielding a good or bad defensive team seems to have little effect on one's chances of putting together a winning season. As I understand it, Baseball HQ's 1998 Scoresheet Exhibition also underscored the apparent lack of effect of Scoresheet defensive range numbers. Scoresheet has responded by prohibiting some out-of-position moves and plans to make their range factors more drastic in both directions, giving good defensive players even greater range and poor players even less. It will be interesting to see if these moves have any effect. I beat the defensive numbers drum even more below. 

  • Total Stolen Bases came in 13th on the list, just a couple of points above randomville. So, um, if defense and stolen bases are the least indicative of success, do Brian Hunter, Rey Ordonez, Deivi Cruz, and their ilk have any value at all in Scoresheet? I'll continue this discussion on defense below. 

In view of the fact that Run Differential is clearly the best indicator of success, I decided to take a look at which stats have the greatest effect on Runs Scored and Runs Allowed. I had some time on my hands one weekend -- a rarity -- so I went through the same 20 Scoresheet leagues and used the same methodology. Instead of picking out the three teams with the most wins and the three teams with the most losses in each league, I looked at the three teams that scored the most runs, the three teams that scored the fewest runs, the three teams that gave up the fewest runs, and the three teams that gave up the most runs.

 In terms of Runs Scored, I checked out correlations with HR, BA, Slg, OBA, OPS, BB, strikeouts (K), Sac Hits, GIDP, SB, and LOB. I looked at the positive and negative sides of each stat. For example, a team that scored a lot of runs would be expected to be in the top three in HR, BA, Slg, OBA, OPS, BB, Sac Hits, and SB, and should be in the bottom three in terms of K, GIDP, and LOB. On the other hand, the three teams per league that scored the fewest runs should be expected to score in the bottom three in HR, BA, Slg, OBA, OPS, BB, Sac hits, and SB, but would be in the top three in terms of Ks, GIDP, and LOB.

 I knew ahead of time that some of these stats were worthless indicators, but I ran all of them anyway to get some kind of a sanity check on the numbers that should mean something. Happily, many of my assumptions turned out to be correct.

 Here's how the numbers crunched, in order of highest correlation with Runs Scored:
 

Runs Scored

Statistic Measured

Best 3 Pct

Worst 3 Pct

Combined Pct

Rank

OPS

75

79

77

1

OBA

69

76(1)

73

2

Slg

68

70

69

3

BA

70

66

68

4

HR

54

61

57

5

BB

45

62(1)

53

6

Total SB

35

48

41(3)

7

GIDP

37

31

34(2)

8

Sacrifice Hits

30

36

33(2,3)

9

Strikeouts

32

27

29(2)

10

Left on Base

15

15

15(4)

11

  1. There seems to be a huge difference between the "best" and "worst" numbers for OBA and BB. The inability to get on base through hits or walks seems to be a real killer. 

  2. GIDP, Ks, and Sacs seem to fall into the category of statistically insignificant. Not surprising for the first two, but since a sacrifice is designed specifically to produce a run, one would think that Sac Hits would score higher than 33%. See note 3 for further discussion on this. 

  3. Curious about NL vs. AL statistical differences? I was, so I separated the 8 NLs from the 12 ALs and then took a look at Stolen Bases and Sac Hits. Here's what came up:


  4.  

    NL vs. AL

    NL Stolen Bases

    AL Stolen Bases

    Best 3 Pct

    Worst 3 Pct

    Best 3 Pct

    Worst 3 Pct

    41

    54

    30

    42

    NL Sacrifice Hits

    AL Sacrifice Hits

    41

    50

    21

    24

    It seems that the ability -- or lack thereof -- to steal bases and lay down sacrifices has some effect on Runs Scored in the NL, or at least a much greater effect than in the AL. Note also that the AL sacrifice numbers fall below the random 25-38%. I take this as a counter-indicator; the use of the sacrifice in the AL may actually lead to fewer Runs Scored rather than to more. Food for thought, certainly.

  5. Speaking of counter-indicators, the LOB numbers jump out too. For the sake of the study I assumed that high LOB numbers were a bad thing, a failure to score runs. In reality, however, higher-scoring teams seem to leave more men on base than their counterparts, a by-product of their higher OBA, I guess.

But wait, there's more! 

The other half of getting a good Run Differential number is preventing runs from scoring. So I looked at the three teams from each league that gave up the fewest runs and the three teams that gave up the most runs. How many of the three most successful pitching staffs and defenses finished in their league's top three in terms of statistic X, and how many of the teams finished in the bottom three of statistic X? Which stats had the highest correlation with success or failure at preventing runs? 

On the pitching side, I looked at ERA, WHIP, CG, Shutouts, Hits allowed, BA against, BB allowed, and Ks. Theoretically, teams that allow fewer runs should have lower ERAs, WHIPs, Hits Allowed, BA Against, and BB allowed, and high numbers of CG, Shutouts, and Ks. Teams that were in the bottom tier of runs allowed should have higher ERAs, WHIP, Hits Allowed, BA against, and BB allowed, and lower CGs, Shutouts, and Ks. You can see how often that happened below. 

On the defensive side, I looked at Outstanding Plays, Errors Committed, and the Opponent Caught Stealing (OCS) numbers. I threw the latter in to get an idea of whether poor-hitting catchers with strong arms (Charles Johnson comes to mind) had any usefulness in Scoresheet. In real life a poor-hitting catcher can have other useful qualities -- calling a good game, handling the pitching staff, those intangible leadership merits -- but in Scoresheet, the numbers tell the whole story. Anyway, you can make your own judgment.

 Here are the numbers in rank order:
 
 

Runs Allowed

Statistic Measured

Best 3 Pct

Worst 3 Pct

Combined Pct

Rank

ERA

90

92

91

1

BA Against

79

81

80

2

Hits Allowed

75

84

79

3

WHIP

80

76

78

4

Shutouts

58

58

58

5

BB Allowed

62

52

57

6

Strikeouts

58

50

54

7

Complete Games

48

56

52

8

Outstanding Plays

36

44

40

9

Errors Committed

33

39

36

10

OCS

27

20

23

11

A couple of points to make on this last chart. First, note the dramatic dropoff in correlation numbers after WHIP -- from 78 to 58%. All the highest-ranked numbers have to do with Hits Allowed, to a much greater extent than BB Allowed. So often we hear announcers chide a pitcher after a leadoff walk, and they point out that "a leadoff walk comes around to score X percent of the time" -- I can never remember the exact number. But I'm thinking -- and this study seems to support -- that a leadoff single is just as bad, and a leadoff extra-base hit is even worse. I desperately avoid pitchers who walk more than an average number of batters, but I may have to reexamine that philosophy. 

Finally, another note on Scoresheet defense. Once again, defensive statistics ended up at the bottom of the chart, waaay below even Ks and CG. So now we can see that range and errors seem to have little effect on building a winning Scoresheet team, and they seem to have little effect on preventing runs, at least in comparison to most other pitching/defensive statistics.

 Is this a surprise? It probably shouldn't be. Scoresheet clearly states in its draft packet that "a difference of .10 in range is equal to .1 (a tenth) of a hit per 9 innings." Well, it takes a couple fairly rangy players to achieve that .10 range advantage, especially since most teams have some subpar range players at other positions.

 Just for fun, I went through the 1999 AL Player List and picked out the rangiest guys at all eight field positions. If you somehow managed to draft all eight of these guys -- essentially including three CF types -- you would enjoy a range rating of +.56. If I'm doing the math right, that means this squad of defensive whizzes would save your pitching staff .56 hits per game, or about 91 hits per season (about two weeks of work for Tim Belcher). I assume that .56/game advantage would work out over the course of the season.

 However, in my Scoresheet AL experience, a great fielding team may manage a range of +.25. That's still pretty high, but we can use it for the sake of comparison. This defense will save the pitching staff .25 hits per game, or about 41 hits per season. Is that a significant number? Is it worth carrying a couple of weak bats?

 Well, teams usually give up between 1300 and 1700 hits in a 162-game season, depending on how good their pitchers are. Saving 41 hits for a team of great pitchers -- which would give up 1300 hits in the season -- represents a 3.2% improvement. Saving 41 hits for a bad pitching staff -- the 1700-hit group -- represents a 2.4% improvement. Saving 41 hits for an average staff -- 1500 hits allowed -- is a 2.7% improvement. The improvement numbers for the top-ranked defensive team that saves 91 hits per season would be 7.0 (1300), 5.4 (1700), and 6.1 (1500) percent.

 For the sake of comparison: A survey of the 20 leagues I used to create this page indicates that the teams that gave up the fewest hits in each league beat their nearest competitor in Hits Allowed by an average of 50 hits per season. The average gap between the team that gave up the fewest hits and the team in the league that gave up the most hits was 267. 

Let's look at a couple of positions. In a one-year league, if you have the choice of Mike Bordick (4.77) or Mike Caruso (4.72), who do you take? Caruso makes a lot of errors, but that doesn't seem to have a great effect on winning or losing or even preventing runs. Bordick's range advantage of .05 translates to him saving you 8 hits per year. Well, I'm pretty sure Caruso will make up those 8 hits with the stick, despite the fact that he starts hacking when he steps off the team bus.

 A more extreme example would be choosing between Jose Canseco (2.01) and Brian Hunter (2.21) to play a corner OF position. Hunter will theoretically get to 32 more balls than Canseco in the OF, but then you have to watch him bat. This is about the most extreme example I could think of without delving into playing guys out of position, and it boils down to 32 hits over the course of the season. Since Canseco will probably produce 75 more RUNS than Hunter over the same season, I think you gotta let the big guy wear a glove. And maybe a hard hat.

 Oh, there is a more extreme example. Scoresheet points out that one position at which range makes a great difference is CF, which is 1.6 times more important than the other OF spots. I wonder if that means that playing Canseco in CF instead of Hunter would mean an extra 51 (32x1.6) hits falling in? That seems like a lot, despite the 75 extra runs produced. Plus your pitchers might get irritated.

 Does defense matter? I think it can marginally improve your pitching. A great fielding team may even be able to cancel out the offensive "All-Star factor" that balloons real-life ERAs up by .25 runs or so. But even a simply good team defensive range -- +.25 -- is only going to reduce your hits allowed by 2 to 3 percent, and the correlations between Range and Wins/Losses and Range and Preventing Runs are quite low. Is it worth the accompanying loss of offense that these trade-offs often require?

 In the past I've always drafted like defense did matter, but those differences of .03 or .04 that I've worried about in the past boil down to 5 or 6 hits prevented per season. It's not like Scoresheet has concealed this fact; it's all right there in the packet. In the future, I'll draft the better hitter and hope to lead the league in Runs Scored. 


If for some reason you would like to comment on this study or the methodology, please send me a note at this address. I hope you found it at least somewhat interesting.