Tim-Reigns wrote:Fair enough and makes sense. Seems odd to spend so much time on other stats' accuracy then use range factor for defense. I understand the time thing, but the DRA data is on Seamheads, and I have "gleaned" it many times for personal use! DRA, depite the limited 5-point defensive scale of the game, is going to be superior in every way to range factor, but you know that...

What impact do defensive stats have on Digital Diamond?

I would

LOVE to incorporate

ALL of the data on Seamheads into DDBBLCT, unfortunately ... nothing on that site beyond the ballpark data (which I've included and make use of) is available electronically. If you're volunteering to go into Seamheads and prepare a spreadsheet for me that has every player's DRA from 1951 to present, at every position, then you will be a hero to sim communities everywhere. I will gladly use that data, as I imagine lots of other would like to. Unfortunately, I don't expect that

ANYONE (and especially not myself) has the time on their hands to do this ... so it has to be calculated and extracted from that data which is available electronically (ie, the Retrosheet PBP files) and that is a bit of work.

Much more so than ARF/9 (which actually doesn't require the PBP data at all).

The key here is that I'm not trying to populate ratings for a single library (for which taking the data from Seamheads seems entirely appropriate). I'm trying to create a tool that facilitates this for

ANY library (inside of the range of dates that the data covers). That's a

huge difference!

I also cannot agree with the statement that on a 5-point defensive scale that DRA is going to superior to ARF/9 "in every way".

At it's core, DRA starts by throwing out misleading statistics (ie. putouts for infielders, for example) ... but in

MY ARF/9 calculation, I'm also doing this. It then corrects for a number of factors, among them Balls In Play (but I'm doing this also). It goes much farther than that, introducing and correcting for ballpark effects at each position, evaluating the handedness of batters faced by each team, etc. and then using linear weights to express the initial calculation as a number of runs saved, and finally finishing by aligning each fielder's contribution to a team total, such that the number of runs saved for all players on a team equals the number of runs that the team saved as a whole. It's this latter part of the analysis that requires significant digging into the PBP files to come up with all of those team and ballpark coefficients. It's very doable, but it's tantamount to the work that I did to extract the VSL batting statistics X 9 X the number of MLB franchises. That's a lot. In lieu of this, I finish with an additional correction meant to account for pitching staffs that are more prone to give up fly balls vs ground balls or vice-versa ... and stop.

Now back to our five point scale ... in light of the fact that ARF/9 (not RF/9 ... which still includes infielder putouts and corrects for nothing,

big difference) starts from the same principles, but just doesn't go as far, I'll argue that on a 5-point scale, it's really not going to produce a difference for that many fielders. You may disagree, but having gone through the process both ways for a number of fielders as an example, I'm just not seeing it. DRA can tell you that Fielder A saved 5 runs, Fielder B saved 6 runs and Fielder C saved 7 runs ... but on a five point scale (good, above average, average, below average, bad) they're all simply "above average". You just don't have enough granularity to differentiate between them.

But the most important part of all of this is your final question, ie: What impact do defensive stats (and I'll assume that you mean the range ratings in particular) have on DDBB?

So let's examine that, and perhaps you can appreciate my 5-point scale point a little better:

We talked about this elsewhere in another post, but it's probably more appropriate here, so I'll repeat it. In DDBB there is a single setting that controls the effect that the range ratings (if you turn them on) have on the game. Specifically, it is the number of times (in 1000 chances) that a "RANGE 2" or "RANGE 4" player will turn an out into a hit, or vice versa. The default setting is 25. This means that "2's" and "4's" have a 2.5% chance of changing the outcome of a play based on their range rating, and that "1's" and "5's" have a 5% chance of doing the same. Personally, I play with this number closer to 50, but that's beside the point. The point is I have this one setting, for all fielders in the game, regardless of position ... and the ability to set each fielder to a rating between 1 (2x chance to take hits away), 2 (1x chance to take hits away, 3 (no effect), 4 (1x chance to turn an out into a hit), or 5 (2x chance to turn an out into a hit). You can certainly attempt to perform some complex analysis to bring linear weights into play here, and translate the number hits added or taken away into a positive or negative number of runs saved ... fitting perfectly with a system like DRA and making that idea look much more attractive. However ... you'll be felled by three things:

1) There is one setting for all positions. If you've studied DRA at all, you know that the contributions from the different fielding positions are nowhere near equal. We'd need a calibration setting for each position.

2) A five point scale isn't anywhere near large enough to differentiate between levels of fielding expertise.

3) The simulation is based on "real" stats. (ie. those that are published everywhere online that we all know and love). But "real" stats already have all of the effects of defense (along with everything else) "baked in". If you're seriously going to employ defensive statistics (whether they're based on DRA, or anything else) into a game, then you need to first "subtract out" the effects of defense through a normalization process. In other words, you need to work backwards from the real stats and the team and ballpark defensive effects to calculate what each player's numbers would have looked like in the context of perfectly average fielders. Then, the defensive numbers can be added back in (by turning the ratings on). This is similar in concept to what I've attempted to do with the ballpark effects ... but that's light years easier, by comparison.

Fail to do this, and just like using ballpark effects without normalization ... you're double dipping. Good and bad fielders already have their effects accounted for in the un-normalized statistics. Anything else you give them after the fact amounts to a "bonus". This is the primary reason why the defensive system in DDBB is meant to have only a slight impact. If you drive the sim with un-normalized statistics (which are all that are readily available) then it will skew the results.

If the game were more conducive to this, (ie. we had something other than a 5-point scale, and calibration settings for every position) and I could start by someone being nice enough to electronically publish the DRA numbers so that I didn't have to calculate them, then I might be inclined to go through all of the trouble of working backwards via linear weights to come up with an algorithm to normalize the stats AND come up with calculations to accurately set the calibration settings. This is all very doable, just a bit time-consuming. And that level of importance in the game engine might just merit the extra work. But in it's current state (and without the DRA numbers being electronically available) ... I stand by my conclusion. The differences between ARF/9 (which can be calculated automatically in a very simple series of calculations, using numbers that are already available electronically), and DRA,

with respect to DDBB's defensive rating system, are extremely remote. The extra work required (which is considerable) would be wasted.

Should someone publish the DRA numbers electronically ... then I'll probably be the first one to lobby Mark to expand the rating system so that things can be taken farther. It's actually something I'd like to see. But "Open Source" doesn't mean "Readily Available". The Lahman and Retrosheet PBP files are a great start, but more data needs to be accessible (ie. downloadable) in some electronic format so that it can be easily utilized. DRA is just one example of this.