Library Creation Tool

Use this forum to discuss or share your custom player libraries and ballpark images.

Re: Library Creation Tool

Postby DDBBAdmin » Thu May 04, 2017 8:54 pm

Great discussion guys!

"Ratings" like range and arm are built-into each player's stats. For example, a player's batting average already includes the outs they made because of fielders with excellent range, and the cheap hits they got from players with poor range.

The custom ratings were added to DDBB to make the game more fun -- they add strategy and excitement. The default settings in DDBB with respect to custom ratings are gentle. They do not have a major impact on game play -- just enough to add some fun but not that much that accuracy takes a hit. These settings can be adjusted to suit your preference. However, if your goal is to be as accurate as possible, the recommendation is to turn them off.

When the custom ratings are turned off the only thing that impacts fielding is a player's error rate, which is objective and is calculated directly from their real-life fielding stats.

I hope this helps!
User avatar
Site Admin
Posts: 2791
Joined: Wed Feb 02, 2011 7:19 pm


Postby johnnybravo17 » Thu May 04, 2017 9:03 pm

Tim-Reigns wrote:Fair enough and makes sense. Seems odd to spend so much time on other stats' accuracy then use range factor for defense. I understand the time thing, but the DRA data is on Seamheads, and I have "gleaned" it many times for personal use! DRA, depite the limited 5-point defensive scale of the game, is going to be superior in every way to range factor, but you know that...

What impact do defensive stats have on Digital Diamond?

I would LOVE to incorporate ALL of the data on Seamheads into DDBBLCT, unfortunately ... nothing on that site beyond the ballpark data (which I've included and make use of) is available electronically. If you're volunteering to go into Seamheads and prepare a spreadsheet for me that has every player's DRA from 1951 to present, at every position, then you will be a hero to sim communities everywhere. I will gladly use that data, as I imagine lots of other would like to. Unfortunately, I don't expect that ANYONE (and especially not myself) has the time on their hands to do this ... so it has to be calculated and extracted from that data which is available electronically (ie, the Retrosheet PBP files) and that is a bit of work. Much more so than ARF/9 (which actually doesn't require the PBP data at all).

The key here is that I'm not trying to populate ratings for a single library (for which taking the data from Seamheads seems entirely appropriate). I'm trying to create a tool that facilitates this for ANY library (inside of the range of dates that the data covers). That's a huge difference!

I also cannot agree with the statement that on a 5-point defensive scale that DRA is going to superior to ARF/9 "in every way".

At it's core, DRA starts by throwing out misleading statistics (ie. putouts for infielders, for example) ... but in MY ARF/9 calculation, I'm also doing this. It then corrects for a number of factors, among them Balls In Play (but I'm doing this also). It goes much farther than that, introducing and correcting for ballpark effects at each position, evaluating the handedness of batters faced by each team, etc. and then using linear weights to express the initial calculation as a number of runs saved, and finally finishing by aligning each fielder's contribution to a team total, such that the number of runs saved for all players on a team equals the number of runs that the team saved as a whole. It's this latter part of the analysis that requires significant digging into the PBP files to come up with all of those team and ballpark coefficients. It's very doable, but it's tantamount to the work that I did to extract the VSL batting statistics X 9 X the number of MLB franchises. That's a lot. In lieu of this, I finish with an additional correction meant to account for pitching staffs that are more prone to give up fly balls vs ground balls or vice-versa ... and stop.

Now back to our five point scale ... in light of the fact that ARF/9 (not RF/9 ... which still includes infielder putouts and corrects for nothing, big difference) starts from the same principles, but just doesn't go as far, I'll argue that on a 5-point scale, it's really not going to produce a difference for that many fielders. You may disagree, but having gone through the process both ways for a number of fielders as an example, I'm just not seeing it. DRA can tell you that Fielder A saved 5 runs, Fielder B saved 6 runs and Fielder C saved 7 runs ... but on a five point scale (good, above average, average, below average, bad) they're all simply "above average". You just don't have enough granularity to differentiate between them.

But the most important part of all of this is your final question, ie: What impact do defensive stats (and I'll assume that you mean the range ratings in particular) have on DDBB?

So let's examine that, and perhaps you can appreciate my 5-point scale point a little better:

We talked about this elsewhere in another post, but it's probably more appropriate here, so I'll repeat it. In DDBB there is a single setting that controls the effect that the range ratings (if you turn them on) have on the game. Specifically, it is the number of times (in 1000 chances) that a "RANGE 2" or "RANGE 4" player will turn an out into a hit, or vice versa. The default setting is 25. This means that "2's" and "4's" have a 2.5% chance of changing the outcome of a play based on their range rating, and that "1's" and "5's" have a 5% chance of doing the same. Personally, I play with this number closer to 50, but that's beside the point. The point is I have this one setting, for all fielders in the game, regardless of position ... and the ability to set each fielder to a rating between 1 (2x chance to take hits away), 2 (1x chance to take hits away, 3 (no effect), 4 (1x chance to turn an out into a hit), or 5 (2x chance to turn an out into a hit). You can certainly attempt to perform some complex analysis to bring linear weights into play here, and translate the number hits added or taken away into a positive or negative number of runs saved ... fitting perfectly with a system like DRA and making that idea look much more attractive. However ... you'll be felled by three things:

1) There is one setting for all positions. If you've studied DRA at all, you know that the contributions from the different fielding positions are nowhere near equal. We'd need a calibration setting for each position.

2) A five point scale isn't anywhere near large enough to differentiate between levels of fielding expertise.

3) The simulation is based on "real" stats. (ie. those that are published everywhere online that we all know and love). But "real" stats already have all of the effects of defense (along with everything else) "baked in". If you're seriously going to employ defensive statistics (whether they're based on DRA, or anything else) into a game, then you need to first "subtract out" the effects of defense through a normalization process. In other words, you need to work backwards from the real stats and the team and ballpark defensive effects to calculate what each player's numbers would have looked like in the context of perfectly average fielders. Then, the defensive numbers can be added back in (by turning the ratings on). This is similar in concept to what I've attempted to do with the ballpark effects ... but that's light years easier, by comparison.

Fail to do this, and just like using ballpark effects without normalization ... you're double dipping. Good and bad fielders already have their effects accounted for in the un-normalized statistics. Anything else you give them after the fact amounts to a "bonus". This is the primary reason why the defensive system in DDBB is meant to have only a slight impact. If you drive the sim with un-normalized statistics (which are all that are readily available) then it will skew the results.

If the game were more conducive to this, (ie. we had something other than a 5-point scale, and calibration settings for every position) and I could start by someone being nice enough to electronically publish the DRA numbers so that I didn't have to calculate them, then I might be inclined to go through all of the trouble of working backwards via linear weights to come up with an algorithm to normalize the stats AND come up with calculations to accurately set the calibration settings. This is all very doable, just a bit time-consuming. And that level of importance in the game engine might just merit the extra work. But in it's current state (and without the DRA numbers being electronically available) ... I stand by my conclusion. The differences between ARF/9 (which can be calculated automatically in a very simple series of calculations, using numbers that are already available electronically), and DRA, with respect to DDBB's defensive rating system, are extremely remote. The extra work required (which is considerable) would be wasted.

Should someone publish the DRA numbers electronically ... then I'll probably be the first one to lobby Mark to expand the rating system so that things can be taken farther. It's actually something I'd like to see. But "Open Source" doesn't mean "Readily Available". The Lahman and Retrosheet PBP files are a great start, but more data needs to be accessible (ie. downloadable) in some electronic format so that it can be easily utilized. DRA is just one example of this.
User avatar
Posts: 843
Joined: Thu Apr 04, 2013 9:54 am
Location: Somewhere on the outskirts of sanity, and perhaps in your head, as the unofficial 'voice' of DDBB

Re: Library Creation Tool

Postby Tim-Reigns » Thu May 04, 2017 9:49 pm

Thanks again, JB, for that informative answer! Sorry, didn't know about your own ARF stat.

Humphreys' biggest beefs with WAR and other proprietary stats are that they are not open source AND they are not readily available. Does this help? ... ppendices/
Posts: 15
Joined: Fri Apr 28, 2017 5:36 am

Re: Library Creation Tool

Postby johnnybravo17 » Thu May 04, 2017 10:03 pm

That, my friend ... is an awesome reference!

You have answered the call that I put out earlier, which was:

If anyone knows of anything else in some electronic format that I can somehow work into DDBBLCT, please let me know.

I'll look this over and see just how it fits ... but it's definitely going in. It does, however, stop at 2009. So I still need a source for the years 2010-1016, and (most importantly) for the years beyond, so that I can plan to maintain this thing going forward. But I can easily make the tool return numbers derived from these spreadsheets in addition to my ARF/9 calculations for those years where the data is available. That way, one can interpret the data to go from criteria to ratings, in any way they wish. That's in keeping with the spirit of DDBB, which always leans toward giving end-users the control of choice.
Last edited by johnnybravo17 on Thu May 04, 2017 11:43 pm, edited 1 time in total.
User avatar
Posts: 843
Joined: Thu Apr 04, 2013 9:54 am
Location: Somewhere on the outskirts of sanity, and perhaps in your head, as the unofficial 'voice' of DDBB

Re: Library Creation Tool

Postby Tim-Reigns » Thu May 04, 2017 10:31 pm

Yeah, the more I read your post, the more I had a hunch you were not aware of Wizardry's specific website with appendices!

A couple of things:

1. I corresponded with Mr. Humphreys via email a few years ago. He's a serious baseball scholar. I do not know if he would provide the 2010-2016 data. His formula has been tweaked a bit since the book. I sent the man a list of questions and he gave me a 13-page response...He estimates that good fielders with long careers (10,000 plus innings) were undervalued in Wizardry by about 50%, or should have been shown to have saved 300 instead of 200 runs over a career.

2. As far as comparing DRA to any range factor, there are good "test cases" to see if they are on the same page. Johnny Bench, Omar Vizquel, Ozzie, Davey Concepcion, Doody Evans, Thurman Munson, and a bunch of center fielders I can't remember off the top of my head. I.e., pretty much every other fielding stat under/over rates certain guys that DRA gets right.

3. Coors and Fenway. Left fielders in Fenway and outfielders in Coors need special tweaking above and beyond. Most metrics miss this.

4. Thanks for the hard work...if we have 1893 to 2009, I would *include* DRA-capability into your tool with the assumption that the remaining 6% of history will fall into database form somewhat easily... ;)
Posts: 15
Joined: Fri Apr 28, 2017 5:36 am

Re: Library Creation Tool

Postby skrabec1 » Sat May 20, 2017 2:39 am

JB - just checking in, how's version 2.0 coming? Neil
Posts: 51
Joined: Mon Feb 17, 2014 3:11 am

Re: Library Creation Tool

Postby johnnybravo17 » Sat May 20, 2017 8:07 am

I'm still at it.

You wouldn't believe how much trouble the Lahman's omission of defensive innings for Outfielders from 2000-present causes when you're trying to center around that as the "base" for a half dozen different data sources. May is also typically a slow month for "me" projects like this, because quite a few of the womenfolk in my family have birthdays this month ... so my time to work on this stuff on weekends is very limited.

What I can tell you is everything that's going to be included in the next release ... as I'm pretty firm on that now. I flip-flopped and left the BLOB stuff (stadium pics and logos) alone for the moment. That will come later. Instead I focused on gathering all of the raw data that I intend to utilize as ratings criteria and trying to neatly organize it into a single database.

So we'll have the following:

Baseball Prospectus Baserunning Runs (probably the closest thing relating to how the speed numbers are used in DDBB)
Baseball Prospectus Team Defense (useful for determining each team's Balls In Play vs lg Avg Balls In Play, and also FB% vs lg FB% and GB% vs lg GB%, since I utilize these ratios heavily to adjust the raw RF/9 numbers)

BaseBall Reference Advanced Fielding: LF (Kill%, Held%, and also the individual breakdowns for this information for each situation)
Baseball Reference Advanced Fielding: CF
Baseball Reference Advanced Fielding: RF

(I actually got this data from the Retroseet PBP files ... but it's easy to look at the Advanced Fielding tables on to get a feel for the type of data I'm accumulating here.)

Baseball Reference Pitcher Fielding (useful for calculating HOLD%, or each pitcher's tendency to prevent runners from getting good leads ... as used by the rating of the same name in DDBB)
BaseBall Reference WAR (no one knows exactly how this is calculated, but the results for WAR, oWAR, dWAR and all of the intermediate factors involved are all updated daily and available in a nifty spreadsheet, if you know where to look!)

Bill James Speed Rating (I can't decide ... I may still prefer this to baserunning runs, so I'll include them both).

Hall Of Stats Individual Season Totals (not directly useful for any DDBB rating, as of yet, but a very interesting site nonetheless, and one worth checking out if you're not familiar with it. Plus, they put all of their data into a downloadable spreadsheet, so why not throw it in?)

Lahman Batting
Lahman Fielding
Lahman Pitching

What If Sports Ratings (They use a Grading system, ie A-F, that is interesting because it is akin to the 5 point scale in DDBB. No one knows how their ratings are determined, and the scale actually has well over 5 points because it includes + and - steps, but since they rate every one of the things that we're rating in DDBB, it's fun to look at them side by side).

The DRA ratings from the Wizardry appendix (which, regrettably, stop at 2009 ... but I'm holding out that we'll see an update at some point and it will be easy to update this database if that ever comes to pass).

Practically every one of these sources has their own funky idea about how to identify the team a player played for each season ... and that's driving me nuts trying to align all of this into a single database. That and the missing defensive innings for 1951-53 in the Lahman, and also for outfielders only from 2000-present are the two big hurdles. If not for those two things ... I would have been done with this a long time ago. But I'm getting there. It also would have been easy to do a simple search and replace to solve the team ID dilemma, but instead I'm attempting to create a pivot table that I can use now and forever for any funky team ID system that comes down the pike going forward. This whole thing is being developed with the idea that I (or someone) can easily update it from year to year just by ingesting the latest annual data set from each of these sources as soon as it's released at the conclusion of each season. It will also side-step the Lahman OF issue by including the defensive innings in the Advanced Fielding tables. So even if they never correct whatever caused that to start happening 2-3 years ago, THIS database won't suffer because of it.

Bottom Line: This is work that only needs to be done once.

Most of the spreadsheets that form the basis for the tables that I'm adding to the existing LCT database are done. It's really just the Advanced Fielding and that humongous Baseball Reference WAR spreadsheet that I'm still wrestling with. Once I'm done with those, I'll post the entire set of spreadsheets on my website for anyone who's interested. I think there's some pretty good bits there for all sorts of baseball research ... if you're into that sort of thing.
User avatar
Posts: 843
Joined: Thu Apr 04, 2013 9:54 am
Location: Somewhere on the outskirts of sanity, and perhaps in your head, as the unofficial 'voice' of DDBB


Return to Custom Libraries and Images

Who is online

Users browsing this forum: No registered users and 1 guest