Sunday, April 1, 2012

Playoff Odds

Baseball Prospectus publishes a set of playoff odds that's interesting for many reasons, the most obvious of which is that they're generated by smart people yet are very obviously wrong. To take just two examples: Detroit is listed as <58% to win the AL Central, when most people think they're nearly a lock. (I actually think the Tigers are not nearly the lock they're made out to be, but that's a different issue--the point is that 57.9% is probably crazy.) And Texas, just a few days ago, was listed as 93.4% to win the AL West, which is also a very bad line, given how good the Angels are and how much luck there is in baseball.

As far as I know--and I could definitely be wrong here--these odds are generated by running a simulation, or maybe a huge set of simulations, of the whole season, using the actual MLB schedule. For each game, a team's chances of winning are a function of its Expected Win Percentage ("EW%") and its opponent's.

All this means that at least one of the following is true:

(1) The widely accepted betting odds on baseball are wildly, totally wrong.

(2) Our best methods for estimating a team's winning chances given its EW% and its opponent's are terrible.

(3) A team's winning chances are not, and cannot even be well approximated by, the two relevant EW%'s.

(4) Something has gone very wrong in calculating the EW%'s Baseball Prospectus is using.

I have a lot to say about these, but only have time to write a little bit. Briefly:

Re: (1): It's almost impossible that this is true. This gets into deep and interesting issues in epistemology, but I take it that this isn't controversial. If you disagree with me, feel free to try to make your fortune betting this baseball season--you have a huge edge, if you're right and everyone else is wrong. (But you're not.)

Re: (2): This seems possible to me, if by "best methods" I mean "the methods that baseball bloggers tend to use." I had some pretty serious issues with some playoff-series odds that (if memory serves) @RationalPastime (who is a very interesting and friendly guy, in my experience) was releasing. After some poking around, I discovered (I think) that it is standard to use something called the log5 method. It seemed obvious to me last year that this method couldn't be all that good, but I can't remember whether I had a priori or just empirical reasons to think so. I also think that there must be a better method out there somewhere, given the state of human knowledge about estimating things statistically--are baseball bloggers so into following Bill James's lead (the link says that the log5 method is James's preferred technique) that they're neglecting better techniques?

I hope some reader can point me to literature on estimating head-to-head winrates from the overall winrates of the two competitors.

Re: (3): This is a fun theory, but I doubt it's true. That said, I do think that part of the problem here could be the way that (what I take to be) BP's system fails to adjust, e.g., for the way that talent is distributed among a team's starting pitchers

Re: (4): Also very possible, but here I am out of my depth--I simply don't know much about estimating team winrates from the expected performances of all its players (which BP generates each year).

Any thoughts?