Sunday, April 1, 2012

Playoff Odds

Baseball Prospectus publishes a set of playoff odds that's interesting for many reasons, the most obvious of which is that they're generated by smart people yet are very obviously wrong. To take just two examples: Detroit is listed as <58% to win the AL Central, when most people think they're nearly a lock. (I actually think the Tigers are not nearly the lock they're made out to be, but that's a different issue--the point is that 57.9% is probably crazy.) And Texas, just a few days ago, was listed as 93.4% to win the AL West, which is also a very bad line, given how good the Angels are and how much luck there is in baseball.

As far as I know--and I could definitely be wrong here--these odds are generated by running a simulation, or maybe a huge set of simulations, of the whole season, using the actual MLB schedule. For each game, a team's chances of winning are a function of its Expected Win Percentage ("EW%") and its opponent's.

All this means that at least one of the following is true:

(1) The widely accepted betting odds on baseball are wildly, totally wrong.

(2) Our best methods for estimating a team's winning chances given its EW% and its opponent's are terrible.

(3) A team's winning chances are not, and cannot even be well approximated by, the two relevant EW%'s.

(4) Something has gone very wrong in calculating the EW%'s Baseball Prospectus is using.

I have a lot to say about these, but only have time to write a little bit. Briefly:

Re: (1): It's almost impossible that this is true. This gets into deep and interesting issues in epistemology, but I take it that this isn't controversial. If you disagree with me, feel free to try to make your fortune betting this baseball season--you have a huge edge, if you're right and everyone else is wrong. (But you're not.)

Re: (2): This seems possible to me, if by "best methods" I mean "the methods that baseball bloggers tend to use." I had some pretty serious issues with some playoff-series odds that (if memory serves) @RationalPastime (who is a very interesting and friendly guy, in my experience) was releasing. After some poking around, I discovered (I think) that it is standard to use something called the log5 method. It seemed obvious to me last year that this method couldn't be all that good, but I can't remember whether I had a priori or just empirical reasons to think so. I also think that there must be a better method out there somewhere, given the state of human knowledge about estimating things statistically--are baseball bloggers so into following Bill James's lead (the link says that the log5 method is James's preferred technique) that they're neglecting better techniques?

I hope some reader can point me to literature on estimating head-to-head winrates from the overall winrates of the two competitors.

Re: (3): This is a fun theory, but I doubt it's true. That said, I do think that part of the problem here could be the way that (what I take to be) BP's system fails to adjust, e.g., for the way that talent is distributed among a team's starting pitchers

Re: (4): Also very possible, but here I am out of my depth--I simply don't know much about estimating team winrates from the expected performances of all its players (which BP generates each year).

Any thoughts?

2 comments:

  1. Interestingly, there used to be three odds reports on Baseball Prospectus: one was based on expected win percentages (based off of runs scored/against), another was PECOTA-informed, and the last was an ELO rating system. It's all simulated some 10,000 times to come up with the probabilities. BP decided to scrap the first and the third last year for simplicity.

    I wonder how well Vegas' expected win percentages for each team would match up with their odds posted of winning the division, etc. (using log5 or some other method).

    I don't have great thoughts about this right now, but I'd be interested in seeing a histogram of win totals for each team. If there are abnormalities, perhaps you could detect it there. I'd also be interested in just seeing the standard deviation of wins for each team (and calculating the coefficient of variation).

    To me, the biggest issue is always injuries. If you're not adequately accounting for them in your simulation, I'm guessing your variance will be too low. Not sure if this would increase or decrease the final odds of the team's you point out.

    ReplyDelete
  2. Sean--

    Thanks for commenting! I'm pretty sure Vegas matches up poorly with the Playoff Odds division %'s, but that's something I ought to check out in more detail.

    PECOTA is doing more to project for injuries, I think? (Am I just making that up?) I'm a bit more sensitive to questions about how willing teams are to win now--surely DET isn't going to be stingy with an extra $4M or so if Smyly flops or another starter gts hurt (before Turner is ready), for example. Different situation in CLE and KC.

    ReplyDelete