Statistical models for betting predictions
Most current models for tennis forecasting use hierarchical stochastic expressions based on Markov chains.
Below is an overview of the concepts behind them.
Klaasen and Magnus challenged IID theory by showing that points in tennis are not distributed independently or equally. However, they also showed that deviations from IID are so small that using this assumption often yields good averages. This fact suggests that for every point in a match, the outcome of that point is independent of previous points. Assume further that we know the probability of winning a point on each player's serve. Let p be the probability that player A wins a point on his serve, q be the probability that player B wins a point on his serve. Using the IID assumption and the probabilities of winning a point, we can construct a Markov chain describing the probability of a player winning a game.
See also: Betting Sites in India
Formally, a Markov chain is a system of transitions between different states in state space. An important property of the system is that it has no memory, i.e. the next state of the system depends only on the current state, not on the preceding sequence of states. Taking the game score as the state space, and the state transitions as the probabilities of player A winning or losing a point, we obtain a Markov chain representing the stochastic progression of the game score. The figure below shows a circuit diagram for one game with player A serving. By denoting p the probability of winning a point when serving and assuming IID, we obtain that all transitions denoting a point won by player A have the same probability, and all transitions denoting a point lost have probability 1-p.
Due to the hierarchical structure of a tennis match, additional Markov chains are constructed to model the progression of points in tie-breaks, sets and matches. For example, in a match model there will be two outgoing transitions from each non-final state, labeled with the probabilities of a player winning and losing an individual set. Diagrams of such models can be seen in.
Based on the idea of modelling tennis matches using Markov chains, Barnett and Clarke and O'Malley developed hierarchical expressions for the probability of a particular player winning the entire match.
In the above expressions p is the probability of player A winning a point when serving, x and y are the number of points won by players A and B respectively. This expression corresponds entirely to the Markov chain in the figure above.
Barnett and Clark also define a similar expression for calculating the probability of winning a set based on the probabilities of winning individual games and tie-breaks (which also depend on the probabilities of winning serving). Finally, the probability of winning a match can be calculated using previously defined expressions. It turns out that the final expression for the probability of winning the match depends only on the probability of winning a point by each player's serve.
Estimating the Probability of Winning a Serving
The question remains how to estimate these probabilities of winning a point by service for matches not yet played. Barnett and Clarke give a method for estimating these probabilities from historical player statistics.
Modern models of tennis forecasting are based on the described hierarchical stochastic expressions. Knottenbelt refined Barnett's models by using only matches with common opponents of players, instead of all past opponents, to calculate the probability of winning a point when serving. This approach reduces the bias arising from players having faced opponents of different levels in the past.
Madurska further extended Knottenbelt's general opponent model by using different probabilities of winning a point when serving for different sets. Thus, the author abandoned the IID assumption and her model reflects the accumulation of physical fatigue in the player as the match progresses.
Knottenbelt's general opponent model and Madursky's settlement model are the most advanced statistical models and the authors claim a 6.8% and 19.6% ROI for their models compared to the 2011 WTA Grand Slam match betting market, respectively. The general opponent model was also tested on a larger and more diverse sample of 2,173 2011 ATP matches and showed an ROI of 3.8%.