The nature of the MMR hell (Matlab analysis)
To those who are unfamiliar, MMR hell refer to the long streak of losses many players experienced in season 2. It’s caused by reducing MMR lead to reduced teammate quality which leads to further reduced MMR. This also applies to win streaks. Some people do not believe this happens, and think everyone who has this problem is bad (ROFL).
I have written a Matlab Simulation of the Season 1 (and unranked) and Season 2 algorithms to find out exactly why this happened and what factors were involved. I made some educated guesses on the distribution of player population and implementation of the matchmaker. The purpose of this is not to publish in Nature, but to reproduce the symptoms people are experiencing. If you are interested in the assumptions, please see the figures at the end of the post and comments in the source code. I also made the source code available to anyone interested in running this simulation. (http://www.ssnt.org/personal/pvp/pvp_sim.rar)
For the complete post with inline figures, see:
(http://www.ssnt.org/personal/pvp/pvp_analysis.pdf)
For Full Sized figures, see:
(http://www.ssnt.org/personal/pvp/figs.rar)
So without further delay, the pretty pictures
http://i.imgur.com/eFxtmkR.png
Unlike real players, true skill levels of simulated players are assigned. This allowed me to have a look at how matchmaking captures the real skill. Both algorithms were terrible, with S2 being a little better with the added “benefit” of vastly inflating or deflating rating of some players. In this simulation there were about 4.5% of stuck players, and that number did not decrease as season progressed.
So why is this happening? Let’s look at a few factors.
Login Randomness
Eliminating the login randomness not only kills the MMR hell, it also allow S2 algorithm to vastly out-perform S1. If everyone who played less frequently only logged in while people who played more often are online (In the other words, the number of players online is a perfect sinusoid), there is no snowball hell. This is probably why Anet thought season 2 algorithms was better on paper. But reality doesn’t work like that. People have jobs.
http://i.imgur.com/O1uJLQI.png
This happens because people can’t find even games during off-hours. Looking over the data for the stuck models, I realize they overwhelmingly had very poor number of close games. But before we get to that, there is another factor to consider.
Within-team MMR Tolerance
http://i.imgur.com/eZmb03K.png
Removing the limit for difference between MMR of teammates improved the situation dramatically. It did not eliminate the MMR hell (about 3.5% player were stuck) for S2, but it did allow the MMR to have some resemblance of actual skill. The reason is an increased number of off-peak games, which increased the chance of finding even games during these hours. This leads us to the heart of the problem.
Rating Error and Close games
http://i.imgur.com/O0T1eDm.png A
http://i.imgur.com/eEz1DP8.png B
http://i.imgur.com/0RX0EEw.png C
In every case (A-baseline, B-Nonrandom Login, C-No MMR limit within team), the Maximum Rating Error is a decreasing function of number of CLOSE games (team MMR difference less than 300). This can be seen from the Glicko equations. A close game where team MMR is identical changes MMR by five times the amount of a game where the team rating difference is 600. This means as far as Glicko is concerned, games between teams with vastly different ratings are not meaningful. The team with the higher rating cannot gain MMR, and the team with the lower rating cannot win.
The job of Glicko algorithm is to estimate a player’s true skill from CLOSE games. Without them, Glicko cannot rate players. While these streaks of wins or losses are subjectively satisfying or frustrating, they don’t provide any meaningful information about the players’ true skill. The job of the Matchmaker is to provide Glicko with these games from which true skill of players can be inferred from wins and losses.
By removing the limit on the MMR difference of red and blue teams, Season 2 matchmaker is wasting everyone’s time with strings of pointless games. It’s less efficient without any benefits. In less technical terms, season 2 algorithm is dumber than two bags of rocks. Relaxing teammate MMR requirement for the old algorithm would do a better job.
Assumptions
http://i.imgur.com/R2GV77G.png
This is the distribution of player models used. It makes the following assumptions:
- Most players are casual and plays a couple hours a day, but a small number of players …
- Online time modeled as sum of a sinusoid with a random variable, and is strongly correlated (R squared ~=.5) with a pure sinusoid.
- Player skill is normally distributed and is strongly (R squared ~=.45) correlated with play time.
The more you play the better you get.
- Most players are okay, but there aren’t many past 2500 ratings, and there are 2% player with zero true skill ratings, which represent AFKs, brand new players and people who intentionally throw matches.
- Most players are pretty consistent between games. Their performance in each game is a Gaussian random variable with a log-normal deviation. A 100 deviation means 95% of performances will fall within 600 rating points.
- The ratings were adjusted in each game assuming each player played a series of game in the time interval against every player on the other team. Email communication with Dr. Glickman indicates the algorithm was never intended to be used in a sport where teams can rearrange between matches. I also tested the alternative interpretation where the teams were adjusted as a single player. But it made no difference. Please check team_adjust in the source code for this implementation.
- Teammate MMR requirement was 100 for baseline and 1000 for the relaxed case.
(edited by RubberDougie.2750)