Showing Posts For Ponidis The Psycho.8612:
Ponidus,
There is a HUGE amount of noise in the data.
Yes, Team A won. But why?
…
Yet the MMRs are all adjusted based on ONE match of the two teams.So, just as in a double blind controlled case study, you need LOTS of data points. n=100,000 is good. n=10 is useless. Because of the confounding variables and the noise that you need to sift out over many, many cases (i.e. matches).
The actual Accurate MMR values will not emerge from game data very quickly. And I think that is the MMR problem.
As I suggested and as the Glicko-2 rating system actually does, it revises both average rating (skill) and variance (uncertainty surrounding the average skill). So the system does as you suggest, actually requiring a fair number of matches to conclude. Don’t be confused by the fact that the average skill is updated. Until the uncertainty has gone sufficiently low, the system know that it has not quantified your skill with precision.
Imagine the sequence of skill ratings (skill~uncertainty):
1200~100 —> 1231~98 —> 1235~94 —> 1229~92 —> … —> 1197~21
The important thing here is not the movement of the rating from 1200 to 1235 and to 1197, but also the decrease of uncertainty from 100 to 21. Bayesian can be hard to grasp intuitively, but the end result is equivalent to what you are suggesting (based on a frequentist approach).
10 player, you need to know who is the best in a 5vs5 match adjusting mmr and having fun
match 1 —>A win.
match 2 --> 3 1-0 player and 2 0-1 vs 2 (1-0) & 3(0-1) : A win
match 3 —> mix again trying to have the same number of win/lose in every team.who have the best record (mmr) at the end is the winner.
glicko2 in this particular case is not so important if the system can menage to have perfect 50/50 match you can base all on w/l ratio.
You are very close to what I suggested because a 50-50% match is the most informative match on average. However, there are situations where 50-50% matches are not feasible (would require very long waiting time) so the system should be stable even if the probability fluctuates a little bit.
The other thing to consider, is that you can (and do) calculate MMR continuously, so you don’t need to have people finish 10 matches to revise their skill rating. The whole point of a Bayesian approach is that you can do that continuously.
Hello everyone, thanks for the input and the constructive discussion!
I have read a bit on the subject and will try to add some more information on how I believe things can be improved.
First, let me be clear on what Glicko (and Glicko-2) does: it takes a series of matches and assigns a rating and a variance to every player, based on a Bayesian approach. The specifics are not important, so we can treat Glicko, and any derived or similar rating algorithm as a black box.
What this means, in simple terms is that every player gets a rating, say 1000 plus an “uncertainty” associated with that rating, say 1000~200 (I use ~ instead of + or – because it does not mess formatting). A classification problem is simply how to put every player in the appropriate place/division and this requires two things that must happen in parallel:
1. Get the rating in the “true” range, for example 1000 —> 1600
2. Reduce the uncertainty, so that we know with precision the skill of the player 1000~200 --> 1600~30
Now, it is very important to realize that Glicko does not decide who plays against whom. Glicko will rate based on any match result you throw at it, so this is a decision made by the programmers. However, not all matches are equally useful for rating.
There is an information theoretical lower bound to the number of matches needed to classify players but it is quite low. Approximately, and without going into specifics, we can classify 1 million players accurately if every one plays something like a few dozen matches (on the order of log2(N) ~ 20 matches per player). This is a theoretical limit, but we can do quite well in practice if we choose the matches wisely.
Thankfully, there is a whole branch of mathematics concerned with “optimal experiment design” . Here experiment simply means testing whether team A is better than team B, so instead of “experiment” you can say PvP match and it makes perfect sense. A reasonable objective is to try and obtain with a minimum number of matches the maximum amount of information (this is called D-optimal design).
Now, I’m not a specialist on that domain, but reading a few pages from a book by a supra-intelligent dude called Fedorov (Theory of optimal experiments), I see that he suggests something similar to what I said before. If you cannot have all players play against all players (clearly not practical) the best alternative is to choose the matches where rating is quite close, with respect to the uncertainty (variance) around their skill.
To give a few examples, let’s say 1000 is emerald and 2000 is diamond, ~200 is a big uncertainty and ~10 a small uncertainty:
1. 1000~10 vs 1600~10: these teams are known to be quite different, no significant information will be gained by the match.
2. 1000~300 vs 1500~300: these teams look very different, but we are not really certain if they belong where they belong (big uncertainty). This match is probably worth playing.
3. 1010~20 vs 1030~20: these teams look very similar, but we can’t really say which one is better. This match is probably worth playing.
4. 1600~300 vs 1580~300: we really have no idea who is going to win. This match is worth playing.
The problem with the current approach that has been implemented in Guild Wars 2 is that excessive weight has been put into discriminating high vs low skill, but not enough effort has been invested into classifying player of low (or high) skill between them. Based on the design, we expect this to happen when people get into their “appropriate” division and start playing against people of (hopefully) similar skill. In the process, everyone wastes time in useless (non informative) matches.
Obviously, using a different match making design would mean separating the division/tier system (which is a reward system) from the skill-rating and match-making system. Now, I’m not saying that rewards should be independent of skill, but rather that our priority should be to determine skill, then we can see how to give out rewards.
As an example, if someone has provably a skill that is much higher than amber, there is no point in having him farm all pips individually. Maybe he should just get his reward and move on. The community should suggest how and when rewards get distributed.
I would intuitively suggest the following:
- Every player that has a sufficiently precise rating (low or high) gets out of amber. This simply means that this player has played a certain number of matches, say minimum 10 and is worth his “tier 1” reward because he is now a PvP contender.
- Every player gets a reward based on his rating: high rating —> high reward at every match.
- There is a bonus for winning.
- There is a bonus for the losing team if the win probability was <40%.
- Every monday, people get a badge based on their rating at that time. Amber and emerald badges cannot be lost once they have been gained.
These are simple suggestions, but if the match-making process is solid, any reward system should work.
I intend to add one more message to explain how the current system creates low MMR hell and why but I’m still working a bit on the specifics (notably, how many people can reside in low-MMR hell before they get out of it).
The MMR hell exists and I will try to show it to you mathematically in a future post in a thread I opened recently.
Thanks for the tip Salamander! Indeed the glicko-2 is very similar to what I was thinking about. I guess I was re-inventing the wheel.
I do suspect that the GW2 implementation has a fundamental theoretical flaw, but since I’m not a specialist in Bayesian, I’ll discuss this with a friend who has a PhD in that domain (Bayesian statistics, not GW2; although she might also like GW2… who knows).
If I have a point, I might spend some time generating some test data (imaginary players playing matches) and seeing how and when Glicko-2 converges based on some assumptions.
I’ll get back to you.
As many other PvPers I’ve been thinking about the new algorithm that has submitted me to countless ridiculous losses (funniest I had was something like 500v24) and equally ridiculous wins.
As I said in a previous post, the matching algorithm should accomplish two objectives:
1. give rank based on actual skill (ie discriminate/classify players)
2. make for fun matches
The actual algorithm probably accomplishes #1, at least based on very competent players who got endless winning streak and climbed very quickly. However, I feel that matches are less fun because they are either very easy to win or not even worth fighting. The situation has improved slightly (I feel) the last 24-48h, but I don’t know if this is going to continue and how it is going to end.
My input, as a PhD bioinformatician, would be to consider two additional elements:
1. matches that are very well balanced are most informative at the end: if your prediction algorithm is worth anything at all, then a match ending 500vs100 is probably a huge loss of time for both teams. The algorithm probably already knew what was going to happen and did not gain significant knowledge from this match. However, if team A vs team B is ranked 50% win probability, then the actual output of the match will matter, because the algorithm has no idea what is going to happen (this is what it means to predict 50% win probability, ie flip of a coin). Obviously, one should integrate some degree of luck/flexibility, meaning that a win at 500vs498 is not the same as a win at 500vs440 (clear win, but balanced match). So, my argument is that closely balanced matches will be fun AND provide more information where it matters, ie between teams that are quite close in skill.
2. The evolution of skill measures could be based on a bayesian “learning” framework. The divisions are built a posteriori after the skill has been computed with reasonable confidence. This is a bit technical, but the basic idea is that players start with a broad range of expected skill. Say their MMR is 2000 /- 2000 (ie from complete newb to pro player). Every match contributes to the refinement of our understanding of a player’s skill as an average but also as a variance: what is the “minimum” skill we expect this player to have at 99% probability and what is the “maximum” skill he could possibly have? So, progressively, this becomes finer to something like 1650/-500 then 1973+/-30 etc. In order to calculate divisions you just give bonuses and “tiers” every time the minimum at 95% (ie average-variance up to 95%) goes over some preset value or over some quantile (percentage) of the population. Also, you can give out a “pvp present” every day based on where the player stands, for example, so that low ranking players get some daily bonuses after their first daily match or every time their average increases (or their variance decreases). The advantage of this solution is that there is no need to actually fix tournament dates: a bayesian framework progressively improves the quality of the prediction over time, so you can reset it to zero, if you want, or just let a “perpetual” leaderboard and distribute titles or whatever every month.
Anyway, there is a rich literature on the subject of machine learning for classification and bayesian models but in the end I think it should be feasible to make for tight matches between teams of comparable skill AND also put every player where he belongs.
Let me say otherwise that GW2 is a great game. I hope you’ll find a fun and fair solution!
I was wondering what exactly is the purpose of the MMR:
Get people to the “appropriate” division as quickly as possible
OR
Organize meaningful and enjoyable matches
Many seem to want the former, which is what the current MMR implements. However, what I have witnessed the last few days is an alternating stream of ridiculous losses (500 vs 100) and also ridiculous wins. Where are the 500 vs 480 matches that made the game enjoyable?
Don’t get me wrong: rank and actual skill should go together, but the process of separating players based on skill must be a little more fun.
GW1 has some astonishing pvp. It shouldn’t be that hard to introduce something like Hall of Heroes or at least Team Arenas. Game mechanics are excellent in GW2, but as a hardcore PvP player in GW1, I am expecting more.
I was thinking the same. I can’t find where I am standing compared to other players. I hope I am not missing something really obvious!
As a side note, it would be rather useful to see one’s ranking over a selected period, ie who was the best player of the day/week/month/all time/whatever.
Thanks for your time.