Computer science approach to matching?

Computer science approach to matching?

in PvP

Posted by: Ponidis The Psycho.8612

Ponidis The Psycho.8612

As many other PvPers I’ve been thinking about the new algorithm that has submitted me to countless ridiculous losses (funniest I had was something like 500v24) and equally ridiculous wins.

As I said in a previous post, the matching algorithm should accomplish two objectives:
1. give rank based on actual skill (ie discriminate/classify players)
2. make for fun matches

The actual algorithm probably accomplishes #1, at least based on very competent players who got endless winning streak and climbed very quickly. However, I feel that matches are less fun because they are either very easy to win or not even worth fighting. The situation has improved slightly (I feel) the last 24-48h, but I don’t know if this is going to continue and how it is going to end.

My input, as a PhD bioinformatician, would be to consider two additional elements:
1. matches that are very well balanced are most informative at the end: if your prediction algorithm is worth anything at all, then a match ending 500vs100 is probably a huge loss of time for both teams. The algorithm probably already knew what was going to happen and did not gain significant knowledge from this match. However, if team A vs team B is ranked 50% win probability, then the actual output of the match will matter, because the algorithm has no idea what is going to happen (this is what it means to predict 50% win probability, ie flip of a coin). Obviously, one should integrate some degree of luck/flexibility, meaning that a win at 500vs498 is not the same as a win at 500vs440 (clear win, but balanced match). So, my argument is that closely balanced matches will be fun AND provide more information where it matters, ie between teams that are quite close in skill.

2. The evolution of skill measures could be based on a bayesian “learning” framework. The divisions are built a posteriori after the skill has been computed with reasonable confidence. This is a bit technical, but the basic idea is that players start with a broad range of expected skill. Say their MMR is 2000 /- 2000 (ie from complete newb to pro player). Every match contributes to the refinement of our understanding of a player’s skill as an average but also as a variance: what is the “minimum” skill we expect this player to have at 99% probability and what is the “maximum” skill he could possibly have? So, progressively, this becomes finer to something like 1650/-500 then 1973+/-30 etc. In order to calculate divisions you just give bonuses and “tiers” every time the minimum at 95% (ie average-variance up to 95%) goes over some preset value or over some quantile (percentage) of the population. Also, you can give out a “pvp present” every day based on where the player stands, for example, so that low ranking players get some daily bonuses after their first daily match or every time their average increases (or their variance decreases). The advantage of this solution is that there is no need to actually fix tournament dates: a bayesian framework progressively improves the quality of the prediction over time, so you can reset it to zero, if you want, or just let a “perpetual” leaderboard and distribute titles or whatever every month.

Anyway, there is a rich literature on the subject of machine learning for classification and bayesian models but in the end I think it should be feasible to make for tight matches between teams of comparable skill AND also put every player where he belongs.

Let me say otherwise that GW2 is a great game. I hope you’ll find a fun and fair solution!

Computer science approach to matching?

in PvP

Posted by: Ravenmoon.5318

Ravenmoon.5318

nobody in anet cares. They don’t even read their own forum man. You have higher chances posting this to reddit, they seem more active in there.

The forums is for plebs with free time.

And in case ANet does read, I would like to know the metrics they use for my account to match me with apparently first timers in soloQ against legendary players from last season. That’d be great.

Every single time I’ve received a pug, he/she has been worse than the last pug. Even though I am winning. I keep getting scrubs and quite frankly I’m tired of carrying even with a simple to carry class like mesmer. I’ll get to to ruby tonight and I’m done with all of this. Might even go play some BDO for a change. cringe

(edited by Ravenmoon.5318)

Computer science approach to matching?

in PvP

Posted by: brannigan.9831

brannigan.9831

The matches were not mostly competitive nor was league placement an indication of skill in season 1 so by those metrics season one was even worse. You just got more wins forced for you with MMR averaging and the 50/50 winloss bs.

Computer science approach to matching?

in PvP

Posted by: Salamander.2504

Salamander.2504

Have you taken a look at the glicko 2 algorithm that ANet adapted for their use? It sounds pretty similar to what you’re describing, with the exception that glicko 2 only factors in a win vs. a loss rather than assigning different ratings for close games. I’m curious what your thoughts are on the math behind the current system:

http://www.glicko.net/glicko/glicko2.pdf

Computer science approach to matching?

in PvP

Posted by: Ponidis The Psycho.8612

Ponidis The Psycho.8612

Thanks for the tip Salamander! Indeed the glicko-2 is very similar to what I was thinking about. I guess I was re-inventing the wheel.

I do suspect that the GW2 implementation has a fundamental theoretical flaw, but since I’m not a specialist in Bayesian, I’ll discuss this with a friend who has a PhD in that domain (Bayesian statistics, not GW2; although she might also like GW2… who knows).

If I have a point, I might spend some time generating some test data (imaginary players playing matches) and seeing how and when Glicko-2 converges based on some assumptions.

I’ll get back to you.

Computer science approach to matching?

in PvP

Posted by: Jack Daniels.5270

Jack Daniels.5270

Random might be better than this. If you don’t get over 50% good games with algorithms random tends to do it better.

Computer science approach to matching?

in PvP

Posted by: Jourdelune.7456

Jourdelune.7456

Since you are working to get a SKILL based RANKING system, please +1 this petition.

https://forum-en.gw2archive.eu/forum/game/pvp/Skill-based-or-Grind-based-rank-system/first#post6032785

I did quote your idea on this thread, to make sure your idea would be read in that petition.

Thank you,

Dal Aï Lhama (Tempest), Dal Lahu Akbar (DH), Lord Dhal of Dharma (Scrapper) 12k+ spvp games.
Former Team Captain of ggwp (ESL weekly), GLHF (AG), MIST[CORE] spvp alliance guild.
https://www.reddit.com/r/GuildWars2PvPTeams/

Computer science approach to matching?

in PvP

Posted by: vulneraria.4865

vulneraria.4865

I don’t think you need a phd for understand that this system cannot works

10 player, you need to know who is the best in a 5vs5 match adjusting mmr and having fun
match 1 —>A win.
match 2 —> 3 1-0 player and 2 0-1 vs 2 (1-0) & 3(0-1) : A win
match 3 —> mix again trying to have the same number of win/lose in every team.

who have the best record (mmr) at the end is the winner.

glicko2 in this particular case is not so important if the system can menage to have perfect 50/50 match you can base all on w/l ratio.

glicko2 is usefull in full team match where the system cannot put them to 50/50, so here you need to weight the mmr outcome based on pre-match supposed chance.

sUk Clan

Computer science approach to matching?

in PvP

Posted by: Salamander.2504

Salamander.2504

Thanks for the tip Salamander! Indeed the glicko-2 is very similar to what I was thinking about. I guess I was re-inventing the wheel.

I do suspect that the GW2 implementation has a fundamental theoretical flaw, but since I’m not a specialist in Bayesian, I’ll discuss this with a friend who has a PhD in that domain (Bayesian statistics, not GW2; although she might also like GW2… who knows).

If I have a point, I might spend some time generating some test data (imaginary players playing matches) and seeing how and when Glicko-2 converges based on some assumptions.

I’ll get back to you.

That sounds awesome, looking forward to it.

Computer science approach to matching?

in PvP

Posted by: Laserbolt.6731

Laserbolt.6731

Ponidus,

There is a HUGE amount of noise in the data.

Yes, Team A won. But why?

There is the mix of classes in the match, and it matters how they work to effect a win, or a loss. Class interactions matter.

There are decisions made by each team member that affect the win or loss. Some can be disastrous based on the situation when it was made.

Yet the MMR is adjusted after a single match. A single case. A single data point.

If you had the matched teams play each other 20 matches in a row, you would not get the same number of wins per team! Each player has a “range of skill” and they do not always play their best. In one set of 20 games Team A might win 15. And another series with the same players, Team A might win 10.

That’s a reason why the World Series is seven games; not one.

Yet the MMRs are all adjusted based on ONE match of the two teams.

So, just as in a double blind controlled case study, you need LOTS of data points. n=100,000 is good. n=10 is useless. Because of the confounding variables and the noise that you need to sift out over many, many cases (i.e. matches).

The actual Accurate MMR values will not emerge from game data very quickly. And I think that is the MMR problem.

There is a low PvP population, and they need to throw together matches in a few minutes. They get bad data from such matches about individual player skill. And then they use that bad data to generate MMR adjustments.

Scrapper: “Frank from Research”

(edited by Laserbolt.6731)

Computer science approach to matching?

in PvP

Posted by: Fluffball.8307

Fluffball.8307

There is the mix of classes in the match, and it matters how they work to effect a win, or a loss. Class interactions matter.

Obviously skill is overwhelming more important than professions (55 dragons on 5 thieves will beat 5 newbies on any profession), but it would be beyond amazing if anet could somehow quantify profession mixes into MMR. I.e. if you win with 5 thieves on your team it moves you up more, and if you lose with 5 thieves it moves you down less or even not at all.

Computer science approach to matching?

in PvP

Posted by: Crinn.7864

Crinn.7864

but it would be beyond amazing if anet could somehow quantify profession mixes into MMR.

Impractical. The meta is too dynamic. Not to mention such a system would be fundamentally biased.

Plus we don’t even know if the meta is correct. (let’s be honest, half the meta is based on preconceptions, and the other half of the meta is based on blind faith in proleague players)

Sanity is for the weak minded.
YouTube

Computer science approach to matching?

in PvP

Posted by: Laserbolt.6731

Laserbolt.6731

I was just describing the actual reality, not suggesting a solution.

I am nearly certain there is no budget for any kind of big rewrite of the matchmaking programming. I could be wrong, but I don’t see evidence of a lot of resources going into PvP. And the names keep changing in the past two years. Justin O’ Dell was pretty good, and talked to us on the forum. Evan seems like a good guy. But I sense he may be the only one on PvP duty.

Scrapper: “Frank from Research”

(edited by Laserbolt.6731)

Computer science approach to matching?

in PvP

Posted by: Crinn.7864

Crinn.7864

I was just describing the actual reality, not suggesting a solution.

I am nearly certain there is no budget for any kind of big rewrite of the matchmaking programming. I could be wrong, but I don’t see evidence of a lot of resources going into PvP. And the names keep changing in the past two years. Justin O’ Dell was pretty good, and talked to us on the forum.

MMR is fine, the problem is that they keep mucking with it in the name of pips. The old pre leagues system of pure glicko2 matchmaking was fairly accurate.

Sanity is for the weak minded.
YouTube

Computer science approach to matching?

in PvP

Posted by: Ensign.2189

Ensign.2189

GW2 uses Glicko2 ratings, which perform reasonably. What you’re describing with a Bayesian approach is more or less Microsoft’s Trueskill algorithm; you just need a step to inject some uncertainty between each match so it doesn’t arbitrarily converge to zero uncertainty without the possibility of further refinements.

That ends up working like a Kalman filter tracking a fundamentally unknown, moving target, which a player’s skill is.

There are of course improvements that can be made to the above, especially for a team game (how do you combine 5 player ratings into a team rating?), but any of the above is sufficient for what GW2 wants to accomplish.

The issues we’re seeing with the ladder are unrelated to accurate matchmaking, but are instead a product of the desire for the ladder to be both a rating system and a progression system. the former more or less has to be compromised to fit in the latter, and how effective the latter is, well, there’s not a good statistical measure for that.

Computer science approach to matching?

in PvP

Posted by: Fluffball.8307

Fluffball.8307

but it would be beyond amazing if anet could somehow quantify profession mixes into MMR.

Impractical

I think I agree… maybe. Anything with as many numbers as GW2 PvP games can be quantified. If you look at 1,000,000 games and utterly ignore everything but compositions, you can start to pick out optimal team setups for solo q.

We can all name situations where 5 thieves would win, but we just ignore that and see if they would usually win. And against what comps do they win and lose. I’m sure you could weight it in the MMR.

Computer science approach to matching?

in PvP

Posted by: Laserbolt.6731

Laserbolt.6731

We talk about people’s MMR.

I think you can only capture a range of what it might be, with the middle the highest confidence. YES, there is the R number that captures uncertainty.

With more games the range supposedly gets less fuzzy and more like a solid line at the middle.

I don’t think that can really happen accurately in the current system. Why? Because of the current low player population and the need to put matches together in a reasonable wait time.

Attachments:

Scrapper: “Frank from Research”

Computer science approach to matching?

in PvP

Posted by: Phaeton.9582

Phaeton.9582

As many other PvPers I’ve been thinking about the new algorithm that has submitted me to countless ridiculous losses (funniest I had was something like 500v24) and equally ridiculous wins.

As I said in a previous post, the matching algorithm should accomplish two objectives:
1. give rank based on actual skill (ie discriminate/classify players)
2. make for fun matches

The actual algorithm probably accomplishes #1, at least based on very competent players who got endless winning streak and climbed very quickly. However, I feel that matches are less fun because they are either very easy to win or not even worth fighting. The situation has improved slightly (I feel) the last 24-48h, but I don’t know if this is going to continue and how it is going to end.

My input, as a PhD bioinformatician, would be to consider two additional elements:
1. matches that are very well balanced are most informative at the end: if your prediction algorithm is worth anything at all, then a match ending 500vs100 is probably a huge loss of time for both teams. The algorithm probably already knew what was going to happen and did not gain significant knowledge from this match. However, if team A vs team B is ranked 50% win probability, then the actual output of the match will matter, because the algorithm has no idea what is going to happen (this is what it means to predict 50% win probability, ie flip of a coin). Obviously, one should integrate some degree of luck/flexibility, meaning that a win at 500vs498 is not the same as a win at 500vs440 (clear win, but balanced match). So, my argument is that closely balanced matches will be fun AND provide more information where it matters, ie between teams that are quite close in skill.

2. The evolution of skill measures could be based on a bayesian “learning” framework. The divisions are built a posteriori after the skill has been computed with reasonable confidence. This is a bit technical, but the basic idea is that players start with a broad range of expected skill. Say their MMR is 2000 /- 2000 (ie from complete newb to pro player). Every match contributes to the refinement of our understanding of a player’s skill as an average but also as a variance: what is the “minimum” skill we expect this player to have at 99% probability and what is the “maximum” skill he could possibly have? So, progressively, this becomes finer to something like 1650/-500 then 1973+/-30 etc. In order to calculate divisions you just give bonuses and “tiers” every time the minimum at 95% (ie average-variance up to 95%) goes over some preset value or over some quantile (percentage) of the population. Also, you can give out a “pvp present” every day based on where the player stands, for example, so that low ranking players get some daily bonuses after their first daily match or every time their average increases (or their variance decreases). The advantage of this solution is that there is no need to actually fix tournament dates: a bayesian framework progressively improves the quality of the prediction over time, so you can reset it to zero, if you want, or just let a “perpetual” leaderboard and distribute titles or whatever every month.

Anyway, there is a rich literature on the subject of machine learning for classification and bayesian models but in the end I think it should be feasible to make for tight matches between teams of comparable skill AND also put every player where he belongs.

Let me say otherwise that GW2 is a great game. I hope you’ll find a fun and fair solution!

This is a very good post.

However, you’re overlooking the fact that the division system is there to prevent players from moving up, as well as good ones from being pushed down. It divides.

If anything is to be ‘improved’ it will be from looking at the objectives in terms of player experience – the matchmaking algorithm is secondary here, unless it can reflect not just even matches, but relate the way players aquire pips/alternate progression.


Phaatonn, London UK

(edited by Phaeton.9582)

Computer science approach to matching?

in PvP

Posted by: Jourdelune.7456

Jourdelune.7456

MMR is fine, the problem is that they keep mucking with it in the name of pips. The old pre leagues system of pure glicko2 matchmaking was fairly accurate.

It was a great system with some down sides that could had been tweak. Namely Rank Decay and number of matches played ceiling or giving more weight for people with more matches played. (many new account were able to be in the top 100 with less than 100 games…)

Dal Aï Lhama (Tempest), Dal Lahu Akbar (DH), Lord Dhal of Dharma (Scrapper) 12k+ spvp games.
Former Team Captain of ggwp (ESL weekly), GLHF (AG), MIST[CORE] spvp alliance guild.
https://www.reddit.com/r/GuildWars2PvPTeams/

Computer science approach to matching?

in PvP

Posted by: Ponidis The Psycho.8612

Ponidis The Psycho.8612

Hello everyone, thanks for the input and the constructive discussion!

I have read a bit on the subject and will try to add some more information on how I believe things can be improved.

First, let me be clear on what Glicko (and Glicko-2) does: it takes a series of matches and assigns a rating and a variance to every player, based on a Bayesian approach. The specifics are not important, so we can treat Glicko, and any derived or similar rating algorithm as a black box.

What this means, in simple terms is that every player gets a rating, say 1000 plus an “uncertainty” associated with that rating, say 1000~200 (I use ~ instead of + or – because it does not mess formatting). A classification problem is simply how to put every player in the appropriate place/division and this requires two things that must happen in parallel:
1. Get the rating in the “true” range, for example 1000 —> 1600
2. Reduce the uncertainty, so that we know with precision the skill of the player 1000~200 --> 1600~30

Now, it is very important to realize that Glicko does not decide who plays against whom. Glicko will rate based on any match result you throw at it, so this is a decision made by the programmers. However, not all matches are equally useful for rating.

There is an information theoretical lower bound to the number of matches needed to classify players but it is quite low. Approximately, and without going into specifics, we can classify 1 million players accurately if every one plays something like a few dozen matches (on the order of log2(N) ~ 20 matches per player). This is a theoretical limit, but we can do quite well in practice if we choose the matches wisely.

Thankfully, there is a whole branch of mathematics concerned with “optimal experiment design” . Here experiment simply means testing whether team A is better than team B, so instead of “experiment” you can say PvP match and it makes perfect sense. A reasonable objective is to try and obtain with a minimum number of matches the maximum amount of information (this is called D-optimal design).

Now, I’m not a specialist on that domain, but reading a few pages from a book by a supra-intelligent dude called Fedorov (Theory of optimal experiments), I see that he suggests something similar to what I said before. If you cannot have all players play against all players (clearly not practical) the best alternative is to choose the matches where rating is quite close, with respect to the uncertainty (variance) around their skill.

To give a few examples, let’s say 1000 is emerald and 2000 is diamond, ~200 is a big uncertainty and ~10 a small uncertainty:
1. 1000~10 vs 1600~10: these teams are known to be quite different, no significant information will be gained by the match.
2. 1000~300 vs 1500~300: these teams look very different, but we are not really certain if they belong where they belong (big uncertainty). This match is probably worth playing.
3. 1010~20 vs 1030~20: these teams look very similar, but we can’t really say which one is better. This match is probably worth playing.
4. 1600~300 vs 1580~300: we really have no idea who is going to win. This match is worth playing.

The problem with the current approach that has been implemented in Guild Wars 2 is that excessive weight has been put into discriminating high vs low skill, but not enough effort has been invested into classifying player of low (or high) skill between them. Based on the design, we expect this to happen when people get into their “appropriate” division and start playing against people of (hopefully) similar skill. In the process, everyone wastes time in useless (non informative) matches.

Obviously, using a different match making design would mean separating the division/tier system (which is a reward system) from the skill-rating and match-making system. Now, I’m not saying that rewards should be independent of skill, but rather that our priority should be to determine skill, then we can see how to give out rewards.

As an example, if someone has provably a skill that is much higher than amber, there is no point in having him farm all pips individually. Maybe he should just get his reward and move on. The community should suggest how and when rewards get distributed.

I would intuitively suggest the following:
- Every player that has a sufficiently precise rating (low or high) gets out of amber. This simply means that this player has played a certain number of matches, say minimum 10 and is worth his “tier 1” reward because he is now a PvP contender.
- Every player gets a reward based on his rating: high rating —> high reward at every match.
- There is a bonus for winning.
- There is a bonus for the losing team if the win probability was <40%.
- Every monday, people get a badge based on their rating at that time. Amber and emerald badges cannot be lost once they have been gained.
These are simple suggestions, but if the match-making process is solid, any reward system should work.

I intend to add one more message to explain how the current system creates low MMR hell and why but I’m still working a bit on the specifics (notably, how many people can reside in low-MMR hell before they get out of it).

Computer science approach to matching?

in PvP

Posted by: Ponidis The Psycho.8612

Ponidis The Psycho.8612

10 player, you need to know who is the best in a 5vs5 match adjusting mmr and having fun
match 1 —>A win.
match 2 --> 3 1-0 player and 2 0-1 vs 2 (1-0) & 3(0-1) : A win
match 3 —> mix again trying to have the same number of win/lose in every team.

who have the best record (mmr) at the end is the winner.

glicko2 in this particular case is not so important if the system can menage to have perfect 50/50 match you can base all on w/l ratio.

You are very close to what I suggested because a 50-50% match is the most informative match on average. However, there are situations where 50-50% matches are not feasible (would require very long waiting time) so the system should be stable even if the probability fluctuates a little bit.

The other thing to consider, is that you can (and do) calculate MMR continuously, so you don’t need to have people finish 10 matches to revise their skill rating. The whole point of a Bayesian approach is that you can do that continuously.

Computer science approach to matching?

in PvP

Posted by: Ponidis The Psycho.8612

Ponidis The Psycho.8612

Ponidus,

There is a HUGE amount of noise in the data.

Yes, Team A won. But why?

Yet the MMRs are all adjusted based on ONE match of the two teams.

So, just as in a double blind controlled case study, you need LOTS of data points. n=100,000 is good. n=10 is useless. Because of the confounding variables and the noise that you need to sift out over many, many cases (i.e. matches).

The actual Accurate MMR values will not emerge from game data very quickly. And I think that is the MMR problem.

As I suggested and as the Glicko-2 rating system actually does, it revises both average rating (skill) and variance (uncertainty surrounding the average skill). So the system does as you suggest, actually requiring a fair number of matches to conclude. Don’t be confused by the fact that the average skill is updated. Until the uncertainty has gone sufficiently low, the system know that it has not quantified your skill with precision.

Imagine the sequence of skill ratings (skill~uncertainty):
1200~100 —> 1231~98 —> 1235~94 —> 1229~92 —> … —> 1197~21

The important thing here is not the movement of the rating from 1200 to 1235 and to 1197, but also the decrease of uncertainty from 100 to 21. Bayesian can be hard to grasp intuitively, but the end result is equivalent to what you are suggesting (based on a frequentist approach).

Computer science approach to matching?

in PvP

Posted by: Ponidis The Psycho.8612

Ponidis The Psycho.8612

GW2 uses Glicko2 ratings, which perform reasonably. What you’re describing with a Bayesian approach is more or less Microsoft’s Trueskill algorithm; you just need a step to inject some uncertainty between each match so it doesn’t arbitrarily converge to zero uncertainty without the possibility of further refinements.

That ends up working like a Kalman filter tracking a fundamentally unknown, moving target, which a player’s skill is.

There are of course improvements that can be made to the above, especially for a team game (how do you combine 5 player ratings into a team rating?), but any of the above is sufficient for what GW2 wants to accomplish.

The issues we’re seeing with the ladder are unrelated to accurate matchmaking, but are instead a product of the desire for the ladder to be both a rating system and a progression system. the former more or less has to be compromised to fit in the latter, and how effective the latter is, well, there’s not a good statistical measure for that.

Great post. However, I would phrase that a bit differently: the problem with the ladder is related to accurate matchmaking, because the progression system (or reward system) which assigns titles and bonuses should not figure into matchmaking. It’s the matchmaking that helps construct ratings and ratings that should define reward, not the other way around. When you start using rewards (divisions, tiers) to define matchmaking, you get a system that does not choose informative (meaningful and, also, fun) matches.

Personally, I don’t care about the rewards. I used to play GW1 PvP at a top #200 guild and we did not get any rewards at all for PvP play, other than the pleasure of seeing our match broadcast. Nowadays, I get a ton of rewards but matches in S2 are not that fun. At least for now. Maybe the system will converge, after I have been farmed by all the better players out there.

Computer science approach to matching?

in PvP

Posted by: Phaeton.9582

Phaeton.9582

There is an alternative to removing the tier system altogether, which is to introduce placements at the start of the season, coupled with growing the player base substantially.

Divisions can be fun, and have a proven track record on other (bigger) pvp games.


Phaatonn, London UK