I’m throwing some ideas out for testing match-making policies without putting thinking, feeling human beings through them. Maybe these ideas help someone at ArenaNet, maybe not, maybe that someone knows them already. In any case, here they are.
Player experience in structured PvP can be quantified by statistical metrics. For example, the average experience is a balanced, neutral mood if
- losing streaks are short,
- win-rate of most players are close to 50%, and
- the variance of win-rate is small.
Conversely, if win-rate fluctuates a lot and most losing streaks are long, then people perceive match-making to be unfair. To evaluate a previously implemented match-making policy, ArenaNet can compute the relevant metrics from matches in the past. But it is hard to gather the metrics for new policies without trying them out. Anecdotal evidence suggests that the matchmaking policy of season 2 is not performing too well on variance of win-rate across time and on the average length of losing streaks.
The general idea is: Estimate the metrics of new policies by simulation.
- Build a model of player skill that predicts the outcomes of matches with reasonable accuracy.
- Simulate matches with the player-skill model and some candidate match-making policy.
- Compute the metrics from the simulation result.
(1) is the most difficult step because player skill is hard for the game to observe directly. Here’s a simple model: the team with the higher average win-rate is predicted to win. One can definitely build more sophisticated models using tools like hidden Markov models or Markov decision processes. Without the real data, it’s hard to say how accurate a model is; perhaps the sum of win-rates is a good enough predictor, who knows. To measure accuracy, one could build the model from half the data of season 1 and test it on the other half.
Parties queueing in together is a tricky part of (2). One can handle it with various degrees of sophistication.
- Use previously formed parties.
- Build a statistical model for parties based on previously formed parties.
- Build a statistical model for parties based on statistical models for friends, guilds, time zones and so on.
Players gaming the system present a confounding factor. Again, one would have to look at the actual data to know whether these players make a statistically significant impact. If they do, assumptions must be changed to account for such behavior. For example, instead of “players play to win all the time”, the skill model may assume “players play to win only if their MMR is low enough”. (That is not the case for season 2, of course; the current match-making policy seems especially designed to make intentional losing unprofitable.)