yaron, yes, we are agreeing on almost everything. I think you are conceptually missing something with RD though. You keep saying...
Quote:
(actually, it is sound to use uncertainty in matching, but you have to use it both ways: the 800(350) noob can be matched with anyone in the 450-1150 range, while an established 1250(50) only has the 1200-1300 range. This also solves the problem of noobs not finding sim games)
This is not how RD (standard deviation) works. 800(350) mathematically and statistically means that there is 68.27% probability that the players true rating lies between 450 - 1150. That means that there's a 31.73% chance that the players true rating is not in this range. It's customary to take standard deviations out to 3 RD which gives a 99.73% confidence interval. Statistically that means that we can say with 99.73% confidence that the players true rating lies between -250 and 1850. In engineering as well as many real world applications and forecasting, it is customary to use 6 RD which gives a 99.9999998027% confidence interval or 1 in 506,800,000 lie outside this range.
yaron wrote:
This can be solved either by giving new players lower, realistic scores (the method I prefer). It can also be solved by introducing specific exceptions to both systems, to deal with new players' unrealistically high scores (the method you seem to prefer). When I said that these exceptions are "hacking the system", I didn't mean the rating system itself, but rather the match-making system, and the (purely cosmetic) display system.
My problem with introducing these exceptions is twofold:
1. It makes those systems more complicated.
2. You will have to introduce such exceptions to every future system that looks at ratings, because the basic issue (new players having unrealistic scores) has not been solved.
1) The change is actually a very simple equation.
2) The basic issue is solved. There's no reason to ever have to change it again. I think you're missing something here. The Glicko rating system could care less if you match a 250(350) player against a 1900(50) player. It will just spit out the rating changes. It doesn't really matter if we start players with 800(350) or 1300(350). After enough games they will migrate to the same rating. Choosing a starting rating of 800(350) should make the player reach their true rating faster, so I agree that this is a more reasonable starting value.
What I'm saying is that I think the system is fine the way it is as long as it's not used in its current state for matchmaking for new players. This is for the reasons we both mentioned above about players realistically coming into TFW are probably truly low rated players, initially.
Quote:
[[ I didn't see anything in Mark Glickman's docs about starting new players with the average. I really think he doesn't care about starting ratings because he's only concerned with how ratings affect other ratings, and high uncertainty takes care of that. ]]
Actually, it's the first step he mentions.
Step 1.
Determine a rating and RD for each player at the onset of the rating period.
(a)
If the player is unrated, set the rating to 1500 and the RD to 350.
Here's my source, a paper written by Mark Glickman himself.
http://math.bu.edu/people/mg/glicko/glicko.doc/glicko.html In the end, this shouldn't matter as long as the matchmaking system is solid.
I know I've debated a lot of what you said in this post yaron, but I actually agree with almost everything you said. I agree that I think there is some confusion between us on a couple points though.