Login    Forum    Search    FAQ

Board index » Suggestion Box




Post new topic Reply to topic  [ 36 posts ]  Go to page Previous  1, 2, 3, 4  Next
Author Message
 Post subject: Re: Suggestions from a relative noob – Starting Ratings
 Post Posted: Fri Apr 02, 2010 4:39 pm 
Offline

Joined: Thu Jan 07, 2010 10:41 pm
Posts: 83
A hidden rating system is easy to implement mathematically. It doesn't have to a complex formula to get the job done as a simple 2nd degree polynomial equation can gradually move a players rating from any starting value to their actual value over a period of time.

Sometimes it's a lot easier to understand it visually. The image below shows an example of a new player starting with a starting rating of 0. The table on the left shows their Actual Rating and their Displayed Rating. Their Actual Rating is hidden for the first 20 games and all they see is the Displayed Rating in the right column. As you can see, their rating increases even when they lose, and if they win it makes a much bigger jump. The plot on the right essentially shows how much of the players rating is still hidden. You can see that the first time their rating ever goes down from losing is after their 19th game and it only goes down 6 points. After the 20th game, their Displayed Rating = Actual Rating and the provisional rating period is over.

Image

The coefficients in this formula can easily be modified to start a new player with any rating for any provisional time period. So for example, you could start a new player with a 500 rating and remain in provisional rating for 50 games. The only difference would be a more gradual migration towards their Actual Rating.

For those who were wondering exactly how the Displayed Rating was calculated, it is:

Actual Rating * (100 - y) where y is the formula displayed on the chart, aka the percentage of rating hidden.


Last edited by Psyclone on Fri Apr 02, 2010 5:02 pm, edited 1 time in total.

Top 
 Profile  
 
 Post subject: Re: Suggestions from a relative noob – Starting Ratings
 Post Posted: Fri Apr 02, 2010 4:53 pm 
Offline

Joined: Thu Nov 26, 2009 4:10 am
Posts: 829
Just proves that hidden ratings could improve this game a lot more ^_^.


Top 
 Profile  
 
 Post subject: Re: Suggestions from a relative noob – Starting Ratings
 Post Posted: Fri Apr 02, 2010 4:57 pm 
Offline

Joined: Tue Jan 08, 2008 2:07 am
Posts: 1045
Pscylone: Yes this is a better idea than just hiding it completely. x needs to be uncertainty rather than games played though.

Zblader: I don't think yaron was trying to offend you. It is often hard to tell if people are joking when you are reading.


Top 
 Profile  
 
 Post subject: Re: Suggestions from a relative noob – Starting Ratings
 Post Posted: Fri Apr 02, 2010 5:53 pm 
Offline

Joined: Thu Jan 07, 2010 10:41 pm
Posts: 83
jed: Maybe I'm getting confused here, but isn't uncertainty only used for matchmaking?

I think there are two slightly different discussions going on in this thread. This might sound a little complicated or confusing, but I think we need to separate the discussion of Hidden Ratings from the discussion of Starting Rating.

HIDDEN (DISPLAYED) RATING

The Displayed Rating's only purpose is to give the new player a sense of progress until they play enough games to where they are close to their actual rating. Displayed Rating is 100% cosmetic and has no effect on who a player is matched up against. That is still done by the current Actual Rating - (Uncertainty).

STARTING RATING

The Starting Rating is a different topic which is what I started this thread discussing. We kind of got off on a tangent when Hidden (Displayed) Rating came up.

There's some concern that the Starting Rating is too high for new players. You can't simply change a new players starting rating without changing the balance of the system in place. I think the most important thing is making sure that these new players get paired up against some of the lower rated players or other new players. This should theoretically provide a much more competitive and pleasant learning experience.

IMPLEMENTATION

New players, on average, are going to lose far more games than they win. I think you can easily implement both Hidden Rating and some type of provisional matchmaking formula to make sure new players are playing against the correct players and don't see their rating take a huge hit when they lose.

Adding a Hidden (Displayed) Rating will be much less demoralizing to the new player and hide the fact that their Starting Rating is 1300. I don't think you need to hide the Uncertainty. Once again, this effect is purely cosmetic and not used in any way for matchmaking purposes.

A provisional matchmaking formula can be something simple as I mentioned in my initial post. Some type of provisional formula that dips a new players rating down a little further than the normal formula. For example:

Rating – (1.5 * Uncertainty)
or
Rating – Uncertainty – (200 - Games Played * 10)

This would last for only the first 20 games or so and ensure that new players are playing against weaker opponents than the current model.


Top 
 Profile  
 
 Post subject: Re: Suggestions from a relative noob – Starting Ratings
 Post Posted: Sat Apr 03, 2010 12:33 pm 
Offline

Joined: Tue Jul 28, 2009 1:47 am
Posts: 150
jed wrote:
Zblader: I don't think yaron was trying to offend you. It is often hard to tell if people are joking when you are reading.


Indeed, no offense meant, Zblader. My first post to the thread was about losing 300 points in the beginning, and how hard it was for me to get them back, so your "give or take 300 points" comment made me think you might be jesting.

Psyclone wrote:
jed: Maybe I'm getting confused here, but isn't uncertainty only used for matchmaking?


Uncertainty is a key component of the Glicko rating system (which is used in TFW). It's a measure of the rating system's level of confidence in a specific player's rating. This means that the ratings of players with high uncertainty fluctuate more wildly after they win/lose a game (because the system wasn't very sure about those ratings to begin with). conversely, when you play against a high uncertainty opponent, your rating isn't significantly altered (because the system can't rely on your opponent's rating in order to make good inferences about your own ability).


Top 
 Profile  
 
 Post subject: Re: Suggestions from a relative noob – Starting Ratings
 Post Posted: Sat Apr 03, 2010 1:33 pm 
Offline

Joined: Tue Jul 28, 2009 1:47 am
Posts: 150
The basic design principle of the rating system is this:

A player's rating accurately reflects that player's ability to win games.

Every system in TFW that looks at players' ratings assumes this implicitly: similar matching pairs people with similar ratings, on the assumption that they're evenly matched; we display people's ratings in their profiles, assuming this gives them a realistic measure of how they're doing in the game; we award ladder points based on rating; etc.

As long as this principle holds true, all those systems are going to work as advertised.

However, if the rating of new players does not reflect their ability, all systems break down with respect to these players: they get steamrolled in their sim matches, and they see their displayed score plunging when in fact they might be rapidly improving.

It seems that most of the discussion in this thread has been about ways to hack the various systems of TFW so that they work as desired, in spite of the ratings themselves being messed up: we're trying to hack the similar system to match new players with their equals in actual ability (instead of their equals in rating), and we're trying to hack the displayed score to show new players' actual ability instead of their rating.

Some of these suggestions are really clever, and will probably achieve their stated goals. However, they're still hacks (i.e., solutions that don't respect the system's design principles), and as such they carry two costs:

1. They're complicated. This is a big problem because the rating system and its derived systems need to be understood not only by coders, but also by players. The simpler they are, and the less exceptions they have, the better.
2. A hacker's work is never done. If you don't solve the underlying design problem, you will need to go on hacking. Every time you install a new feature that takes ratings as input, you will need to hack that feature to deal with new players' unrealistic ratings.

Instead, giving new players realistic ratings will smoothly and naturally solve all those problems.

jed wrote:
Yaron I know you tried to convince me that it wouldn't effect overall rating if we start people off lower but I still think it would just lower the overall numbers and we would end up with the same effect in 6 months.


I don't think I can present a watertight mathematical proof that this will not happen (even though it is still my prediction that it will not). Instead, let me address this concern by making two other points:

1. What can you lose? Let's suppose you change the initial rating to, say, 800, making no other change. One of two things will happen. Best case: no overall downward slide is observed. You've just solved all rating related problems cleanly and simply without having to hack every present (and future!) system that looks at ratings. Worst case: there's some overall slide. You're back to square one, but no damage was done, and you can still implement all those other suggestions.

2. Even if it turns out that lowering initial ratings does result in an overall lowering of ratings, I would still argue against hacking all present and future systems that look at ratings. Instead, I would still suggest using realistic initial ratings, but also hacking the rating system itself so that those ratings can't pull other ratings down. For example, you could make players with less then x games against humans (or more than y uncertainty) unable to affect opponents' ratings (even though their own rating can be changed as normal). If they can't affect opponents' ratings, they can't pull them down. I prefer one hack that forces the system to work as advertised, to a series of hacks designed to cope with the fact that it doesn't.

2a. Of course, the Glicko system already has this "hack" as part of it's design: people with high uncertainty can't significantly affect opponents' ratings, and can't pull them down (this is why I predict no slide). For this reason, I suggest trying the lower initial ratings without the hack. It will only become necessary if it turns out that there is, in fact, some slide.


Top 
 Profile  
 
 Post subject: Re: Suggestions from a relative noob – Starting Ratings
 Post Posted: Sat Apr 03, 2010 4:02 pm 
Offline

Joined: Tue Jan 08, 2008 2:07 am
Posts: 1045
yaron: The matching of similar players is easily fixed and I can still use the normal glicko. I just have to change how uncertainty affects things. I haven't changed it yet since there aren't enough people online right now. Seems better to give them a game vs a better player than no game.

As far as the rating displayed to the user. There are 2 problems with the glicko system.
1) Glicko is designed so that 1/2 of the people have their rating drop when they start playing. Since it starts you at what should be the average. (The average right now is ~1400 for people with uncertainty<70). So basically it is designed to demoralize.
2) Most people don't take into account the uncertainty. So they see 1300 (350) as being better than 1200 (50) when in fact it should be considered worse.

So with these issues and the fact that a players skill varies pretty drastically from game 1 to game 20, I think it is good to obscure or translate the glicko rating into something more meaningful/less discouraging to new users.


Top 
 Profile  
 
 Post subject: Re: Suggestions from a relative noob – Starting Ratings
 Post Posted: Sat Apr 03, 2010 4:15 pm 
Offline

Joined: Thu Jan 07, 2010 10:41 pm
Posts: 83
OK, I read more about the Glicko system. I have a Bachelor's in Aerospace Engineering and an MBA in Marketing and Statistics. I've taken 800 and 900 level statistics classes in graduate school as well. I'm already familiar with the ELO system and after reading the Glicko system, the mathematical concept makes sense to me.

The changes to the rating system I suggested actual aren't any kind of "hack". A "hack" to the system would imply that we were changing the formulas in the actual rating system. The two changes I proposed were:

1) Using a hidden rating - this has no affect on the Glicko system in place. It's just cosmetic and the purpose is essentially so that the user gets to see their rating increase for the first x games whether they win or lose. If they win their rating will make a much bigger jump of course.

2) Using a provisional rating for matchmaking purposes - this provisional rating is only used for matching opponents with each other, it is not used to calculate their rating in any way. The Glicko rating system does not provide a matchmaking system. All it does is calculate the ratings.

One thing to keep in mind though is that if you still think it's "hacking", you're already "hacking" the system as it's implemented in TFW.

1) You've arbitrarily chosen the starting rating of 1300 instead of 1500. Why not 800 then? From what I understand of the Glicko rating system, arbitrarily lowering the starting rating to 800 from 1300 or 1500 won't break down the system and shouldn't have any major or long-term effects. Glicko is not a zero-sum system like the original ELO.

2) You're already admitting that 1300 (350) is not a reasonable starting rating because you're using -1*RD to set up matches. If 1300(350) was a reasonable starting rating, then you should be matching the player against equally higher and lower rated players.

Once again, the Glicko rating system is not a matching system. What I've been suggesting all along is a change to the matching system, not the rating system.


Top 
 Profile  
 
 Post subject: Re: Suggestions from a relative noob – Starting Ratings
 Post Posted: Sun Apr 04, 2010 1:51 am 
Offline

Joined: Tue Jul 28, 2009 1:47 am
Posts: 150
jed wrote:
As far as the rating displayed to the user. There are 2 problems with the glicko system.
1) Glicko is designed so that 1/2 of the people have their rating drop when they start playing. Since it starts you at what should be the average. (The average right now is ~1400 for people with uncertainty<70). So basically it is designed to demoralize.
2) Most people don't take into account the uncertainty. So they see 1300 (350) as being better than 1200 (50) when in fact it should be considered worse.


I think the Glicko system works a little differently than you describe in the above paragraphs. I agree that it would be a problem if it did work in those ways. Let me tackle them in reverse order:

2) There is absolutely no reason to subtract uncertainty from a player's rating. Uncertainty is the amount by which the player's rating might deviate from her true ability. However, this deviation can go both ways. A 1200(50) might have a real ability of 1150, but is equally likely to really be a 1250. The 1300(350) might be a 950, but she's just as likely to be a 1650! If I had to bet on the outcome of such a game, I would bet on the 1300(350), because on average, she's supposed to be the better player. More to the point: the Glicko system itself is betting on the 1300(350). If the 1300(350) loses this game, she well lose more points then she would gain for winning it. That's because she's favoured to win (you can verify this by sticking these numbers in the Glicko formulas).

[of course, in the specific context of TFW, I'd never bet on a 1300(350), because I know that this is the specific score artificially given to new players. But in general, your best estimate of a player's ability is her rating, not her (rating minus uncertainty). And this is the estimate used by the Glicko system]

1) In TFW, we can pretty much assume that new player = bad player. However, this isn't generally true. On a chess server, for example, the latest joiner might very well be an expert. Glicko is designed to handle both cases, so it makes no assumptions about a new player's ability. The trick is giving the new player a high uncertainty, which in practice means that the new player's ability could be anywhere. In most Glicko implementations, it doesn't really matter what the starting rating is, because the high uncertainty causes it to rapidly converge to where it should be, and it also prevents it from affecting the ratings of other players. The average of current players is a reasonable choice, given that there's no reason to assume current players are any better than new joiners (as would be the case in chess).

In TFW we do care about the player's starting rating, because we use it for things other than affecting other people's scores (for example, matching). However, we have the advantage of knowing something about the new player: they're not very likely to be TFW experts when they join, because they've never had any experience with the game (unlike the chess situation).

When we give a new player a rating of 1300(350), we're misleading the Glicko system. In effect, we're telling it that the player's real ability can be anywhere between 950 and 1650, even though we know that's it's impossible for it to be anywhere near 1650 (and 1300 is pretty much out of the question, too). If we gave her a score of, say, 800(350), it would mean that her true ability lies somewhere in the 450-1150 range (which sounds like a reasonable range for new players).

[[ I didn't see anything in Mark Glickman's docs about starting new players with the average. I really think he doesn't care about starting ratings because he's only concerned with how ratings affect other ratings, and high uncertainty takes care of that. ]]

jed wrote:
I just have to change how uncertainty affects things. I haven't changed it yet since there aren't enough people online right now. Seems better to give them a game vs a better player than no game.


If you mean matching people by (rating-uncertainty), then I would argue against this. A returning 1570(350) player should be matched with her equals - 1570(50) players. They're not any better then she is, they've just played more recently, so their score is more certain. If she's matched with 1270(50) players (same rating-uncertainty), she'll crush them. This logic only breaks down when matching 1300(350) noobs - but that's because they shouldn't really be 1300's, not because it's sound to use uncertainty in matching.

(actually, it is sound to use uncertainty in matching, but you have to use it both ways: the 800(350) noob can be matched with anyone in the 450-1150 range, while an established 1250(50) only has the 1200-1300 range. This also solves the problem of noobs not finding sim games)


Top 
 Profile  
 
 Post subject: Re: Suggestions from a relative noob – Starting Ratings
 Post Posted: Sun Apr 04, 2010 2:19 am 
Offline

Joined: Tue Jul 28, 2009 1:47 am
Posts: 150
Psyclone, I think we might be agreeing on most issues, but just misunderstanding each other.
I certainly didn't mean that your suggestions are hacking the Glicko system itself - you didn't suggest any kind of alteration to it!

The thing is, TFW has many auxiliary systems that take as input the ratings produced by the Glicko system. Specifically, there's the matching system, and the "rating display" system.

Right now, these systems are very simple. The matching system just checks that the ratings are within a certain range (I think a recent change made that range RD dependent). The display system just displays the ratings as they are (with RD in parentheses). Both of them work well, except for the specific case of new players.

This can be solved either by giving new players lower, realistic scores (the method I prefer). It can also be solved by introducing specific exceptions to both systems, to deal with new players' unrealistically high scores (the method you seem to prefer). When I said that these exceptions are "hacking the system", I didn't mean the rating system itself, but rather the match-making system, and the (purely cosmetic) display system.

My problem with introducing these exceptions is twofold:
1. It makes those systems more complicated.
2. You will have to introduce such exceptions to every future system that looks at ratings, because the basic issue (new players having unrealistic scores) has not been solved.

Psyclone wrote:
1) You've arbitrarily chosen the starting rating of 1300 instead of 1500. Why not 800 then? From what I understand of the Glicko rating system, arbitrarily lowering the starting rating to 800 from 1300 or 1500 won't break down the system and shouldn't have any major or long-term effects. Glicko is not a zero-sum system like the original ELO.


I completely agree with everything you've said here. Now you just need to convince Jed. ;)

Psyclone wrote:
2) You're already admitting that 1300 (350) is not a reasonable starting rating because you're using -1*RD to set up matches. If 1300(350) was a reasonable starting rating, then you should be matching the player against equally higher and lower rated players.


Again, I agree completely. 1300(350) is not a reasonable starting rating - this is the main point of all my posts in this thread. As I said in my post above, matching by (rating-1*RD) instead of just rating is a hack that is only made necessary because of this unreasonably high rating, and will cause problems when matching old players whose RD is high because of inactivity.
(However, I think this is not yet implemented, and we are now matched by just rating).

Psyclone wrote:
Once again, the Glicko rating system is not a matching system. What I've been suggesting all along is a change to the matching system, not the rating system.


I understand and agree. I'm just saying that if the starting rating is fixed, we wouldn't need this change to the matching system, because simple matching by rating would work across the board.


Top 
 Profile  
 
Display posts from previous:  Sort by  
 
Post new topic Reply to topic  [ 36 posts ]  Go to page Previous  1, 2, 3, 4  Next

Board index » Suggestion Box


Who is online

Users browsing this forum: No registered users and 21 guests

 
 

 
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: