Who name selection criteria
- bringoutyourdead
- Forums Admin & general flunky
- Posts: 1432
- Joined: Fri Nov 07, 2008 1:11 pm
- Location: Texas, USA
Who name selection criteria
Every time I've run CensusPlus I've felt uncomfortable about how the who name requests are created.
The ideal process is to do the least amount of requests to get all the character currently online.
Currently Rollie uses:
local function GetNameLetters()
return { "a", "b", "c", "d", "e", "f", "g", "i", "o", "p", "r", "s", "t", "u", "y" };
end
For a total of 15 instead of the assumed worse case of all 26 english characters. The problem is that Blizzard can't limit names to the strict ASCII coding.
To cover the English, German, Spanish, Portuguese, French languages takes 42 symbols in the Latin-1 character set (ignoring case.){per wikipedia)
And I've already spotted in use characters in the Latin-1 character set but not used by the above languages.
The problem I see is that we are certainly doing requests that return data that we have already see.. and we are almost certainly missing the edge cases.
I'm not yet sure how the WhoLib handles Diacritics and ligatures used by most of covered languages, Blizzard UTF-8 character set which includes characters not in the above languages. Nor am I sure yet how the Wow api responds to single character name requests.. i.e. which character/case/accent combos.
I'm looking at letter frequency in character names.. by examining names from the top (membership) guilds from realms in all languages. Using Wowhead to get guild numbers and Blizzards web api to download membership rosters.
Already I've found a disturbing error.
From US - Cenarius -guild Infection
I see member Äçóòñ who as of his last activity is a level 67 Draenei Priest and appears to have been in the 60's through out all of June and late May.
Yet doing a character query here I get a level 16 priest only seen on may 23 of this year... but Cenarius/alliance has always been a well covered faction.. currently at 300 census in the last 30 days.
So the questions are:
Why has Äçóòñ been missed in census runs when he was very active during the month of may.
Or.. Why/How was he even found once!
I'm going to continue researching this problem and I hope to be able to provide Rollie some code to help cover all names and maybe even with less duplication from the requesting pattern.
The ideal process is to do the least amount of requests to get all the character currently online.
Currently Rollie uses:
local function GetNameLetters()
return { "a", "b", "c", "d", "e", "f", "g", "i", "o", "p", "r", "s", "t", "u", "y" };
end
For a total of 15 instead of the assumed worse case of all 26 english characters. The problem is that Blizzard can't limit names to the strict ASCII coding.
To cover the English, German, Spanish, Portuguese, French languages takes 42 symbols in the Latin-1 character set (ignoring case.){per wikipedia)
And I've already spotted in use characters in the Latin-1 character set but not used by the above languages.
The problem I see is that we are certainly doing requests that return data that we have already see.. and we are almost certainly missing the edge cases.
I'm not yet sure how the WhoLib handles Diacritics and ligatures used by most of covered languages, Blizzard UTF-8 character set which includes characters not in the above languages. Nor am I sure yet how the Wow api responds to single character name requests.. i.e. which character/case/accent combos.
I'm looking at letter frequency in character names.. by examining names from the top (membership) guilds from realms in all languages. Using Wowhead to get guild numbers and Blizzards web api to download membership rosters.
Already I've found a disturbing error.
From US - Cenarius -guild Infection
I see member Äçóòñ who as of his last activity is a level 67 Draenei Priest and appears to have been in the 60's through out all of June and late May.
Yet doing a character query here I get a level 16 priest only seen on may 23 of this year... but Cenarius/alliance has always been a well covered faction.. currently at 300 census in the last 30 days.
So the questions are:
Why has Äçóòñ been missed in census runs when he was very active during the month of may.
Or.. Why/How was he even found once!
I'm going to continue researching this problem and I hope to be able to provide Rollie some code to help cover all names and maybe even with less duplication from the requesting pattern.
Last edited by bringoutyourdead on Mon Oct 24, 2011 2:52 pm, edited 1 time in total.
When "/who 60-69" returns 48 or less online players, it doesn't delve into classes or races.. If that player was one of those 48 (or less) people online, he's already seen as online.bringoutyourdead wrote:So the questions are:
Why has Äçóòñ been missed in census runs when he was very active during the month of may.
Or.. Why/How was he even found once!
Similar, if 60-69 returns more than 49, it starts doing races (without classes or names yet).. Here, less then 48 might be applicaple and it will get caught here..
Or if 49 or more Drannies was online, there might me less than 48 priests online..
I'm sure you get the drift..
However, if more than 49 Dranny Priests of lvl 67 are online at the same time, it will never be caught.... Onless it's actually within the first 48 found...
I hope this makes sense...
Doing census mainly on Draenor; Raluf - Nimsay - Lusmo - Quixx - Sosyan - Garthog - Trubin - Zalistra - Zesmi and Djaang
- bringoutyourdead
- Forums Admin & general flunky
- Posts: 1432
- Joined: Fri Nov 07, 2008 1:11 pm
- Location: Texas, USA
My example Äçóòñ was last (only) seen by CensusPlus last May.
Cenarius a high poplulation server (currently >15,000 alliance) currently has 132 Draenei(level 60-69), 108 characters at level 67 for all races and 13 Draenei at level 67.
Assuming minimal changes in the last 5 months..(bad assumption but the only one we can work with) most likely during at least one of the 300 average census runs during the month would have gone to at least:
level67/Draenei
which would return less then 49 so a name split would not be needed.
Part 2 of the question is answered... Since he was an active player during the timeframe he would have been found...but part 1 still stands... with the player being active for the month and 300 census runs.. how could he have been missed all of the time except for the once!
Cenarius a high poplulation server (currently >15,000 alliance) currently has 132 Draenei(level 60-69), 108 characters at level 67 for all races and 13 Draenei at level 67.
Assuming minimal changes in the last 5 months..(bad assumption but the only one we can work with) most likely during at least one of the 300 average census runs during the month would have gone to at least:
level67/Draenei
which would return less then 49 so a name split would not be needed.
Part 2 of the question is answered... Since he was an active player during the timeframe he would have been found...but part 1 still stands... with the player being active for the month and 300 census runs.. how could he have been missed all of the time except for the once!
He could very easily be missed - if his active hours are not the same as the person running the censuses on the server, he won't appear. 300 average censuses in a month is only 10 in a day, if somebody plays, say evenings only with it censusing every 5-10 mins, meanwhile your example player plays mornings only, just logging on in the eevning that one time when he was lvl 16, then there's your answer.
There's no way to account for different playtimes really; a server would need to have at least 72 censuses per day to stand a reasonable chance of catching even 90% of people who log in (and that's assuming they're spaced out perfectly evenly so as to have been taken every 20 minutes... which still misses anybody who only logged on for 10 minutes in between) So chances are, that is why your example has been missed.
There's no way to account for different playtimes really; a server would need to have at least 72 censuses per day to stand a reasonable chance of catching even 90% of people who log in (and that's assuming they're spaced out perfectly evenly so as to have been taken every 20 minutes... which still misses anybody who only logged on for 10 minutes in between) So chances are, that is why your example has been missed.
Balgair brings up a very valid point... the current data for Cenarius is well spread over the course of a day... but that was necessarily not true of May / June, when the mentioned character was active.
Just as an example... the character was active during early morning hours of weekdays only and whoever was censusing only played during evenings and weekends due to work / school.
Also, 72 is actually a way too low figure... "Full" coverage would require scans at roughly 1 minute intervals on low population factions and even on high population ones would require roughly 3 minute intervals to have good coverage of the low population hours (mostly between 02 and 06 in EU).
So, the figure lies somewhere between 1440 and 480... and even then, all characters would have to stay online over 5 minutes to guarantee that at least one scan hits them.
And as we all know, this isn't how things work... In fact, I miss some of my own characters on daily basis, because it's really rare for me to have time to run a full scan on every single one of them.
For my own benefit, I concentrate on those characters of mine, who manage to level up, are affected by guild disband or joining, etc. Also, characters on low population factions like EU-Scarshield Legion are easier, as the census length is only a couple of minutes even at peak times... as opposed to EU-Defias Brotherhood where 15+ minutes isn't that rare.
And IF I had multiple mains on EU-Silvermoon Alliance, the situation would be even worse as the census length can run to over 30 minutes... and still mostly covers 85s... I have actually condired requesting that Rollie reversed the scanning to start from 1 and go upwards... that way more partial scans would cover the levelling up characters.
But that's just my point of view...
In any case, happy censusing, everyone!
Just as an example... the character was active during early morning hours of weekdays only and whoever was censusing only played during evenings and weekends due to work / school.
Also, 72 is actually a way too low figure... "Full" coverage would require scans at roughly 1 minute intervals on low population factions and even on high population ones would require roughly 3 minute intervals to have good coverage of the low population hours (mostly between 02 and 06 in EU).
So, the figure lies somewhere between 1440 and 480... and even then, all characters would have to stay online over 5 minutes to guarantee that at least one scan hits them.
And as we all know, this isn't how things work... In fact, I miss some of my own characters on daily basis, because it's really rare for me to have time to run a full scan on every single one of them.
For my own benefit, I concentrate on those characters of mine, who manage to level up, are affected by guild disband or joining, etc. Also, characters on low population factions like EU-Scarshield Legion are easier, as the census length is only a couple of minutes even at peak times... as opposed to EU-Defias Brotherhood where 15+ minutes isn't that rare.
And IF I had multiple mains on EU-Silvermoon Alliance, the situation would be even worse as the census length can run to over 30 minutes... and still mostly covers 85s... I have actually condired requesting that Rollie reversed the scanning to start from 1 and go upwards... that way more partial scans would cover the levelling up characters.
But that's just my point of view...
In any case, happy censusing, everyone!

- bringoutyourdead
- Forums Admin & general flunky
- Posts: 1432
- Joined: Fri Nov 07, 2008 1:11 pm
- Location: Texas, USA
After my last post .. I realized that I should have added one more piece of info.
http://us.battle.net/wow/en/character/C ... óòñ/simple
here is the blizzard data page for the character.. as you can see from the full (50 event) activity list.. he was very busy in battlegrounds and dungeons.. i.e. this isn't a quick in and out player. I think you can safely say at least 1 hour
But it does still lead to the question of when hour of the day he played.. vs. when census runs are done
The other thing to remember is that CensusPlus found him at level 16 on the 23rd
and 7 days later he is getting achievements for Sunken temple (min level 45), Stratholme (min level 37), and Black Rock Depths (min level 42) and two days later he records his 4th kill of Emperor_Dagran_Thaurissan (min level 47)
Ah two days after that he gains classic dungeon master and level 60.
While leveling is faster then it use to be.. he had to have been playing for hours.
Also this one of the realms I played on.. and I remember census runs taking much more then 5 minutes each..(with a reasonably fast machine) there were lots of people on.
Oh well..I've grabbed the top 20 guilds on Cenarius (>7000 character names)
Time to do the same for language opening realms in the EU region.
http://us.battle.net/wow/en/character/C ... óòñ/simple
here is the blizzard data page for the character.. as you can see from the full (50 event) activity list.. he was very busy in battlegrounds and dungeons.. i.e. this isn't a quick in and out player. I think you can safely say at least 1 hour
But it does still lead to the question of when hour of the day he played.. vs. when census runs are done
The other thing to remember is that CensusPlus found him at level 16 on the 23rd
and 7 days later he is getting achievements for Sunken temple (min level 45), Stratholme (min level 37), and Black Rock Depths (min level 42) and two days later he records his 4th kill of Emperor_Dagran_Thaurissan (min level 47)
Ah two days after that he gains classic dungeon master and level 60.
While leveling is faster then it use to be.. he had to have been playing for hours.
Also this one of the realms I played on.. and I remember census runs taking much more then 5 minutes each..(with a reasonably fast machine) there were lots of people on.
Oh well..I've grabbed the top 20 guilds on Cenarius (>7000 character names)
Time to do the same for language opening realms in the EU region.
You forget the possibility that the character might have benefited from the RAF program and been leveling way beyond "normal" speeds.
Also... there is a small chance that Blizzard's changes to game might have rendered characters with only special characters in their names "invisible" to /who... You could test for that, I think...
Also... there is a small chance that Blizzard's changes to game might have rendered characters with only special characters in their names "invisible" to /who... You could test for that, I think...
- bringoutyourdead
- Forums Admin & general flunky
- Posts: 1432
- Joined: Fri Nov 07, 2008 1:11 pm
- Location: Texas, USA
I am building a list of characters from different realms in order to better understand the letter frequency breakdown.
For the English language the analysis shows the frequency sequence as "ETAONRIS HDLF CMU GYPWB VKXJQ Z".
The 'top twelve' letters comprise about 80% of the total usage. The 'top eight" letters comprise about 65% of the total usage. (from Wikipedia)
If you are interested see http://en.wikipedia.org/wiki/Letter_frequency where letter frequencies for other languages are included.
I have captured over 22,000 character names using Blizzard web api calls from five realms so far
(one each Alliance|Horde on US realms, and ditto for EU-Eng realms, and one EU-FR horde realm),
I am in the process of working on the following EU-FR alliance, EU-De Die20Todeskrallen for both sides, EU-Sp C'thun for horde.
These realms are at or near the highest census reports for their region.
I am looking for more realms to capture with preference toward those more highly populated by people from countries and languages other then the base expected languages. My goal is to capture about 100,000 character names or about 3% or total characters per Rollie's quick stats.
The ten most common letters in the various languages change with each language.
English: 'etaoi nshrd'
French: 'esait nrulo'; (Indo-European: Romance)
Spanish: 'eaosr nidlc'; (Indo-European: Romance)
Portuguese: 'aeosr indmt' (Indo-European: Romance)
Italian: 'eaion lrtsc'; (Indo-European: Romance)
German: 'enisr atdhu'; (Indo-European: Germanic)
Swedish: 'eantr slido'; (Indo-European: Germanic)
Turkish: 'aeinr ldkmu'; (Turkic: a non Indo-European language)
Dutch: 'enati rodsl'; (Indo-European: Germanic)
Polish: 'aoiez nscwr'; (Indo-European: Slavic)
What I don't know (for English) is after considering repeat letters within words what the total percentage of all English words can be captured when only the top 10 letters are used as keys in selecting for the words.
This information can and is used (by Rollie) in doing name selection breakouts when /who requests pull too many names.
However two problems occur when the top most frequent letters are used: First you get a lot of duplication in the names and Second you can run into the problem where more then maximum returnable names match the selection letter and the /who request fails with no further ability to subset your selection.
So a better approach maybe to select for the least used letters. The /who requests will return quickly with zero or few names.
The decision to quit searching by letter can be set to a fixed count (Rollie currently uses 15 letters) or to when x number (or percentage) of duplicate names are returned for the same census run.
This has the advantage of possible faster census runs and most certainly more accurate runs. With the understanding that this issue only occurs on very high population census runs. For example Human/top level/paladin/letter 'a'. This is an edge condition which doesn't happen often but I have seen it happen when running in verbose mode. We can have high expectation but not certainty that the names will get captured latter under the other 14 key letters.
Currently (at >22,000 names) I see the following frequency of the 57 letters used by the Latin-1 Character set. (ignoring cap/lowercase)
letter,A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,X,Y,Z,
Ä,Ö,Ü,ß,È,Á,É,Ú,Â,Ê,Î,Ô,Û,Ë,Ï,ÿ,Ç,Ñ,Æ,Å,Ø,À,Ì,Í,Ð,Ò,Ó,Õ,Ù,Ý,Þ
count,13502,3069,3626,4968,11416,2046,3403,5517,9939,1033,4225,8185,4582,9396,8372,2553,283,10833,7839,7131,4771,1669,1788,1097,3497,2073,
180,140,62,34,95,157,314,36,72,62,78,56,23,106,124,47,27,15,47,85,212,90,93,160,32,53,115,11,30,11,0
rank,1,18,15,11,2,21,17,10,4,25,14,7,13,5,6,19,27,3,8,9,12,23,22,24,16,20,
29,32,42,47,36,31,26,46,41,42,40,43,51,35,33,45,50,52,45,39,28,38,37,30,48,44,34,53,49,53,57
or sorted by rank
A,E,R,I,N,_ O,L,S,T,H,_ D,U,___ M,K,C,Y,G,B,P,Z,F,W,V,X,J,É,Q,Ø,Ä,Í,Á,Ö,Ï,Ó,Ë,È,Ì,À,Å,Î,Â,Ê,Ü,Ô,Ò,Æ,ÿ,Ú,ß,Ð,Ù,Ç,Û,Ñ,Õ,Ý,Þ
The last letter which hasn't been seen in any name is only used in the Icelandic language.
Rollie uses:
Letters: a,b,c,d,e,f,g,i,o,p,r,s,t,u,y
Ranking: 1,18,15,11,2,21,17,4,6,19,3,8,9,12,16
(where ranking is what I have found so far)
I am not saying that what Rollie uses is wrong by any means, but I do wonder if a better selection and end selection criteria can make CensusPlus more complete and or faster.
For the English language the analysis shows the frequency sequence as "ETAONRIS HDLF CMU GYPWB VKXJQ Z".
The 'top twelve' letters comprise about 80% of the total usage. The 'top eight" letters comprise about 65% of the total usage. (from Wikipedia)
If you are interested see http://en.wikipedia.org/wiki/Letter_frequency where letter frequencies for other languages are included.
I have captured over 22,000 character names using Blizzard web api calls from five realms so far
(one each Alliance|Horde on US realms, and ditto for EU-Eng realms, and one EU-FR horde realm),
I am in the process of working on the following EU-FR alliance, EU-De Die20Todeskrallen for both sides, EU-Sp C'thun for horde.
These realms are at or near the highest census reports for their region.
I am looking for more realms to capture with preference toward those more highly populated by people from countries and languages other then the base expected languages. My goal is to capture about 100,000 character names or about 3% or total characters per Rollie's quick stats.
The ten most common letters in the various languages change with each language.
English: 'etaoi nshrd'
French: 'esait nrulo'; (Indo-European: Romance)
Spanish: 'eaosr nidlc'; (Indo-European: Romance)
Portuguese: 'aeosr indmt' (Indo-European: Romance)
Italian: 'eaion lrtsc'; (Indo-European: Romance)
German: 'enisr atdhu'; (Indo-European: Germanic)
Swedish: 'eantr slido'; (Indo-European: Germanic)
Turkish: 'aeinr ldkmu'; (Turkic: a non Indo-European language)
Dutch: 'enati rodsl'; (Indo-European: Germanic)
Polish: 'aoiez nscwr'; (Indo-European: Slavic)
What I don't know (for English) is after considering repeat letters within words what the total percentage of all English words can be captured when only the top 10 letters are used as keys in selecting for the words.
This information can and is used (by Rollie) in doing name selection breakouts when /who requests pull too many names.
However two problems occur when the top most frequent letters are used: First you get a lot of duplication in the names and Second you can run into the problem where more then maximum returnable names match the selection letter and the /who request fails with no further ability to subset your selection.
So a better approach maybe to select for the least used letters. The /who requests will return quickly with zero or few names.
The decision to quit searching by letter can be set to a fixed count (Rollie currently uses 15 letters) or to when x number (or percentage) of duplicate names are returned for the same census run.
This has the advantage of possible faster census runs and most certainly more accurate runs. With the understanding that this issue only occurs on very high population census runs. For example Human/top level/paladin/letter 'a'. This is an edge condition which doesn't happen often but I have seen it happen when running in verbose mode. We can have high expectation but not certainty that the names will get captured latter under the other 14 key letters.
Currently (at >22,000 names) I see the following frequency of the 57 letters used by the Latin-1 Character set. (ignoring cap/lowercase)
letter,A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,X,Y,Z,
Ä,Ö,Ü,ß,È,Á,É,Ú,Â,Ê,Î,Ô,Û,Ë,Ï,ÿ,Ç,Ñ,Æ,Å,Ø,À,Ì,Í,Ð,Ò,Ó,Õ,Ù,Ý,Þ
count,13502,3069,3626,4968,11416,2046,3403,5517,9939,1033,4225,8185,4582,9396,8372,2553,283,10833,7839,7131,4771,1669,1788,1097,3497,2073,
180,140,62,34,95,157,314,36,72,62,78,56,23,106,124,47,27,15,47,85,212,90,93,160,32,53,115,11,30,11,0
rank,1,18,15,11,2,21,17,10,4,25,14,7,13,5,6,19,27,3,8,9,12,23,22,24,16,20,
29,32,42,47,36,31,26,46,41,42,40,43,51,35,33,45,50,52,45,39,28,38,37,30,48,44,34,53,49,53,57
or sorted by rank
A,E,R,I,N,_ O,L,S,T,H,_ D,U,___ M,K,C,Y,G,B,P,Z,F,W,V,X,J,É,Q,Ø,Ä,Í,Á,Ö,Ï,Ó,Ë,È,Ì,À,Å,Î,Â,Ê,Ü,Ô,Ò,Æ,ÿ,Ú,ß,Ð,Ù,Ç,Û,Ñ,Õ,Ý,Þ
The last letter which hasn't been seen in any name is only used in the Icelandic language.
Rollie uses:
Letters: a,b,c,d,e,f,g,i,o,p,r,s,t,u,y
Ranking: 1,18,15,11,2,21,17,4,6,19,3,8,9,12,16
(where ranking is what I have found so far)
I am not saying that what Rollie uses is wrong by any means, but I do wonder if a better selection and end selection criteria can make CensusPlus more complete and or faster.
-
- Superior Census Taker
- Posts: 63
- Joined: Sun Sep 04, 2005 6:51 am
Quoting from another thread:
"The reason for my question was to find realms with people who might use their native language when selecting character names. So from what you have said I know that I should concentrate on the EU-Eng realms."
I should think that all realms follow this pattern. But, I would expect that what you will find is basically this: French, German and Spanish realms will resemble the Wikipedia data, while the English ones will have the highest diversity, due to containing both absolutely and subjectively the largest amounts of non-native English speakers.
Also, your current method of massing all data into one is actually causing you to create data on what are the most common letters on US and EU realms combined. Which, of course is helpful in a way, especially from the Censusplus point of view.
Now, let's compare:
Current 15: A, B, C, D, E, F, G, I, O, P, R, S, T, U, Y
Data 15: A, C, D, E, H, I, K, L, M, N, O, R, S, T, U
Which leads to:
Common: A, C, D, E, I, O, R, S, T, U
Which in turn leads to this:
Option 1: Cutting the number of checks to those 10 will reduce the time needed, but will also drop accuracy.
Option 2: No changes at all, no change to current situation.
Option 3: Drop B, F, G, P and Y, add H, K, L, M and N. In theory, no time chance, but increased accuracy.
Option 4: Add H, K, L, M and N. Accuracy and time needed increased.
Option 5: Use option 4 and add one or more of the following in this order of preference: Z, W, V, X, J and É. With each addition time and accuracy increase.
After those, your data reaches a statistically significant breakpoint after which adding additional letters will greatly increase time needed, but will not markably affect accuracy any more.
I suggest that you collect more data and make a new post after a while, so we can see if the additional data has any significant impact on the frequency... Adding more French data will add more És, but will it be enough to actually change order significantly, remains to be seen... Adding more Spanish data will also increase the absolute number of some special characters, but I doubt their impact on the actual frequency ranking as they don't appear alone in normal language. Granted, some people go overboard with special characters in their names.
Which leads to the possibility that Blizzard's naming policies cause a "effects of X-rays on zebras in Qatar" sort of thing, as mentioned in the Wikipedia...
:D:D
"The reason for my question was to find realms with people who might use their native language when selecting character names. So from what you have said I know that I should concentrate on the EU-Eng realms."
I should think that all realms follow this pattern. But, I would expect that what you will find is basically this: French, German and Spanish realms will resemble the Wikipedia data, while the English ones will have the highest diversity, due to containing both absolutely and subjectively the largest amounts of non-native English speakers.
Also, your current method of massing all data into one is actually causing you to create data on what are the most common letters on US and EU realms combined. Which, of course is helpful in a way, especially from the Censusplus point of view.

Now, let's compare:
Current 15: A, B, C, D, E, F, G, I, O, P, R, S, T, U, Y
Data 15: A, C, D, E, H, I, K, L, M, N, O, R, S, T, U
Which leads to:
Common: A, C, D, E, I, O, R, S, T, U
Which in turn leads to this:
Option 1: Cutting the number of checks to those 10 will reduce the time needed, but will also drop accuracy.
Option 2: No changes at all, no change to current situation.
Option 3: Drop B, F, G, P and Y, add H, K, L, M and N. In theory, no time chance, but increased accuracy.
Option 4: Add H, K, L, M and N. Accuracy and time needed increased.
Option 5: Use option 4 and add one or more of the following in this order of preference: Z, W, V, X, J and É. With each addition time and accuracy increase.
After those, your data reaches a statistically significant breakpoint after which adding additional letters will greatly increase time needed, but will not markably affect accuracy any more.
I suggest that you collect more data and make a new post after a while, so we can see if the additional data has any significant impact on the frequency... Adding more French data will add more És, but will it be enough to actually change order significantly, remains to be seen... Adding more Spanish data will also increase the absolute number of some special characters, but I doubt their impact on the actual frequency ranking as they don't appear alone in normal language. Granted, some people go overboard with special characters in their names.
Which leads to the possibility that Blizzard's naming policies cause a "effects of X-rays on zebras in Qatar" sort of thing, as mentioned in the Wikipedia...

- bringoutyourdead
- Forums Admin & general flunky
- Posts: 1432
- Joined: Fri Nov 07, 2008 1:11 pm
- Location: Texas, USA
Well since it is for me mostly an excercise in logic, and Lua programming training it is worth the effort even if it goes nowhere.gmmmpresser wrote:I admire your diligence.
But considering we have not seen or heard of Rollie since Jan, I just wonder if it is all worth the effort.
However I hope that isn't the case... on the other hand. someone needs to check with Mr. Sellers since the Domain Name: Warcraftrealms.com .. expires before the end of November!!!
- bringoutyourdead
- Forums Admin & general flunky
- Posts: 1432
- Joined: Fri Nov 07, 2008 1:11 pm
- Location: Texas, USA
Yes I'm aware of that1974ER wrote:Quoting from another thread:
Also, your current method of massing all data into one is actually causing you to create data on what are the most common letters on US and EU realms combined. Which, of course is helpful in a way, especially from the Censusplus point of view....

I will do that ..1974ER wrote:I suggest that you collect more data and make a new post after a while, so we can see if the additional data has any significant impact on the frequency ...
That is without a doubt. On the other hand people are generally lazy in selecting names. I expect this is the reason that you will find a lot more names starting with the letter "A" then normal for name words.1974ER wrote:Granted, some people go overboard with special characters in their names.
Many violations of Blizzard naming rules are active for character names. Famous real people names, names from others Intellectual properties (ex. Movie or book character names.. etc) I expect you will find as many copies of the name Legolas as there are realms in WoW. The other violation of naming rules is one that impacts CensusPlus directly and that is the pure gibberish names.
- bringoutyourdead
- Forums Admin & general flunky
- Posts: 1432
- Joined: Fri Nov 07, 2008 1:11 pm
- Location: Texas, USA
Blizzard seems to be having problem with the web api setup out of the Paris server farm.???
I'm trying to get guild info for realm "Chants éternels"
http://eu.battle.net/api/wow/guild/Chan ... ds=members
which should return a file with guild roster data.. fails
the apparent top x population guilds all fail.. grrr
getting 404 and 500(server maintenance or programming error) on my requests.
A google search found a character on that realm
and
http://eu.battle.net/wow/en/character/c ... tte/simple works
http://eu.battle.net/api/wow/character/ ... inaigrette? mostly fails but very rarely succeeds
http://eu.battle.net/api/wow/character/ ... elds=guild
has yet to return with a non 404 error..
sigh
Ah.. found a live Guild on that realm.. and got results...
I wonder if wowhead has a problem and gave me a list of bogus guilds.. or did that realm have a massive exodus?
I guess I'll ignore this realm and get another.. or find a better source of guild names
I just blew up Wowheads data for guild Cerberus... did a resync.. and they got back +200 records of UNKNOWN.. !!
at least I got all 16 Horde guilds listed as >200 members out of C'thun (EU-Sp) with only one 500 error message.. and a repeat attempt a minute later got that missing guild.
I'm trying to get guild info for realm "Chants éternels"
http://eu.battle.net/api/wow/guild/Chan ... ds=members
which should return a file with guild roster data.. fails
the apparent top x population guilds all fail.. grrr
getting 404 and 500(server maintenance or programming error) on my requests.
A google search found a character on that realm
and
http://eu.battle.net/wow/en/character/c ... tte/simple works
http://eu.battle.net/api/wow/character/ ... inaigrette? mostly fails but very rarely succeeds
http://eu.battle.net/api/wow/character/ ... elds=guild
has yet to return with a non 404 error..
sigh
Ah.. found a live Guild on that realm.. and got results...
I wonder if wowhead has a problem and gave me a list of bogus guilds.. or did that realm have a massive exodus?
I guess I'll ignore this realm and get another.. or find a better source of guild names



at least I got all 16 Horde guilds listed as >200 members out of C'thun (EU-Sp) with only one 500 error message.. and a repeat attempt a minute later got that missing guild.
Letter A happens to be very common in many languages so it's probably not lazyness...
Amy, Ami, Ana, Ara, Ari, Aki, Ann, Ali, Mae, Mia, Lea, Al, Ab, Alf, Art, etc... plenty of short names contain A(s).
Yes, gibberish names (especially the "xfgfghdgfdgfdhfgdh" types favored by bots) really mess things up...
As for that French realm... check your spellings as you have at least "éternels", "eternels" and "éternals" in your last post... :/
Other than that, I look forward to seeing more results...

Yes, gibberish names (especially the "xfgfghdgfdgfdhfgdh" types favored by bots) really mess things up...

As for that French realm... check your spellings as you have at least "éternels", "eternels" and "éternals" in your last post... :/
Other than that, I look forward to seeing more results...

- bringoutyourdead
- Forums Admin & general flunky
- Posts: 1432
- Joined: Fri Nov 07, 2008 1:11 pm
- Location: Texas, USA
Arrgh!. knew I should have cut and paste.. that was just my fumble fingers on the message .. all were done correctly as éternels.1974ER wrote:As for that French realm... check your spellings as you have at least "éternels", "eternels" and "éternals" in your last post... :/
Other than that, I look forward to seeing more results...
And of course the wowhead update failure was their coding..
oh and I'm over 30k names in my database. and grinding along.
Also, as an additional thought... Once MoP is released, the racial spread will change again and the amounts of 50+ of same race, class and level online simultaneously will drop. And despite that... Census lengths will increase due to the added complexity of any check of single level that has a result exceeding 49...
Provided, of course, that Rollie actually has enough time to rebuild CensusPlus as the current interface doesn't have enough room for adding both a race and a class and even more levels... 1-5 are already not visible...
I guess we'll just have to wait and see... :/
Provided, of course, that Rollie actually has enough time to rebuild CensusPlus as the current interface doesn't have enough room for adding both a race and a class and even more levels... 1-5 are already not visible...

- bringoutyourdead
- Forums Admin & general flunky
- Posts: 1432
- Joined: Fri Nov 07, 2008 1:11 pm
- Location: Texas, USA
Over the weekend I had a programming revelation... collation.
Either way.. when a sql query is generated, (which after all is what a /who request turns into for Blizzard) the result is determined by the collation settings.
I am also assuming that Blizzard would choose to keep all servers on all realms no matter what region or datacenter to the same collation. I also am assuming that they would choose to use the same general (default) one of UTF-8_general_ci.
This means that a search for "A" would return a match on A, a, À, à, Á, á, Â, â, Ã, ã, Ä, ä, Å, å, ,Æ, æ where all variations are considered the same letter.
This means the selection criteria returns to the standard 26 letters with an insignificant frequency change due do to the Diacritics and ligatures.
I know Blizzard uses Oracle and MySql for the database engines.. but I don't think they have ever acknowledged which is engine is used for what purpose.I'm not yet sure how the WhoLib handles Diacritics and ligatures used by most of covered languages, Blizzard UTF-8 character set which includes characters not in the above languages. Nor am I sure yet how the Wow api responds to single character name requests.. i.e. which character/case/accent combos.
Either way.. when a sql query is generated, (which after all is what a /who request turns into for Blizzard) the result is determined by the collation settings.
I am also assuming that Blizzard would choose to keep all servers on all realms no matter what region or datacenter to the same collation. I also am assuming that they would choose to use the same general (default) one of UTF-8_general_ci.
This means that a search for "A" would return a match on A, a, À, à, Á, á, Â, â, Ã, ã, Ä, ä, Å, å, ,Æ, æ where all variations are considered the same letter.
This means the selection criteria returns to the standard 26 letters with an insignificant frequency change due do to the Diacritics and ligatures.
- bringoutyourdead
- Forums Admin & general flunky
- Posts: 1432
- Joined: Fri Nov 07, 2008 1:11 pm
- Location: Texas, USA
and at 35 thousand names:bringoutyourdead wrote:Currently (at >22,000 names) I see the following frequency of the 57 letters used by the Latin-1 Character set. (ignoring cap/lowercase)
letter,A,B,C,D,E, F,G,H,I,J, K,L,M,N,O, P,Q,R,S,T, U,V,W,X,Y,Z,
Ä,Ö,Ü,ß,È, Á,É,Ú,Â,Ê, Î,Ô,Û,Ë,Ï, ÿ,Ç,Ñ,Æ,Å, Ø,À,Ì,Í,Ð, Ò,Ó,Õ,Ù,Ý,Þ
count,13502,3069,3626,4968,11416, 2046,3403,5517,9939,1033, 4225,8185,4582,9396,8372, 2553,283,10833,7839,7131, 4771,1669,1788,1097,3497, 2073,180,140,62,34, 95,157,314,36,72, 62,78,56,23,106, 124,47,27,15,47, 85,212,90,93,160, 32,53,115,11,30, 11,0
rank,1,18,15,11,2, 21,17,10,4,25, 14,7,13,5,6, 19,27,3,8,9, 12,23,22,24,16,20,
29,32,42,47,36, 31,26,46,41,42, 40,43,51,35,33, 45,50,52,45,39, 28,38,37,30,48, 44,34,53,49,53,57
or sorted by rank
A,E,R,I,N,_ O,L,S,T,H,_ D,U,___ M,K,C,Y,G,B,P,Z,F,W,V,X,J,É,Q,Ø,Ä,Í,Á,Ö,Ï,Ó,Ë,È,Ì,À,Å,Î,Â,Ê,Ü,Ô,Ò,Æ,ÿ,Ú,ß,Ð,Ù,Ç,Û,Ñ,Õ,Ý,Þ
The last letter which hasn't been seen in any name is only used in the Icelandic language.
Rollie uses:
Letters: a,b,c,d,e,f,g,i,o,p,r,s,t,u,y
Ranking: 1,18,15,11,2,21,17,4,6,19,3,8,9,12,16
(where ranking is what I have found so far)
letter A R E I N O L S T H U
count 20782 16909 16794 15016 14287 12845 12545 11804 10720 8266 7521
rank 1 2 3 4 5 6 7 8 9 10 11
letter D M K C Y G B P Z F V
count 7411 7292 6667 5674 5144 5070 4395 3922 3016 2923 2450
rank 12 13 14 15 16 17 18 19 20 21 22
letter W X J É Ø Ä Q Ï Ö Â Ë
count 2385 1788 1538 611 534 488 409 399 368 353 312
rank 23 24 25 26 27 28 29 30 31 32 33
letter Á Í Î Ô À Ó Ü Ê È Ì Æ
count 300 294 252 205 189 180 178 166 157 138 112
rank 34 35 36 37 38 39 40 41 42 43 44
letter Å Û ÿ Ò Ð Ú ß Ñ Ù Ç Ý
count 102 96 91 82 71 65 64 55 51 35 28
rank 45 46 47 48 49 50 51 52 53 54 55
letter Õ Þ
count 22 0
rank 56 57
so letter D and U swap positions.. and letter Q drops futher down in frequency.
Seems R and E also swapped places as did W and V. 
I admit that I know very little about databases, but based on how names work on Blizzard servers, I have to assume your assumption is faulty. Why? Because if it were true, it would be impossible to have a Legolas, Légolas, Legolás and Lególas on the same server, correct?
Not to mention that languages such as Finnish, Swedish and German would get severely butchered...
"Jäämies" to "jaamies" = from "Iceman" to "Yes-man",
"körtti" to "kortti" = from a certain type of religious person to a card,
"säkki" to "sakki" = from a bag to a group of fairly big size,
"mökä" to "moka" = from a loud noise to a big mistake,
"sää" to "saa" = from weather to may/is allowed to,
"isä" to "Isa" = from father to a name of a woman,
"mäki" to "maki" = from a hill to a specific type of ape.
"Åbu" to "Abu" = from a name of town to a name of a man,
"schön" to "schon" = from pretty to already, etc...
In some cases, even the language is changed...
"bär" (bear) to bar, "öl" (beer) to "ol" (oil), etc...
Granted, Blizzard probably didn't consider such points, but the point is that altering even one letter can and will have catastrophic consequences in certain situations.
Also, though this already fall outside of the main point, some of the letters have similar pronounciations, while others do not and even the native language of the speaker will affect things:
For example the letter É: Whether it comes out as a E, Ee or even Ö is variable.
Spanish doesn't pronounce H, but apply that to Finnish and you get gibberish... "Hilman haava" becomes "Ilman aava" and Hilma's wound turns into Air's wide open sea...
Even though he died a long time ago, you may have heard of a man called Larry A. Thorne... but he was born Lauri Allan Törni. He, like millions of other people, changed his name, because it was easier for other Americans to pronounce and write down.
And so go things today as well... as fairly known (here) reporter who lives in the US tells everyone to call her Twiggy, because nobody manages to write her real name of Varpu (meaning: twig) correctly.

I admit that I know very little about databases, but based on how names work on Blizzard servers, I have to assume your assumption is faulty. Why? Because if it were true, it would be impossible to have a Legolas, Légolas, Legolás and Lególas on the same server, correct?
Not to mention that languages such as Finnish, Swedish and German would get severely butchered...
"Jäämies" to "jaamies" = from "Iceman" to "Yes-man",
"körtti" to "kortti" = from a certain type of religious person to a card,
"säkki" to "sakki" = from a bag to a group of fairly big size,
"mökä" to "moka" = from a loud noise to a big mistake,
"sää" to "saa" = from weather to may/is allowed to,
"isä" to "Isa" = from father to a name of a woman,
"mäki" to "maki" = from a hill to a specific type of ape.
"Åbu" to "Abu" = from a name of town to a name of a man,
"schön" to "schon" = from pretty to already, etc...
In some cases, even the language is changed...
"bär" (bear) to bar, "öl" (beer) to "ol" (oil), etc...
Granted, Blizzard probably didn't consider such points, but the point is that altering even one letter can and will have catastrophic consequences in certain situations.
Also, though this already fall outside of the main point, some of the letters have similar pronounciations, while others do not and even the native language of the speaker will affect things:
For example the letter É: Whether it comes out as a E, Ee or even Ö is variable.
Spanish doesn't pronounce H, but apply that to Finnish and you get gibberish... "Hilman haava" becomes "Ilman aava" and Hilma's wound turns into Air's wide open sea...

Even though he died a long time ago, you may have heard of a man called Larry A. Thorne... but he was born Lauri Allan Törni. He, like millions of other people, changed his name, because it was easier for other Americans to pronounce and write down.

And so go things today as well... as fairly known (here) reporter who lives in the US tells everyone to call her Twiggy, because nobody manages to write her real name of Varpu (meaning: twig) correctly.
