Who name selection criteria

Found a bug with the CensusPlus UI Mod? Post it here!
User avatar
bringoutyourdead
Forums Admin & general flunky
Posts: 1432
Joined: Fri Nov 07, 2008 1:11 pm
Location: Texas, USA

Post by bringoutyourdead »

And the most common name (out of 62804 listed) is....
not worth telling..
there are 16 names listed exactly the same 3 times.
and 653 names that are exact duplicates (ie 2x)

And Legolas wasn't listed.. however Legolãs and Legolaz were. :roll:

User avatar
bringoutyourdead
Forums Admin & general flunky
Posts: 1432
Joined: Fri Nov 07, 2008 1:11 pm
Location: Texas, USA

Post by bringoutyourdead »

So what if we change selectors and use the top 15 most frequent letters as found in my names list?
This will have to be the next posting.
Most common name... there are 16 names that are used 3 times and 653 names used 2 times.

Out of 62804 names in list, 62687 are found and 117 are missed
Duplicate finding of names 253030 times.

Letters with the _ after are the new substitute selectors.
-- count of new names found -- Percent with letter -- running total percentage - duplicate names found again -- percentage duplicates -- dup vs. find

A 1 = 37077 -- 59.4%
E 2 = 14112 -- 22.47% -- 81.51% -- 16521 -- 26.31% -- 1.17:1
R 3 = 4449 -- 7.08% -- 88.59% -- 25697 -- 40.92% -- 5.78:1
I 4 = 3476 -- 5.53% -- 94.12% -- 23745 -- 37.81% -- 6.83:1
N _5 = 1075 -- 1.71% -- 95.84% -- 24539 -- 39.07% -- 22.83:1
O 6 = 1240 -- 1.97% -- 97.81% -- 22523 -- 35.86% -- 18.16:1
L _7 = 467 -- 0.74% -- 98.55% -- 22019 -- 35.06% -- 47.15:1
S 8 = 304 -- 0.48% -- 99.04% -- 21175 -- 33.72% -- 69.65:1
T 9 = 113 -- 0.18% -- 99.22% -- 19378 -- 30.85% -- 171.49:1
H _10 = 96 -- 0.15% -- 99.37% -- 14761 -- 23.50% -- 153.76:1
D 11 = 75 -- 0.12% -- 99.49% -- 13603 -- 21.66% -- 181.37:1
U 12 = 93 -- 0.15% -- 99.64% -- 13450 -- 21.42% -- 144.62:1 -- at this point we have found ~ the same character percentage as the current selector keys find.
M _13 = 60 -- 0.10% -- 99.73% -- 13262 -- 21.12% -- 221.03:1
K _14 = 39 -- 0.06% -- 99.80% -- 12092 -- 19.25% -- 310.05:1
C 15 = 11 -- 0.02% -- 99.81% -- 10265 -- 16.34% -- 933.18:1

Old key duplicates found 204876, new key duplicates found 253030:
Old key duplicates Vs. Found = 3.27:1
New key duplicates Vs. Found = 4.04:1

So we could swap selectors and get 0.16% (117) more names found from this universe of 62804 characters.
Or we could cut the number of letter selector keys by 3 and process slightly faster. Dropping to 217411 duplicates and 62583 found or a ratio of 3.47:1. :twisted:

Personally I would rather find the extra 100 characters in the universe. It is just a matter of asking the server the right questions.

User avatar
bringoutyourdead
Forums Admin & general flunky
Posts: 1432
Joined: Fri Nov 07, 2008 1:11 pm
Location: Texas, USA

Post by bringoutyourdead »

So Rollie did select a reasonable collection of keys.

In a real world situtation if the complete new keys were used: 1974ER who currently has an update count of 16,139,532 with the new key update count would be 18,721,857

User avatar
bringoutyourdead
Forums Admin & general flunky
Posts: 1432
Joined: Fri Nov 07, 2008 1:11 pm
Location: Texas, USA

Post by bringoutyourdead »

Stepping away from rows and rows of names... and looking at code again.

Would you run the following macro for me.

First do a /who query that will fail with too many results (censusPlus must not be running.. nor anything else that would call /who)

/Who 85-85 will probably work

after that is done, enter and execute the following macro from the chat line

/run numResults, totalCount = GenNumWhoResults(); out="num = "..numResults..",Count = "..totalCount;print(out);

this should get the counts from your who request, format them for easy reading and print the output.

Count should be the 49 limit that can be displayed.
num will be??? I believe it will be the actual number of level 85 characters online when the server process the /who request.

1974ER
Epic Censi
Posts: 762
Joined: Fri Nov 07, 2008 3:30 am

Post by 1974ER »

Whoa, so many posts... :D

I'll try to comment some:

The amount of duplicates is so low, because you actually have too small a sample. For example, as of this writing, there are 207 "Barbie"s and 131 "Légolas"es on EU realms alone. (Figures from EU Armory.)

As for letter order, it doesn't matter for you, since you work on a ready set of data. I was referring to the impact of letter order to actual results of a census run. Why is that important? Well, simply because the censuses take sufficiently long that people can log in or out during the process. The most accurate method would check letters such as A, E, R, I and N first, for the rarer ones later.

Quote: "My 2nd goal is to investigate if there is anyway to (1st improve the accuracy of the census, 2nd to make the process of gathering the data faster.)"

If you have paid close attention to what I have written so far, you should have noticed that accuracy and efficiency are somewhat counterproductive and what we have right now appears to be a fairly good compromise.

Quote: "In a real world situtation if the complete new keys were used: 1974ER who currently has an update count of 16,139,532 with the new key update count would be 18,721,857"

Interesting hypotesis, but wrong. You can't extrapolarate my results from your data, because it's partially based on realms I can't access at all and largely based on realms I visit only rarely, if ever. Even if the data was based entirely on my results, the triggers, the behaviour (and character names) of all other players and my own censusing habits would still impact the end results.

I also tested the macro as supplied and suggested by you on multiple factions. The results: Absolutely nothing happened, no result, no error message, nothing, just nothing. Sorry.

User avatar
bringoutyourdead
Forums Admin & general flunky
Posts: 1432
Joined: Fri Nov 07, 2008 1:11 pm
Location: Texas, USA

Post by bringoutyourdead »

1974ER wrote:The amount of duplicates is so low, because you actually have too small a sample.
Hey! 62 thousand is a large number... why it is .. let us see.. hmm :( 0.07% of all characters listed in the Warcraft Realms database! :wink:
1974ER wrote:As for letter order, it doesn't matter for you, since you work on a ready set of data. .
Truth.. but I have no other option, so I must work with what I have. I would love to have Rollie comment here that he ran a frequency analysis on an early subset of his data.. something like 20-30 million characters.
1974ER wrote:I was referring to the impact of letter order to actual results of a census run. Why is that important? Well, simply because the censuses take sufficiently long that people can log in or out during the process. The most accurate method would check letters such as A, E, R, I and N first, for the rarer ones later.
Hmm.. you might want to think that over again. By the activity graphs found on this site, approximately 15 hours a day the later the time the more players would be counted on instant counts. So that would imply you would want to the high hit selectors later in your census run {which is what Rollie does}

As far as "accuracy and efficiency" are concerned, I think we just need to respectfully agree that we disagree.

One of the advantages of improved efficiency would be the ability to increase the number of chances to catch characters online. A simple change would be to include an option switch to allow a user to stop census runs at the Minimum Cutoff that Rollie had to accept on his database. (Or to have the run go all the way to 1-1 for local use.)
1974ER wrote:Quote: "In a real world situtation if the complete new keys were used: 1974ER who currently has an update count of 16,139,532 with the new key update count would be 18,721,857"
Interesting hypotesis, but wrong.
Yes I know it was fallacious, but I couldn't find the tongue-in-cheek emoticon. :lol:
1974ER wrote:I also tested the macro as supplied and suggested by you on multiple factions. The results: Absolutely nothing happened, no result, no error message, nothing, just nothing. Sorry.
Hmm.. not what I expected.. at least I thought it would print 'num = ,Count ='
ok.. how about this instead
EDIT: stop the presses... error found not GenNum.. but GetNum.. sigh
after submitting the manual /who that returns max results(+)
/run local numResults, totalCount = GetNumWhoResults(); print("num = ",numResults," ,Count = ",totalCount);
and if that doesn't display then
/run local numResults, totalCount = GetNumWhoResults(); DEFAULT_CHAT_FRAME:AddMessage(format("num = %i Count = %i",numResults,totalCount);

Assuming you get numbers with this.. an interesting experiment would be to do a /who 1-85
and see what numResults returns.. it could be the total number of characters in current faction online at that time.

1974ER
Epic Censi
Posts: 762
Joined: Fri Nov 07, 2008 3:30 am

Post by 1974ER »

More comments:

Sample is too small in terms of SERVERS, not names themselves.

There is a chance that he didn't do one at all, but went with data from Wikipedia or other similar source.

As to your "Hmm". I did think it over, several times in fact, before I posted and I stand by my desire to have the high hit selectors first. And I stand by it exactly because the count isn't instant, not even close. (Semi-)instant counts would render the order irrelevant.

I did not actually look at or think of the graphs before your comment... but to be honest, the graphs support my way of doing things. :D As the character count goes up, the likelyhood of for example As to "blow 50" increases, rendering their exact amount irrelevant to a degree. However, the rare triggers will generally catch more characters the later they are in the sequence. And due to the population patterns, this will be true roughly 21 to 22 hours per day, the only exception being the point where population drops quickly and decisevily withing a short time period, generally between 22 and 23.30, server time, at least on all fairly high population EU servers I took a quick, closer look at (about a dozen). And even during that period, the high hit triggers first principle will guarantee that as many characters as possible are seen before they log off for the day...

Granted, things would be very different if the population was almost flat, but high around the clock. However, I have not seen a single server that would have that. Flat, low, sure, but flat, high, no. On a further note, low population servers aren't even very relevant, because they generally fail to even trigger letter level checks. Some fail to trigger class level checking... and some, very small ones... fail even to trigger a racial level check even on character level 85.

Secondly, http://www.warcraftrealms.com/charsheet/131852629 is one of my censusalts. Whether by desing or accident, the database no longer has a cutoff point. It appears more likely that the system now purges level 1s (and maybe up to level 9 or so) who aren't seen regularly enough. So or so, there are now very low level characters in the database and high level characters created after the last couple of times Rollie tweaked things, have histories starting from level 1 or so, if the faction has been censused a lot, example here, one of my newer alts:

http://www.warcraftrealms.com/charsheet/138693416

This differs from a lot of older characters, none of whom have histories below level 10... unless we go to really old characters who existed before cutoffs were ever implemented, in other words, as far as I can tell, mostly Vanilla and very early TBC stuff.

Your first macro was tested several times... if there are more than 49 then the result always looks like this.

num = 50,Count = 49

If the number is less than 49:

num = x,Count = x, where x is the number of characters found by the /who.

Also tried the second one multiple times, it produced another result of nothing.

In closing as it is too late again...

Ideally, we would like a wholely unlimited /who, which would allow us at maximum to do /who 1-85 and get a quick result. As has been noted elsewhere, even a limit of, for example 99 instead of 49 might basically eliminate the "blowover problems".

User avatar
bringoutyourdead
Forums Admin & general flunky
Posts: 1432
Joined: Fri Nov 07, 2008 1:11 pm
Location: Texas, USA

Post by bringoutyourdead »

1974ER wrote:Whether by desing or accident, the database no longer has a cutoff point.
Ah! I wasn't aware that Rollie had opened his storage back up to all levels again. I stopped playing right after the first of the year, when he was having problems. Well that makes the switch option I mentioned meaningless.
1974ER wrote:Your first macro was tested several times... if there are more than 49 then the result always looks like this.
num = 50,Count = 49
Good.. hmm, unfortunate. as www.wowwiki.com shows
numWhos, totalCount = GetNumWhoResults();
totalCount
Number - number of users matching the query
numWhos
Number - number of entries actually returned

Which implied the possibility that totalCount could actually be 'Total Count"
instead it really is nothing more then a indicator of incomplete return of all available information. sigh. that tosses aside another idea I had.
I had more then half expected this result, as it is how libwho-2.0 library presents the data back to the calling program.
1974ER wrote:If the number is less than 49:
num = x,Count = x, where x is the number of characters found by the /who.
Which is what it should. {at least it shows I have learned something of the code. :shock: :lol: } And it verifies that the current CensusPlus code is vulnerable to data loss/corruption by another mod doing who requests directly or via the libwho calls.
The solution to the problem is to just finish the integration of the libwho library with CensusPlus.
1974ER wrote:Also tried the second one multiple times, it produced another result of nothing.
That is the fun of working with Blizzards API stuff. They make it available but do not support. Much of the understanding that people had figured out about the API got lost when Blizzard changed the forum software. The code fragments are still available on the web, but the discussion about how to use them disappeared.
As I was giving you the DEFAULT_CHAT_FRAME: I was saying to myself, well yes all these examples in the wiki show this.. but where is default_chat_frame defined?

User avatar
bringoutyourdead
Forums Admin & general flunky
Posts: 1432
Joined: Fri Nov 07, 2008 1:11 pm
Location: Texas, USA

Post by bringoutyourdead »

Just for giggles.. I ran the 15 least common english letters as selectors and with all of them only found 90% of the names.

User avatar
Rollie
Site Admin
Posts: 4783
Joined: Sun Nov 28, 2004 11:52 am
Location: Austin, TX
Contact:

Post by Rollie »

You guys are nuts =)

So for the Cliff Notes version of this, does it make sense to adjust the filtering?
phpbb:phpinfo()

User avatar
bringoutyourdead
Forums Admin & general flunky
Posts: 1432
Joined: Fri Nov 07, 2008 1:11 pm
Location: Texas, USA

Post by bringoutyourdead »

I have placed an alternate character list in the program, but to be honest the differences in processing time vs. completeness of data capture is so little that I don't feel there is any reason to change the list already in use.

1974ER
Epic Censi
Posts: 762
Joined: Fri Nov 07, 2008 3:30 am

Post by 1974ER »

We may be nuts... but in a good way. :D Also, I would tend to agree with Bringoutyourdead's decision. Making alterations or more precisely addition(s) to the character list would increase the length of census without having a large improving effect on accuracy. One overall tricky thing is that there are such huge variations between factions, even within a single server and on large servers, also within the day. Some large realms double, triple, or in rarer cases quadruple their activity levels when one compares early mornings to the peak hours between roughly 19.30 and 21 (at least on EU servers, I haven't really looked into how things run in the US, with the higher timezone spread, etc).

Post Reply