The support for this site is beginning to really bother me. Lich King battlegrounds are not supported, much less Cataclysm ones.
The census addon still functions and the data is still there, but it depends on contributions, which are very lacking. Uni uploader seems to have stopped working in the last couple of days.
Wouldn't it be better to just data mine the armory? Then you wouldn't have to worry about contributors of any kind. I'd love to see the exact same data displayed as it currently is but all data pulled from the armory instead of addons and uploading. Wouldn't that be great?
In the meantime, does anyone know any site that actually does armory datamining that has race/class/server population counts?
Armory Datamining?
-
- Census Taker
- Posts: 54
- Joined: Mon Mar 26, 2007 11:22 am
- Location: Greenville, SC
First to avoid repeating myself excessively, please read my post in the Uni uploader thread.
Secondly, no. Because it is impossible to pull the exact same data from the armory (total hourly activity levels and such, armory is only accurate to the day, at best, and at worst, it has been known to be off by several weeks, even). Also, armory doesn't account for levels 1-9 at all, Censusplus does to a degree.
And to your final question... I don't think there will ever be such a thing. The only way such a data level would be reached, would be by Blizzard granting everyone the ability to pull any selected data directly out of the armory itself. Even then, the system would be clunky due to sheer amount of data. And again, the way armory currently works, the data would actually be LESS accurate in certain ways.
As for an outside crawler application... did you even remotely consider how complicated such thing would have to be? Or the kind of strain it would be on servers? I wouldn't be at all surprised if such a large scale data request would get interpeted as Denial Of Service attack upon Blizzard Armories.
I admit that I will be pulling the following figures out of my ***, but if you can throw me better ones, please do so.
Armories would need to be crawled a minimum of 4 times per hour to compare to theoretical WCR values. Each crawl would have to cover at least 10 million characters (assuming every EU and US player has an average of just 2 characters at or above level 10 and there are a total of about 5 million active EU & US subscriptions). So, EU and US armories would need to handle 40 million+ queries an hour (excluding non-crawler armory users).
Can WCR (or any other private site for that matter) even handle 40 million+ inputs per hour??? Just so that you get a general idea how much that is... as of right now... I have somewhat over 14,3M updates. It has taken me a bit under 2 years and 7 months to reach that figure. As a rough guess, that means I might have easily submitted about 40 million+ units of data as not even half the data is an actual update.
Granted, as a human being I am very inefficient... but that's not the point. The point is... as is I, Balgair and a few other active census users have managed to cause problems for Rollie with our high upload figures... now try multiplying that traffic by a factor of... ummm... 2 years, 7 months is roughly 365+365+210 days x 24 hours per day... 940 x 24 = 22560 times as much daily traffic than what I generate alone, on long term average.
Ask Rollie if the site can handle the equivalent of 22560 "copies of 1974ER" on hourly basis and I think he'll tell you something along the lines of "No way, are you f**king kidding me?"

EDIT: Removed couple of minor typos.
Secondly, no. Because it is impossible to pull the exact same data from the armory (total hourly activity levels and such, armory is only accurate to the day, at best, and at worst, it has been known to be off by several weeks, even). Also, armory doesn't account for levels 1-9 at all, Censusplus does to a degree.
And to your final question... I don't think there will ever be such a thing. The only way such a data level would be reached, would be by Blizzard granting everyone the ability to pull any selected data directly out of the armory itself. Even then, the system would be clunky due to sheer amount of data. And again, the way armory currently works, the data would actually be LESS accurate in certain ways.
As for an outside crawler application... did you even remotely consider how complicated such thing would have to be? Or the kind of strain it would be on servers? I wouldn't be at all surprised if such a large scale data request would get interpeted as Denial Of Service attack upon Blizzard Armories.
I admit that I will be pulling the following figures out of my ***, but if you can throw me better ones, please do so.

Armories would need to be crawled a minimum of 4 times per hour to compare to theoretical WCR values. Each crawl would have to cover at least 10 million characters (assuming every EU and US player has an average of just 2 characters at or above level 10 and there are a total of about 5 million active EU & US subscriptions). So, EU and US armories would need to handle 40 million+ queries an hour (excluding non-crawler armory users).
Can WCR (or any other private site for that matter) even handle 40 million+ inputs per hour??? Just so that you get a general idea how much that is... as of right now... I have somewhat over 14,3M updates. It has taken me a bit under 2 years and 7 months to reach that figure. As a rough guess, that means I might have easily submitted about 40 million+ units of data as not even half the data is an actual update.
Granted, as a human being I am very inefficient... but that's not the point. The point is... as is I, Balgair and a few other active census users have managed to cause problems for Rollie with our high upload figures... now try multiplying that traffic by a factor of... ummm... 2 years, 7 months is roughly 365+365+210 days x 24 hours per day... 940 x 24 = 22560 times as much daily traffic than what I generate alone, on long term average.
Ask Rollie if the site can handle the equivalent of 22560 "copies of 1974ER" on hourly basis and I think he'll tell you something along the lines of "No way, are you f**king kidding me?"


EDIT: Removed couple of minor typos.
-
- Census Taker
- Posts: 54
- Joined: Mon Mar 26, 2007 11:22 am
- Location: Greenville, SC
Then how do sites like this exist:
http://armorydatamine.wordpress.com/
http://www.guildox.com/go/g.asp?a=7&r=&w=
http://www.wowprogress.com/gearscore/us
http://www.wowhead.com/profiles
~
It's not like you'd have to stop getting contributions from players, now that I've got uni uploader set up and I coded census+ to automatically do a 30 day purge every time it runs an auto-census, I'll be uploading data automatically from now on without having to remember to do so. But also getting data the same way these websites do would only improve the data overall, wouldn't it?
I know what you mean about high upload volume, as I play on multiple realms my upload volume spiked for a bit to the point where it started giving 'errors'. Rollie actually contacted me and asked me to purge my data and start over so I did. I asked him to put in an auto-prune option but he didn't. =/ Now that I know a little more about addon coding I put it in myself the other day, just before the uploads stopped working. -.-
What I don't get is how other sites can obviously have a much bigger kind of database (including achievement completion data, and gear and everything) but this site can't.
Take for instance wowhead's profiler: http://www.wowhead.com/profiles
It says it has 18,276,121 total characters at the time I posted this.
Granted, several, perhaps even the majority of those characters will not have been updated from the armory in several days. As I understand it, the majority of their 'armory sync' data comes from some user requesting the data on a per-character basis. Even so, limiting census data to those characters that have been updated in the last 30 days should be a major amount of data that is currently not being included, as far as I can see.
http://armorydatamine.wordpress.com/
http://www.guildox.com/go/g.asp?a=7&r=&w=
http://www.wowprogress.com/gearscore/us
http://www.wowhead.com/profiles
~
It's not like you'd have to stop getting contributions from players, now that I've got uni uploader set up and I coded census+ to automatically do a 30 day purge every time it runs an auto-census, I'll be uploading data automatically from now on without having to remember to do so. But also getting data the same way these websites do would only improve the data overall, wouldn't it?
I know what you mean about high upload volume, as I play on multiple realms my upload volume spiked for a bit to the point where it started giving 'errors'. Rollie actually contacted me and asked me to purge my data and start over so I did. I asked him to put in an auto-prune option but he didn't. =/ Now that I know a little more about addon coding I put it in myself the other day, just before the uploads stopped working. -.-
What I don't get is how other sites can obviously have a much bigger kind of database (including achievement completion data, and gear and everything) but this site can't.
Take for instance wowhead's profiler: http://www.wowhead.com/profiles
It says it has 18,276,121 total characters at the time I posted this.
Granted, several, perhaps even the majority of those characters will not have been updated from the armory in several days. As I understand it, the majority of their 'armory sync' data comes from some user requesting the data on a per-character basis. Even so, limiting census data to those characters that have been updated in the last 30 days should be a major amount of data that is currently not being included, as far as I can see.
You address the answer in your last paragraph... they exist because they are, as far as I am not mistaken, all driven by user requests of various kinds.
Bigger database? Ummm... No... 18,276,121 versus the current 91,125,041 of Warcraftrealms... after Rollie pruned out several million deleted / abandoned low-level characters a few months back. Also, armory in itself retains almost no historical data... of which WCR has plenty.
Furthermore, (many of) those sites have big holes in their knowledge bases... I know this because I have tried a few... in many cases they had zero knowledge of many of my characters and guilds I was / am in even ever existing.... despite the fact that I have been playing WoW almost daily for over 3 years and some of the guilds I have characters in at the moment are actually even older than that. That's a rather huge oversight, don't you think?
Also consider this... Censusplus can process data of online characters at an approximate rate of roughly 85 to 110 characters per minute... Can you (or any other person) excute 85+ manual queries per minute on any site? I don't think so.
Let's use your 18,2+ million character profiles as a baseline... an automated system would require roughly 72,8+ million units of input per hour to attain similar result levels.
Why is that? Well, it's simple. Censusplus can only count active characters online during a census. Armory crawler would have to request ALL characters every time, because it would have no idea if there have been additions / changes to the available data.
A much more limited crawler could be handy though... a similar user driven one, to weed out erroneus data... like the fact that some characters that have been deleted, switched guild (and gone offline before next census run) / name / faction or transferred out are still shown in guild rosters here, because in terms of nature of Censusplus it's not possible to update something that no longer exists or simply isn't available in game.
WCR isn't perfect... but entirely (or even mostly) replacing Censusplus users as a source of data with an armory crawler is not technically viable, in my humble opinion.
Bigger database? Ummm... No... 18,276,121 versus the current 91,125,041 of Warcraftrealms... after Rollie pruned out several million deleted / abandoned low-level characters a few months back. Also, armory in itself retains almost no historical data... of which WCR has plenty.
Furthermore, (many of) those sites have big holes in their knowledge bases... I know this because I have tried a few... in many cases they had zero knowledge of many of my characters and guilds I was / am in even ever existing.... despite the fact that I have been playing WoW almost daily for over 3 years and some of the guilds I have characters in at the moment are actually even older than that. That's a rather huge oversight, don't you think?
Also consider this... Censusplus can process data of online characters at an approximate rate of roughly 85 to 110 characters per minute... Can you (or any other person) excute 85+ manual queries per minute on any site? I don't think so.
Let's use your 18,2+ million character profiles as a baseline... an automated system would require roughly 72,8+ million units of input per hour to attain similar result levels.
Why is that? Well, it's simple. Censusplus can only count active characters online during a census. Armory crawler would have to request ALL characters every time, because it would have no idea if there have been additions / changes to the available data.
A much more limited crawler could be handy though... a similar user driven one, to weed out erroneus data... like the fact that some characters that have been deleted, switched guild (and gone offline before next census run) / name / faction or transferred out are still shown in guild rosters here, because in terms of nature of Censusplus it's not possible to update something that no longer exists or simply isn't available in game.
WCR isn't perfect... but entirely (or even mostly) replacing Censusplus users as a source of data with an armory crawler is not technically viable, in my humble opinion.