Process delay

General Discussion Area
pencey

Post by pencey »

thanks, bringoutyourdead -- good insight into how it works. and excellent name :)

In a nutshell:
The "power-censusers" should purge after uploading, since they likely don't care about looking at the data when in the game.

If you just play a few realms, you might want to avoid purging/pruning since you'd like to be able to browse the stats in-game.


If top-uploaders are uploading 10-15M files several times per day in order to get credit for finding new/levelled characters, and a large portion of that data has already been put in the database, that's going to cause a big slowdown in the system, a system that is already just treading water (it's gone from 116.384 to 116.368 hours behind in the last 5 hours -- it's barely moved (though I think it catches up a lot in the late-night hours since there is less data)).

There could be a way to avoid this repeated processing of the data. Although maybe the site is already doing this..

- make a new table with these columns:
User, Realm, Faction, lastCensusTime
- lastCensusTime is not the same as last upload time. It's the time of the last census taken on a particular realm+faction by that user.
- When a file is uploaded and parsed, skip over any data for a realm that is older than that lastCensusTime -- don't check the database to see if the data is duplicated, since you already know it is old because this data was parsed the last time they uploaded their file.

User avatar
bringoutyourdead
Forums Admin & general flunky
Posts: 1432
Joined: Fri Nov 07, 2008 1:11 pm
Location: Texas, USA

Post by bringoutyourdead »

use the Search button at the top any forum page and search for 'purge' as subject and '1974er' as author.. you will see that the "power-censusers" :) are doing as you suggest/expect.

The real problem is that we the users of this site are pushing Rollie's resources to the limit and it is just going to take time for the backlog to clear.

pencey

Post by pencey »

i just opened up the data file and it looks like i misunderstood how it works.

i thought that each time you took a census, it'd add another line if it found the same character again (pencey level 50, nov 15 2010... pencey level 54, nov 30 2010), but it looks like it actually just keeps the most current data for that character (pencey level 54, nov 30 2010).

so.. nevermind, frequent purgings are not going to make the huge difference I was expecting.

the only 'cure' would be less-frequent uploading, but as you said, you don't get as good data that way. (I thought you'd end up with the same data, since i thought it tracked each viewing of each character)

Balgair
Araiceil
Posts: 1716
Joined: Fri Sep 30, 2005 11:47 am
Location: UK

Post by Balgair »

At the moment while I'm censusing as much as I am, I'm pruning all data older than 2 days every day. Now, I could totally empty my datafile but I happen to like seeing the stats as well, and it's pretty tricky getting a balance between wanting to see the data and keeping the file small enough, since even if I do purge entirely, it'll be over 10MB within a few hours anyway! (The file upload limit is 10MB via the website, but higher if you use UniUploader; not sure of the max there but I'm regularly uploading 12-15MB files, even with pruning heavily)

1974ER
Epic Censi
Posts: 762
Joined: Fri Nov 07, 2008 3:30 am

Post by 1974ER »

Excellent posts by Bringoutyourdead and Balgair. I'll throw in a few more thoughts of mine. I just did a complete purge about 45 minutes ago. The last file I submitted last night had grown to a total size slightly exceeding 9 MB (I upload manually, so I am restricted by the 10 MB upload limit) and according to the purge information, contained a few more than 86 000 individual characters.

As I write these words, my current .lua file has already exceeded 3 500 characters, I have finished a couple of dailies, giving one of my alts a new level :D , through the clever use of manual who covered 10 of my actually played characters and covered 9 factions on a total of 7 realms.

pencey

Post by pencey »

edit: nothing to see here, it's on to a new file now!
--

last processed file has been "2010-11-25 22:29:18" for about 4 hours now.
Is this a normal delay for a large file, or has the database stalled :( ?

Hybuir
Gear Dependent Squirrel
Gear Dependent Squirrel
Posts: 1471
Joined: Tue Sep 06, 2005 6:22 am
Location: Austin, TX
Contact:

Post by Hybuir »

wait.

Dameon
Census Taker
Posts: 19
Joined: Thu Sep 20, 2007 6:54 am
Location: Ohio

Post by Dameon »

"power-censusers" /slap /bite Oh that tasted bad. ack ack cough cough. I think all big submitters should be banned for about...oh...4 years. Then maybe I can catch up! hee hee

J/K

D
Dameonii, Damon, Lucci, Luccia, Nyctherius, Jedrus, Deathzone, Bankershours of Whisperwind

1974ER
Epic Censi
Posts: 762
Joined: Fri Nov 07, 2008 3:30 am

Post by 1974ER »

Examines time spent to results ratio. *coughs* Try... about 38 years? *chuckles loudly*

Sidenote: "Power-censusers" are not to be consumed internally. :D

User avatar
bringoutyourdead
Forums Admin & general flunky
Posts: 1432
Joined: Fri Nov 07, 2008 1:11 pm
Location: Texas, USA

Post by bringoutyourdead »

On the tenth day...

Saw something very interesting today.. mid afternoon.. I started seeing large increase in level 20's down to level 1 on some of the servers I've been running census+ on. The program had to break up the chunking size all the way down to n-n i.e. 2-2 on many of the levels and some midteen levels even had to break down by races to get the results below the overflow number (50)

While it is true that I don't normally watch my counts .. I think this is the first time in two years.. that I've seen such a large activity bubble at the starting end of the game.

Since today was the last day for my trial account after my final census run I did a day by day prune back and checked the census after each prune:
unpruned file size 9,018,002
prune 10 .......... 8,935,090 characters pruned 749
prune 9 .......... 7,731,966 .................... 10837
prune 8 .......... 7,174,089 .................... 5070
prune 7 .......... 6,471,893 .................... 6384
prune 6 .......... 5,489,386 .................... 9020
prune 5 .......... 5,402,419 .................... 722
prune 4 .......... 4,266,269 .................... 10303
prune 3 .......... 2,913,941 .................... 12247
prune 2 .......... 2,793,031 .................... 1101
prune 1 .......... 2,083,486 .................... 7203
prune 0 .......... 41,552 ................... 18604

Remember /census prune x --- means prune out all characters that haven't been seen in the last x days. At prune 0 the only data left in the censusPlus.lua file was the record of realm/factions that I had run census runs and the timestamp of those runs.

pencey

Post by pencey »

bringoutyourdead wrote:On the tenth day...
Since today was the last day for my trial account after my final census run I did a day by day prune back and checked the census after each prune:
unpruned file size 9,018,002
prune 10 .......... 8,935,090 characters pruned 749
prune 9 .......... 7,731,966 .................... 10837
prune 8 .......... 7,174,089 .................... 5070
prune 7 .......... 6,471,893 .................... 6384
prune 6 .......... 5,489,386 .................... 9020
prune 5 .......... 5,402,419 .................... 722
prune 4 .......... 4,266,269 .................... 10303
prune 3 .......... 2,913,941 .................... 12247
prune 2 .......... 2,793,031 .................... 1101
prune 1 .......... 2,083,486 .................... 7203
prune 0 .......... 41,552 ................... 18604

Remember /census prune x --- means prune out all characters that haven't been seen in the last x days. At prune 0 the only data left in the censusPlus.lua file was the record of realm/factions that I had run census runs and the timestamp of those runs.
I think this shows the importance of pruning tightly and often... if you upload this file daily (and it appears you upload more than once per day typically), you are generating ~18604 character updates per day (more on a weekend probably), but if you have not pruned for 10 days, the database has to do a lot of extra work: it needs to check 63636 characters who do not actually need updating (since they were in your previous uploads), so only 22.6% of your file is useful. Over 75% of the character lookups generated from the file are useless (and in fact since you are uploading twice a day usually, it's actually more like 87%).

I'm not sure what the time ratio of (character lookup + update) vs (just character lookup), but it could be that your files take twice as long to process as they otherwise would.

and i'd imagine most anonymous uploaders don't prune until they hit the filesize limit, so they probably have months of data for their server in one file.. who knows what horrible ratio they have..

(unless i'm still not understanding how this works.. maybe the upload script has a fast way of telling if old data should not be compared to the database..)

1974ER
Epic Censi
Posts: 762
Joined: Fri Nov 07, 2008 3:30 am

Post by 1974ER »

Pencey, I am sorry to point out something again... Your math has a big logic error. As a character is only listed once in the .lua and can level up multiple times during a single day (or in rare cases, change guild status) even submitting the exact same character multiple times in a day can produce new data.

Also, the number of people who cover a really large number of factions on even semi-regular basis is really limited. Balgair, myself, bringoutyourdead and maybe half a dozen others.

Furthermore, there comes a point where pruning really doesn't help anymore. Granted, only I and Balgair break that point often, but that's exactly why there is no point in asking Rollie to write a prune in the style /censusplus prune hours 6 to kill all data older than 6 hours.

Just as a further example: Yesterday... I made four submissions... by the first submission, the .lua had gone from fully purged to 1+ MB... by the 4th, about 15 hours later, it was a hair under 5 MB. And I only covered 21 different factions. But... I ran almost 100 censuses and 19 of those factions were covered 4 times, the remaining two thrice.

I didn't purge 0 this morning... and the 6th post-purge submission I just made had a file size of 7,78 MB. Compared to yesterday, the number of factions hasn't changed at all... but there will be plenty of new data, both new and updates... and several characters will be of (much) higher level, a few will have left / joined guilds. Others will just have their last seen date changed...

But.. no matter how much I would look into the file... I could not tell, if I have seen a specific character once, twice, thrice, 4, 5 ot 6 times. Or perhaps only on the 2nd and 5th time.

User avatar
bringoutyourdead
Forums Admin & general flunky
Posts: 1432
Joined: Fri Nov 07, 2008 1:11 pm
Location: Texas, USA

Post by bringoutyourdead »

Just to add an example - on multiple occasions I've played the game to see just how fast I could level a character. Using the advantage of some of the World events, I've been able to go from level 1 to level 20 in about 12 game hours.

And if I remember correctly the 1st Draenei (or maybe it was Blood Elf) went from start to level cap in just 4 days real time.

pencey

Post by pencey »

If you mean my older posts, yes, I had lots of mistakes, since I didn't understand the dataformat (I thought every viewing of a character was recorded, since that sort of data would be needed to generate hourly character activity, but now I see it actually just tracks the most recent level seen and the date it was seen, and hourly stats are recorded just as character counts associated with a census time). I should edit those to point out my mistakes..

But, if you mean my post about bringoutyourdead's table of pruning data, then I don't see the error..

Given that table of data, if he submitted his file the day before, and then submitted the updated file 'today', the 'today' submission would contain 18604 character updates, but the whole file has 80000+ characters in it, 60000+ of which were submitted the day before, so the database has to do a lot of wasted queries finding out that all those old characters have not changed.

No?

--
pencey wrote: In a nutshell:
The "power-censusers" should purge after uploading, since they likely don't care about looking at the data when in the game.
Here i was especially wrong. Top submitters are actually more efficient (in terms of char updates vs stale chars) since they are forced to prune/purge more often.

A bad type of uploader would probably be the daily uploader who does only one or two realms and only prunes/purges when they reach the limit. They'll be submitting files with 20000 characters in them, but only maybe 2000 of them were seen since their last upload, so 18000 of those characters do not actually need to be processed, but the database will still check them anyway. 9:1 ratio of stale:new data.

User avatar
bringoutyourdead
Forums Admin & general flunky
Posts: 1432
Joined: Fri Nov 07, 2008 1:11 pm
Location: Texas, USA

Post by bringoutyourdead »

But then we are back to the Problem that if I pruned or purged after every upload then all I would see locally is NEW characters.. so I would end up uploading even more often then the couple of times a day that I do now, since as far as I could tell I could be seeing a big uptick in characters in the game.

User avatar
bringoutyourdead
Forums Admin & general flunky
Posts: 1432
Joined: Fri Nov 07, 2008 1:11 pm
Location: Texas, USA

Post by bringoutyourdead »

I have a suggestion.. why don't you create a helper addon that copies the local censusPlus.lua data file, purges it and then after a census run does a compare of the two files to create a new difference only file that could then be upload?

1974ER
Epic Censi
Posts: 762
Joined: Fri Nov 07, 2008 3:30 am

Post by 1974ER »

Pencey, sorry for not being specific enough. I was pointing out the error within a single day. Part of what you said is true, but the final conclusion concerning multiple uploads within a single day is faulty through the point I made about levelling speeds.

Speaking on a theoretical level: From pure data efficiency point of view, everyone should do a full purge directly after every single submission. But, as Bringoutyourdead pointed out, that would lead to many people making much more smaller submissions, which in turn are less data efficient. In addition, many people like to have at least a little bit of local history available for themselves.

Also, such behaviour pattern would cause even more data loss. I can't do 84+ uploads and purges per day. That's technically impossible, because it would mean I would need to stay up 21+ (due to the minimum interval for uploads) hours every day... and I would deliver less data than I do now.

Even further, your nutshell conclusion isn't utterly wrong. I know I create a fairly large amount of "stale" info, but the only way to reduce it would be to do multiple purges per day and despite that strictly limit the number of submissions. And that's where we hit another snag. As far as the realms I actually play on (there are several), I would actually like to retain a lot more data, but I can't, if I want to do a lot of censusing.

As for the "bad uploader"... That would actually be someone who submits very large number of times per day, never prunes / purges and only censuses one or two factions. Consider someone who has say 40000 characters in the .lua and submits every half hour, each submission containing only 500 more characters. After just 8 hours... there are about 640k+ "stale" characters in the queue (in 16 different files). In a similar timeframe, I might do 2 big uploads, first containing about 75000 characters and the latter 86000. Even if my first submission didn't produce any updates at all, the second woulld add 11000 to check... far more than the 8000 (16 times 500) from the other person, with less than 1/4 of the waste (150k vs 640k) even with me doing a horribly bad job, with just two files.

gmmmpresser
Superior Census Taker
Posts: 63
Joined: Sun Sep 04, 2005 6:51 am

Post by gmmmpresser »

1974ER wrote:people making much more smaller submissions, which in turn are less data efficient.
Why would this be 'less data efficient'? Surely uploading/processing 300 characters would be more efficient than trying to upload/process 3000 characters.
1974ER wrote: In addition, many people like to have at least a little bit of local history available for themselves.
Can I ask what people use this 'local history' for?
Myself, I quite like to see how many people are online when I am playing. The display shows this, and I find it interesting to see how this changes over (playing) time. Other than that, why would you look through the .lau file?

Balgair
Araiceil
Posts: 1716
Joined: Fri Sep 30, 2005 11:47 am
Location: UK

Post by Balgair »

I like to see the breakdown of races/classes on my main server, personally - sure I can look it up on the website, but why do that when it's right here ingame? Plus when the site is behind on processing or I'm curious about nontracked characters it's all the more handy - how else would I have known that 625 troll druids were rolled on my server in the first day of 4.0.3a for example? Pretty impressive stat, which the site wouldn't have told me, since even when it catches up, it doesn't show the 1-30s.

1974ER
Epic Censi
Posts: 762
Joined: Fri Nov 07, 2008 3:30 am

Post by 1974ER »

Simple, there are several reasons:

1) Even a .lua with zero characters isn't an utterly empty file.

2) As a result of 1) and notes further on: 10 files of 300 is worse than one file of 3000.

3) Minimum submission interval is 15 minutes. If the census takes less than 15 minutes and the person is doing just one faction, there is either time waste or even if multiple consecutive censuses are ran, large results overlap.

4) Without very frequent purging, the next submission would contain a lot of just processed data, which would be reprocessed for nothing.

5) At least for everyone who uploads manually: Less time spent uploading and therefore more time for censusing (as one can't upload and census at the same time due to the fact that one has to logout / exit to upload).

6) Site does only one search for next file, not ten.

And so on... that list is hardly exhaustive.

As for your second question...

I can't give you an all covering answer. I, myself, pretty much use it the same way you do, keeping an eye on changes.

EDIT: Balgair, surely you mean 1-29? ;) *nitpick, sorry :( *

Post Reply