Process delay

Balgair · Post by **Balgair** » Thu Dec 02, 2010 4:40 pm

Lol dammit, yes you're right, nitpicker

You knew what I meant though

pencey · Post by **pencey** » Thu Dec 02, 2010 7:29 pm

bringoutyourdead wrote:I have a suggestion.. why don't you create a helper addon that copies the local censusPlus.lua data file, purges it and then after a census run does a compare of the two files to create a new difference only file that could then be upload?

something like this could be good... but i think ideally the database would simply be able to tell which data to not both querying the db for, to avoid complicating things for the users.

an idea for how to implement this, which is hopefully relatively simple to do (not requiring too much change to existing code, just adding new code)...
--For each census taken, associate a unique identifier with it (perhaps the time, server, and character name), and then give this identifier an ID within the file. Eg "1 = dec 2 2010 23:00:00, firetree, pencey". Or use a GUID if the database supports them, along with the date/time for ordering purposes (http://en.wikipedia.org/wiki/Globally_unique_identifier), which would be faster in the db -- 1={3F2504E0-4F89-11D3-9A0C-0305E82C3301}=(the date/time of the census)
--Then, for each character, have the ID for the GUID for the time that char was updated.
--When a file is done processing, save the most recent GUID into a table in the database and put the date for it in there, too.
--When a file is uploaded, check the GUIDs it contains to see if any are found in the DB. if a match is found, then ignore any characters which, in the .lua file, were last updated before the census associated with that GUID was taken.
--In the database, delete GUIDs that are over a month old, since you don't want to keep track of GUIDs you're never going to see again (if someone is uploading once per month and not pruning, well, they aren't causing much strain on the db)

pencey · Post by **pencey** » Thu Dec 02, 2010 7:44 pm

1974ER wrote:As for the "bad uploader"... That would actually be someone who submits very large number of times per day, never prunes / purges and only censuses one or two factions.

Yes, that would be pretty much a worst-case scenario, but rare.

I'm trying to think of where the most database time wastage is coming from, causing the 120 hour delay we are on.

Sorry for targetting the "power-censusers" earlier -- you guys are actually fairly efficient because you need to purge so often due to the file size limit.

Users who upload several times per week and never prune are probably relatively common (it's what I would have been if not for delving in here due to noticing the big delay going on right now). If there are 100 people uploading 90% inefficient files once per day, that's a lot of wasted database time.

In the end, there are a few competing issues, with tradeoffs.
Factors...
1. frequency of censusing
2. frequency of uploading
3. level of pruning (or purging)

1. more is better. not much tradeoff on this factor.
2. it's good to upload often because this generates the most detailed data for the site. However, uploading more often means the site needs to do more work.
3. if you prune tightly (or purge), you create the most efficient upload files, but it means you need to upload more often and you get no in-game history.

So a nice middle ground needs to be found for these things

Using something like my idea above of having the upload script know what data to ignore will help with 3. 2 is more difficult to balance, but I think if 3 can be solved, 2 will not be a problem.

1974ER · Post by **1974ER** » Fri Dec 03, 2010 5:09 am

Pencey, I am not computer savvy enough to say whether your idea is feasible... but even if it is, I already see a problem with it. It will inflate the file size, meaning that a file of specific size will contain even less actual data than before. If the size change is significant enough, it will actually be counterproductive.

As for your three factors:

1) Yes, but each extra census on the same faction by the same person between submissions has diminishing returns. Also, hourly activity only really requires one census per hour. Extra censuses add characters, but again with diminishing returns.

2) Due to 1) uploading more often than once per hour makes little sense outside the cases, where one has to cut the time short or one is censusing only such small amount that stretching the session is counterproductive to efficiency.

3) File size efficiency is really up to each individual user, it can not be forced from outside (except by reducing the maximum file size allowed). Which could be done... and it would really inconvenience a very limited number of people. Namely myself and Balgair greatly and Bringoutyourdead, FuxieDK and maybe a dozen others to a lesser degree.

If Balgair and I submit somewhat smaller sized files more frequently and everyone else needs to prune / purge more often, the overall effect on the site should be positive, I think.

I am bit tired (didn't sleep too well last night), so if anyone spots any major logic failures, please post them. Thank you for reading!

pencey · Post by **pencey** » Fri Dec 03, 2010 9:12 am

I'm not sure file size really matters that much -- if the file is 5% bigger (or contains 5% less data per 10MB), but it reduces the number of database queries by an average of over 50%... I'd say that's a good thing

3) my idea lets you keep all your local data (up to the upload size limit, anyway) -- the parsing script can easily figure out what data is redundant and can be ignored, rather than checking each character against the database to see that it's already there. So I don't think efficiency of file size is a particularly big deal -- it's efficiency of database access that I suspect is the problem, I think the site is probably wasting over 50% of its time working on data that has already been uploaded, rather than ignoring the redundant data (ie not accessing the database to see if it needs to update it).

On the other hand, for all I know, it could be that updates of the db that are like 1000 times slower than checking if a char+level is already there, in which case the only solution is to upload and/or census less..

1974ER · Post by **1974ER** » Fri Dec 03, 2010 11:03 am

Actually, though the impact might be small, it does matter. First of all, bigger files upload slower. -> Less time spent censusing as one has to stay out of game during upload.

Or reversed, for those of us who often hit the limit: 5% loss per 10 MB / about 36 hours = around 4500 characters (10 MB equals approximately 90000 characters) / 1,5 days = roughly 20 x 4500 = 90000+ checks lost per month = 1,08M+ checks per year. Theoretical, of course, as I don't run 100+ censuses every day. Just to give you a rough scale.

Also, after just a month... the database might easily contain 3000+ unique IDs from just me, of which more than 2800 would constantly be utterly useless, because due to the upload limit, I would have been forced to prune / purge the relevant data from my local file within about 36 hours. maybe even less if I were to census very heavily.

And they could not be deleted, because it would be overtly complicated to set up deletion patterns for individual submission makers, so everyone would comply with 30 day rule.

Overall speaking, the best way to reduce clutter is to send over as little "stale" data as possible. *ponders* I wonder... would the best solution actually be shortening the maximum age of data? For example, dropping the maximum prune value from 30 to 20, 15 or even 10 days? I mean... data that is more than a week old... is fairly likely to be (mostly) out of date anyway. And asking people to upload once a week (or per 2 weeks), minimum... doesn't seem too unreasonable to me. Comments?

pencey · Post by **pencey** » Fri Dec 03, 2010 11:40 am

1974ER wrote: Overall speaking, the best way to reduce clutter is to send over as little "stale" data as possible. *ponders* I wonder... would the best solution actually be shortening the maximum age of data? For example, dropping the maximum prune value from 30 to 20, 15 or even 10 days? I mean... data that is more than a week old... is fairly likely to be (mostly) out of date anyway. And asking people to upload once a week (or per 2 weeks), minimum... doesn't seem too unreasonable to me. Comments?

Having the parsing script know how to ignore stale data based on when the data was collected would work better because it simply wouldn't matter what the prune limit was. It would never waste time accessing the database for bad data (beyond the initial overhead of finding the GUID to find out what data was already uploaded and should be ignored).

How to upload without being in the game:
Can't you just make a copy of the file and upload that? Then go back into the game while the copy is still uploading.
Anyway, as things are currently, we are losing lots of data because rollie had to cut out updating of chars under level 30 and delete a bunch of really old data... so that 5% is looking good to me if it means the processing is twice as fast and we can keep that other data..

In other news, my first submission finally went through

649 new, 1618 updated. Firetree server.. no one was updating Horde so I made a Horde alt.. I think they outnumber Alliance by at least 2:1.

1974ER · Post by **1974ER** » Fri Dec 03, 2010 12:04 pm

But... wouldn't it have to check absolutely every single character's GUID against the database anyway? Meaning... open file, check all GUIDs against database, then go through all that need a check for level up / guild change, execute and save changes, close file, move on to next file. In other words, having less overall characters to check would be better, right?

Theoretically, yes... but for small files, the copying and switching folders would last longer than the upload itself... and for big files, it would increase the time needed as well, because one can't upload the file WHILE it's being copied either.

And the problem with database is more related to it's size than the uploads... you might have not realized this yet... but according to the DB stats info... it currently contains almost 94 MILLION different characters. That a HUGE pile of data to shift through, no matter how well indexed it is.

EDIT: Typo + congratulations on your first new ones and updates!

Balgair · Post by **Balgair** » Fri Dec 03, 2010 12:08 pm

ER, you do realise that you CAN upload the file whilst ingame? I've always done it that way, just don't log out while it's uploading. It's only on logout that the file is written to (and it's on logout of the character, not upon exiting the game entirely, so whenever you switch realms the file is updated). So if you want to upload while censusing, just pick a nice large realm that'll take 5 minutes or so to census anyway, problem solved no matter how slow your upload speed may be

1974ER · Post by **1974ER** » Fri Dec 03, 2010 12:12 pm

Ummm... I tried that... and the file wiped, producing nothing. Never tried it again. And no, I didn't log out during upload.

Balgair · Post by **Balgair** » Fri Dec 03, 2010 12:18 pm

Heh never done that to me, only time it's wiped has been when I've crashed or otherwise logged out not the regular way (logging in on the wrong account by accident causing a force logout on the client I was already logged in on, oops). I pretty much always upload while ingame, never had a problem yet!

pencey · Post by **pencey** » Fri Dec 03, 2010 12:18 pm

1974ER wrote:Ummm... I tried that... and the file wiped, producing nothing. Never tried it again. And no, I didn't log out during upload.

Try logging out first, since the addon might need you to log out before it actually writes the data to the file and saves it.

1974ER · Post by **1974ER** » Fri Dec 03, 2010 12:23 pm

Pencey, I always do that, just because it is exactly as you (and Balgair) said, the file is saved upon log out (or exit). The point is, I tried Balgair's ingame method and the file wiped. Which is why I am not taking any risks. I lose enough data to crashes and other problems already. Estimated total data loss for this year alone already exceeds 30 MB or roughly 270000 characters.

EDIT: Typos, again.

pencey · Post by **pencey** » Fri Dec 03, 2010 12:49 pm

1974ER wrote:But... wouldn't it have to check absolutely every single character's GUID against the database anyway? Meaning... open file, check all GUIDs against database, then go through all that need a check for level up / guild change, execute and save changes, close file, move on to next file. In other words, having less overall characters to check would be better, right?

Theoretically, yes... but for small files, the copying and switching folders would last longer than the upload itself... and for big files, it would increase the time needed as well, because one can't upload the file WHILE it's being copied either.

And the problem with database is more related to it's size than the uploads... you might have not realized this yet... but according to the DB stats info... it currently contains almost 94 MILLION different characters. That a HUGE pile of data to shift through, no matter how well indexed it is.

EDIT: Typo + congratulations on your first new ones and updates!

No, it would only need to check the list of GUIDs until it finds one, and then it knows which characters to ignore.

It's like this... (census number, guid, date, server, character who took the census)
Census=1, GUID-48451487498454874987, Nov 30 2010, 3:00 PM, Firetree, Pencey
Census=2, GUID-91564984584894748496, Dec 01 2010, 6:00 PM, Firetree, Pencey
Census=3, GUID-16489451458979874874, Dec 02 2010, 9:00 PM, Firetree, Pencey

(server, faction, name, level, census number)
Firetree, Alliance, RarelyPlaysDude, level 30, Census=1
Firetree, Alliance, Pencey, level 54, Census=2
Firetree, Alliance, 1974ER, level 77, Census=3
... and way more characters with varying Census=# between 1-3.
The higher the Census=#, the more recent the update.

Now imagine the first time I upload this file is after the 2nd census (so the update of 1974ER is not there). It will add both RarelyPlaysDude and Pencey to the database and update their level. And then at the end, it'll record that GUID for the 2nd census in the database.

Now, when I upload the file after taking the 3rd census (and pencey and rarelyplaysdude were not seen in this example, so don't need to be updated), the script looks up the Census=3 GUID in the database.. it's not there, so it knows if Census=3, it should update.
It then looks up the Census=2 GUID. It finds it. Therefore it knows to ignore all chars with Census=2 or even Census=1.

There's some logic that needs to be thought about for how how to get this to work for multiple characters and servers, but that's the basic idea.

10 MB takes mere seconds to copy. You're losing like 10 seconds here making a copy of a file (Delete previous copy. select current file. Ctrl+C, Ctrl+V, wait all of 5-10 seconds for it to copy.. you don't even need to change directories).

Yeah, the size of the database is a big problem. Doing so many queries(searches) on a large database is going to slow down processing. So we want to reduce the number of queries. I think my idea could reduce the number of queries by at least 50% (probably more like 90% when it comes to people who don't need to prune/purge).
Although, perhaps it's the updates that are the slowest part, in which case all we can do is submit less (non-redundant) data...

pencey · Post by **pencey** » Fri Dec 03, 2010 12:53 pm

1974ER wrote:Pencey, I always do that, just because it is exactly as you (and Balgair) said, the file is saved upon log out (or exit). The point is, I tried Balgair's ingame method and the file wiped. Which is why I am not taking any risks. I lose enough data to crashes and other problems already. Estimated total data loss for this year alone already exceeds 30 MB or roughly 270000 characters.

EDIT: Typos, again.

making a copy of a file should never mess up the original file.. you really should try this at least once more.

Balgair · Post by **Balgair** » Fri Dec 03, 2010 12:54 pm

But... wouldn't it already do this? Random sample pulled from my lua file:

["Misshh"] = {
80, -- [1]
"Omega Chapter", -- [2]
"2010-11-30", -- [3]

It's got the date right there, so I'd assume it already would filter tham out based on date, meaning that your suggestion would only have ANY effect on people who upload the same server more than once in a day.

pencey · Post by **pencey** » Fri Dec 03, 2010 1:18 pm

I don't think it uses it for that purpose, though. I think it just uses that to report on the site when that character was last updated or reached a certain level.

If that character were updated at 1am, and then another person updated them at 11pm, it would reject the 11pm. Surely it's not working that way.

1974ER · Post by **1974ER** » Fri Dec 03, 2010 1:26 pm

Slower, things are getting mixed up with eachother:

I didn't say making a copy made a mess, I said trying to submit a file while ingame wiped it, even though I didn't log out, exit or crash.

Also, copying a file requires two directories, because one file can only exist once in any given directory. So the process would look like this: Select Censusplus.lua, hit Ctrl + C, switch to another directory, press Ctrl + V, if necessary, hit yes to overwrite old copy, wait for copy to finish, then assign copy to be uploaded, log back into game, continue using the original, while the copy is being uploaded. Unnecessarily complicated.

And if Balgair's assumption is correct, then GUIDs would actually slow things down, because they would get assigned to all uploads, even those which don't benefit from them at all.

EDIT: Posting speed is exceeding my typing speed. Pencey... you forgot that the files get processed in chronological order. The system only discards characters that have a time stamp OLDER than the current date for that character, problem sorted.

gmmmpresser · Post by **gmmmpresser** » Fri Dec 03, 2010 3:52 pm

Can I ask 1 simple question?
Why are we continuing to have this debate considering
a) None of us can ACTUALLY do anything about it.
b) There is still a backlog of about 6 days worth of data waiting to be processed.
c) There is obviously series problems with the database which are beyond our control.

Personally the best thing at the moment might be for everyone to log out of the game. Turn off your computers, and STEP AWAY FORM THE CONSOLE.

Go outside and smell the fresh air. Let the poor system try and do its own thing in its own time.

pencey · Post by **pencey** » Fri Dec 03, 2010 5:37 pm

1974ER wrote:Slower, things are getting mixed up with eachother:

I didn't say making a copy made a mess, I said trying to submit a file
while ingame wiped it, even though I didn't log out, exit or crash.

Also, copying a file requires two directories, because one file can
only exist once in any given directory. So the process would look like
this: Select Censusplus.lua, hit Ctrl + C, switch to another
directory, press Ctrl + V, if necessary, hit yes to overwrite old
copy, wait for copy to finish, then assign copy to be uploaded, log
back into game, continue using the original, while the copy is being
uploaded. Unnecessarily complicated.

I don't understand your objection here... you don't want to wait 2 minutes or whatever for the 10 meg upload to complete, but you also don't want to wait 10 seconds for the file to copy to a new directory (or the same directory -- it will rename it Copy of Censusplus.lua automatically)? It can't take more than 3 seconds to switch directories.. you can have two explorer windows open, too...

1974ER wrote: And if Balgair's assumption is correct, then GUIDs would actually slow
things down, because they would get assigned to all uploads, even
those which don't benefit from them at all.

As I said, I don't think that date is used for eliminating checks for redundant data. I could be wrong, in which case yeah the GUID idea isn't necessary because it's already doing something similar.

1974ER wrote: EDIT: Posting speed is exceeding my typing speed. Pencey... you forgot
that the files get processed in chronological order. The system only
discards characters that have a time stamp OLDER than the current date
for that character, problem sorted.

The problem is, in order to know whether to discard a character, the system (as I currently understand it) needs to query the database. Querying the database is bad if you don't need to do it. The GUID way (or potentially that existing date, but i doubt it's being used this way) will eliminate a huge amount of database accesses.

gmmmpresser wrote:Can I ask 1 simple question?
Why are we continuing to have this debate considering
a) None of us can ACTUALLY do anything about it.
b) There is still a backlog of about 6 days worth of data waiting to be processed.
c) There is obviously series problems with the database which are beyond our control.

Personally the best thing at the moment might be for everyone to log out of the game. Turn off your computers, and STEP AWAY FORM THE CONSOLE.

Go outside and smell the fresh air. Let the poor system try and do its own thing in its own time.

I explained a potential solution to the problem... and then tried to clarify... although I'm clarifying to the wrong person, but hopefully with the extra detail, it'll make sense now when rollie reads it

Anyway, my work here is done! Hopefully this method is useful and allows for even more detailed data to be uploaded.