
In a nutshell:
The "power-censusers" should purge after uploading, since they likely don't care about looking at the data when in the game.
If you just play a few realms, you might want to avoid purging/pruning since you'd like to be able to browse the stats in-game.
If top-uploaders are uploading 10-15M files several times per day in order to get credit for finding new/levelled characters, and a large portion of that data has already been put in the database, that's going to cause a big slowdown in the system, a system that is already just treading water (it's gone from 116.384 to 116.368 hours behind in the last 5 hours -- it's barely moved (though I think it catches up a lot in the late-night hours since there is less data)).
There could be a way to avoid this repeated processing of the data. Although maybe the site is already doing this..
- make a new table with these columns:
User, Realm, Faction, lastCensusTime
- lastCensusTime is not the same as last upload time. It's the time of the last census taken on a particular realm+faction by that user.
- When a file is uploaded and parsed, skip over any data for a realm that is older than that lastCensusTime -- don't check the database to see if the data is duplicated, since you already know it is old because this data was parsed the last time they uploaded their file.