data reliability measure

Suggestions for WarcraftRealms.com
Post Reply
znog
Posts: 37
Joined: Sun Jan 02, 2005 4:37 pm

data reliability measure

Post by znog »

as WarcraftRealms relies on players uploading their data, a bit of additional thinking could be given to data reliability : e.g. I suspect some atypical servers just look like it because there is only one census data uploaded, and the census has been done at a time there were only a few players, (like 126 of FR beta horde now).

Of course with time, as fame and uploads increase the information gets more and more accurate, but still a mesure of quality is required

some choices have already been made in this domain (after all data quality IS one of the few cores issues WarcraftRealms has to deal with). These measures/choices certainly improve overall quality but I think it is important to realize the data will never be perfect and systematically associate the data published with a measure of error.


Here are some of the (perfectible) ideas I have in mind about the issue

How to measure data quality ?
==================
Errors come form three sources
- A/ in-game data collection errors (?characters not logged when census is launched remain unseen?)
- B/ irregular data uploads ==> this is the problem I would like to measure on this post, since I think it?s the priority (note that measures like the the ?Most Wanted list? already aim at this issue)
- C/ data processing (but this error should only be the result of deliberate choices in data processing : (filtering chars lvl<10, 30 Day Data Window, choices given or not on the interface, factorial components mappings).

My main question here is ?how to measure B/?
This problem has been tackled more scientifically, like by statistics (polls have margins of errors) and theory of information, so these are only very rough ideas.

1/ Detect anomalities (like a server with 100 players) and refuse to take into account this data since this is clearly false. Maybe even refuse to display it (but this would not be very nice to the guy who made the effort to collect it, so maybe just display the data with a warning, different color). The problem there is to find a good algorithm to sort good data from bad data.

2/ evaluate the added value of a new information upload: this is subtle, but neat ?. Here is the idea: the more added value there is in a new upload the less the information was accurate.
So each time the database for a faction/server is updated an evaluation of the number of modifications must be made. If there are few changes, this means the data was good. If there are lots of changes (new levels, new char, new guild whatever) it means the data was quite innacurrate.
This is simple and neat, provided the indicator is well-calibrated ?. but with a major flaw : this evaluates the value of the previous dataset, not the new one ! Of course this indicator is not perfect : two lua collected at in a short period of time give the impression of good data quality, so some smoothing (like a moving average) is required.

Still I?d say this a good way to tackle the issue.




NB : measuring data quality is one thing, but improving it is necessary as well. E.g. for B/ error, developing a simple executable doing the job of uploading the file, maybe even making it automatic after chacking the lua has been updated is useful as well!

User avatar
Rollie
Site Admin
Posts: 4783
Joined: Sun Nov 28, 2004 11:52 am
Location: Austin, TX
Contact:

Post by Rollie »

All excellent points. I do have some plans to at least display more information on the data quality that has been submitted. You idea of evaluating submitted data is an interesting one and something I'll have to think on.

I'll think about all this more and see what I can come up with =)
phpbb:phpinfo()

Post Reply