Most important stat missing

Suggestions for WarcraftRealms.com
BambisFaline

Most important stat missing

Post by BambisFaline »

I was looking over the Top Contributor's page and noticed that you list number of new characters found and number of updated characters.

Those are nice numbers to know, but it misses out on the one most important number:

How many censuses have been taken.

The problem is, someone on a low population server can do 100 censuses in a month and come up with lower numbers than someone on a high population server who only does 10, yet the low pop/high census taken has provided better data than the high pop/low census taken person has.

I suggest a new stat: The total number of censuses taken per month.

1974ER
Epic Censi
Posts: 762
Joined: Fri Nov 07, 2008 3:30 am

Post by 1974ER »

Interesting idea. :) I do, however see a couple of minor problems.

Your idea assumes that all submission makers run censuses on only one single faction. And that, as we all know, is not true of anyone who scores really high figures on new/updated lists. No single faction, how matter how large, can put a person to the top of the existing lists.

I'll make a short example of the second one:

Person A runs 100 censuses on a low population faction.
Person B runs 100 censuses on a high population faction.
Person C runs 50 censuses on a low population faction and 50 censuses on a high population faction.

All three people have run the same number censuses... but... the truth is that from WCR point of view, person C has done more than A or B, though he has done less for a specific faction.

Some of the readers might be going: "Huh? Where did he get that idea?". Well, simple. From generic point of view, having as broad information base as possible is the best solution as follows:

Person A runs 100 censuses on a single faction.
Person B runs 1 census on a hundred different factions.

Next, 100 players look at WCR for info. Assuming that there were only exactly 100 factions, person A would provide less than 100% coverage on 1% of factions, but person B would provide more than 1% coverage on 100% of factions. Assuming the 100 players are fairly evenly spread... every single one of them will get a bit of info from person B, but for most of them, person A's efforts mean nothing. The only situation, where A's efforts would mean more would be one, where the players were so heavily concentrated to the single faction A censused that they would make out 51 or more of the 100. And that kind of situation is very far from happening in any foreseeable near future on the real realms.

Also... I dislike bragging, so I apologize for asking and answering... how many people reading this can say with high level of certainty that they average more than 42 censuses per day? If someone is wondering why I picked number 42... no, it has nothing to do with Douglas Adams or the meaning of life... :D It is simply the number of censuses that I need to make as minimum to cover each faction that I am currently, permanently or semi-permanently, keeping an eye on, twice.

I can. :D

I am not against your suggestion, in fact, if it easy or fairly easy, please put it in Rollie! :) I am simply saying that most, if not all of the people on it will also feature prominently in the top 10s or just below them.

On a further note, WCR needs all three types of people, those who run a lot of censuses on low population factions, those who run smaller numbers of censuses on larger factions and also those who do a little bit all over the place.

As Rollie likes to put it: Even a single person can do a lot for WCR! :D

With that note, happy censusing, everyone! :D

Edit: some typos. :(

User avatar
Rollie
Site Admin
Posts: 4783
Joined: Sun Nov 28, 2004 11:52 am
Location: Austin, TX
Contact:

Post by Rollie »

It would be interesting to see that one, and not too tough to gather that data either.
phpbb:phpinfo()

BambisFaline

Post by BambisFaline »

1974ER wrote:Interesting idea. :) I do, however see a couple of minor problems.

Your idea assumes that all submission makers run censuses on only one single faction. And that, as we all know, is not true of anyone who scores really high figures on new/updated lists. No single faction, how matter how large, can put a person to the top of the existing lists.
Ah, but you see, the problem has already been addressed. Even though one person can run several small censuses, they are all uploaded under a single user name. Thus if Person A runs 100 censuses on one server and Person B runs 30 censuses on four servers, Person B obviously ran more censuses than Person A.

So I think total number of censuses would still properly reflect the amount of effort a single user has put in rather than character count.

gmmmpresser
Superior Census Taker
Posts: 63
Joined: Sun Sep 04, 2005 6:51 am

Post by gmmmpresser »

I, for one, would be very interested in how many censuses you do 1974.

1974ER
Epic Censi
Posts: 762
Joined: Fri Nov 07, 2008 3:30 am

Post by 1974ER »

Bambi's Faline wrote:Ah, but you see, the problem has already been addressed. Even though one person can run several small censuses, they are all uploaded under a single user name. Thus if Person A runs 100 censuses on one server and Person B runs 30 censuses on four servers, Person B obviously ran more censuses than Person A.

So I think total number of censuses would still properly reflect the amount of effort a single user has put in rather than character count.
I am sorry, but I have to disagree with you, because you entirely missed my one of my points. I will try another simplified example to clarify:

Person A runs 100 censuses on a low population faction.
Person B runs 100 censuses on a high population faction.

Now, they both ran the same number of censuses, but the effort made by person B is greater. Why? Because censuses are easy to run, running one in itself is not an effort... BUT... the TIME for doing so is.

Noone present can claim to have unlimited time, in fact for most of us time is the single biggest lmiter of our censusing efforts. If I wanted, I could probably run 100 censuses on EU-Silvermoon Horde in less than 3 hours, even during peak hours. Running 100 censuses on EU-Argent Dawn Alliance side during peak hours... estimated time: about 42 hours.

Actually, I think that on average, I am already giving EU-Silvermoon Horde more than above mentioned amount of effort per month, because it such a quick census to run. As for EU-Argent Dawn Alliance, they mostly get about 1 census during peak hours per day, the other censuses are outside the peak times as I prefer to spread my efforts to get better coverage, due to having characters that I play with on multiple servers and in two cases on both sides of a server.

Also, in my personal opinion, anyone who runs censuses on faction(s) they don't play on, puts in more effort, because most of us don't have multiple accounts and therefore running a census elsewhere is directly taken out of our time, which in theory could be spent to play.

Gmmmpresser: If Rollie manages to introduce this feature, you will get to know. Actually, I would like to know it myself. :D I only know the figure is very high, in excess of 1200 per month with very high probability. Yesterday, the daily figure exceeded 70 as I ran 4 factions once and 21 at least 3 times, about 7 or 8 of them 4 times... = 4 + 63 + 7 or 8 > 70. Due to waking up early for work and my large time difference to WCR, the results of the 21 first actually went to the submission of 23:06:55 on Wednesday, as for me the time was 07:06:55 and some of the 7 or 8, which were submitted 16:33:28, were actually done very early this morning, as that translates to 00:33:28 for me.

Even if one substracted those and counted just what was done during Thursday, no matter which clock one looks at, the daily figure would still exceed 46.

As for today, the 00:32:04 submission already contains data from 21 factions and due to minor technical problems, possibly 22 or 23 censuses + a few misc bits of data from uncompleted ones.

I have to leave for work in about 25 minutes, so I don't have time to do another "full 21", but I will be submitting a few more censuses before I leave. :D

Have a nice day, everyone! I will return to check the forums after work. :D

BambisFaline

Post by BambisFaline »

1974ER wrote:Now, they both ran the same number of censuses, but the effort made by person B is greater. Why? Because censuses are easy to run, running one in itself is not an effort... BUT... the TIME for doing so is.
You make a good point in that a high pop server takes more time to run than a low pop server. But I think your example isn't typical. It is unlikely that you are going to get a bunch of people running the same number of samples. While it does take more time to run a census on a high population server, we also have to consider that it also takes time to be around for each census. Thus it may take Person A 15 minutes to run a single census on a high pop server, it takes Person B more time than the 5 minutes * two to run two census on a low population server simply because he had to also make the time to be available while Person A can pop in and out and be done.

I also think my point stands that multiple surveys provides better data than limited surveys. So if Person A does on survey on a high pop server that requires 15 minutes to run and Person B does 15 surveys on a low pop server that takes one minute to run each, Person B put in more effort (had to be available 15 separate times as opposed to one time for Person A) but also provided better data (pop over time rather than a single snapshot).

Extreme examples, but this is why I think total number of census taken would be a good stat to report along with characters seen. I think it better reflects the effort put in, but then that's imho and ymmv. :)

Dameon
Census Taker
Posts: 19
Joined: Thu Sep 20, 2007 6:54 am
Location: Ohio

Post by Dameon »

Ok my question is with you guys that are putting up the big numbers, When do you actually play? I sounds like all your doing is running the census on various servers. I let my census run on auto and it usually runs several times while I am playing but of course I only submit maybe once a day and only from one server.
Dameonii, Damon, Lucci, Luccia, Nyctherius, Jedrus, Deathzone, Bankershours of Whisperwind

1974ER
Epic Censi
Posts: 762
Joined: Fri Nov 07, 2008 3:30 am

Post by 1974ER »

Bambi's Faline: If he did so on 15 different days, I would agree with you... but then we run into this: Two (or more) people could run 15 censuses on a certain faction in a month in many different patterns:

One scan every two days during early mornings.
One scan every two days during peak hours.
About 4 scans per week with random spaces in between, but only during workdays.
About 4 scans per week with random spaces in between, but only during weekends.
15 scans in one week and nothing during the rest...

Etc... and they all would get 15 censuses. The joke here is that 15 does not necessarily equal 15, in fact, on WCR, it almost never does. :D

Though, admittably that one too falls under IMHO, YMMV.

Dameon: This isn't necessarily true of all big submission makers, but I have characters that I actually play fairly often on 4 different realms, in 6 factions. Also, I try to use time effectively... for example, at the moment I type these words, I am also running a census on EU-Ysera, Horde side (where I do not play) and will be switching to something else in a moment, while I continue to share my thoughts here. :D

In very general terms, very high figures require time, censusing multiple realms and submitting data often. Also, at least on EU-realms, the best time to catch lots of people online is about between 19 and 22 o'clock. Running censuses on realms on the most wanted lists tend to produce higher amounts of new and updated characters as the data sadly often contains month long holes. :( But I guess you knew most of this theory. Going to edit a bit later, need to go AFK for a short while.

Edit: I actually wanted to write more, but I feeling pretty tired... I have already been awake almost 18 hours and the day has been anything but usual in other ways as well... It has been over 16 years since I last disassembled, cleaned, reassembled and fired an assault rifle or listened to lectures on changes in troop designations, laws and so on...
Last edited by 1974ER on Sat Feb 06, 2010 2:36 pm, edited 1 time in total.

Dameon
Census Taker
Posts: 19
Joined: Thu Sep 20, 2007 6:54 am
Location: Ohio

Post by Dameon »

1974ER I wasn't trying to be critical, so I hope you didn't take my previous post wrong. I was just looking at the numbers I submit and those of others and am just totally amazed at how high those are and how one could have time to play at the same time. lol I guess multiple accounts would help also. Oh and I am only on one realm although I have toons on 3 other realms but never play them. I have 9 on Whisperwind including one horde. Guess I am stuck on the alliance side. lol

Anyhow you guys keep up the good work. Someday when you have grown tired of submitting I may have a chance to catch you. lol

D
Dameonii, Damon, Lucci, Luccia, Nyctherius, Jedrus, Deathzone, Bankershours of Whisperwind

1974ER
Epic Censi
Posts: 762
Joined: Fri Nov 07, 2008 3:30 am

Post by 1974ER »

Dameon, I did not take your question as critique, I took it as honest curiousity and tried to reply the best I could, considering English isn't my native language and being tired makes me even more prone to various errors than native speakers.

Yes, multiple accounts would help a lot. I only have one though. They were one thing that slipped my mind while writing the previous post and its edit.

As for catching up... I don't think a very large number of people will bypass me, Kerebo or Balgair any time soon. In a few hours, I have been a member here for 15 months... and to be honest, Balgair and Kerebo can attest that pulling 500k+ updates in a month is not easy, doable yes, but not easy. Doing so month and month after another, even more difficult. That means anyone starting from scratch today can reserve about... 13 months minimum to break into top 3 updaters, provided all three of us quit censusing today for forever... :D And even if someone managed 1M+ updates per month, hitting top 3 updaters would take about half a year, bypassing me almost 8 months... :D

On the new characters side, the situation isn't much different, realistically about 8 to 9 months minimum into top 3 (this assumes about 140k new ones per month... which is a figure I for example have never reached myself) and about 10 to secure position 1. If I recall correctly, my monthly average is a bit over 80k new at the moment... assuming that speed... about 13 months into top 3, about 15 to bypass me and take 1st place.

Edit: Multiple typos. Going to bed now or I will soon fall asleep on top of my keyboard. :/

BambisFaline

Post by BambisFaline »

1974ER wrote:Bambi's Faline: If he did so on 15 different days, I would agree with you... but then we run into this: Two (or more) people could run 15 censuses on a certain faction in a month in many different patterns:

One scan every two days during early mornings.
One scan every two days during peak hours.
About 4 scans per week with random spaces in between, but only during workdays.
About 4 scans per week with random spaces in between, but only during weekends.
15 scans in one week and nothing during the rest...

Etc... and they all would get 15 censuses. The joke here is that 15 does not necessarily equal 15, in fact, on WCR, it almost never does. :D
I must admit that I'm not quite sure what the point is you're trying to make.

When it comes to data and stats, the number of samples determines how accurate your data is.

One person running 15 censuses at one minute each provides better data than one person running a single census that takes 15 minutes.

I would go so far as to say that the person running the 15-one minute census put in more work than the person running the one-15 minute census because he had to be available 15 times verses the one time the other person had to.

This certainly isn't meant to minimize the time someone spends running a census on a high population server. Far from it. I reside on a high-pop server, so I understand how much time it takes. My original point, which I think might be well to reiterate, is that total number of censuses run would be a good stat to see.

1974ER
Epic Censi
Posts: 762
Joined: Fri Nov 07, 2008 3:30 am

Post by 1974ER »

Unfortunately, I can see that... I will try again:

Given an identical number of samples, two people can generate data, which has different overall accuracy. Let's say two people run just one single census during a month:

Person A runs his census at 06:00, person B at 20:00. The probabillity that person B generates more data is very high, which means his data gives a better picture of the overall population in that faction than person A's.

Given an identical amount of time, two people can generate data, one of them will produce more accurate data, the other will generate a greater volume of data. For example 2 x 7,5 minutes versus 1 x 15 minutes.

Given an identical amount of BOTH samples AND time, two people can still produce highly variable overall accuracy.

Person A runs 15 x 1 minute censuses (one census per day on January 2nd, 4th, 6th, 8th, etc...)
Person B runs 15 x 1 minute censuses (but he chooses to run 8 of them on January 1st and 7 on January 2nd)

On February 1st, we look back. All of person A's data is still visible, but this is no longer true of person B. They both used 15 minutes and provided 15 samples, but person A still produced data with better accuracy. So, neither number of samples nor the time used alone determine the accuracy of the data that was created.

Quote: "I would go so far as to say that the person running the 15-one minute census put in more work than the person running the one-15 minute census because he had to be available 15 times verses the one time the other person had to." I agree 100%, provided the person doing the one minute ones did not run them directly one after another, producing an indentical block of time, which however would contain less data.

To further clarify what I meant with 15 not equaling 15... try this one...

Person A runs 2 censuses, time taken: 5 + 10 minutes.
Person B runs 2 censuses, time taken: 7,5 + 7,5 minutes.
Person C runs 2 censuses, time taken: 1 + 14 minuntes.
Person D runs 2 censuses, time taken: 6 +9 minutes.

All have 2 samples, all spent 15 minutes... but what about the data quality, quantity and accuracy? Truth is, given the just the 4 lines of info, I can't tell. Can you?

BambisFaline

Post by BambisFaline »

1974ER wrote:Given an identical number of samples, two people can generate data, which has different overall accuracy. Let's say two people run just one single census during a month:

Person A runs his census at 06:00, person B at 20:00. The probabillity that person B generates more data is very high, which means his data gives a better picture of the overall population in that faction than person A's.
Ah, I understand now. Unfortunately, you are incorrect in an important matter. In your example above, Person B does not in fact give a "better picture of the overall population". Person B, like Person A, simply gave a snapshot of the population at a given time. The fact that Person A probably had to run his survey longer due to a higher population does not make his data "better". Only multiple surveys taken multiple times will provide a good picture of the population.

To further clarify what I meant with 15 not equaling 15... try this one...

Person A runs 2 censuses, time taken: 5 + 10 minutes.
Person B runs 2 censuses, time taken: 7,5 + 7,5 minutes.
Person C runs 2 censuses, time taken: 1 + 14 minuntes.
Person D runs 2 censuses, time taken: 6 +9 minutes.

All have 2 samples, all spent 15 minutes... but what about the data quality, quantity and accuracy? Truth is, given the just the 4 lines of info, I can't tell. Can you?
Well, we can answer some questions:

Data quality: The data is what the data is.

Data quantity: Each ran 2 censuses. How much time it took to run each census is completely irrelevant to the value of the data collected.

Data accuracy: There is the only real question mark, as the accuracy is not about what data is collected, but how the data is interpreted.

Let me use your example as an example:

Person A runs two census, each census at 2 AM in the morning, taking 1 minute to run the survey because of low server population and thus putting in a total of two minutes of actual data collecting.

Person B also runs two census, one at 5 PM on Saturday and one at 12 Noon on Sunday. Each census took fifteen minutes to run, thus putting in a total of 30 minutes of actual data collecting.

Person A's census provides far superior information than Person B because he took his census at the same time of day, while Person B took his census at different times. This in spite of the fact that Person B took longer to collect his data.

Now, that above example is presented in a vacuum, since multiple people are running census at different times on the same server. The data is collected, lumped together, and interpreted.

But since we're breaking down data collectors into who found the most new characters and who saw the most people on line, we also need to see who is running the most census, as more census is far better than number of toons saw. After all, who provides better data? Person A who runs 3000 census in a month and saw only one player each time, or Person B who ran one survey during the month and saw 2 million players? Person B would top the chart, but Person A ultimately provided more useful data.

1974ER
Epic Censi
Posts: 762
Joined: Fri Nov 07, 2008 3:30 am

Post by 1974ER »

This is getting interesting, for which I thank you. :D

Now... unfortunately, I disagree with almost the entire content of your post. However, I am not saying that you wrong, just that I have a highly different opinion.

Paragraph 1: The second to last phrase, I assume you meant to say person B, because otherwise the phrase makes zero sense. Also, I said "better", not "good". If I am allowed to have only one snapshot worth of data, I want to have one taken at a time, where the population online is likely to be high(est).

Consider the following two examples:

1) You get one snapshot, size 125 characters, timepoint unknown. You can't do much with this. Now, you are given a time, 20:00. Now you can tell that the population is likely to be low OR there is something else unusual going on.

2) You get one snapshot, size 2913 characters, timepoint unknown. Without any extra information you know that this snapshot was NOT taken during early morning and the faction is very unlikely to have a very low population.

In other words, I said better, because in my opinion, if the number samples is very low, the value of size of those samples increases. In short, having one small snapshot is worth less than having one large snapshot.

Now to the longer part:

Quality: I have to disagree, at least according to your later example of my example. The quality is increased if the latter snapshot was taken at the same time on some other day. That information, however, was intentionally not in my example. So the quality of the data can not be determined accurately.

Quantity: I disagree, time is not the relevant factor, but for a different reason: The relevant factor is the number of characters in the examples. Again, an unknown factor, because we don't know if the two snapshots came from the same or different factions. For example person B could have very heavy overlap if the snapshots came from the same faction and zero overlap if they came from two different one. Person C would have the least amount of overlap even if all four had chosen to snapshot the same faction twice.

Accuracy: Question mark sure... but the reason is different again... If all eight snapshots came from different factions, we can't interpret much... but if all four people censused the same faction, we actually start to see things.

Next going to your example...

You ran into a brick wall, head first. :( Your example already assumes something you did not consider or chose to ignore. Running two censuses at 2 AM will find a very small number of characters. Both the scan at 5 PM on Saturday and the one at 12 on Sunday will most likely find (a lot) more people online.

The two at 2 AM have high probability of overlap, so the data quantity will not increase much from the first to the second. The scan from 5 PM will probably be bigger than the one from noon Sunday... even so, in my opinion the following is valid: The superior information is provided by the person who saw the most characters twice... and that person is more likely to be person B. For person A to produce superior information, person A would need to be on a server where almost noone who plays on Saturdays at 5 PM, plays on Sundas at noon.

I admit that situation might occur, but in my experience, it is unlikely.

Now on to the last paragraph...

Ummm... I am sorry, but just no. If person A saw 3000 different characters once, he provided less information with equal accuracy/quality. And if he saw the same person 3000 times, well, the data is super accurate/of great quality... BUT... it is basically useful to only one person. Person B can't win the quality/accuracy race, but his data is useful to more than 3000 people, no matter what person A did.

You see, in my opinion, the best census data provider is the one with the best information to censuses ran ratio (as long as the number of censuses run > 1). Person A runs 10 censuses at 100 units of new or updated info and scores a total of 1000 new & updated, person B runs 100 censuses getting 10 units of new & updated for an equal total of 1000, while person C runs 1000 censuses and gets 1 new or updated each time for a total of 1000 as well.

Person A did the best job with a ratio of 100 to 1, person B managed a 10 to 1... and person C really should have done something differently with his 1 to 1 ratio.

Now the question remains... if person D runs 1 census and scores a total 1000 as well... is the data worthless? In my opinion, no... On its own it is of low quality and accuracy, but it still benefits 1000 people slightly.

This conclusion might also interest you... I may hold both top 10 first spots, both overall and montly and there probably aren't very many people here who do 1200+ censuses per month... but there is a very large probability (approximately 99,998% if I calculated correctly ;D ) that the best censuser is someone entirely else, most likely someone we can't see on the lists at all.

Why did I end to this conclusion? Simple. Given the same amount of time, someone else could have run more censuses AND collected more (new & updated) data.

This is turning into another wall of text, so I am going to take a break... and reread the thing a bit later, in order to try to spot my possible goofs. :/

BambisFaline

Post by BambisFaline »

1974ER wrote:This is getting interesting, for which I thank you. :D
Statistical theory is fun, isn't it? :)
Now... unfortunately, I disagree with almost the entire content of your post. However, I am not saying that you wrong, just that I have a highly different opinion.
Well, I think the difficulty lies in what data you are looking for.

If you want to know who lives on a realm, then certainly a sample run during times of high population will result in more data available for analysis. In this case, more toons seen is better.

However, if you want to know who is living on a realm at what time, then the number of toons seen is irrelevant to the discussion. Your data will be better if you take your samples at the same time of the day.

However (again), in each case, more samples run is always better. The fewer snapshots you take of the population, the more susceptible your analysis of the data is to aberrant fluctuations. So again, running a single sample during a time of high population produces less accurate data of the population of the server than someone who takes multiple samples at less populous times. That's simply how statistics work.
This is turning into another wall of text, so I am going to take a break... and reread the thing a bit later, in order to try to spot my possible goofs. :/
Yeah, that was a bit more than I could read in this sitting. But again, it boiled down to what someone considers to be quality of data. That, as I pointed out, depends on what you are looking for. Who lives on the server or when they live on it? Ultimately, the more samples someone takes will always be superior to someone who takes fewer samples. Again, that is simply how statisticians work.

1974ER
Epic Censi
Posts: 762
Joined: Fri Nov 07, 2008 3:30 am

Post by 1974ER »

:D:D:D

It's not a difficulty... It is a choise and after reading your latest post, I am seeing that you have just made a different one than I have.

Correct, on this we agree 100%.

This one is where we go separate ways... :D You see, in my opinion, the number of toons is not irrelevant. But that is because we have different aims.

Yes, correct, more samples is always good. However, the difference is that your main interest is in "when", mine is in "who".

This paragraph starts with something I can again agree with. But the end is different, due to our differing views. You aim for the best quality, I for the best quantity.

I apologize, I am prone to producing "walls", given sufficient time, as I am actually a rather slow typist. I tried to write the above to correspond to your paragraphs in order not to need quote (almost) everything.

Disclaimer, I ran out of time, I need to leave for work. I will post this as is and heavily edit it after I get back from work.

EDIT: I had time to think about things while I was at work... :D

I am going to give you an real world example, why your thinking can lead to funny results. I am E, one of my bosses is M1, my colleagues are M2, M3, M4, H, K, T1 and T2. This is how last week looked like:

Monday through Thursday, 07:35, M1, K, T1 and T2 were present, Friday 07:35, K, T1 and T2 were present.

Monday through Thursday, 13:55, M1, K and T1 were present, as well as H, M2 and M3, Friday E, K, T1, H, M3 and M4 were present. Additionally, E was present at the time on Tuesday, M4 on Wednesday.

That is the equalent of 10 snapshots worth of data. Now, I am going to let you pick, Faline... Do you wish to get four more snapshots (yesterday and today at 07:35 and 13:55) or one from last Thursday at 11:00?

I will wait for your answer before posting more. ;) :D

BambisFaline

Post by BambisFaline »

You are correct that "more samples is always good".

You are incorrect that "best quality" is independent of "best quantity".

The only way you know that a certain time produces a high population is...wait for it...multiple samples run at the same time. :)

The only way you know that a high population isn't transient is...here it is again...multiple samples run at the same time.

There is nothing wrong with running population surveys during times of high population. You certainly will learn more as to who occupies a server. But only multiple samples taken over time will yield accurate results.

Therefore, multiple surveys provide better data. That is the bottom line of statistics. There is no opposing position to that which is accepted in the scientific arena.

1974ER
Epic Censi
Posts: 762
Joined: Fri Nov 07, 2008 3:30 am

Post by 1974ER »

Quote: "You are incorrect that "best quality" is independent of "best quantity"."

I don't think I said that and if you understood something I said to mean that then I failed at expressing myself. :( The two are definetly not independent, but due to limited data (we can't run a near infinite number of censuses) they are partially mutually exclusive, which is a completely different thing.

See further below for more.

Quote: "The only way you know that a certain time produces a high population is...wait for it...multiple samples run at the same time." Ouch... Wrong, wrong, wrong... at least on the EU realms. I have not examined the US realms as closely, but all EU servers I have examined more carefully show similar (mind you, similar, not identical) patterns. Populations are often at their lowest from around 02 to 04, at their highest around approximately 19 to 22.

Ergo, even if I run a census on a server that currently has zero entries, if I get to pick the time, I will pick somewhere around 19.30-22.00 as my first choise. And the only way to know that the highest population is not transient (peaking at 23.30, for example) is to run multiple samples covering DIFFERENT times, not just one.

Running two censuses at 19, two at 20 and two at 21 gives better quantity knowledge that running six at 20. Running six at 20 gives better quality knowledge on those characters that are seen more than twice. The two + two + two approach also only takes 2 days, instead of 6. The person going the 2+2+2 route can't exceed quality 2 (seen twice at the same time), but the person going the 6 route is going to zero out on some characters, as they are never on during his single daily census. On the other hand, he will see some people 3, 4, 5 or 6 times, making the quality of data on those characters higher.

Person taking the 2+2+2 route will also have some insight as to whether online population is going up, down or staying about the same between 19 and 21... the 6 router has zero knowledge here. As I said, we have different aims, you want the highest possible quality, I am willing to sacrifice some quality in order to guarantee a greater number of people at least something.

That is what I was talking about earlier.

You see, I think... and I do apologize if I am utterly wrong, that given the following situation we would pick completely opposite approaches:

Server x has 100 Alliance entries, 10 Horde entries.

I assume you would go for Alliance, seeing that is has strong data, which would make it easy to increase the the data quality. I on the other hand would probably go: "Horde has miserable amount of data, I should fix that."

Going further, if we were given 30 days and told we are only allowed one census per day, you would assumably pick the time which had the most old entries and fell within you playing time. You would then proceed to run the daily census on the same strike of the clock every day to the best of your ability. And this even if the lesser quality data suggested that the peak population might be several hours off of your chosen time.

I, on the other hand would spread out a bit... most likely picking probable highest times for the first few days to feel things out and then variate a bit, maybe for example going 19, 20, 21, 19, 20, 21, etc...

At the end of 30 day period... the likely result would be (assuming the server is not horribly unbalanced) that we have both ran 30 censuses, I have seen more new characters, updated a few more, tied or lost just by a little, and you would have more "quality 30 characters" than I would.

The last line I agree with. :D And it seems I produced another "wall", sorry. :(

EDIT: Corrected a few minor typos. :)

BambisFaline

Post by BambisFaline »

Again, it simply depends on what you are looking for. If you want to know who inhabits a server, then taking a survey during a time of high population would give you a better picture than taking a survey during a time of low population.

In any case, however, only multiple surveys can give better data.

Which is why, going back to the OP, I think generating a stat as to how many surveys a person runs, would be an interesting stat to have. I further position that someone who runs multiple surveys provides better data than someone who provides fewer. There are caveats to that as you have pointed out. But ultimately it is only through multiple surveys that we can verify data as accurate. Now, you can go further and say that multiple surveys done during times of high population are more valuable than multiple surveys done during times of low population. I certainly can't argue with that if your intention is solely population makeup. But in either case, you can't get away from the "multiple surveys needed" aspect of it.

I'll go further and say that the stat displayed such as "WR Updates: 28,212" (my current stat) is disingenuous as it tends to give the impression that someone who only has 1,000 WR updates has done less work than my 27,212, even though he may have run 1,000 surveys on an empty server (besides himself) while I ran only 27 surveys during a time of 1K population.

An even more interesting stat (here may be another suggestion) is a ratio stat. How many updates / number of surveys turned in. That, it would seem to me, to be a more accurate number as to the amount of work a person does for WR as it breaks down into an average how many toons someone is seeing, giving credit to both high pop and high survey runs.

Post Reply