Brad Wardell's views about technology, politics, religion, world affairs, and all sorts of politically incorrect topics.
statistical analsys makes the difference between success and bankruptcy
Published on November 18, 2003 By Draginol In OS Customization

There are lies, damn lies, and statistics. It's true. Statistics can be used to pass off the most erroneous things as seemingly factual. Those of us who base our business plans on statistics have to be very careful not to let statistics mislead us. Over the years, I have made statistical analysis a key part of my core competency. I track all manner of statistics and have learned how to mush them together to form reasonably accurate conclusions.

Here are 4 tables.

May-01 DLs
WindOS Soft 911
MSWindowsXP 794
Nascent 703
LunaUIXP V2 570
Gemin 567
Jul-02 DLs
Midnite 1480
Serene II 1407
Industrial Disease 815
Kismet 466
Win2k Extended 403
Feb-03 DLs
XP Prime 2795
C-N Red 2027
Woodworks 1055
Axion_CCOlor 687
Vgreen 568
Nov-03 DLs
Longorn Slate 2121
Atlantek 2 1036
Flooter WB 867
NHA Thy 796
Shadow 2.0 665

So what are these tables you're thinking? These are the top 5 WindowBlinds skin downloads for a particular day in May 2001, July 2002, February 2003, and November 2003. I want to know if the overall user base of WindowBlinds is increasing or decreasing.

As anyone who does statistical analysis can tell you, if you are trying to determine the # of users, you cannot go by the top downloads. Ideally, you would use the mean number of downloads. Unfortunately, that would take too long. And I have found an imperfect mechanism for getting a "rough" user base guestimation.

Throwing out #1 and #2 from each allows us to eliminate skins that might have been highlighted somewhere or been linked by someone or had other extenuating circumstances.  Instead, places 3-5 added together will make our rough estimate but only as the start of our statistical analysis journey:

So, places 3 through 5 for each one:

May-01 1840
Jul-02 1684
Feb-03 2310
Nov-03 2328

Better but there is still a problem. Look at the months. Web site activity varies greatly by time of year. The peek months are usually October through March. The dead time is June and July.  So we have to bias thee results a bit:

Date DLs Fudget Factor Modifed Result
May-01 1840 1 1840
Jul-02 1684 1.1 1852
Feb-03 2310 0.9 2079
Nov-03 2328 0.9 2095

Now we have a better idea. If you saw the original results, the tempting thing to do would be to say "Hey look, November has the highest stats ever, We're kicking ass!" But that would be a delusion, a dangerous one. November is a peek traffic month just like February is.  Whereas the earlier stats were from the Spring and Summer months.  After biasing them, the results don't look quite as impressive but they do show consistent growth.  From July 2002 the WindowBlinds population has increased by around 12%.

Of course, even here there are other issues, such as the day to day availability of skins. Turns out that on average, 1.5 new visual styles are added each day to the world of WindowBlinds. But the quality can vary from week to week. But you make due with what you got.

I've tracked most of the other programs, competitors and unrelated programs alike to look for trends. This technique definitely works well as a basic guide for popularity. Which gets back to my point -- if you want to be an entrepreneur, you have to wear many hats. I attribute much of our success to knowing the market and making projections based on accurate statistical models. Those who ignore the statistics or don't know how to accurately use them are likely to end up bankrupt in the long term.

 


Comments
on Nov 21, 2003
Very interesting Draginol. Nice approach to get quick results out of the data.

As a scientist I analyse data on a daily basis and must highlight some of the important logical steps you made (not suggesting they are wrong, just they are important)

1) removing #1 and #2. This is a fairly dangerous game as it assumes that every month the 1st 2 download numbers are artificially high. If this is the case then some months maybe only 1 download or even 3 downloads could be artificial. It also has the effect of potentially removing the users who only download 1 or 2 skins a month. This makes it very hard to compare individual months against each other. If you have (and I assume you do) a longer timebase over which you have seen this 1+2 download number skewing then the comparison is useful, but only as a longterm trend and not as a month to month comparison.

2) Fudge factor. Probably totally true, but would have loved to see the data which supports this. Surely you could take the past few years of downloads and work out individual monthly fudge factors.

One way to improve your analysis, while trying not to waste too much time, would be to trace links. I assume as well as the downloads you also get information as to where the link to the download (or the download web page) came from. Using this method you could restrict numbering to a few selected links that are not flavour of the month. It would also let you know the number of distinct ip sources

Another good method of improving the statistics without taking too much time is to use a running average. This allows you to create far more data points thus using the square root dependance of the error to improve the accuracy. A 3 to 6 month running average including every months data should be fairly accurate (including 1+2 downloads).

Paul.
on Nov 10, 2004
Same comments as solitaire, you method is probably a rough estimation and your personnal knowledge of the area give you certainly some advantages to read it. But the sample is small and hard to say if it is really representative. I'm not comfortable with the value of the fudge factor ant how you actually choose them. But i guess you are just looking for trends more than accurate results...
on Nov 24, 2004
http://shop.gourmondo.de/espresso.asp http://shop.gourmondo.de/lebensmittel.asp http://shop.gourmondo.de/gourmondo.asp http://shop.gourmondo.de/tee.asp
on Nov 24, 2004
http://shop.rossmann.de/rossmann_de.asp http://shop.rossmann.de/parfuem.asp http://shop.rossmann.de/rossmann.asp http://shop.rossmann.de/rosman.asp