There are lies, damn lies, and statistics. It's true. Statistics can be used
to pass off the most erroneous things as seemingly factual. Those of us who base
our business plans on statistics have to be very careful not to let statistics
mislead us. Over the years, I have made statistical analysis a key part of my
core competency. I track all manner of statistics and have learned how to mush
them together to form reasonably accurate conclusions.
Here are 4 tables.
May-01 |
DLs |
WindOS Soft |
911 |
MSWindowsXP |
794 |
Nascent |
703 |
LunaUIXP V2 |
570 |
Gemin |
567 |
Jul-02 |
DLs |
Midnite |
1480 |
Serene II |
1407 |
Industrial Disease |
815 |
Kismet |
466 |
Win2k Extended |
403 |
Feb-03 |
DLs |
XP Prime |
2795 |
C-N Red |
2027 |
Woodworks |
1055 |
Axion_CCOlor |
687 |
Vgreen |
568 |
Nov-03 |
DLs |
Longorn Slate |
2121 |
Atlantek 2 |
1036 |
Flooter WB |
867 |
NHA Thy |
796 |
Shadow 2.0 |
665 |
So what are these tables you're thinking? These are the top 5 WindowBlinds
skin downloads for a particular day in May 2001, July 2002, February 2003, and
November 2003. I want to know if the overall user base of WindowBlinds is
increasing or decreasing.
As anyone who does statistical analysis can tell you, if you are trying to
determine the # of users, you cannot go by the top downloads. Ideally, you would
use the mean number of downloads. Unfortunately, that would take too
long. And I have found an imperfect mechanism for getting a "rough" user base
guestimation.
Throwing out #1 and #2 from each allows us to eliminate skins that might have
been highlighted somewhere or been linked by someone or had other extenuating
circumstances. Instead, places 3-5 added together will make our rough
estimate but only as the start of our statistical analysis journey:
So, places 3 through 5 for each one:
May-01 |
1840 |
Jul-02 |
1684 |
Feb-03 |
2310 |
Nov-03 |
2328 |
Better but there is still a problem. Look at the months. Web site activity
varies greatly by time of year. The peek months are usually October through
March. The dead time is June and July. So we have to bias thee results a
bit:
Date |
DLs |
Fudget Factor |
Modifed Result |
May-01 |
1840 |
1 |
1840 |
Jul-02 |
1684 |
1.1 |
1852 |
Feb-03 |
2310 |
0.9 |
2079 |
Nov-03 |
2328 |
0.9 |
2095 |
Now we have a better idea. If you saw the original results, the tempting
thing to do would be to say "Hey look, November has the highest stats ever,
We're kicking ass!" But that would be a delusion, a dangerous one. November is a
peek traffic month just like February is. Whereas the earlier stats were
from the Spring and Summer months. After biasing them, the results don't
look quite as impressive but they do show consistent growth. From July
2002 the WindowBlinds population has increased by around 12%.
Of course, even here there are other issues, such as the day to day
availability of skins. Turns out that on average, 1.5 new visual styles are
added each day to the world of WindowBlinds. But the quality can vary from week
to week. But you make due with what you got.
I've tracked most of the other programs, competitors and unrelated programs
alike to look for trends. This technique definitely works well as a basic guide
for popularity. Which gets back to my point -- if you want to be an
entrepreneur, you have to wear many hats. I attribute much of our success to
knowing the market and making projections based on accurate statistical models.
Those who ignore the statistics or don't know how to accurately use them are
likely to end up bankrupt in the long term.