Path: senator-bedfellow.mit.edu!bloom-beacon.mit.edu!newsfeed.stanford.edu!logbridge.uoregon.edu!newsfeed.direct.ca!look.ca!news.noc.cabal.int!resurrector!guidorepost!not-for-mail
From: [email protected] (Rob Maxwell)
Subject: REPOST: [FAQ] Gathering Traffic Data for Proposed Newsgroups
Newsgroups: alt.config,news.groups,alt.answers,news.answers
X-Repost-Date: 7 Dec 2001 02:33:41 GMT
Message-ID: <7$--$$%[email protected]>
X-Original-Path: sn-us!sn-xit-03!supernews.com!news-out.visi.com!hermes.visi.com!nycmny1-snh1.gtei.net!news.gtei.net!bloom-beacon.mit.edu!senator-bedfellow.mit.edu!dreaderd!not-for-mail
X-Original-Message-ID: <usenet/creating-newsgroups/[email protected]>
X-Original-NNTP-Posting-Host: penguin-lust.mit.edu
Supersedes: <usenet/creating-newsgroups/[email protected]>
Expires: 17 Jan 2002 11:39:34 GMT
X-Last-Updated: 2001/10/31
Organization: none
Followup-To: alt.config,news.groups
Approved: [email protected]
Originator: [email protected]
Date: 03 Dec 2001 11:40:48 GMT
X-Trace: 1007379648 senator-bedfellow.mit.edu 3963 18.181.0.29
Sender: [email protected] (Guido the Resurrector)
X-Reposted-By: [email protected] (Guido the Resurrector)
X-Comments: GtR Repost: The following Usenet article was cancelled, more
X-Comments: than likely by someone other than the original poster.  Please
X-Comments: see the end of this posting for a copy of the cancel.
X-Comments: Guido the Resurrector can be contacted at
X-Comments: [email protected].
Lines: 128
Xref: senator-bedfellow.mit.edu alt.config:307036 news.groups:437134 alt.answers:59242 news.answers:220474

Archive-name: usenet/creating-newsgroups/justification
Last-modified: 10 June 2001
Posting-Frequency: Monthly (on the 1st)
URL: http://www.alt-config.org/justification.htm
Maintainer: Rob Maxwell <[email protected]>
Disclaimer: Approval for *.answers is based on form, not content.

Gathering Traffic Data for Proposed Newsgroups
Or
How to use Google Groups

The traditional expectation that a newsgroup justify its existence by virtue
of existing Usenet traffic goes back to the earliest days. It precedes the
birth of alt.*, the Great Renaming that bought forth the Big 7 (later the
more familiar Big 8 with the creation of the humanities.* hierarchy in 1995),
and even the rise and eventual fall of the backbone Cabal.

In the early 1980s, if discussion of a topic became significant enough, a new
newsgroup was created to centralize the discussion. With only a relatively
few corporate and university mainframes providing the Unix Users' Network
(Usenet) to a similarly few readers it was fairly easy to see when a topic
was worthy of receiving its own newsgroup. Today with over three Gigabytes of
text-only discussion occurring on a daily basis coupled with the abuse of the
alt.* newsgroup creation process leading to a significant number of alt.*
newsgroups not being carried on any given news server it has become
effectively impossible to see when a topic becomes popular enough to warrant
a newsgroup of its own.

This is where Google Groups comes into the picture. It would start in 1995
when Deja decided to begin archiving Usenet text postings until 2000 when the
task became too overwhelming and expensive leading them to try different
things but ultimately their efforts would be futile leading to their sale of
their archive and name to the Internet search engine company Google. After a
rough start, Google was finally able to bring together Deja's massive archive
with their recent efforts at archiving Usenet under the name of Google Groups
<http://groups.google.com/>.

Getting started

The journey to Justification begins at Google Groups' Advanced Group Search
<http://groups.google.com/advanced_group_search>. What you will be looking
for is how often the topic is discussed in English on Usenet. The customary
method uses a search for the keyword or phrase being used over the last
ninety-days. The recommended quantity of on-topic posts is ten (10) per day
on average. For the sake of this demonstration we will be trying to justify
the ABC television show "20/20".

Start by typing 20/20 into "Find Messages with all of the words", change the
dropdown box from "10 messages" to "100 messages", Language Return messages
written in "any language" to "English", and Message Dates () Return messages
posted between 29 Mar 1995 to the date three months before today's date. A
visual example is available at: <http://www.alt-config.org/20-20a.gif>

The results for this search for "20/20" on 27 May 2001 produced these
results:

Relevant English Messages for 20/20 from 28 Feb 2001 to 27 May 2001 Results
1- 100 of about 12,400. <http://www.alt-config.org/20-20b.gif>

That averages out to 137.78 posts per day which clearly meets the 10 per day
recommendation, or does it?

Refining the search results

Taking a closer look at the 20/20 example shows that the first on-topic
mention of the show is the 14th search result. <http://www.alt-config.org/20-
20c.gif>

Although this is an extreme example which is badly contaminated by "%20"
which is a way of representing a space in a URL when of course spaces are not
allowed and is often in a search result URL which is seen in the third search
result for 20/20.

Repeating the search for 20/20 and adding "abc" it is on produces radically
different results:
Relevant English Messages for 20/20 abc from 28 Feb 2001 to 27 May 2001
Results 1-100 of about 374

Three hundred seventy-four averages out to a mere 04.16 posts per day coming
to less than half of the desirable results. <http://www.alt-config.org/20-
20d.gif>

This is why your initial search results must be checked carefully before
attempting to use them. First off, there is a known glitch in the software
Google acquired from Deja which usually does a poor (sometimes comically
poor) estimate of "about" how many results were found. A blatant example of
this was a search for "infertility insurance":

Relevant English Messages for "infertility insurance" from 18 Feb 2001 to 18
May 2001 Results 1 - 4 of about 6. <http://www.alt-config.org/20-20e.gif>

The quick way to see the actual totals or least enough to see if there is
justification which of course would be 900 on-topic messages over 90 days is
to scroll down to the bottom of the page (or press the [End] key) and double-
click the 9 under Goooooooooogle which will take you to the 901st message if
there is one. [Note: This is why "100 messages" is selected instead of the
default "10 messages".] The glitch is meaningless if the top line is:

Relevant English Messages for "_______" from 28 Feb 2001 to 27 May 2001
Results 901-1000 of about #,###.

Things to avoid

Most of the things that can falsely inflate results show up on the last
pages. A weekly Frequently Asked Questions (FAQ) on the topic or containing a
reference to same will produce 12-14 identical results with only one being
valid. Far worse then this is when the subject ends up in someone's signature
if they post a few messages per day they can create a few hundred false hits
in the 90 day period. A sig hit requires a search in the same time frame for
the author to determine the total number of hits the sig has caused and then
finding out the number of actual posts made on the subject being searched.

.. END ...

========= WAS CANCELLED BY =======:
Message-ID: <cancel.usenet/creating-newsgroups/[email protected]>
Control: cancel <usenet/creating-newsgroups/[email protected]>
Subject: cmsg cancel <usenet/creating-newsgroups/[email protected]>
From: [email protected] (Rob Maxwell)
Date: Thu, 6 Dec 2001 20:08:44 GMT
X-No-Archive: yes
Newsgroups: microsoft.test,comp.lang.c,news.groups
NNTP-Posting-Host: mail.jbcharles.com 63.227.23.121
Lines: 1
Path: news.uni-stuttgart.de!dns.phoenix-ag.de!newsfeed01.sul.t-online.de!t-online.de!fr.clara.net!heighliner.fr.clara.net!news.tele.dk!small.news.tele.dk!204.71.34.15!news-out.cwix.com!newsfeed.cwix.com!sjc-peer.news.verio.net!sea-feed.news.verio.net!news.verio.net!msrnewsc1!cppssbbsa01.microsoft.com!tkmsftngp01!tkmsftngp07!unacanceller
Xref: news.uni-stuttgart.de control:39747983

autocancel