Previous Next Index Thread

IDEAS - HashNet, the future is here

Sent from: acb@cs.monash.edu.au (Andrew C. Bulhak)

A year or two ago, when the AOL/Information Superhighway Hype Explosion
was just getting off the ground, before the Communications Decency Act,
when the biggest immediate worry was the Canter & Siegel book and the 
tide of spamming it promised to trigger, there was a discussion on ways
to prevent spamming.  The more mundane alternatives proposed were things
like hacks to news server software to detect spams using word-frequency
and edit-distance heuristics and drop them on the floor.  However, one
person proposed a system which, unlike Usenet, didn't have multiple 
fixed newsgroups, but in which articles were "categorised" by their
word-frequency signatures, and thus spamming was made impossible.
This was termed "HashNet".

In true Internet fashion, this has come to pass, without anybody 
deliberately implementing it.

Observe the following:

 a)  Many larger newsgroups are so full of drivel that reading them
     in the traditional sense is right out.

 b)  The name of a newsgroup is in many cases no indicator of its content.
     (For example, if one ventures into comp.emulators.ms-windows.wine,
     one is as likely to find a "Know any X Windows servers for Windows???!"
     post, or a "Mac emulators for DOS" post, or a paranoid rant, as anything
     else.)

 c)  Search engines like Altavista allow articles to be chosen by 
     matching terms, constraints, etc., across newsgroups.

The logical progression from this is a system where Altavista-like
search engines exist ad many sites and are used as the primary interface.
Eventually in this scenario, the concept of newsgroups will be abandoned
altogether, and news site software will be replaced with servers which
internally index the posts according to keywords and retrieve them using
Altavista-type search engine interfaces.  Of course, these will need to 
run on fast machines, but such machines are becoming commonplace enough.

-- 
This message should be considered as the output of a badly-configured 
AI program.

[ mod's note: http://www.fringeware.com/HTML/memetics.html#memelex
  and our email list keywords are a step in this direction... ]