News::Archive 0.14 review

License: Perl Artistic License
Developer: Tim Skirvin
News::Archive is a Usenet news archiving package for downloading and later accessing news articles in bulk.

It can load articles laid out in INN format, retrieve them from a running news server, or just take articles one-by-one. News::Archive module is compatible with News::Web and Net::NNTP::Server, so the articles can be shared either via the Web or via NNTP.


use News::Archive;
my $archive = new News::Archive
( 'basedir' => '/home/tskirvin/kiboze' );

# Get a news article
my $article = News::Article->new(*STDIN);
my $msgid = article->header('message-id');

die "Already processed '$msgid'n"
if ($archive->article( $messageid ));

# Get the list of groups we're supposed to be saving the article into
my @groups = split('s*,s*', $article->header('newsgroups') );
map { s/s+//g } @groups;

# Make sure we're subscribed to these groups
foreach (@groups) { $archive->subscribe($_) }

# Actually save the article.
my $ret = $archive->save_article(
[ @{$article->rawheaders}, '', @{$article->body} ], @groups );
$ret ? print "Accepted article $messageidn"
: print "Couldn't save article $messageidn";

News::Archive keeps several files to keep track of its archives:

active file

Keeps track of all newsgroups we are "subscribed" to and all of the information that changes regularly - the number of articles we have archived, the current first and last article numbers, etc.

Watched over with News::Active.

history database

A simple database keeping track of articles by Message-ID. Makes access by ID easy, and ensures that we don't save the same article twice. The database chosen to maintain these is user-determined.

newsgroup file

Keeps track of more static information about the newsgroups we are subscribed to - descriptions, creation dates, etc.

Watched over with News::GroupInfo.

archive directory

Directory structure of all articles, with each article saved as a single textfile within a directory structure laid out at one section of the group name per directory, such as "rec/games/mecha". Crossposts are hardlinked to other directory structures.

Articles are actually divided into sub-directories containing up to 500 articles, to avoid Unix directory size performance limitations. Individual files are thus stored in a file such as "rec/games/mecha/1.500/1".

Each newsgroup also contains overview information, watched over with
News::Overview. This overview file goes in the top of the structure,
such as "rec/games/mecha/.overview".

You may note that these files are very similar to how INN does its work. This is intentional - this package is meant to act in many ways like a lighter-weight INN.


Global Variables

The following variables are set within News::Archive, and are global throughout all invocations.

Default value for "debug()" in new objects.

Default value for "hostname()" in new objects. Obtained using

The number of articles to keep in each directory. Default is 500;
change this at your own peril, since things may get screwed up later
if you change it after archiving any articles!

