News::Archive 0.14 review
DownloadNews::Archive is a Usenet news archiving package for downloading and later accessing news articles in bulk. It can load articles l
|
|
News::Archive is a Usenet news archiving package for downloading and later accessing news articles in bulk.
It can load articles laid out in INN format, retrieve them from a running news server, or just take articles one-by-one. News::Archive module is compatible with News::Web and Net::NNTP::Server, so the articles can be shared either via the Web or via NNTP.
SYNOPSIS
use News::Archive;
my $archive = new News::Archive
( 'basedir' => '/home/tskirvin/kiboze' );
# Get a news article
my $article = News::Article->new(*STDIN);
my $msgid = article->header('message-id');
die "Already processed '$msgid'n"
if ($archive->article( $messageid ));
# Get the list of groups we're supposed to be saving the article into
my @groups = split('s*,s*', $article->header('newsgroups') );
map { s/s+//g } @groups;
# Make sure we're subscribed to these groups
foreach (@groups) { $archive->subscribe($_) }
# Actually save the article.
my $ret = $archive->save_article(
[ @{$article->rawheaders}, '', @{$article->body} ], @groups );
$ret ? print "Accepted article $messageidn"
: print "Couldn't save article $messageidn";
News::Archive keeps several files to keep track of its archives:
active file
Keeps track of all newsgroups we are "subscribed" to and all of the information that changes regularly - the number of articles we have archived, the current first and last article numbers, etc.
Watched over with News::Active.
history database
A simple database keeping track of articles by Message-ID. Makes access by ID easy, and ensures that we don't save the same article twice. The database chosen to maintain these is user-determined.
newsgroup file
Keeps track of more static information about the newsgroups we are subscribed to - descriptions, creation dates, etc.
Watched over with News::GroupInfo.
archive directory
Directory structure of all articles, with each article saved as a single textfile within a directory structure laid out at one section of the group name per directory, such as "rec/games/mecha". Crossposts are hardlinked to other directory structures.
Articles are actually divided into sub-directories containing up to 500 articles, to avoid Unix directory size performance limitations. Individual files are thus stored in a file such as "rec/games/mecha/1.500/1".
Each newsgroup also contains overview information, watched over with
News::Overview. This overview file goes in the top of the structure,
such as "rec/games/mecha/.overview".
You may note that these files are very similar to how INN does its work. This is intentional - this package is meant to act in many ways like a lighter-weight INN.
Usage:
Global Variables
The following variables are set within News::Archive, and are global throughout all invocations.
$News::Active::DEBUG
Default value for "debug()" in new objects.
$News::Active::HOSTNAME
Default value for "hostname()" in new objects. Obtained using
"Sys::Hostname::hostname()".
$News::Active::HASH
The number of articles to keep in each directory. Default is 500;
change this at your own peril, since things may get screwed up later
if you change it after archiving any articles!
News::Archive 0.14 search tags