Search::ContextGraph 0.15 review

Download
by rbytes.net on

Search::ContextGraph is a Perl module for spreading activation search engine. SYNOPSIS use Search::ContextGraph; my $cg

License: GPL (GNU General Public License)
File size: 93K
Developer: Maciej Ceglowski
0 stars award from rbytes.net

Search::ContextGraph is a Perl module for spreading activation search engine.

SYNOPSIS

use Search::ContextGraph;

my $cg = Search::ContextGraph->new();

# first you add some documents, perhaps all at once...

my %docs = (
'first' => [ 'elephant', 'snake' ],
'second' => [ 'camel', 'pony' ],
'third' => { 'snake' => 2, 'constrictor' => 1 },
);

$cg->bulk_add( %docs );

# or in a loop...

foreach my $title ( keys %docs ) {
$cg->add( $title, $docs{$title} );
}

# or from a file...

my $cg = Search::ContextGraph->load_from_dir( "./myfiles" );

# you can store a graph object for later use

$cg->store( "stored.cng" );

# and retrieve it later...

my $cg = ContextGraph->retrieve( "stored.cng" );


# SEARCHING

# the easiest way

my @ranked_docs = $cg->simple_search( 'peanuts' );


# get back both related terms and docs for more power

my ( $docs, $words ) = $cg->search('snake');


# you can use a document as your query

my ( $docs, $words ) = $cg->find_similar('First Document');


# Or you can query on a combination of things

my ( $docs, $words ) =
$cg->mixed_search( { docs => [ 'First Document' ],
terms => [ 'snake', 'pony' ]
);


# Print out result set of returned documents
foreach my $k ( sort { $docs->{$b} $docs->{$a} }
keys %{ $docs } ) {
print "Document $k had relevance ", $docs->{$k}, "n";
}

# Reload it
my $new = Search::ContextGraph->retrieve( "filename" );

Spreading activation is a neat technique for building search engines that return accurate results for a query even when there is no exact keyword match. The engine works by building a data structure called a context graph, which is a giant network of document and term nodes. All document nodes are connected to the terms that occur in that document; similarly, every term node is connected to all of the document nodes that term occurs in. We search the graph by starting at a query node and distributing a set amount of energy to its neighbor nodes. Then we recurse, diminishing the energy at each stage, until this spreading energy falls below a given threshold. Each node keeps track of accumulated energy, and this serves as our measure of relevance.

This means that documents that have many words in common will appear similar to the search engine. Likewise, words that occur together in many documents will be perceived as semantically related. Especially with larger, coherent document collections, the search engine can be quite effective at recognizing synonyms and finding useful relationships between documents. You can read a full description of the algorithm at http://www.nitle.org/papers/Contextual_Network_Graphs.pdf.

The search engine gives expanded recall (relevant results even when there is no keyword match) without incurring the kind of computational and patent issues posed by latent semantic indexing (LSI). The technique used here was originally described in a 1981 dissertation by Scott Preece.

Requirements:
Perl

Search::ContextGraph 0.15 search tags