Bio::Seq 1.4 review
DownloadBio::Seq is a sequence object, with features. SYNOPSIS # This is the main sequence object in Bioperl # gets a sequen
|
|
Bio::Seq is a sequence object, with features.
SYNOPSIS
# This is the main sequence object in Bioperl
# gets a sequence from a file
$seqio = Bio::SeqIO->new( '-format' => 'embl' , -file => 'myfile.dat');
$seqobj = $seqio->next_seq();
# SeqIO can both read and write sequences; see Bio::SeqIO
# for more information and examples
# get from database
$db = Bio::DB::GenBank->new();
$seqobj = $db->get_Seq_by_acc('X78121');
# make from strings in script
$seqobj = Bio::Seq->new( -display_id => 'my_id',
-seq => $sequence_as_string);
# gets sequence as a string from sequence object
$seqstr = $seqobj->seq(); # actual sequence as a string
$seqstr = $seqobj->subseq(10,50); # slice in biological coordinates
# retrieves information from the sequence
# features must implement Bio::SeqFeatureI interface
@features = $seqobj->get_SeqFeatures(); # just top level
foreach my $feat ( @features ) {
print "Feature ",$feat->primary_tag," starts ",$feat->start," ends ",
$feat->end," strand ",$feat->strand,"n";
# features retain link to underlying sequence object
print "Feature sequence is ",$feat->seq->seq(),"n"
}
# sequences may have a species
if( defined $seq->species ) {
print "Sequence is from ",$species->binomial_name," [",$species->common_name,"]n";
}
# annotation objects are Bio::AnnotationCollectionI's
$ann = $seqobj->annotation(); # annotation object
# references is one type of annotations to get. Also get
# comment and dblink. Look at Bio::AnnotationCollection for
# more information
foreach my $ref ( $ann->get_Annotations('reference') ) {
print "Reference ",$ref->title,"n";
}
# you can get truncations, translations and reverse complements, these
# all give back Bio::Seq objects themselves, though currently with no
# features transfered
my $trunc = $seqobj->trunc(100,200);
my $rev = $seqobj->revcom();
# there are many options to translate - check out the docs
my $trans = $seqobj->translate();
# these functions can be chained together
my $trans_trunc_rev = $seqobj->trunc(100,200)->revcom->translate();
A Seq object is a sequence with sequence features placed on it. The Seq object contains a PrimarySeq object for the actual sequence and also implements its interface.
In Bioperl we have 3 main players that people are going to use frequently
Bio::PrimarySeq - just the sequence and its names, nothing else.
Bio::SeqFeatureI - a location on a sequence, potentially with a sequence
and annotation.
Bio::Seq - A sequence and a collection of sequence features
(an aggregate) with its own annotation.
Although Bioperl is not tied heavily to file formats these distinctions do map to file formats sensibly and for some bioinformaticians this might help
Bio::PrimarySeq - Fasta file of a sequence
Bio::SeqFeatureI - A single entry in an EMBL/GenBank/DDBJ feature table
Bio::Seq - A single EMBL/GenBank/DDBJ entry
By having this split we avoid a lot of nasty circular references (sequence features can hold a reference to a sequence without the sequence holding a reference to the sequence feature). See Bio::PrimarySeq and Bio::SeqFeatureI for more information.
Ian Korf really helped in the design of the Seq and SeqFeature system.
Requirements:
Perl
Bio::Seq 1.4 keywords