GO::TermFinder 0.7 review

Download
by rbytes.net on

GO::TermFinder is a Perl module that can identify GO nodes that annotate a group of genes with a significant p-value. This package

License: Perl Artistic License
File size: 848K
Developer: Gavin Sherlock, Elizabeth Boyle and Ihab Awad
0 stars award from rbytes.net

GO::TermFinder is a Perl module that can identify GO nodes that annotate a group of genes with a significant p-value.

This package is intended to provide a method whereby the P-values of a set of GO annotations can be determined for a set of genes, based on the number of genes that exist in the particular genome (or in a selected background distribution from the genome), and their annotation, and the frequency with which the GO nodes are annotated across the provided set of genes.

The P-value is simply calculated using the hypergeometric distribution as the probability of x or more out of n genes having a given annotation, given that G of N have that annotation in the genome in general. We chose the hypergeometric distribution (sampling without replacement) since it is more accurate, though slower to calculate, than the binomial distibution (sampling with replacement).

In addition, a corrected p-value can be calculated, to correct for multiple hypothesis testing. The correction factor used is the total number of nodes to which the provided list of genes are annotated, excepting any nodes which have only a single annotation in the background, as a priori, we know that these cannot be significantly enriched.

The client has access to both the corrected and uncorrected values. It is also possible to correct the p-value using 1000 simulations, which control the Family Wise Error Rate - using this option suggests that the Bonferroni correction is in fact somewhat liberal, rather than conservative, as might be expected. Finally, the False Discovery Rate can also be calculated.

The general idea is that a list of genes may have been identified for some reason, e.g. they are coregulated, and TermFinder can be used to find out if any nodes annotate the set of genes to a level which is extremely improbable if the genes had simply been picked at random.

Requirements:
Perl

GO::TermFinder 0.7 search tags