GSF Logo GSF mips Logo mips
mips

services

BFAB

 

 

BFAB Help

 

BFAB

help

download

available analyses
Best BLAST

Best PSI-BLAST
Best FASTA
deposit your own stuff

developers

mips home      mail to webmaster    print view

 

 

Annotation Goal

BFAB data include four genomes, Bacillus subtilis, Helicobacter pylori, Listeria innocua and Listeria monocytogenes annotated using MIPS FunCat. FunCat consists of 28 main functional categories (or branches) that cover general fields like cellular transport, metabolism and signal transduction. The main branches exhibit a hierarchical, tree like structure with up to six levels of increasing specificity with a total of 1307 categories. A most general annotation may predict just the upper level of MIPS FunCat categories, while the most detailed may predict all sub-leveles. The BFAB data were annoated using a total number of 419 distinct categories. Two categories, 99 and 98 correspond to incomplete annotations, and should not be used for annotation. An estimation of the annotation performance for proteins that have similar but not exactly matching annotation represent some challenge. The script simpleStatistic.pl measures the performance of annotation by counting the number of all non-redundant subcategories, i.e. 01, 01.01, 01.01.01, etc. Other statistical performances, i.e. annotation accuracy at each level of Funcat may be used also.

Leave One Genome out Schema

We suggest to calculate results using the leave one genome out schema, i.e. to predict each genome using the remaining ones. Notice, since both Listeria innocua and Listeria monocytogenes genomes are very similar, they should not be used to predict one another (i.e., only two genomes, Bacillus subtilis, Helicobacter pylori, should be used as the training set for Listeria).

Data XML file

The sequence parameters were calculated using the same program for all genes. These data are enclosed in the <GENE> tags. The genes from one genome are in the <GENOME> tags. The description of each parameter includes information on the program used to perform the analysis and a reference to it (publication reference or/and WWW link), arguments used to run the program (if any). Please, include the references in your publication if some of these parameters are used in your study.

Result XML file

The result file includes a prediction for all four genomes. The prediction of each functional category should be followed by a confidence value indicated as a <CONFIDENCE> tag. This tag can be used to calculate more advance statistic, e.g. ROC curves.

Statistics

A Perl script, simpleStatistic.pl, can be used to calculate statistical results for the annotation. The script receives as input the Results and Data XML files. The calculated statistical results are stored in the output XML file.

If you have any question concerning data available at this site or suggestion to improve it, please, contact us!
See also explanation.txt which we wrote to a student who does not have any idea about XML (as we were sometimes ago :)).

 

 

  (c) 2005-2008 GSF - Forschungszentrum für Umwelt und Gesundheit, GmbH Ingolstädter Landstraße 1, D-85764 Neuherberg































eXTReMe Tracker