|
|
 |
Annotation Goal |
BFAB data include four genomes,
Bacillus subtilis,
Helicobacter pylori,
Listeria innocua and
Listeria monocytogenes
annotated using MIPS FunCat.
FunCat consists of 28 main functional categories (or branches) that cover general
fields like cellular transport, metabolism and signal transduction. The main branches exhibit a hierarchical, tree like structure
with up to six levels of increasing specificity with a total of 1307 categories.
A most general annotation may predict just the upper level of MIPS FunCat categories, while the most detailed may predict all sub-leveles.
The BFAB data were annoated using a total number of 419 distinct categories. Two categories,
99 and
98 correspond to incomplete annotations,
and should not be used for annotation.
An estimation of the annotation performance for proteins that have similar but not exactly matching annotation represent some challenge.
The script simpleStatistic.pl measures the performance of annotation by counting the number of
all non-redundant subcategories, i.e. 01, 01.01, 01.01.01, etc. Other statistical performances, i.e. annotation accuracy at each level of Funcat may be used also.
| Leave One Genome out Schema |
We suggest to calculate results using the leave one genome out schema,
i.e. to predict each genome using the remaining ones. Notice, since both Listeria innocua and Listeria monocytogenes
genomes are very similar, they should not be used to predict one another (i.e., only two genomes, Bacillus subtilis, Helicobacter pylori, should be used as the training set for Listeria).
| Data XML file |
The sequence parameters were calculated using the same program for all genes. These data are enclosed in the <GENE> tags.
The genes from one genome are in the <GENOME> tags. The description of each parameter includes information on the program
used to perform the analysis and a reference to it (publication reference or/and WWW link), arguments used to run the program (if any).
Please, include the references in your publication if some of these parameters are used in your study.
| Result XML file |
The result file includes a prediction for all four genomes. The prediction of each functional category should be followed by a
confidence value indicated as a <CONFIDENCE> tag. This tag can be used to calculate more advance statistic, e.g.
ROC curves.
| Statistics |
A Perl script, simpleStatistic.pl, can be used to calculate statistical results for the annotation.
The script receives as input the Results and Data XML files. The calculated statistical results are stored in the output XML file.
| | If you have any question concerning data available at this site or suggestion to improve it, please, contact us!
|
| | See also explanation.txt which we wrote to a student who does not have any idea about XML (as we were sometimes ago :)).
|
| |
 |