|
|
 |
The annotation of sequences is performed according to the "best" FASTA score, i.e. for each sequence from the test set
we select the annotated sequence from the training set that has maximum FASTA score (i.e., highest similarity). After this we assign all FunCat categories of the training set
sequence to the sequence from the test set. The confidence value is 1/SCORE, where SCORE is the "best" FASTA score.
Thus this annotation corresponds to the use of k-nearest neighbor classifier with K=1. |
|
Performance of the method
genome |
total genes |
coverage of manually annotated genes |
annotations of new genes |
sensitivity |
specificity |
|
Helicobacter_pylori |
1576 |
88.6 (771 out of 870) |
351 |
68.1 |
47.4 |
Bacillus_subtilis |
4112 |
78.1 (2205 out of 2823) |
339 |
76.1 |
77.4 |
Listeria_monocytogenes |
2846 |
87.8 (1710 out of 1948) |
361 |
86.1 |
81.3 |
Listeria_innocua |
2968 |
92.8 (1813 out of 1953) |
416 |
86.4 |
80.7 |
| |
 |