GSF Logo GSF mips Logo mips
mips

services

FUNAT

 

 

Welcome to FUNcat Annotation Tools (FUNAT)!

 

METHOD
HELP

On-line ANALYSIS (bacteria)
Funat Web Services

developers

mips home      mail to webmaster    print view

 

 

Accurate automatic classification of protein function remains a challenge for genome annotation. We have benchmarked the automatic annotation of four bacterial genomes employing a 5-fold cross-validation procedure three machine learning methods (linear regression analysis, k nearest neighbors and associative neural networks). The analyzed genomes were previously manually annotated with FunCat categories in MIPS providing a gold standard. Features describing pairs of sequences rather than each sequence alone were used. The descriptors were derived from sequence alignment scores, InterPro domains, synteny information, lengths of sequences, and calculated protein properties. Following training we scored all pairs from the validation sets. For each target protein we selected a pair with the highest predicted score and annotated the target protein with functional categories of the prototype protein. The neural network approach calculated the highest annotation accuracy. Moreover, the predicted annotation scores differentiated reliable vs. non-reliable annotations. The sequence alignment scores and descriptors derived from InterPro domains provided the largest contribution to the performance of the algorithm. The method was applied to annotate the protein sequences from 180 complete bacterial genomes.

References

Tetko, I.V.; Rodchenkov, I.V. Walter, M.C.; Rattei, T.; Mewes, H.W. Beyond the "Best" Match: Machine Learning Annotation of Protein Sequences by Integration of Different Sources of Informationi, Bioinformatics. 2008, 24(5):621-8.

This study was partially supported by the DFG grant TE 380/1-1 to Dr. I.V. Tetko and Prof. H.W. Mewes.

This server is no more supported. You can still search for old results but no new annotations will be submitted for calculations.

See also other servers developed by us

 

 

  (c) 2007 GSF - Forschungszentrum für Umwelt und Gesundheit, GmbH Ingolstädter Landstraße 1, D-85764 Neuherberg