GSF Logo GSF mips Logo mips
mips

services

BIOREL

 

 

Knowledge modules

 

BIOREL

METHODS
HELP

genomes
  Arabidopsis thaliana

   Bacillus subtilis
   Helicobacter pylori
   Listeria innocua
   Listeria monocytogenes
   Neurospora crassa
   Saccharomyces cerevisiae
   Thermoplasma acidophilum

   EXAMPLES OF ANALYSIS
   download example network
   download software

developers

mips home      mail to webmaster    print view

 

 

The knowledge networks can be extracted from principally different sources of biological knowledge. As a core of our system we employ the MIPS functional catalogue. Gene sequence similarity and InterPro domains data were employed as additional independent data sources. Other utilized sources are manually annotated PPI databases. Our system is very flexible in use. Each knowledge module can be switched on/off depending on the purpose of the study. There is an option, which allows the user to upload his own knowledge modules in the specified format.

Functional Catalogue Module (FunCat module)

The FunCat is an annotation scheme for the functional description of proteins. Taking into account the broad and highly diverse spectrum of known protein functions, the FunCat consists of 28 main functional categories (or branches) that cover general fields like cellular transport, metabolism and cellular communication/signal transduction. The main branches exhibit a hierarchical, tree like structure with up to six levels of increasing specificity. In total, the FunCat includes 1307 functional categories.

Each of the functional categories is assigned to a unique two-digit number. The upward context of the hierarchical tree consists of the prefix of the preceding nodes, located in the upper levels in the hierarchy. The levels of categories are separated by dots, e.g. 01 metabolism is a representative of the highest level, and 01.01.03.02.01 biosynthesis of glutamate belongs to the most specific level of FunCat.

According to the total number of different functional categories (1307) one can extract the same number of different networks. Each network corresponds to one category. The extraction procedure is very simple. If two genes have the same category then they are connected in the corresponding network. The hierarchical tree like structure of FunCat presumes an hierarchical organization of the extracted networks. The networks generated by very specific categories (e.g. 01.01.03.02.01 biosynthesis of glutamate) are a subnetworks of the networks generated by corresponding unspecific ones (e.g. 01 metabolism).

Sequence similarity (SS) module

The base information used by the module is a pairwise similarity score between the amino acid sequences of two genes. The FASTA pair-wise scores for were retrieved from the SIMAP database. The input values were calculated as -log10 (E-value). Pairwise scores with E-value > 0.1 were excluded from the analysis. The edge weight between two genes is proportional to the similarity score.

There are several reasons to include sequence similarity (SS) module to the BIOREL system. First of all it reflects any bias in the network that can be attributed to the genes sequence similarity. This module for example may be very helpful for analyses of gene expression data to estimate cross hybridization effects. Any systematic bias towards similarity in expression profiles of genes with similar sequences will be detected. However the estimation of this effect is not as simple as it seems. Genes with similar sequences are functionally related and thus one needs to separate two effects: sequence similarity and similar function. By applying the BIOREL system twice for the network extracted from expression data one can estimate these effects. First time analysis is performed using only FunCat module and second time using only SS module. If the functional bias of the network and a set of genes classified as relevant will be similar in both cases then most edges in the network connects only genes that share strong sequence similarity (and thus functionally related) and there are no edges which connects functionally similar genes without sequence similarity. Such result would indicate strong cross hybridization signal in the analyzed expression data.� ��

InterPro Domain (IPD) module

The base information used is protein domain composition provided by the InterPro database.� The number of different networks extracted by this module corresponds to the number of domains. Each domain generates a network. The extraction procedure creates an edge between two genes if their proteins both have the corresponding domain. Any systematic bias in the network due to similar domain composition of interacting genes will be estimated by this module.

Gene Neighborhood module

The base information used is physical distance between two genes on the chromosome. The weight of the edge between two genes is inversely proportional to the distance separating them physically on the chromosome. Two options are implemented. The distance is measured in a number of genes or in a number of nucleotides. Any systematic bias in the gene interactions reflected in the network due to gene neighborhood on the chromosome will be estimated by this module.

Protein Protein interaction (PPI) module

There are several databases on protein-protein interaction in yeast. Among them one should mention manually curated catalogues of known protein complexes, data from high-throughput experiments, such as two hybrid experiments, genetic interactions, etc. Having been assembled differently they are similar in storage format. Therefore in all cases the same network extraction procedure can be used. An edge of the binary network is constructed if two proteins are involved in an interaction according to the database record. The BIOREL system in the web configuration employs only manually curated catalogues of known protein complexes.

User defined knowledge modules

This option can be used for many purposes. First the biological information is very dynamic. New sources of information considering genes from different biological perspectives can arise. Thus we allow the user to add data to our knowledge base. Second this option allows to infer the relevance of the target network based on the associations from the set of user supplied networks. These networks represent not the biological knowledge but other networks extracted by different methods or from different kind of high-throughput data. This kind of analysis within our system allows getting interesting insights into the differences and similarities of the networks extracted by different statistical methodologies or from kinds of data. It can be very useful for benchmarking (network inference procedure) purposes.

 

 

 

  (c) 2005 GSF - Forschungszentrum für Umwelt und Gesundheit, GmbH Ingolstädter Landstraße 1, D-85764 Neuherberg































eXTReMe Tracker