GSF Logo GSF mips Logo mips
mips

services

BIOREL

 

 

METHODS

 

BIOREL

METHODS
HELP

genomes
  Arabidopsis thaliana

   Bacillus subtilis
   Helicobacter pylori
   Listeria innocua
   Listeria monocytogenes
   Neurospora crassa
   Saccharomyces cerevisiae
   Thermoplasma acidophilum

   EXAMPLES OF ANALYSIS
   download example network
   download software

developers

mips home      mail to webmaster    print view

 

 


Initially we present a brief explanation of the network bias evaluation procedure. There are three main steps. In the first step the biological knowledge (database information) is converted into the network format. The details for each database can be found in the description of BIOREL knowledge modules. In the second step the bias of associations for each gene in the analyzed network is quantified by regression analyses (the analyzed network is aligned to biological knowledge for each gene). For a given gene, a strong bias indicates that gene associations in the network are enriched by some functional categories (the null distribution is computed based on statistics from random networks). In the last step the overall network bias (relevance) score is defined as the rate of genes in the network with significantly biased associations.

A gene network structure can be formalized in matrix form. Each element of a matrix quantifies the connection strength between a corresponding pair of genes. Each column of the matrix reflects the associations of a particular gene. The whole gene network structure can be decomposed into small sub networks (further referred to as elementary networks). For the purpose of our study we will decompose the network so that each elementary network reflects associations of one particular gene. Therefore the elementary network is formalized mathematically as vector, namely the column of the corresponding network matrix (see figure 1).



Figure 1. A) Different forms to represent network structures: graphical and matrix. B) A possible approach to decompose the network structure into elementary networks which reflect the connections of only one node.



The networks that are extracted from biological knowledge databases or from other reliable sources will be referred to as reference networks. These networks represent current reference about gene functional associations. The gene network structures whose biological relevance one should quantify will be referred to as target networks.

For each gene X from the target network the following procedure is applied. The information from knowledge databases is formalized in a reference matrix xik. The element xikof the matrix quantifies the association of gene i (index i runs overall genes from the target network) with gene X in the reference network k (index k runs over all reference networks [e.g. categories] selected for analysis). On other hand the association of gene i with gene X in the target network is formalized by the element yi . In the next step the vector y is regressed against the matrix xik: yi = ak xik+ek. The multiple correlation coefficient R is a quantitative measure of correlation between the reference matrix and vector y. The R value is used to estimate the bias (related to the database information used) introduced by associations of gene X in the target network. The corresponding p-value reflects the statistical significance of R and represents the probability to get the same correlation between the elementary target network and reference matrix by chance assuming as a null hypothesis that both xikand yi were generated randomly. In reality this assumption is not true and the null distribution should be estimated based on statistics from random networks. For this purpose random vector z (random analogue of vector y, represents associations of gene X in the random network) is generated and Rz value is estimated an appropriate number of times (in respect to chosen significance level) to gain statistics of Rz value for random networks.. Based on Rz statistics we estimate the significance of R value at different levels. Therefore, we classify the associations of gene X as biased/nonbiased at different significance levels. The overall network functional bias is defined as the rate of genes in the network with significantly biased associations.





Figure 2. The principles used to assess the biological relevance of the association of gene X in the target network. For gene X from the target network the corresponding elementary networks from gene reference networks are extracted. Together they form a reference matrix xik (available reference about gene X functional associations). Associations of gene X in the elementary target network are expressed by vector y. In the next step by application regression analysis we estimate statistical correspondence between matrix xikand vector y.



Two options are realized to generate the random networks. In both cases the topology of the target network is preserved and only genes in the nodes are per mutated. In the first case genes are per mutated only from the target network. In the second case the permutation process involves the whole set of genes from the analyzed organism. The difference between random models allows to evaluate the bias introduced by the target network set of genes.

Along with the bias of the target network the set of genes with significantly biased associations is identified. For each gene from the set of categories that make major contributions in an explanation of its associations in the target network are inferred. In other words these categories were significantly over or under represented among gene associations in the network. The overall statistics of such categories in the network provides information on the kind of gene interactions that prevail in the target network. This information can be used as a basis for a deeper insight into the network biology/biases.


 

 

  (c) 2005 GSF - Forschungszentrum für Umwelt und Gesundheit, GmbH Ingolstädter Landstraße 1, D-85764 Neuherberg































eXTReMe Tracker