EXAMPLE (analysis of gene neighborhood networks for several model organisms)
Using BIOREL system we estimated the relevance of gene networks extracted based on gene neighborhood information for different model organisms.
The extracting neighborhood network procedure was very simple. The weight of the edge between two genes was inversely proportional to the number of genes separating them physically on the chromosome.
The weight of the edge between two consecutive genes was set to 1. The weight of the edge between genes separated only by 1 gene was set to 0.5, by two genes 0.33 and so on
(the edge weight equal to 1/(n+1), where n is a number of separating genes).
We estimate the functional bias of such networks for 8 model organisms: arabidopsis thaliana, bacillus subtilis, helicobacter pylori, listeria monocytogenes, thermoplasma acidophilum,
saccharomyces cerevisiae, neurospora crassa. The standard output of the BIOREL system for all cases is available here
(
arabidopsis thaliana,
bacillus subtilis,
helicobacter pylori,
listeria monocytogenes,
listeria innocua,
thermoplasma acidophilum,
saccharomyces cerevisiae,
neurospora crassa
).
Table 3. BIOREL evaluation of gene neighborhood
networks in eu- and prokaryote genomes.
|
genome
|
network bias tested by
BIOREL
|
BIOREL modules
|
bias score
|
top enriched categories
|
|
Arabidopsis
thaliana
|
Genes
located closely on the chromosome have the same function or have similar
sequences
|
FunCat module,
Sequence Similarity and InterPro Domain modules
|
0.20
|
Sequence
similarity >(90%)
|
|
Bacillus subtilis
|
0.22
|
FunCat categories
(>70%)
|
|
Helicobacter
pylori
|
0.18
|
FunCat categories
(>80%)
|
|
Listeria monocytogenes
|
0.19
|
FunCat categories
(>70%)
|
|
Thermoplasma acidophilum
|
0.14
|
FunCat categories
(>50%)
|
|
Saccharomyces cerevisiae
|
0.04
|
Sequence
similarity >(60%)
|
|
Neurospora crassa
|
0.07
|
Sequence
similarity >(90%)
|
aThe bias score is a proportion
of genes in the network with significantly (p
< 0.01) biased associations.
As it was expected the functional bias of neighborhood network for bacteria species was much stronger (approximately 20% ) then the functional bias for eukaryotes (4-5%, except arabidopsis thaliana (20%)).
The statistical analyses of categories enriched in the network for bacteria and eukaryotes species reveals the principal difference in the roots of both effects.
The number of cases for eukaryotes when sequence similarity was only one category, which explains the associations of genes classified as relevant was strikingly higher
(approximately 80% of cases) then for bacteria species (approximately 15-20% of cases) where in most cases (approximately 70% of cases) the associations were mainly explained by FunCat categories.
Thus the functional bias of the neighborhood network for eukaryotes mainly can be attributed to gene duplication events while for bacteria the neighborhood network bias is accounted for the operon genome structure as genes within the same operon does not necessarily share strong sequence similarity but are involved in the same biological function.
We would like to point out that the functional bias of gene neighborhood network for bacteria species is less then the share of genes in the genome expected to be organized in operon structures (50-80%).
The gene neighborhood network as it was constructed in the example reflects operon structure only partially.
For instance genes at the operon boarders functionally unrelated but are connected in the gene neighborhood network.
On the other hand functionally related operons are sometimes separated physically on the chromosome and thus a lot of relevant edges are absent in the gene neighborhood network.
|