For statistical test we use the hypergeometric distribution
This means we calulate the p-value as follows:
$N=$all_count; Genome amount
(e.g. 6723 genes)
$S=$fun_all{$fun_set_num1}; functional category
(e.g. 1500 genes in functional category 01)
$n=$set_count; List genes
(4 genes)
$k=$count_set1; HITs in single functional category of List genes
(from 4 genes in List genes 2 genes are in functional category 01)
formula:
n
________
(S) (N-S) \ (S) (N-S)
( ) x ( ) \ ( ) x ( )
(k) (n-k) \ (k) (n-k)
p=------------- P-value = pp= >-------------
(N) / (N)
( ) / ( )
(n) /_______(n)
k