Organ System Heterogeneity DB provides information on the phenotypic heterogeneity of diseases, drugs and mutations in mouse genes on 26 different organ systems defined in the MedDRA ontology at the SOC (System Organ Class) level. The database provides information for 4,865 human diseases, 1,667 drugs and 5,361 genetically modified mouse models. Disease symptoms, drug side-effects and phenotypes from genetically modified mouse models are mapped to the MedDRA SOC level, and the heterogeneity (normalized Shannon entropy) is calculated from the corresponding annotation frequencies of all SOCs.
1. Specify the entity type you are interested in (Disease, Drug, Gene or Multi-search) and enter a query term. When selecting Multi-search, it is possible to search for multiple diseases, genes or drugs by separating individual search terms with ' | '. Multi-search seeks for exact match of the individual search terms.
2. For results matching the query, organ system distributions are plotted in addition to the organ system heterogeneity scores. If desired, these results can be sorted in ascending or descending order of the organ system heterogeneity score. On pointing to the abbreviated name of an organ system or to the bar above, the full organ system name is displayed. For a result of interest, click on Select for more information on its related perturbations and phenotypes.
3. In the result of a Multi-search query, a distance measure between the the first sub-query and other sub-queries is also shown. This distance gives a quantitative measure of the similarity of the organ system distributions. It is possible to sort the results based on the distance.
4. On clicking Select, you will be directed to a new web page that offers the options for comparing the organ system distribution of the selected entity with its already known associations and other entities with most similar organ system distribution.
5. Click on View High Level Phenotypes to view the phenotypes of the selected entity. In the new page the phenotypes description at the High Level Term (HLT) of MedDRA ontology is listed. HLT level groups the individual phenotypic features based upon anatomy, pathology, physiology, etiology or function. Their mapping to the organ system classes is indicated by using the colored squares. In addition, the number of specific phenotypes under each high level phenotype is also shown. By pointing the mouse over the option Sources(s) of symptom data (Sources(s) of side-effect data or Sources(s) of phenotype data) links to the original sources are displayed.
6. Click on the Disease Genes to compare the organ system distribution of the disease with the organ system distribution of genes associated to it. To compare with the organ system distributions of indicated drugs or the contraindicated drugs click on Indicated Drugs and Contraindicated Drugs, respectively. Similarly, when searched for a drug, Drug Targets, Indications and Contraindications allow the comparison of the organ system distribution of the drug with the organ system distribution of the known drug targets, indications and contraindications, respectively. When searched for a gene, Associated Diseases and Interacting Drugs allow to compare the organ system distributions of the gene and associated diseases or related drugs, respectively. For a disease, the database allows to compare its organ system distribution with genes and drugs with most similar profile via Genes and Drugs, respectively. Similarly, these options are available for a selected drug and gene. The screen-shot below shows the comparison of disease Schizophrenia with its associated genes.
7. In the 'Disease Genes' results, the sources reporting the association of disease and a gene can be found by pointing the cursor to the gene. Similarly, in the 'Associated Diseases' results, the sources reporting the association of gene and a disease can be found by pointing the cursor to the disease. In the 'Indications', 'Contraindications', 'Indicated Drugs' and 'Contraindicated Drugs' results, details on the interaction type of disease-drug associations can be found by pointing to the listed diseases or drugs. The interaction type can be 'contraindicate', 'induces', 'may prevent' or 'may treat'. The distance between the organ system distributions of the entities being compared is also shown.
All phenotype data presented and utilized by this database is based on annotations with terms from the Medical Dictionary for Regulatory Activities (MedDRA).
We obtained the phenotypic data from disease and drug-related electronic documents provided by publicly accessible sources (in 2012) using a semi-automatic text mining approach.
Disease data was extracted from OMIM clinical synopses as well as following web resources: The Merck Manual of Diagnosis and Therapy and The Merck Manual Home Health Handbook, A.D.A.M. Medical Encyclopedia via MedlinePlus, and CureResearch. In total, we collected signs and symptoms coded with MedDRA for 4,865 diseases.
Following the procedure in the SIDER database, the phenotypic data for drugs was parsed from public documents describing observed adverse drug events directed at health care professionals or the public such as drug labels, monographs or assessment reports published by the U.S. Food and Drug Administration (provided by FDA and Dailymed), the Medicines and Healthcare products Regulatory Agency (MHRA, UK), BC Cancer Agency (Canada), MedEffect (only clinical report data, Canada), and the European Medicines Agency (EMA). Altogether, we obtained MedDRA-coded side effect data for 1,667 drugs.
We used gene-phenotype annotations provided by Mouse Genome Informatics (MGI) to extract phenotypic features of mouse mutations. Here the phenotypic descriptors are organized in the mammalian phenotype ontology. The terms of this ontology were translated into the Unified Medical Language System (UMLS) with the help of the tool MetaMap.
This application from the National Library of Medicine maps biomedical text to the UMLS Metathesaurus using natural language processing. We manually curated the matches with a score higher than 845, with 1000 being the highest, to ensure a high number of associations between the mammalian phenotype ontology and the UMLS Metathesaurus without false positive mappings. Finally, we only kept those UMLS concepts that were linked to MedDRA, yielding a set of 5,361 mutated mouse genes with phenotype data coded in MedDRA.
Information on genes associated to 2,971 diseases was taken from DisGeNET (integrating data from GAD, MGD, OMIM, CTD, PubMed and Uniprot) which we complemented with up-to-date information provided from MedGen, Uniprot, and Orphadata.
We extracted targets for 1,002 drugs from the STITCH 3 database that have a confidence score higher than 0.7 and discarded indirect interactions.
The National Drug File - Reference Terminology (NDF-RT) is an extended version of the VHA National Drug File (NDF) and contains information on drugs approved in the U.S. We obtained the public version of the NDF-RT (accessed May 2, 2012) and extracted information on indications (attributes may_prevent, may_treat, and induces) and contraindications (attribute CI_with) for drugs and diseases included in our drug and disease thesaurus, respectively. In total, we collected 2,229 drug-disease contraindications and 2,592 indications.
As a measurement of the organ system heterogeneity of a drug, a mouse gene, or a disease we calculated the normalized Shannon entropy from the corresponding annotation frequencies of all high-level MedDRA System Organ Classes (SOCs) linked to the collected individual phenotypic traits and normalized by the maximum possible entropy:
Here, 'p(xi)' refers to the annotation frequency of a SOC and 'n' equals the number of different SOCs (26). This formula evaluates the broadness of the phenotypic effects across organ systems by accounting for the relative abundance, rather than for the number, of phenotypic traits affecting each organ system. Low heterogeneity values correspond to perturbations influencing mainly few organ systems (0 if only one organ system is affected) while high levels represent effects in multiple organ systems to a similar extent (1 if all organ systems are affected equally).
The similarity between organ system distributions of two perturbations is measured as the Euclidean distance between the SOC frequency values of the perturbations.Supported Browsers
The database is best viewed in Firefox (Version 14.0.1 and above), Chrome (Version 35), Safari (Version 5.05 and above), Opera (Version 22), Internet explorer (Version 10 and above).