Protein complexes are key molecular entities that integrate multiple gene products to perform cellular functions. The CORUM database is a collection of experimentally verified mammalian protein complexes.
We use the complex names given in the literature including synonyms. An example is the eukaryotic chaperonin CCT (chaperonin containing TCP-1), that is also well known as TRiC (TCP-1 ring complex). If no name is found for a protein complex, we define one which is usually composed of gene names of the complex, e.g. ‘BRCA1-RAD51 complex’ or ‘Ubiquitin E3 ligase (FBXW7, CUL1, SKP1A, RBX1)’.
The majority of protein complexes in CORUM originates from man (65%), followed by mouse (14%) and rat (14%).
The subunits of protein complexes are annotated according to the respective UniProt entries. In CORUM only the primary accessions are stored as identifiers. Associated information like gene names and protein names is retrieved via the BioRS sequence retrieval system, providing up-to-date information from the primary data sources.
Frequently, the molecular characterization of the complex composition is limited to the identification of the subunits. For cases where the stoichiometry of the subunits has been analysed, the information is given in the ‘Number of subunits’ field (see e.g. complex 960).
For species like rat, pig or sheep some proteins are not found in UniProt. In such cases orthologs from related organisms are used and the substitutions are mentioned in the comment field.
In some articles, the description of certain protein complex subunits is ambiguous. This might occur if at the time of the experiments, only one variant of the protein was known or if several very similar proteins exist and the authors did not determine which isoform or variant was part of the complex. In such cases we collect all possible protein entries and mark them in the status field with ‘nd’ which stands for ‘not determined’. If variants exist for more than one subunit of a protein complex the individual variants are differentiated by nd1, nd2, nd3 etc.
For the complex subunits homologous proteins from mouse are also provided. These are retrieved from our MfunGD database (http://mips.gsf.de/genre/proj/mfungd/). CORUM and MfunGD are cross-linked to each other.
The experimental method which was used to purify the protein complex is annotated according to the PSI-MI standard. The PSI consortium provides a list of methods (http://www.psidev.info/).
We use the Functional Catalogue (FunCat) annotation scheme for protein complex function characterization. The hierarchical structure of FunCat allows browsing for protein complexes with particular cellular functions or localizations. Examples of such sub-datasets are presented on the CORUM home page. Detailed results of such queries are also available via the Browse FunCat search tool on the search penal of the web pages.
The evidence for assigning a functional category is given in a separate field. There are five different evidences that include different qualities: (i) experimental evidence (exp), (ii) evidence from literature like reviews (lit), (iii) known mammalian homolog (kmh), (iv) high-throughput experiment (htp) and (v) predicted function (pred). For all evidences but predicted annotation the corresponding PubMed references are provided.
Additional information like disease relevance or more detailed information about the cellular function of protein complexes is given in the comment field.
In this field the PMID of the article is given, where the members of the complex have been characterized as constituents of the complex.
© 2003 GSF - Forschungszentrum für Umwelt und Gesundheit, GmbH Ingolstädter Landstraße 1, D-85764 Neuherberg