CORUM – Complete Dataset versus Core Set
The CORUM dataset is generated according to the PSI-MI standard for the
annotation of molecule interactions. Therefore, information about identical or homologous protein
complexes originating from different scientific articles is separated. The result is that a complex
which was analyzed e.g. in mouse, rat and dog is represented as three entries in CORUM or, that a
human complex which was isolated by different authors using different purification methods is also
represented as different entries. Users that prefer this comprehensive dataset will select the
Complete Dataset for searches and download in CORUM.
However, other users prefer a non-redundant dataset which rather reflects a catalogue of proteinThe selection of the Core Set is to some extend subjective but it might be of help for interested users.
complexes in a mammalian cell. (i) Therefore, in the Core Set only one copy of homologous protein
complexes is included.
During the selection for the Core Set, there is a preference for entries where complexes were
thoroughly characterized and for complexes from human. (ii) In addition, for some protein complexes
additional subunits are identified in the course of time which escaped prior experiments. In such
cases only those complexes remain in the Core Set which represent the state of the art.
© 2003 GSF - Forschungszentrum für Umwelt und Gesundheit, GmbH Ingolstädter Landstraße 1, D-85764 Neuherberg