Download
This download site provides access to Medicago genome sequence and gene prediction data.
Genome sequences
Genome Annotation
- DATE_imgag_cds.fa: contains the coding sequences (without introns, UTRs) of all predicted gene models (without partial gene models).
- DATE_imgag_cds_partial.fa: contains the coding sequences (without introns, UTRs) of all PARTIAL (3' and 5') gene models.
- DATE_imgag_prot.fa: contains the protein sequences of all predicted gene models (without partial gene models).
- DATE_imgag_prot_partial.fa: contains the protein sequences of all PARTIAL (3' and 5') gene models.
- NONRED files: contain NONREDUNDANT (at genomic/unspliced sequence level) sequences. That means, e.g. there still might be identical protein sequences in the nonred protein file if their gene models differ in their unspliced/genomic sequence. Main purpose of these files is to eliminate redundant gene models due to BAC clone sequence overlap!
- DATE_imgag_cdsNONRED.fa: contains the NONREDUNDANT (at genomic/unspliced sequence level) coding sequences (without introns, UTRs) of all predicted gene models (without partial gene models).
- DATE_imgag_protNONRED.fa: contains the NONREDUNDANT (at genomic/unspliced sequence level) protein sequences of all predicted gene models (without partial gene models).
- DATE_imgag_prot/cds_partialNONRED.fa: contains for partial gene models the NONREDUNDANT set of protein/coding sequences
All annotation is available in GAMEXML format (Apollo compatible) as well. Please use the GameXML download page to retrieve your custom files. TIGRXML formatted files can be downloaded for each annotation round in the TIGRXML FTP subsection (zipped).
By accessing these data, you agree not to publish any articles containing analyses of genes or genomic data on a whole genome or chromosome scale prior to publication by MIPS/IMGAG and/or its collaborators of its comprehensive genome analysis. The data will be available for any kind of publication that does not compete directly with planned publications of MIPS/IMGAG and collaborators. Scientists are strongly encouraged to contact MIPS/IMGAG and/or its principal collaborator about planed publications, their intentions and any potential collaborations.
- Medicago FTP download page
- GameXML download page
- Download custom sequence datasets (e.g. all exons, all introns etc.)
Download format documentation
[>]
Unique Namespace: gene predictions obtained through the IMGAG pipeline are named 'IMGA'
[pipe]
Gene ID: BACAccessionNr_GeneCall.GeneCallVersion
[space]
free text description following FASTA conventions
[space]
Start-stop coordinates, where start is the first nucleotide of the translation start codon ATG and stop the last nucleotide of the translation stop, e.g. TGA. Separated by -/dash/minus, no padding zeros, no whitespace. Coordinate 1 is always the first nucleotide of the sequence as retrieved using the seqversion from EMBL/GENBANK/DDBJ (reversing the sequence to achieve forward orientation relative to the chromosome is not allowed). Gene calls on Crick/reverse/- strand have stop > start
Evidence: a single letter code for the level of evidence that underlies the gene call is given. These codes are:
- F full coverage/FL-cDNA: The complete gene model from translation start to translation stop is covered by expressed Medicago sequence, e.g. FL-cDNA or EST alignments across the full length of the coding sequence.
- E expressed/EST matches: Expression of the gene is supported by Medicago EST sequence that matches the gene call (partially).
- H homology/heterologous: the gene call is supported by similarity to Medicago or other ESTs, protein, FL-cDNA, genomic or other sequences with partial or full-length alignments.
- I intrinsic/ab initio/inferred/hypothetical: the gene call is based only on intrinsic prediction tools such as FGENESH, Genscan or Eugene, and no significant alignments to other sequences are available. The classification will be done top-down, so any gene call that does not fall under F will fall under E, if it does not satisfy the requirements of E it will be H and all gene calls that do not fulfill H will be called I
- L low propability: very small genecalls with less than 100 AAs without respect to other evidences
[space]
Method abbreviation that shows the method used to generate the gene call. See
the IMGAG pipeline for details.
[space]
Date where the gene call was made or last modified in yyyymmdd format
[newline]
Sequence...
Last update: 25/07/2006