GSF Logo GSF   mips Logo mips

mips  >  Services  >  Analysis tools  >  GEnome PAir - Rapid Dotter

Projects

 Fungi

 Plants

 Structural genomics

 Annotation

 The Functional Catalogue

 Expression analysis

 Proteomics

 cDNA

 BFAM

 HNB

 BioRS

 GenRE

 GAMS

 SIMAP

 HOBIT

 CMB

 CoReNe

 CADASTER

 

Services

 Genomes

 Databanks retrieval systems

> Analysis tools

 Expression analysis

 Protein Protein Interactions

 Protein Complexes

 

About/Contact

 Staff

 Publications

 

Open Positions

 

News

 

 

Imprint

 

     mips home      mail to webmaster      print view

 

Gepard

          

Gepard (German: "cheetah", Backronym for "GEnome PAir - Rapid Dotter") allows the calculation
of dotplots even for large sequences like chromosomes or bacterial genomes.

Author: Jan Krumsiek


Reference:
Krumsiek J, Arnold R, Rattei T
Gepard: A rapid and sensitive tool for creating dotplots on genome scale.
Bioinformatics 2007; 23(8): 1026-8. PMID: 17309896

Navigation

Use cases
Features
Screenshot
System requirements
Download
Bugs
Source code
Tutorial
Method
Memory issues / Vmatch support
Contact


Use cases

  1. Online comparison of partial or complete genomes from the PEDANT databases without downloading any sequence data.
  2. Online comparison of a user-supplied nucleotide sequence against a genome from the PEDANT database.
  3. Local comparison two of nucleotide or amino acid sequences from user-specified files.
  4. New: Batch dotplot functionality provided by command line access to Gepard

Features

  • Rapid calculation of dotplots (<2min for E.coli self-plot on a standard computer)
  • Preconfigured parameters => simply specify two sequences and create the dotplot (3 clicks)
  • Easy-to-use interface (mouse zooming, context-sensitive help)
  • Image exports (multiple formats)
  • Should work on any common OS due to Java software architecture
  • Genes covered by the dotplot are linked to their report webpages in the PEDANT database
  • Coloring of genes by functional classification (uses data from PEDANT)
  • Persistent storage of suffix arrays (avoids recalculation)

Screenshot



Gepard application in remote mode displaying a dotplot of Escheria coli vs. Shigelia flexneri with encolored functional annotations (Click image for larger version of the screenshot)

System requirements

Gepard requires the Java Runtime Environment Version 5.0 or later (http://www.java.com/download/).

It has been tested on the following operating systems:
  • Microsoft Windows 2000 & XP
  • KDE 3 on Linux/Un*x system
  • MacOS 10.x

Download

Latest version: 1.21 (Version changes)
  1. Java Web Start - The convenient way to launch Gepard. Click the following link and Java Web Start should take of care downloading and starting Gepard. This also ensures that you are always running the latest version of Gepard.

    Note: Gepard requires special security rights (like access to the file system). You thus have to trust the certificate which will show up when launching the program.

    For more information on the different startup scripts for different amounts of free memory see: Memory issues

    Launch normal version (512MB)

    Launch low memory version (256MB)
    Launch high memory version (1024MB)


  2. Download the program

    • Download archive - Download a compressed archive containing the required JAR files, startup scripts and an offline version of the tutorial.

      gepard-1.21.zip
      gepard-1.21.tar.gz

    • Download JNLP file - You can also download the Java Web Start descriptor files (see above) to your computer. After downloading all required data once, the program will then run without an internet connections. Right-click on the links and select "Save as" from the pop-up menu.

Bugs

All known bugs should be fixed in the latest release of Gepard. Thanks to the anonymous bug report senders!

Source code

To get a copy of Gepard's source code please contact me.

Tutorial

Read the tutorial online. An offline version of the tutorial is included in the download archive above.

Method

Gepard utlizies suffix arrays for rapid heuristic dotplot calculation. For large dotplots it searches exact word matches of a certain length (10 by default) from one sequence in the suffix array of the other sequence. As an arbitary word is found in log(n) time within a suffix array this method reduces complexity of the dotplot calculation from O(m*n) to O(m * log n) (where n is the length of the longer, m the length of the shorter sequence). For small dotplots the classical window-based dotplot calculation is utilized.

Memory issues / Vmatch support

The program uses the "Skew" algorithm to calculate the suffix arrays. This algorithm is very memory-intense so Gepard might require a large amount of available memory.
Unfortunately, the Java VMs for all operating systems have to be given the maximum amount of available memory at startup.
This is why there are different startup scripts for different machines.

The following table shows the approximate maximum sequence size (assuming a self-plot) for each memory setting. This includes both suffix array and dot matrix calculation.

256MB~10 million base pairs
512MB~20 million base pairs
1024MB~40 million base pairs

Gepard supports the program "mkvtree" from the Vmatch packages which is able to calculate persistent suffix arrays in very short time and with very little memory usage. Gepard will attempt to use this external binary automatically if it can be located in the programs directory or in the environment variable PATH.

If you are using Vmatch with Gepard you may run the low-memory version of Gepard as the mkvtree binary will run outside the Java VM.

Contact

Feel free to contact me for suggestions, comments, questions via email (krumsiek [at] in [dot] tum [dot] de) or use the following contact form:

Contact web form removed temporarily due to massive Spam.
Last change: Jan Krumsiek - Jan 27, 2008

 

© 2008-2009 Helmholtz Zentrum München - Deutsches Forschungszentrum für Gesundheit und Umwelt, GmbH Ingolstädter Landstraße 1, D-85764 Neuherberg

 

Disclaimer:
MIPS Databases and associated information are protected by copyright. This server and its associated data and services are for academic, non-commercial use only. The GSF has no liability for the use of results, data or information which have been provided through this server. Neither the use for commercial purposes, nor the redistribution of MIPS database files to third parties nor the distribution of parts of files or derivative products to any third parties is permitted. Commercial users may contact the distributor Biomax Informatics GmbH.