![]() |
![]() |
|
|
mips > Services > Analysis tools > GEnome PAir - Rapid Dotter |
|
Gepard (German: "cheetah", Backronym for "GEnome PAir - Rapid Dotter") allows the calculation of dotplots even for large sequences like chromosomes or bacterial genomes. Author: Jan Krumsiek Reference: Krumsiek J, Arnold R, Rattei T Gepard: A rapid and sensitive tool for creating dotplots on genome scale. Bioinformatics 2007; 23(8): 1026-8. PMID: 17309896 NavigationUse casesFeatures Screenshot System requirements Download Bugs Source code Tutorial Method Memory issues / Vmatch support Contact Use cases
Features
ScreenshotGepard application in remote mode displaying a dotplot of Escheria coli vs. Shigelia flexneri with encolored functional annotations (Click image for larger version of the screenshot) System requirementsGepard requires the Java Runtime Environment Version 5.0 or later (http://www.java.com/download/).It has been tested on the following operating systems:
DownloadLatest version: 1.21 (Version changes)
BugsAll known bugs should be fixed in the latest release of Gepard. Thanks to the anonymous bug report senders!Source codeTo get a copy of Gepard's source code please contact me.TutorialRead the tutorial online. An offline version of the tutorial is included in the download archive above.MethodGepard utlizies suffix arrays for rapid heuristic dotplot calculation. For large dotplots it searches exact word matches of a certain length (10 by default) from one sequence in the suffix array of the other sequence. As an arbitary word is found in log(n) time within a suffix array this method reduces complexity of the dotplot calculation from O(m*n) to O(m * log n) (where n is the length of the longer, m the length of the shorter sequence). For small dotplots the classical window-based dotplot calculation is utilized.Memory issues / Vmatch supportThe program uses the "Skew" algorithm to calculate the suffix arrays. This algorithm is very memory-intense so Gepard might require a large amount of available memory.Unfortunately, the Java VMs for all operating systems have to be given the maximum amount of available memory at startup. This is why there are different startup scripts for different machines. The following table shows the approximate maximum sequence size (assuming a self-plot) for each memory setting. This includes both suffix array and dot matrix calculation.
Gepard supports the program "mkvtree" from the Vmatch packages which is able to calculate persistent suffix arrays in very short time and with very little memory usage. Gepard will attempt to use this external binary automatically if it can be located in the programs directory or in the environment variable PATH. If you are using Vmatch with Gepard you may run the low-memory version of Gepard as the mkvtree binary will run outside the Java VM. ContactFeel free to contact me for suggestions, comments, questions via email (krumsiek [at] in [dot] tum [dot] de) or use the following contact form:Contact web form removed temporarily due to massive Spam.
Last change: Jan Krumsiek - Jan 27, 2008
© 2008-2009 Helmholtz Zentrum München - Deutsches Forschungszentrum für Gesundheit und Umwelt, GmbH Ingolstädter Landstraße 1, D-85764 Neuherberg
|
|
Disclaimer: |