Bioinformatics analysis Software - Omixon Color Space Toolkit

The Omixon Color Space Toolkit is a suite of tools for performing bioinformatics analysis. The two main tools are a SOLiD gapped read mapper called Crema, and a statistical SOLiD fine aligner called AMAP. The toolkit also contains a number of utilities for reformatting, sorting, filtering and parsing bioinformatics data, and handling very large files across a number of platforms.

Note that the Omixon Color Space Toolkit used to be called the 'Omixon Variant Toolkit', but now that we have two separate toolkits we have renamed it. The new toolkit is the Omixon Letter Space Toolkit, which is for mapping and aligning 'letter space' (or 'base space') data produced by Illumina, Ion Torrent or Roche 454 sequencers.

Command line Color Space Toolkit
The main tools of the Omixon Color Space Toolkit are available as a standalone Java application, which can be run from the command line. These tools automate a number of analysis steps, as well as automatically run them in parallel (if multiple processors are available). The tools also manage their own memory consumption, so there is no special memory configuration required.

The command line Toolkit comes with some supporting scripts, plus a comprehensive readme file that describes how to run the tools. It can be tried for free for evaluation purposes - there's a 14 day evaluation license available. Please contact us for a quote for purchasing the command line Toolkit.

Color Space Toolkit features
The Toolkit includes highly accurate tools for detecting micro indels, SNPs and MNPs.

It offers a solution to map color reads with a moderate distance (up to 30% sequence divergence) from reference genomes. It poses no restrictions on the size of the reference, which, combined with its high sensitivity, makes the Color Space Toolkit well-suited for targetted sequencing projects and diagnostics.

The Toolkit also offers another unique feature - a fine alignment tool that takes the read quality scores into account using a well-defined probablistic model, plus integrates a choice of DNA mutation models. This leads to a significantly improved variant calling accuracy.

User Benefits
• Gives you the best chance to find the structural variation you are looking for
• Finds variants missed by all other techniques
• Maps more reads and thus provides the highest coverage
• Reports calculated structural variations with statistical significance
• Very simple to run
• Intuitive parameters

Bioinformatics features
• Choice of three pre-set sensitivity settings for mapping, plus option to use custom settings
• Choice of two fine alignment modes using different mutation models (one for coding regions, one for general DNA analysis)
• Automatic handling of lower quality bases without dismissing entire short reads or trimming.
• No need to run pre-filtering tools. Automatic recalibration of quality scores is part of the alignment.
• Allows any fasta or multifasta to be used as the reference (i.e. genomes and exomes supported)
• Runs the mapping and alignment steps together (with an option to run each alone)
• Choice of three strategies for mapping non-specific reads (read that map to multiple locations)
• Two quality filters available, for automatically screening out very low quality alignments
• Only a few parameters, with sensible defaults

File handling features
• Option to keep the rough mapping output as well as final finely aligned output
• Input in .fastq format, utilities provided to convert other formats to .fastq
• Output in standard .sam format, compatible with samtools
• Option to write out unmapped reads in .fastq format

Technical features
• If multiple processors are available the toolkit will automatically split the input and run in parallel
• Memory is managed without any user intervention
• Simple and flexible configuration
• Simple installation
• Comes with instructions and sample configurations

Innovative Alignment
The Color Space Toolkit was designed and written from the ground up to work with SOLiD data.

The mapping module of the Color Space Toolkit - called Crema - follows the seed-and-extend paradigm underlying successful tools such as BLAST and SHRiMP. Color reads are indexed by Crema using spaced seeds (or so-called neighbor seeds) and approximately mapped to a reference sequence database. The underlying data structures are extremely economical for memory use, yet still provide high flexibility for trade-offs between sensitivity and specificity. Crema has an innovative indexing technique and maps more reads than SHRiMP or BFAST at comparable trade-off settings.

The fine alignment module of the Color Space Toolkit - called AMAP - uses a combination of information and algorithms to produce its results, including the quality scores from the sequencer. AMAP has two modes, a nucleotide mode (using a DNA mutation model) and a codon mode (using a protein mutation model). The codon mode is only suitable for analysing coding regions, and is offered as part of our Human Exome online analysis service.

