This page contains all the necessary information needed to install and run Twin.

What is Twin?


Twin is a program that aligns in-silico digested contigs to an optical map.


What type of input is expected for Twin?


The Twin executable consumes two files:

  • A binary file containing a sequence of 32 bit integers, each representing a fragment size in bp. This can be created from a text file in the following format using the included script om2bytes.py. Each line of the text file contains two decimal numbers: The size of the fragment and the standard deviation (both in kb), separated by white space. The standard deviation is ignored. (This is the same file format used by the match executable distributed with SOMA v2).
  • A text file containing in-silico digested contigs. This file contains pairs of lines. The first line in each pair constains an identifier, this contig length in bp, and the number of restriction sites, separated by white space. The second line contains a white space delimited list of the restriction site positions. (This is also the format used by the match program distributed with SOMA v2). These files can be produced from an assembly in FASTA format using the included digest.py script. (Usage example for AflIII enzyme which cuts 1 bp after the beginning of the enzyme's recognition sequence: "digest.py contigs.fasta CTTAAG 1 > contigs.silico")

Obtaining and installing Twin


Twin can be obtained from the Download page.

Installation

Twin is known to require the following packages be installed:
Notes which may help with installation on Cent OS 6 can be found here.

After installing these packages, modify the Makefile to point to your install location for boost and sdsl-lite. (It is probably easiest to use your Linux distributions version of Boost rather than building it yourself). To install Twin from the source package, unpack the tarball, change to the new directory, and build as follows:

tar zxvf twin-1.0.tar.gz
cd twin/
make


Using Twin


A tutorial is available to supplement the following documentation.

The TWIN executable reads in in-silico digested contigs from the same file format as SOMA v2. An included script, "om2bytes.py" must be used for converting SOMA formatted optical maps into a binary file, which is the format TWIN uses for input of the optical map. Redirect TWIN's standard output to a file (eg. "twin --silico_map insilico_contigs.txt --opt_map optical_map.bin > twin_out.txt"). The included script "twin2psl.py" will convert a file containing the standard output from TWIN into .psl format.

The following is a detailed description of the options used to control Twin:


  --help                     produce help message
  --verbose                  show successful steps in approximate backtracking 
                             search
  --opt_map arg              REQUIRED set optical map binary file
  --silico_map arg           REQUIRED set in-silico digested contigs file
  --fval arg                 precision/recall tradeoff (default 4.0)
  --search_radius arg        radius around silico fragment size that should be 
                             searched for optmap candidates (i.e. tollerance) 
                             (default 1000)
  --largest_maybe_frag arg   size below which TWIN should consider discarding 
                             in-silico digested fragments (default 1000)
  --smallest_frag_length arg size below which in-silico digested fragments 
                             should be always discarded (default 250)


FAQ


Questions and answers will be posted here as they arise in the use of Twin.