Cap3 

Cap3 is a fragment assembly program written by Xiaoqiu Huang <xqhuang@cs.iastate.edu>. 

The quick-and-dirty on how to use it is:

(1) login to the Unix computer on which it is installed.

(2) combine all of your sequence fragments into a single fasta file

(3) type:

    cap3 frag.file > cap3.log

You will then have as output, a file of contigs (ending in .contigs) and a file of unused fragments (ending in .singlets). The file cap3.log tells you about the details of how and why the contigs were selected.

 

The following is taken directly from the cap3 README file written by the author.

A detailed documentation on CAP3 usage.

Usage: cap3 File_of_reads [options]

File_of_reads is a file of DNA reads in FASTA format

If the file of reads is named 'xyz', then
the file of quality values must be named 'xyz.qual',
and the file of constraints named 'xyz.con'.

Options (default values):

-a N specify band expansion size N > 10 (20)
-b N specify base quality cutoff for differences N > 15 (20)
-c N specify base quality cutoff for clipping N > 5 (10)
-d N specify max qscore sum at differences N > 100 (250)
-e N specify extra number of differences N > 10 (20)
-g N specify gap penalty factor N > 0 (6)
-m N specify match score factor N > 0 (2)
-n N specify mismatch score factor N < 0 (-5)
-o N specify overlap length cutoff > 20 (30)
-p N specify overlap percent identity cutoff N > 65 (75)
-s N specify overlap similarity score cutoff N > 100 (500)
-u N specify min number of constraints for correction N > 0 (4)
-v N specify min number of constraints for linking N > 0 (2)
-x N specify prefix string for output file names (cap)

If no quality file is given, then a default quality value of 10 is used for each base.

Input to CAP3

CAP3 takes as input a file of sequence reads in FASTA format. If the names of reads contain a dot ('.'), CAP3 requires that the names of reads sequenced from the same subclone contain the same substring up to the first dot. CAP3 takes two optional files: a file of quality values in FASTA format and a file of forward-reverse constraints.

The file of quality values must be named "xyz.qual", and the file of forward-reverse constraints must be named "xyz.con", where "xyz" is the name of the sequence file. CAP3 uses the same format of a quality file as Phrap.

Each line of the constraint file specifies one forward-reverse constraint of the form:

ReadA ReadB MinDistance MaxDistance

where ReadA and ReadB are names of two reads, and MinDistance and MaxDistance are distances (integers) in base pairs. The constraint is satisfied if ReadA in forward orientation occurs in a contig before ReadB in reverse orientation, or ReadB in forward orientation occurs in a contig before ReadA in reverse orientation, and their distance is between MinDistance and MaxDistance. CAP3 works better if a lot more constraints are used.

We have a separate program named "formcon" to generate a constraint file from the sequence file. The program takes an input file of fragments in FASTA format and two integers (minimum distance and maximum distance in bp). The minimum distance and maximum distances specify a lower and a upper limit on the subclone length, respectively. It produces a file of forward-reverse constraints for CAP3. It is assumed that a pair of forward and reverse reads must contain a dot in their names and a pair of forward and reverse reads have a common name up to the first dot. Because CAP3 uses reads whose ends are clipped, instead of raw reads, to measure their distance, the distance seen by CAP3 could be different from the insert size by 1000 to 1500 bp. For example, if the insert size is 2000 to 3000 bp, we recommend that you use 500 for the minimum distance and 4000 for the maximum distance. The results are in the file with name ending in ".con".

 

Output from CAP3


Assembly results in CAP format go to the standard output and need to be directed to a file. Note that clipped 5' and 3' sequences of reads are not shown in CAP3 format output.

CAP3 also produces assembly results in ace file format (".ace"). This allows CAP3 output to be viewed in Consed. Note that clipped 5' and 3' sequences of reads are shown in ace format output.

CAP3 saves consensus sequences in file ".contigs" and their quality values in file ".contigs.qual". Reads that are not used in assembly are put in file ".singlets". Additional information about assembly is given in file ".info".

The CAP3 program reports whether each constraint is satisfied or not. The report is in file ".results". A sample report file is given here:

CPBKY55.F CPBKY55.R 500 6000 3210 satisfied
CPBKY92.F CPBKY92.R 500 6000 497 unsatisfied in distance
CPBKY28.F CPBKY28.R 500 6000 unsatisfied
CPBKY56.F CPBKY56.R 500 6000 10th link between CPBKI23.F+ and CPBKT37.R-
CPBKY70.F CPBKY70.R 500 6000 4th overlap between CPBKM47.F+ and CPBKN28.R-

The first four columns are simply taken from the constraint file. Line 1 indicates that the constraint is satisfied, where the actual distance between the two reads is given on the fifth column. Line 2 indicates that the constraint is not satisfied in distance, that is, the two reads in opposite orientation occur in the same contig, but their distance (given on the fifth column) is out of the given range. Line 3 indicates that the constraint is not satisfied. Line 4 indicates that this constraint is a 10th one that links two contigs, where the 3' read of one contig is "CPBKI23.F" in plus orientation and the 5' read of the other is "CPBKT37.R" in minus orientation. The information suggests that the two contigs should go together in the gap closure phase. Line 5 indicates that the constraint is a 4th constraint supporting an overlap between CPBKM47.F and CPBKN28.R. The overlap is not implemented in the current the assembly.

All The Details

The complete help files for cap3, with more details, are located on genome.chmcc.org  in the file /usr/local/gcg/doc/cap3.help