11/11/13: new version of HapCUT (v.0.6) available for download (64-bit linux binary compatible with GLIBC 2.3 or greater). HapCUT can now generate haplotypes from fosmid pooled sequencing data and also from ligation based mate-pair sequencing data.
Latest source code for HapCUT is available from github (https://github.com/vibansal/hapcut).
HapCUT is a max-cut based algorithm for haplotype assembly using sequence reads from the two chromosomes of an individual. It can be applied to sequence data generated from next-generation sequencing platforms. HapCUT takes as input the aligned SAM/BAM files for an individual diploid genome and the list of variants (VCF file), and outputs the phased haplotype blocks that can be assembled from the sequence reads.
The HAPCUT method is described here:HapCUT: an efficient and accurate algorithm for the haplotype assembly problem. Bansal V, Bafna V. Bioinformatics. 24(16):i153-9. 2008 Aug 15. PMID: 18689818.
HapCUT has been applied to phase Craig Venter's genome, which was sequenced using Sanger sequencing technology.
A sample dataset from NA18508 (bam file for a region on chromosome 20, VCF file and HapCUT output files) can be downloaded from the attachments (HAPCUT-testdata.tar.gz) and used to test HapCUT.
Vince Buffalo from UC Davis has a python package called readphaser to convert the phased HapCUT output into FASTA files of phased/unphased reads. Users of HapCUT may find this useful.